We’ve been talking a bit about arrays the past few weeks. We’ve talked about how each element of an array is accessed by it’s index. Throughout that discussion, we’ve always presumed that the index is a number, specifically an integer.
But does it always have to be an integer? What if we had an array which could be indexed by some other value?
Freedom of Association
One way to think about the indexes of an array is to think about the connection between the index and the data. We normally say that a piece of data is stored under a specific index in an array. However, we can also restate that by saying we are associating the data with a specific index. This allows us to treat the index value itself as another piece of data, which can be of any type whatsoever.
This is a subtle but important shift in thinking. We associate different things together all the time in real life. Smells and sounds are often associated with strong memories. Circadian sleep cycles are associated with the presence and absence of light. Habits are formed and broken by repeating associations between behaviors and triggers.
By associating one experience with another, we create meaning and give importance to the things around us. Similarly for data, associating one piece of data with another helps us create more meaningful code.
Association in Action
An example might help here. Let’s say we wanted to store data about a person’s contact information in an array. Specifically, we need to store a name, address, and phone number. So we can create an array called contacts
that could look something like this:
Index | 0 | 1 | 2 |
---|---|---|---|
Data | Joe Cool | 123 Cool Drive, New York | (212) 555-1212 |
The name of the first contact is accessed as contacts[0]
, the address as contacts[1]
, and the phone number as contacts[2]
.
Now what if we wanted to add more people? We could keep adding indexes…
Index | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Data | Joe Cool | 123 Cool Drive, New York | (212) 555-1212 | Jane Rad | 456 Rad Way, New York | (212) 555-3434 |
Accessing the first contact is the same, but the second contact’s name is under contacts[3]
, their address as contacts[4]
, and their phone number as contacts[5]
.
Even with just two people, it’s starting to get confusing. How do you lookup a contact and find the phone number? Which index contains the address for Jane Rad? You could do some math to figure out the indexes where the names and phone numbers are stored, but there’s another complication.
Let’s say Joe Cool has two phone numbers. How do we do that? We can add another entry to the array:
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
Data | Joe Cool | 123 Cool Drive, New York | (212) 555-1212 | (203) 123-4567 | Jane Rad | 456 Rad Way, New York | (212) 555-3434 |
But now the math for looking things up won’t work. Jane Rad’s name has now moved to contacts[4]
, which moves her address and phone number as well. So maybe instead you put the second phone number in the same place as the first:
Index | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Data | Joe Cool | 123 Cool Drive, New York | (212) 555-1212, (203) 123-4567 | Jane Rad | 456 Rad Way, New York | (212) 555-3434 |
Now you have to do some work to figure out which number you want.
You could even add a second entry for Joe Cool with the second number, but what happens when he moves and the address changes? You need to change both entries, if you remember you have two entries.
There is a better solution for this than a standard array. They’re called various things in various programming languages, but for this article, we’ll call them dictionaries.
Look It Up
Why dictionary? Because a dictionary associates specific words with other data, like proununciation, definition, and history. The word acts like a key, which you use to find the rest of the data you need. In fact, many languages use the term key to refer to the index value you use to find data in the dictionary.
So how would this work for our contact information data? First, let’s think about storing just a single person’s contact information in a dictionary. Again, we want to store a name, address, and phone number. So you create a dictionary called contact1
with the following keys and associated data:
Key | Data |
---|---|
Name | Joe Cool |
Address | 123 Cool Drive, New York |
Phone | (212) 555-1212 |
If you want to find the name of this contact, you would use contact1["Name"]
, which is "Joe Cool". The address is in contact1["address"]
and the phone number is contact1["phone"]
.
We can do the same thing for the second contact, creating a similar dictionary called contact2
:
Key | Data |
---|---|
Name | Jane Rad |
Address | 456 Rad Way, New York |
Phone | (212) 555-3434 |
As with the first entry, the name is contact2["name"]
, the address is in contact2["address"]
, and the phone number is contact2["phone"]
.
We can create dictionaries for as many contacts as we have. However, the next question is how do we bring these different contacts together into a single place? The variable names are a clue. They lead us to use a regular array to store each dictionary.
You can create an array called contacts
to store them all:
Index | 0 | 1 |
---|---|---|
Data | contact1 |
contact2 |
In reality, the data stored is the address of the dictionaries we created, but this diagram works for visualizing what is going on.
So how do we find the name of the first contact? When we created the dictionary for that contact, we called it contact1
. Now, that dictionary is stored as contacts[1]
, so you can say contacts[1]["name"]
.
Of course, different languages have different ways of referring to nested data structures, but in general, you start with the index or key of the containing structure (in this case, a normal array), followed by the index or key of the structure stored at that location (in this case, a dictionary).
Expanding On a Theme
So how does this solve the problem of Joe Cool having a second phone number?
Remember, dictionaries allow us to associate any key with any piece of data. This allows us to attack this problem in two different ways.
The most obvious is to add a new key for the second phone number:
Key | Data |
---|---|
Name | Joe Cool |
Address | 123 Cool Drive, New York |
Phone | (212) 555-1212 |
Phone2 | (203) 123-4567 |
While this is a quick and effective way to handle it, it only allows one extra number. What if Jane had three numbers?
Another way to handle this is to change the data which is stored:
Key | Data |
---|---|
Name | Joe Cool |
Address | 123 Cool Drive, New York |
Phone | array("(212) 555-1212", "(203) 123-4567") |
Now instead of storing a single number, the phone
key stores another array, which can have as many numbers as necessary.
Of course, this solution only stores the numbers. It doesn’t tell us which is which. For example, is the first number a mobile number, or a fax number? Is there a work number? There’s no association between the data which is stored and what that data means.
So…. What if we made that a dictionary instead?
Key | Data |
---|---|
Name | Joe Cool |
Address | 123 Cool Drive, New York |
Phone | dict("work: (212) 555-1212", "mobile: (203) 123-4567") |
Of course, this makes accessing the proper phone number a little more complicated, but good key names can help manage the complexity. For example, Joe Cool’s work phone number is referenced as contacts[0]["phone"]["work"]
. Compare that to the regular array example of where Joe’s work number is stored as contacts[2]
.
In most languages, you can nest arrays and associative arrays as much as you want. Of course, with great power comes great responsibility, but you already knew that…
A Rose By Any Other Name
Not every programming language calls these structures dictionaries. You may read or hear the terms associative array, map, or even symbol table. They all refer to the same thing, namely a way to associate one piece of data with another in a formal data structure.
Next Time
We expand on the ideas here to show other ways you can store different types of data in a single data structure.