It’s been a long time since I posted here. I keep trying to make it a regularly habit, but have found it difficult, especially around the holidays. I won’t bore you with another resolution to make this a more regular feature. I will simply try to do better. Whether I succeed or not is up to you to decide.
In any case, I want to wrap up the discussion of storing different kinds of data in a single data structure.
Dynamic v. Static
Recall from my last post that I described a structure called a dictionary (also called a map or associative array). A dictionary is like a normal array, except the indices aren’t numeric, but other data. For example, a dicitonary can be used to hold onto contact information:
Key | Data |
---|---|
Name | Joe Cool |
Address | 123 Cool Drive, New York |
Phone | (212) 555-1212 |
In a dictionary structure, keys are used to access individual datum. Together, the collections of keys and the data they reference is the dictionary.
Like an array, a dictionary can be modified at will. You can add both new keys and new data. For example, if you want to start tracking email address in your contact list, you can execute something as simple as:
contact["email"] = "joe@cool.com"
This bit of Python code will add a new key/datum pair to the dictionary called contacts
. You can also delete a key, if you want. In other words, you can dynamically change the data structure.
This is both a strength and a weakness in the dictionary structure. Being able to dynamically modify the structure adds a lot of flexibility to your code. However, it also requires you handle cases where a key may or may not be present.
If you know you will never need to add new keys to the data you track, you can define a static structure. This requires you to define the keys you will use before-hand, but it also ensures that those keys will be present anytime you need to process the structure.
Artificial constructs
Different languages handled static data structures differently. The basic form can be best illustrated in a pseudo-C language.
Let’s assume we are tracking our contact information. We can define the following structure:
struct contact_info {
String name;
String address;
String phone;
String email
}
struct contact_info contact;
In C, the struct
keyword defines a single data structure (here called contact_info
) which can hold multiple data of different types. You then define a variable contact
, which holds the data in one of these static data structures.
Accessing the data in a struct
is a little different as well. Let’s say we want to access the address of our contact
– we can do that as:
print(contact.address);
The struct
isn’t an array of any kind, so we don’t need brackets. Since the key name is statically defined, you don’t need to wrap it in quotes. The dot replaces both of them, and tells the computer to find the datum associated with the key called address
in the variable contact
.
Of course, holding one contact is relatively useless. You can have an array of struct
s, if you want:
struct contact_info[] contacts;
...
print(contacts[4].name);
Since contacts
is an array, you need to find the specific element to access (in this case, contacts[4]
), then use the dot to find the key name
, then print that datum.
So why use a struct
instead of a dictionary? Or vice versa?
Limits
When to use a static data structure over a dynamic data structure is largely influenced by your use case.
For example, if you control both the program and the composition of the data it will process, a static data structure may lead to better performance and less code complexity. For example, if you need to examine the contents of a folder on your hard drive, a static data structure would be useful. Data about the files on your hard drive (called metadata) is usually well and statically defined by your operating system.
On the other hand, if the data if not under your control, you may need a dynamic structure to manage it fully. The contacts tracking system we’ve been working with is a great example, where the user may wish to add data for every contact.
Not all languages support these data structures natively. C does not have a built-in dictionary type, although implementing one is relatively easy. Python, being inherently dynamic, doesn’t have a formal struct
type, although the dataclass comes closest. Java supports both data-only classes (sometimes called records), as well as providing a Dictionary class.
What Next?
No matter what language you choose, and what you are trying to do, understanding how data is stored at a basic level will help you decide how to do it.
But now, we turn to other matters. I originally started writing this blog to explain why arrays in some languages use indices starting with zero, while others start with one. Trying to answer that simple question requires a knowledge of how memory works, which is where this all started.
However, I find that in order to keep writing consistently, I need to do two things:
- Write shorter articles.
- Write about things I am currently engaged with.
To that end, the next article will be a cautionary tale about writing software and making sure we get it right.
Then we’ll do some fun stuff in a specific language.