I Need Arrays

For the past little while, we’ve been talking about how computers store data in memory. We’ve been talking a lot about how to store a single piece of data in variable, like a = 437 or name = "Jon".

But what if you need to store a lot of data in a single variable?

Wait, why would you even need to do that?

At an Average Pace

I’m a runner. I’ve run lots of races, IRL and virtual, including one marathon so far. It’s not for everyone, but it’s something I do.

One aspect of running is tracking my slowest, fastest, and average pace for different race lengths. I track every race I run, the distance I ran (5K, 10K, half-marathon, and so on), and my finishing time for the race. From there, I can calculate my average pace for the race, while my Polar M430 running watch figures out my fastest and slowest splits per kilometer. All of this data comes together to help me set my expectations for later races.

There are a few ways to track this data. I could track each race in a set of different variables:

race1_distance = 5    # This was a 5K
race1_time = 30.50    # Time in seconds, so this is 00:30:30
race1_average = race1_time / race1_distance

race2_distance = 10   # This was a 10K
race2_time = 65.25    # Time in seconds, so this is 01:05:15
race2_average = race2_time / race2_distance

overall_average = (race1_average + race2_average) / 2

This works great for this two races, but what happens when I add a third race? Now I need to not only add race3_distance, race3_time, and race3_average, but also modify overall_average:

race3_distance = 5   # This was a 5K
race3_time = 29.80   # Time in seconds, so this is 00:29:42
race3_average = race3_time / race3_distance

overall_average = (race1_average + race2_average + race3_average) / 3

This still works, but I’ve run 20 or so races so far, and I plan to run more in the future. So now I need to type a lot of variable names and hope I don’t make any mistakes. This also doesn’t tell me what my fastest average time is, or let me figure out my fastest 5K, fastest 10K, etc. And anytime I need to add a new race, I need to change the equation calculating my average.

If only there was a better way.

You don’t have to call me array…

What if, instead of adding a bunch of individual variables for each race, you had just a single variable for each category of data you wanted to store. So you would store all the distances in a race_distance variable, and all the times in a race_time variable. You could also calculate the averages in a separate race_average variable.

But how? That’s where an array comes in.

An array (sometimes called a list as well) is a single variable that can hold many different values. You access the different values using a numeric value called an index. You can store any type of data in an array, but the array usually only holds one type of data.

Here’s a quick example of an array and index used to hold a string value:

contact_name[4] = "John Smith"

The array variable is contact_name. The index is 4, which means you store the value "John Smith" in the fourth slot of the array. Here’s what that might look like in memory:

Array illustration
What an array looks like conceptually

So going back to the running example, instead of using race1_time and race1_distance, let’s use an array instead. Again, in psuedo-code:

race_distance = array()
race_time = array()
race_average = array()

race_distance[1] = 5    # This was a 5K
race_time[1] = 30.50    # Time in seconds, so this is 00:30:30
race_average[1] = race_time[1] / race_distance[1]

race_distance[2] = 10   # This was a 10K
race_time[2] = 65.25    # Time in seconds, so this is 01:05:15
race_average[2] = race_time[2] / race_distance[2]

It’s not that much better, is it? You still have to type the numbers for the index, and you still have to calculate the race average for each. Or do you?

The index is just a number, which means you could, if you wanted, use another variable as the index. We’ll see how to do that next week, when we cover loops.