AoC Day 3: Learning About HashMaps

Remember when I said last week that if you can get the data out of the input file and into a good internal format, you were 90% of the way to solving the problem? Well, Day 3 reinforces that by changing the problem from reading and processing individual lines to reading and processing a whole grid.

You need to identify all the numbers in a grid which are engine part numbers. You are given a grid like this:

467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..

The periods (.) represent blank spaces. Any number which is adjacent to a symbol other than a period (.) is a part number. By adjacent, we mean in all directions, including diagonally. In the example given, only the numbers 114 and 58 are not adjacent to any other symbol — adding the others gives you your answer.

Approach

My approach for this was fairly straight forward:

  • Start scanning the grid from left to right, top to bottom, starting in the upper left corner.
  • When I find a number
    • Scan until I find the end of the number
    • Parse that number
    • Scan all the characters around the number
    • If any are not periods ('.'), then add them to the final answer

To aid in this algorithm, I created a Vec<Vec<char>> from all the lines in the input. In Rust, a Vec is not unlike a list in Python, or an ArrayList in Java — in short, they are dynamic resizeable arrays. This structure makes scanning the grid easier, as I can treat it like a two-dimensional plane.

At this point, it’s safe to say that I overcomplicated things by using a struct with a custom method to do this. Three guesses which part of the Rust book I was on at the time. I avoided this technique in future solutions, unless it really helped.

Of course, my rough approach above leaves some smaller problems to solve.

To find a number, I scan until I find a digit, then find the last digit in the number by scanning forward until I see something that isn’t a digit. I collect those characters into a String so I can parse it as a usize:

let mut last_x = x;
while last_x < line.len() && engine.scheme[y][last_x].is_digit(10){
    last_x += 1;
}

// Collect the slice into a string and parse it into a usize
let num = line[x..last_x].iter().collect::<String>().parse::<usize>().unwrap();

Now I have two important pieces of data — the actual part number, but also where it lives on the grid. This helps me find the symbols around the number.

Each part number is confined to a single line y, so I just need to look at the line above and below it, or y-1 and y+1. I also know the part number starts at x and stops at last_x, so I can examine every character from (x-1..last_x+1) horizontally, and (y-1..y+1) vertically. If I find something that isn’t part of the number or a blank, I know this is a valid part number:

let mut good = false;
for scan_y in start_y..end_y+1{
    for scan_x in start_x..end_x+1{
        if engine.scheme[scan_y][scan_x]!='.' &&
          !engine.scheme[scan_y][scan_x].is_digit(10){
            good = true;
        }
    }
}

If there are any symbols in this block, good is set to true and I add the part number to the final sum.

Part 2

Part 2 complicates things by introducing the concept of gears.

Any '*' symbol on the grid is a gear, but only if it is next to two and only two numbers. So returning to the sample above, even though there are three '*' characters in the grid, only two of them are gears. One connects 467 and 35, and the other connects 755 and 598. The '*' next to 617 isn’t a gear since it doesn’t connect two numbers.

Of course, all gears have gear ratios — you calculate that by multiplying the part numbers connected by a gear. Add all the gear ratios for your final answer.

So how did I do this? It’s easy enough to find numbers next to gears — but how do I know how many numbers are next to gears? And how do I know if there are precisely two numbers, not one or three?

Let me introduce you to the magic of the HashMap.

HashMaps

Most languages have the concept of the hash map. In Python they are called dictionaries, and some languages call them associative arrays.

The idea behind the hash map is simple. You associate data you want to store with a key that identifies that data. Using an actual dictionary as an analogy, you find the definition of a word (the data) by looking it up using the word itself as the key. The words themselves are listed in the dictionary in alphabetical order, making them easy to find. In a hash map, the keys are located using a hash function.

A hash function is a function which converts data in some format into a number. That number can then be used as the index into an array.

So what a hash map does is take your key, turn it into a number using a hash function, then use that number as the index in another structure like an array to store your data.

Why such a convoluted process, I hear you ask. Why not just store the data in an array and search for it? Simple — because searching for data is a lot slower than computing the final place where the data lives. They are called computers, not finders, after all…

Think about this analogy — you are standing in front of 101 Main Street. You need to get to 1205 Main Street. Which is faster — looking at each house number starting with 101, and stopping when you get to 1203? Or simply skipping ahead 11 blocks, then going to the second house you see?

The defense rests.

So how do I use the hash map here? What is the key, and what is the data?

Gearing Up

I use the same code I used in Part 1 to find numbers for Part 2. However, instead of looking for anything that isn’t a period or a number, I look specifically for gears.

The hash map I use tracks the location of each gear as a tuple of the x and y location of the gear in the grid. The data is a Vec<usize> containing all the adjacent numbers:

// Track all of our gears here
// Keys are tuples of the coordinates of the gear, 
// Data is a vector of numbers adjacent to it.
let mut gears: HashMap<(usize, usize), Vec<usize>> = HashMap::new();

Here’s the logic behind this structure.

Once I find a number, I start looking for any gears adjacent to the number. Each gear lives in only one specific location in the grid, no matter what part number is adjacent to it. So, if I find a gear at (x,y) for one adjacent number, it will be at the same (x,y) when I find it for another adjacent number.

This makes the gear location an ideal key for a hashmap, since it never changes locations.

I put the coordinates of the gear in a tuple to use as the key, and look for it in the hash map. If I find it, I add the current number to the Vec data for this particular gear. If this is the first time I’m seeing this gear, I add it to the gears hash map first:

for scan_y in start_y..end_y+1{
    for scan_x in start_x..end_x+1{
        // Did we find a gear?
        if engine.scheme[scan_y][scan_x]=='*'{
            // Add this gear if we haven't seen it before
            if !gears.contains_key(&(scan_y, scan_x)){
                gears.insert((scan_y, scan_x), vec![num]);
            } else {
                // Add this number to the vec of the existing gear
                gears.get_mut(&(scan_y, scan_x)).unwrap().push(num);
            }
        }
    }
}

The .get_mut() call returns a mutable reference to the Vec data. Without that, I wouldn’t be able to add the new number to the Vec. Rust!

OK, so what, I hear you say. This still doesn’t solve the problem — how do you figure out if a gear connects only two numbers? Well, that’s actually really easy now.

All I do is get every key-value pair out of the hash map. Any Vec values with exactly two members represent gears which connect two and only two numbers:

// Now we can scan the gears hashmap
// Any item with exactly two numbers is what we want
for (gear, numbers) in &gears{
    if numbers.len() == 2 {
        sum += numbers[0]*numbers[1];
    }
}

Technically, I don’t need the variable gear above — the loop could be for (_, numbers) in &gears, or even for numbers in &gears.values(). But I didn’t think about that until later.

Summary

Check out my final code here, and let me know what you think — is there an improvement you would make?

This is a great example of how using the right data structures can make solving a problem easier. Without knowing how hash maps work, part 2 would have been a much more difficult problem to solve.

It was also a great way to learn Rust. As I progressed with the contest and revisited older problems, I saw ways to improve my Rust code, although I applied those lessons moving forward.