For this one, I solved part 1 twice — the first time, I used regular expressions, and I used recursion for the second time.
The first time, I used what I thought was the right approach — surprise, it wasn’t. Then a quick scan of Reddit uncovered a better solution which also worked well for part 2.
The Problem
Here, your task is to identify broken hot springs. As always, read the description for the complete details.
You are given a string consisting of two parts. The first part is a pattern consisting of three characters:
- A
'.'
indicates a known working hot spring. - A
'#'
indicates a known broken hot spring. - A
'?'
indicates a hot spring of unknown condition.
After each pattern, a sequence of comma-separated numbers is listed, where each number indicates how many contiguous damaged springs there are in a group. Each group of damaged springs has at least one undamaged spring in between them. Here’s an example:
???.### 1,1,3
The pattern shows that you don’t know if the first three springs are functional or not. The fourth spring works, and the last three are damaged. The numbers indicate that there are three groups of damaged springs. The first group has one damaged spring, the second has one as well, and the last has three damaged springs.
You have to use this information to figure out how many different arrangements of working and damaged springs there are.
So how to do this?
The Approach, v1
My first thought was that this looks like a regular expression problem.
Using this theory, I created a regular expression from the list of numbers, then generated every possible combination of strings from the pattern. Every match was counted, and the final answer displayed.
This had the advantage of being fairly straightforward to code, once I found the crate for handling regular expressions in Rust.
The problem was that it was slooow — it took at least eight seconds to solve my input, which is ridiculously slow for this. I need to generate $2^{x}$ strings to test, where $x$ is the number of '?'
characters in the string, which is time consuming and memory intensive. It also didn’t allow me to apply any other time-saving techniques, as you’ll see later.
I won’t go into detail about this code here, since it was slow for part one, and completely unworkable for part 2. You can see how this worked here if you are interested.
The Approach, v2
I did some digging into Reddit, and found this thread which started talking about a recursive solution. I wasn’t sure myself how to apply it, so I read a bit more, and found an approach which worked for me.
This solution relies on two key pieces of info:
- We don’t care about working springs — they can be safely ignored for the most part.
- When checking for a group of damaged springs, we can treat wildcards as damaged springs.
You’ll see how these play out below.
The recursive solution works like this (Warning — lots of code):
fn count_arrangements_rec(pattern: Vec<char>, counts: Vec<usize>) -> usize {
// Are we done?
if pattern.is_empty() && counts.is_empty() { return 1; }
// If only the pattern is empty, we've got another problem
else if pattern.is_empty() && !counts.is_empty() { return 0; }
// If the string starts with a '.', drop it and keep going
else if pattern[0] == '.' {
return count_arrangements_rec((pattern[1..]).to_vec(), counts);
}
// If it starts with a '?', we have to check both possibilities
else if pattern[0] == '?' {
let mut p1 = pattern.clone();
let mut p2 = pattern.clone();
p1[0] = '.';
p2[0] = '#';
return count_arrangements_rec(p1, counts.clone())
+ count_arrangements_rec(p2, counts.clone());
}
// If it starts with a '#', we need to do some checking
else if pattern[0] == '#' {
// How many '#' do we need?
// Count from here to that number
// If we find enough '#' or '?' chars, we can pull those
// Else it's a no bueno
// If we don't have any counts, we're done
if counts.is_empty() { return 0; }
for i in 0..counts[0] {
// Are we past the end of the pattern? No bueno
if i == pattern.len() { return 0; }
// Did we find a '.' before we ran out of chars? No bueno
if pattern[i] == '.' { return 0; }
}
// We can keep going
// Remove count chars
let mut p1 = pattern[counts[0]..].to_vec();
// If the next char is a '?', change it to a '.'
// for the end of the pattern
if !p1.is_empty() && p1[0] == '?' { p1[0] = '.'; }
// If the next char is a '#', we've didn't clear the whole pattern
if !p1.is_empty() && p1[0] == '#' { return 0; }
// Else we can continue
return count_arrangements_rec(p1, counts[1..].to_vec());
}
0
}
For each pattern and set of numbers (which I call pattern
and counts
), do the following:
- If both
pattern
andcounts
are empty, this is a match, return 1. - If
pattern
is empty, but counts isn’t, this is not a match, return 0. - If the first character in
pattern
is a.
(a working spring), remove it frompattern
and check what’s left. - If the first character in
pattern
is a?
(a wildcard), create two new patterns:- One substituting a
#
for the wildcard - One substituting a
.
for the wildcard
Then check each new pattern
- One substituting a
- If the first character is a
#
(a damaged spring)- Get the first element of
counts
(call itcount
) - Check the first
count
characters of the pattern are either#
or?
- If so
- Remove
count
characters frompattern
- If the first remaining character is a
#
, this is not a match, return 0. - If the first remaining character is a
?
- Change it to
.
to end the pattern - Remove the first element from
counts
- Check the new
pattern
andcounts
- Change it to
- Remove
- Else this is not a match, return 0.
- Get the first element of
Notice how many places return 0 — there are plenty of places where I don’t need to check the whole pattern. As soon as part of the pattern fails, the whole thing fails. Add in a bunch of boundary checking to avoid panics, and I had a solution which worked in a fraction of the time as the regular expression version. You can see the whole thing here.
Part 2
The complication with part two was typical — solve the same thing, but bigger. The "bigger" this time was five-fold — the pattern was five copies of the original pattern, separated by ?
characters, and the damaged group counts were five copies as well. This makes the original example:
???.### 1,1,3
Look like this:
???.###????.###????.###????.###????.### 1,1,3,1,1,3,1,1,3,1,1,3,1,1,3
Here is where the original regex solution falls down — the first pattern has three wildcards leading to $2^{3} = 8$ patterns to check. The part two pattern has $2^{19} = 524,288$ patterns to check. And that’s just the first example pattern — my actual input file had lines with 19 wildcards. That turns into $19\times{5}+{4} = 99$ wildcards, or $2^{99}\approx 6.338\mathrm{e}{29}$ combinations). It was after waiting a full minute with no answer to the first sample string, and my computer cooling fans spinning up for take-off on a short runway, that I decided to abandon the regular expression path.
However, the recursive solution wouldn’t be much better either. While I didn’t need to check all 633 octillion patterns in the worst case, there were still a lot of calculations I needed to do. Repeating those calculations would still be time-consuming, and the recursive solution wouldn’t work.
At least, not on it’s own.
Take a Memo(ize)
Let’s say you want to calculate Fibonacci numbers — these are numbers in the following sequence:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
Each number in the sequence is generated by adding the two previous numbers. The sequence starts with 0
and 1
, and the rest is just adding.
This is great for a recursive solution — here’s a Python version which calculates the n
th Fibonacci number:
def fib(n):
if n == 0:
return 0
if n == 1:
return 1
return fib(n-1) + fib(n-2)
Simple, right? Ask it for a Fibonacci number, and you get it. But there is a hidden problem.
Let’s look at what happens when you call fib(5)
, by tracking what other calls are made to the fib()
function:
fib(5)
needs to findfib(4)
andfib(3)
fib(4)
needs to findfib(3)
andfib(2)
fib(3)
needs to findfib(2)
andfib(1)
fib(2)
needs to findfib(1)
andfib(0)
fib(2)
needs to findfib(1)
andfib(0)
fib(3)
needs to findfib(2)
andfib(1)
fib(2)
needs to findfib(1)
andfib(0)
Notice how many times you call fib(3)
and fib(2)
. You are calculating the same number two or three times, which takes time and memory. Think about trying to find fib(10)
or fib(20)
this way, and how many times you’ll calculate fib(2)
that way.
But what if you stored the results of calculations as you went? That way, you wouldn’t need to recalculate them, but checked if they were stored instead? Take a look at this:
fib_store = {0:0, 1:1}
def fib(n):
if n in fib_store.keys:
return fib_store[n]
fib_store[n-2] = fib(n-2)
fib_store[n-1] = fib(n-1)
fib_store[n] = fib_store[n-1] + fib_store[n-2]
return fib_store[n]
The big difference here is the fib_store
dictionary, which holds the results of each calculation. The first thing this version does is look for prior results in the dictionary — if they exist, those are used instead. Only if the results aren’t there does this version make the recursive call to calculate the current number, storing the results in fib_store
.
Now let’s look at how the calls to fib(5)
go:
fib(5)
needs to calculatefib(4)
andfib(3)
fib(4)
needs to calculatefib(3)
andfib(2)
fib(3)
needs to calculatefib(2)
fib(2)
findsfib_store[1]
andfib_store[0]
, and createsfib_store[2]
fib(1)
is found atfib_store[1]
fib_store[3]
is created
fib(2)
is found atfib_store[2]
fib(3)
is found atfib_store[3]
fib_store[4]
is created
fib_store[5]
is created
Now, instead of calculating fib(2)
three times, it’s calculated only once — the other times it’s needed, it’s found in fib_store
. Imagine this same thing with fib(20)
, or even fib(1314)
.
This storing of intermediate results and looking them up for future calculations is called memoization, and is a key technique in making recursive algorithms work better and more efficiently.
So how did I make memoization work here?
External Crates
Rust has a great ecosystem of libraries available, just like Python. The crate system lets you download and install functionality from others you can use in your apps.
In this case, there is a memoize crate I installed (which I learned about from the same Reddit thread). After adding and use
ing the crate, all I had to do is add #[memoize]
attribute to my recursive function. As long as the arguments to the function are Clone
-able and Hash
-able, and the return value is Clone
-able, the crate takes care of the everything.
So what is this crate doing? It’s Clone
-ing the arguments, Hash
-ing them to make a simpler look-up, and storing the Clone
-d result in a HashMap
. Which is what you saw above with the Python example. And what I could do on my own as well. This just makes it easier.
So for Part 2, I generate the new pattern and count string, then just call the recursive solution with memoization. The result is a lightning quick answer — memoizing the results from previous calls meant I could look them up when necessary, which sped the solution up immensely.
Summary
My complete code is here, and I welcome all comments and PRs.
I’ll admit, I sometimes don’t know how to approach a problem, and I need some hints from time to time. Reddit is great for this, and as long as I don’t copy someone else’s code, it’s all good. I’m learning by doing, even if I do need some help.