Table of contents
Advent of Code is a free, annual coding event where each day in December there are a series of coding and problem solving challenges to solve.
This year I've decided to blog my experience and the key learnings. I am coding it all in Python
but many of the lessons or tactics used are agnostic to a language.
Day 1
The problem setup was the following:
... The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
For example:
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
In this example, the calibration values of these four lines are
12
,38
,15
, and77
. Adding these together produces142
.Consider your entire calibration document. What is the sum of all of the calibration values?
Part 1
Here's how I broke it down:
Identify all the digits and store them in a list
Extract the first and last digits
Sum them all up
def part1(input):
numbers = []
for i in input:
digits = re.findall(r'[\d]', i)
first = digits[0]
last = digits[-1]
joined = int(f"{first}{last}")
numbers.append(joined)
print(sum(numbers))
Part 2
This part added complexity. Within the input strings, there were digits literally spelled out like one, two, three...
Simple enough, I just hardcoded a dictionary to convert the spelled out numbers to ints
, then adjusted the regex
to search for either the spelled out digits, or actual digits.
string_to_number_dict = {
'one': 1,
'two': 2,
'three': 3,
'four': 4,
'five': 5,
'six': 6,
'seven': 7,
'eight': 8,
'nine': 9,
}
And my regex
pattern became:
digits = re.findall(r'(?=(one|two|three|four|five|six|seven|eight|nine|\d))', i)
Putting it all together I had this:
def part2(input):
numbers = []
for i in input:
print("="*80)
print(i)
digits = re.findall(r'(?=(one|two|three|four|five|six|seven|eight|nine|\d))', i)
print(digits)
digits = [string_to_number_dict[i] if len(i) > 1 else int(i) for i in digits]
print(digits)
first = digits[0]
last = digits[-1]
joined = int(f"{first}{last}")
print(joined)
numbers.append(joined)
print(sum(numbers))
Overlapping Matches
The stumbling block in part 2 was the fact that some of the string inputs had overlapping matches. I'll explain with an example. Consider the setup below:
>>> inputs = ['sdpgz3five4seven6fiveh']
>>> digits = re.findall(r'(one|two|three|four|five|six|seven|eight|nine|\d)', i)
>>> print(digits)
['3', 'five', '4', 'seven', '6', 'five']
The regex
pattern above is designed to match either 'one', 'two', 'three'... or any digit \d
. And by manual review, it seems to work pretty well. It captures all the digits, and also 3
and five
as the correct first and last digit, respectively.
However, what I overlooked in the sample input was there were inputs akin to the below. There were spelled out digits that were overlapping. In other words they were sharing one letter.
>>> inputs = ['fiveight', 'sevenine', 'eightwo']
>>> digits = re.findall(r'(one|two|three|four|five|six|seven|eight|nine|\d)', i)
>>> print(digits)
['five', 'seven', 'eight']
When you run the regex on those inputs you only get 1 digit returned. But as you can see, there are technically two digits.
There is a clue as to why this is happening within the re
docs. To quote the definition of re.findall()
:
Return all non-overlapping matches of pattern in string, as a list of strings or tuples.
The solution to handling overlapping pattern matches was to utilise what is called a positive lookahead
.
Positive Lookahead
i = ['sdpgzeightwosvnfvh']
digits = re.findall(r'(one|two|three|four|five|six|seven|eight|nine|\d)', i)
With the above example, the re.findall()
will first come across 'eight'
and then, from there, it is essentially continues searching from where it finished.
sdpgzeightwosvnfvh
And so what is left to 'search' is the following:
wosvnfvh
And naturally it misses the overlapping two
.
When we utilise the positive lookahead
which is written as this: ?=
, it changes the search behaviour.
i = ['sdpgzeightwosvnfvh']
digits = re.findall(r'(?=(one|two|three|four|five|six|seven|eight|nine|\d))', i)
The positive lookahead is saying, at every character, I then look ahead for any of my one|two|three...
. If I find it, I report a match, and then move one character forward and repeat my 'lookahead' search.