Advent of Code 2023: Day 1

Advent of Code is a free, annual coding event where each day in December there are a series of coding and problem solving challenges to solve.

This year I've decided to blog my experience and the key learnings. I am coding it all in Python but many of the lessons or tactics used are agnostic to a language.

Day 1

The problem setup was the following:

... The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Consider your entire calibration document. What is the sum of all of the calibration values?

Part 1

Here's how I broke it down:

Identify all the digits and store them in a list
Extract the first and last digits
Sum them all up

def part1(input):

    numbers = []

    for i in input:
        digits = re.findall(r'[\d]', i)
        first = digits[0]
        last = digits[-1]
        joined = int(f"{first}{last}")
        numbers.append(joined)

    print(sum(numbers))

Part 2

This part added complexity. Within the input strings, there were digits literally spelled out like one, two, three...

Simple enough, I just hardcoded a dictionary to convert the spelled out numbers to ints, then adjusted the regex to search for either the spelled out digits, or actual digits.

string_to_number_dict = {
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6,
    'seven': 7,
    'eight': 8,
    'nine': 9,
}

And my regex pattern became:

digits = re.findall(r'(?=(one|two|three|four|five|six|seven|eight|nine|\d))', i)

Putting it all together I had this:

def part2(input):

    numbers = []

    for i in input:
        print("="*80)
        print(i)
        digits = re.findall(r'(?=(one|two|three|four|five|six|seven|eight|nine|\d))', i)

        print(digits)
        digits = [string_to_number_dict[i] if len(i) > 1 else int(i) for i in digits]
        print(digits)
        first = digits[0]
        last = digits[-1]
        joined = int(f"{first}{last}")
        print(joined)
        numbers.append(joined)

    print(sum(numbers))

Overlapping Matches

The stumbling block in part 2 was the fact that some of the string inputs had overlapping matches. I'll explain with an example. Consider the setup below:

>>> inputs = ['sdpgz3five4seven6fiveh']
>>> digits = re.findall(r'(one|two|three|four|five|six|seven|eight|nine|\d)', i)
>>> print(digits)

['3', 'five', '4', 'seven', '6', 'five']

The regex pattern above is designed to match either 'one', 'two', 'three'... or any digit \d. And by manual review, it seems to work pretty well. It captures all the digits, and also 3 and five as the correct first and last digit, respectively.

However, what I overlooked in the sample input was there were inputs akin to the below. There were spelled out digits that were overlapping. In other words they were sharing one letter.

>>> inputs = ['fiveight', 'sevenine', 'eightwo']
>>> digits = re.findall(r'(one|two|three|four|five|six|seven|eight|nine|\d)', i)
>>> print(digits)

['five', 'seven', 'eight']

When you run the regex on those inputs you only get 1 digit returned. But as you can see, there are technically two digits.

There is a clue as to why this is happening within the re docs. To quote the definition of re.findall():

Return all non-overlapping matches of pattern in string, as a list of strings or tuples.

re.findall() - Python 3.12.0 documentation

The solution to handling overlapping pattern matches was to utilise what is called a positive lookahead.

Positive Lookahead

i = ['sdpgzeightwosvnfvh']
digits = re.findall(r'(one|two|three|four|five|six|seven|eight|nine|\d)', i)

With the above example, the re.findall() will first come across 'eight' and then, from there, it is essentially continues searching from where it finished.

sdpgzeightwosvnfvh

And so what is left to 'search' is the following:

wosvnfvh

And naturally it misses the overlapping two.

When we utilise the positive lookahead which is written as this: ?= , it changes the search behaviour.

i = ['sdpgzeightwosvnfvh']
digits = re.findall(r'(?=(one|two|three|four|five|six|seven|eight|nine|\d))', i)

The positive lookahead is saying, at every character, I then look ahead for any of my one|two|three.... If I find it, I report a match, and then move one character forward and repeat my 'lookahead' search.