It all started as a simple error message from a piece of software that had run seamlessly thousands of times over the past two years.
And yet, once a week in late 2016, this error message would rear its ugly head. At first, it was thought to be a user error. But after some testing, it was clear that we were dealing with a software bug. Thus began a month-long hunt for this elusive bug, becoming as much a personal vendetta as a solution to an intractable problem.
Software bugs are a fact of life; born out of the ever-increasing complexity of our world. Humans work hard at reducing their frequency and effect, but they can never be eliminated.
In our case, what complicated matters was our inability to replicate the problem, which would have allowed us to reliably test and fix the issue. No matter how many times we tried, we simply could not reproduce the bug, and yet, it continued to haunt us week after week.
The problematic code was only 8 lines long. After several weeks of chasing, these eight lines were re-written and simplified. This should have been the end of the hunt, but instead of using the re-written code, it was decided, perhaps foolishly, that the hunt must go on.
Eventually, the whole company was pulled in to aid in the hunt. The source of the bug was eventually discovered, but at what cost?
Listen to your bug-hunter host Hany Fahim explain how it all went down.
Connect with Hany at his company stack.io and LinkedIn.
Don’t forget to leave us a review and subscribe to our channel to keep up with the latest episodes!