Improve your debugging by asking broad questions

what

                            February 2, 2023

                Improve your debugging by asking broad questions

                        Another form of divide-and-conquer

                        I recently had to help a friend debug a Word issue where fonts would randomly change to Greek symbols. It got me thinking about theories of debugging in general. At my last job, I was the Debugging Guy. I'd semiregularly have a sprint task like "this other team is seeing a weird behavior in an old system, help them figure out what's going on." I was pretty good at it, but I couldn't explain what good debugging looked like to other people.
Since then, I found a couple of good resources. First is Julia Evan's posts on debugging. Second is David Agans' book Debugging, which I really like as an introduction, so much that I bought a bunch of copies to give as gifts to early-career friends. 
On top of what they say, here's one technique I like: you can speed up debugging by asking broader questions.
Debugging as hypothesis building
Debugging is really an application of the scientific method. We observe a discrepancy in between the system's expected behavior and actual behavior. Based on our observation, we come up with a hypothesis for why they diverge. We then check our hypothesis, such as by adding instrumentation, writing a test, or just trying out a fix. Finally, we either confirm our hypothesis or falsify it, in which case we come up with a new one. 
Most of the time we ask narrow questions which are helpful when confirmed and not-helpful when rejected. If you make a lot of wrong predictions, then debugging boils down to guess-and-check. If you instead ask broad questions, you learn less when they're true but more when they're not. Then you iteratively close in on the actual source of the bug.
Okay that's all really vague so I'll give an equally-vague example. Let's say I've got a ticketing system that randomly crashes, but only on Wednesdays. It's a complex system, so there's a lot of possible bugs that could cause this, a sort of "bug space".

I look at this and think "aha!" 'Wednesday' has 9 characters in it! I bet TimeFarbler is storing the day string in a fixed array and the crash is an overflow. Kind of a weird thing to jump to, but I've seen weirder bugs. 
The prediction is narrow it only covers one possible bug explanation. If I'm right, then hooray I'm done.

Now I go and check that hypothesis and, to my shock, it's not that at all. Instead of ruling out 99 explanations, I've ruled out 1. Making that prediction didn't do me much good.

Now let's instead make a broad prediction. I predict there is something different in how the system is used on Wednesday, and that different use is causing the bug. This is a lot vaguer than "'Wednesday' is an overflow", and if it's true, it doesn't give me an obvious candidate for the bug, just 20 possible candidates.

On the other hand, if it's false, it still rules out 20 possibilities! That narrows things down much faster. If I make three broad predictions, and all of them are wrong, I rule out over half the possible explanations, when three wrong narrow predictions leave me with 97 possibilities left. 

Limitations
So, there's a reason we normally ask narrow questions. Broad questions have three drawbacks:

It's harder to come up with broad questions. I find I naturally jump from narrow guess to narrow guess, and forcing myself to zoom out and ask big questions is just tougher. 
It's harder to test a broad prediction. It's one thing to inspect TimeFarbler for a buffer overflow, quite another to find all differences in Wednesday usage patterns!
I wrote it like the broad questions give you a binary yes/no on any given explanation, but that's not really true. Even if the system is used different on Wednesdays, that could just be a coincidence. Rather, broad questions make some explanations more or less likely. You still need to make narrow predictions to narrow it in.

Now the obvious response to this is that broad predictions are just a tool, like any other tool, that you use when appropriate. But I think it's also true that it's a tool that takes more skill to use. So I'm not just arguing that "you should ask broader questions when debugging", but also that "you should ask broader questions when debugging, even if it doesn't necessarily seem that helpful, because it takes some time to get good at it and investing that time is really worthwhile."
Damn that's a mouthful.
To make "asking broad questions" become useful more quickly, here's a synergistic technique I like. Whenever I find a tricky bug, I ask myself "how could I have found this faster?" Would it have been easier if I knew my debugging tools better? If I knew more about the broader system architecture? If I was quicker to ask a teammate for help? The feedback cycle helps me upskill more quickly.

March TLA+ Workshop
So far we're at 25% capacity for the 2022-03-20 TLA+ Workshop. Will it be fun? Yes. Will it feel like taking a power drill to the skull? Also yes.

                    If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.

                            Don't miss what's next. Subscribe to Computer Things: