I’m the kind of person who hears about a good idea and immediately wants to try it. So when I read about reference class forecasting, and coincidentally heard some managers in my org discussing the problem of software projects going way over (time) budget, I eagerly suggested we try reference class forecasting.
Reference class forecasting is a familiar principle: you should use data to inform your estimate. People tend to underestimate the time it takes to deliver a project when they’re deep in the details of the project. Due to optimism or overconfidence, “everyone thinks they’re above average.” I know I often underestimate. Reference class forecasting suggests you instead predict a project’s delivery time based on the actual delivery time of past “similar” projects. In mathy-softwarey lingo, collect a dataset of project completion time estimates and their actual completion times, define a probability distribution over the past data, and compare your estimate for a new project to the distribution. This will cause you to adjust your estimate to hopefully better match the truth.
“Similar” gives a bit of ambiguity here. How do you decide what class of projects to use to compare against this new project? That can be a problem, but somehow people are decently good at telling whether two projects are similar in scope and complexity. With enough examples of projects like “refactor codebase,” and “migrate off deprecated dependency,” and “extract business logic into a new service,” the correct reference class to pick should be clear without needing to be too rigorous. After all, how many truly novel kinds of software features and projects are there?
This became a topic of an hours-long meeting in which, perhaps betraying my overconfidence and optimism, nothing got done. My suggestion was very concrete: for each sufficiently large project (expected to take at least 1 quarter) everyone involved will come up with an estimate for when it will be completed. We’ll keep the predictions in a spreadsheet or database, and once we have enough, consult it for future estimates.
It was met with mostly crickets. Some people questioned whether the right metric was completion time (calendar time) compared to total engineer-months spent. This sort of makes sense as priorities shift quickly, but adds the complexity that people would have to track their effort spent, which is harder than it seems. Another question was, what do you do with a project that gets abandoned? I suggested it would be good to know how often projects are abandoned.
Others seemed opposed to the idea of keeping a record of predictions. While I might speculate that they were concerned about egos being bruised, externally they voiced that they didn’t think it was worth the work. Either they thought it wouldn’t improve the estimates enough, or else that the value of having better estimates was too low.
This surprised me. Why wouldn’t better estimates be valuable? Is recording estimates really such a burden? This reminded me of the writing of Cassie Kozyrkov, who taught a series of excellent statistics courses internal to Google. Cassie stresses the importance of determining whether the question you want to answer is actually important. That is, will the decision maker commit to a different course of action if the answer to the question changes? If so, then you can devote the time to form hypotheses, carefully collect data, and do statistical analyses.
But more often than you’d expect, people have already decided what they want to do. Data will not change their minds. They may reject the need for data outright. Or they may appeal to data, but only for inspiration and reinforcement. When you need data to confirm your immutable belief, you can always torture the data enough to get a confession. Or at least, you can rationalize discrepancies away in a puff of smoke. In these situations, you, the person responsible for answering the question, are lucky. You don’t need to do any hard work. You can do just what’s necessary to placate the decision maker’s need for a justification to do whatever it is they were going to do anyway.
In the end, we didn’t start tracking project estimates. It seems that “how long will a feature take to deliver” is simply not an important enough question, and we decide to build projects no matter how long they take, or give up when priorities change. So I can provide care-free estimates, and be happy that there are no consequences to being wrong.
Wrapped up in this is a familiar lesson. Talk is cheap, and people will often say they value something that their actions contradict. More subtly, as a consequence people like me, with bright-eyed, bushy-tailed enthusiasm and naiveté, grow jaded, lose trust, and lose motivation. I’m more willing to lean on institutional bloat, and reply to “this is valuable,” with a masked form of “prove it with your actions.” I have to spend more effort reading between lines, engaging in politics, and less time building useful software. Or even, if I had nothing else to do at least I’d clean up old tech debt, which is more valuable than pretend caring about software estimation.
But then, I’m still interested in whether reference class forecasting works well for software! Do you have an experience with reference class forecasting for software projects? If so, how did it go? Or, does your experience line up with mine, that people say they want better estimates, but their actions suggest it’s not valuable enough to put in the work?