Over the weekend, I read the question Is the Formal Methods Winter About to End on Lobsters:
Formal methods (theorem proving, model checking, static analysis, abstract interpretation, advanced type systems, program semantics, etc) have remained a relatively niche topic during the last two and a half decades. The field saw much interest during the 1980s and early 1990s. Tony Hoare’s 1996 speech recognized formal methods were so far unable to scale to real-world software, and that event perhaps marked the beginning of a formal methods winter. [\n] Do you think the field is starting to regain attention?
People asked for my thoughts, and they went a bit long as a response, so I’m making it a newsletter. Caveat, since this is a newsletter and not a full post I’m not doing as much research as I normally would: this is all off the top of my head.1 Also, I’m going to be treating formal methods as a giant homogeneous mass, in particular blurring the code/model distinction between approaches. A more serious essay would talk about them separately. I’m assuming you don’t have a background in FM but have the info of an “educated outsider”, about as much as this essay covers.
First of all, one assumption in the question I need to address: there has not been a formal methods winter. The term comes from the AI Winter(s), where the industry suddenly and dramatically lost interest in AI. The same thing never happened to FM, because there never was any industry interest in the first place.
When it comes to improving software correctness, we can divide approaches by what they try to make correct. On one hand, we have the correctness of an abstract algorithm, primarily focused on well-behaved, formal systems. These are the problems that directly come out of the domain of computation: sorting, concurrency primitives, etc. On the other, we have the correctness of code doing something in a “physical” domain.2 We usually think of this as an “academic vs industry” divide, but this split doesn’t map entirely to this.
Early formal methods only focused on abstract correctness. A quick bullet list of early verification:
Abstract algorithm verification had very little relevance to industry, as abstract algorithms make up a microscopic slice of interesting programs. The broader computing world cares much more about physical domain problems. FM designed for physical domains looked very different from abstract FM:
Also, it involved six years of work, 115,000 lines of B, and involved writing two separate B→Ada compilers. That gets into the main reason that industry never got all-that interested in formal methods.
Abstract algorithms have very strict correctness metrics, almost close to a binary “right/wrong”. Physical domains are fuzzier. As such, there’s a countervailing force: how much you’re willing to invest to get correctness. Formal methods, historically, let you push the envelope extremely close to 100% correct, at the cost of massive investment.
This is a fundamental rule of correctness techniques: They compete on power/cost ratio. People aren’t deciding between FM and nothing, they’re deciding between investing in FM and investing in other approaches. FM was competing with the rise of testing, code review, safer languages, and OOP.
Many of these, unlike FM, originated in physical domains. Even the most heavyweight of the informal approaches, stuff like RUP and cleanroom engineering, came from people struggling with physical domains. This meant they all had much better power/cost ratios. FM had a higher maximum power attainable, but that doesn’t matter when correctness is a gradient. So FM never caught on in the industry.
That’s why I don’t think there was an “FM Winter”. Investment in FM was marginal all the way from its conception to 1996. It also didn’t meaningfully drop after 1996, either: academic research continued at about the same pace. In particular the first SMT Solver came out in 1998. SMT solvers are a deep and rich topic but boils down to “automates a lot verification steps you’d have needed to do by hand.” A year after that, Leslie Lamport publically releases TLA+. Both of those sets the stage for the 10’s “revival” of FM. But the steady pace of work continued through the 00’s. It was niche, but it was always niche.
Not yet, anyway.
The AI winter ended in 2012, when AlexNet proved the power of deep learning by crushing the ImageNet challenge. AlexNet drove up the AI power/cost ratio so much that industry suddenly needed to pay attention again.
The closest thing FM has to an equivalent paradigm shift was Z3 in 2006. Z3 was the first SMT solver with a decent UI. In other words, it improved the power/cost ratio by driving the cost down. This is a recurring theme in FM adoption: we need to make it cheaper.
But that was 15 years ago! Why are we only noticing the impact? AlexNet had an immediate impact because it directly connected to physical needs. Z3 was still fairly “abstract”. What it did do was make it easier for other people to build higher-level programming languages with more FM affordances. This pushes the cost down further, as people can now use the programming languages on physical domains.
In other words, Z3 was monumental technology that indirectly lead to a slow tick up in FM usage. FM doesn’t have a watershed moment, a killer app that makes everybody want to use it. It grows gradually due to slow, backbreaking labour by lots of people nudging up the power/cost ratio.
But the same thing is happening with every other correctness technique. And they have a systemic advantage (most of) FM does not: power limits. The law of diminishing returns applies to correctness, too. If you want a tool to be able to give you 100% correctness, that tool has to make fundamental tradeoffs that limit it in other regards. You’d expect FM to have a power/cost “penalty” for this reason: equivalent investment in FM will not give you as much of a ratio improvement as investment in unit testing.
And remember: in physical domains, correctness is a gradient. It is just one constraint of however many you have to deal with. Consider a phone messenger app: would you rather it never crashed, or crashed once a week but is also 3x faster?
I firmly believe that use of formal methods will never be widespread. Which is fine. There’s still a huge amount of room for FM to grow. The majority of the companies who could benefit from FM haven’t heard of it, or never even considered it. We should just be cognizant that, even at saturation, it’ll still be a minority of the broader industry.
Nonetheless, interest in FM is now higher than ever before. Not to the point of being a springtime, but enough to make the future look bright. We’re finally seeing sustained industry investment in FM. The biggest tech companies are now doing so many different things that they all have a few projects where correctness is a higher priority. Microsoft has been here a while (they used Z3 to verify Windows drivers); now AWS, Google, and Facebook are all in the game too. There’s also a lot of interest from B2B tech companies which specialize in commoditizing solutions to difficult abstract problems.
There’s also been some important cultural shifts. First of all, there’s broader interest in software correctness in principle, esp safer languages like Rust and Haskell. These act as gateways into FM, esp towards similar high-level languages. Lots of Haskellers try Liquid Haskell or Idris, for example, and I suspect there’s a small-but-nonzero set of people who go from Rust to SPARK.
But the most exciting thing by far is that we’re close to a self-sustaining critical mass. This simply a numbers thing: there’s now enough people out there who know some form of FM that people can talk shop with each other. This is really, really important. When people talk, they share ideas, they teach each other, they riff, they inspire. Out of this comes new material to draw people in, new tools to make FM easier, and new techniques it apply it more broadly— all impacting the power/cost ratio. A healthy community and culture lead to sustainable adoption of the tech. I don’t think we’re there yet, but we’re a lot closer than we were in 2014, or even 2018.
Here’s some promising signs I’ve noticed:
I think FM will see a lot of growth in the next decade. I don’t think it will go exponential, and I don’t think it will go mainstream. But it won’t be a flash in the pan either. Just the slow expansion of the steady-state as people continue the slow, backbreaking labour of changing the ratio.
A couple of quick things that didn’t fit anywhere but people asked me to cover so
Formal methods could see a massive uptick of adoption if a government regulates software be built with it. I think this is incredibly unlikely, though. Existing software regulations like DO-178C and TIR80002 don’t mandate the use of FM; I don’t believe any do. There is interest in regulated industries using FM to meet requirements more cheaply, but that’s a much more complicated question.
Someone asked me to cover “DbC interacting (in both directions) with compilers, linters and optimizers and unit tests.”
Design by Contract (DbC) is programming with assertions: if asserts fail, you stop the program because you definitely know there’s a bug. Bertrand Meyer developed it in 1988 as part of his work on Eiffel.
I’m not 100% sure why DbC never caught on, but I suspect a lot of it has to do with Meyer himself. He just seems like a very difficult person to work with. He has a very rigid view of computing, getting into a fight with Wikipedia because they wouldn’t write Eiffel code in blue.3 And he trademarked Design by Contract.
Then again, DbC was just a variant of assertion-based programming; while Meyer added a lot of ideas on top of it, the “precondition statement postcondition” core is basically Hoare triples. The ideas have been bouncing around in academia for a long time. Many imperative verification languages use that as the base syntax, so if any of the conditions would fail, the code won’t compile. I believe Ada12 has used assertions for compiler optimization, too, but I’m not certain.
I’m guessing the challenge to wider use of contract-using optimizing compilers is you’d need both FM experts and compiler experts to develop it, maybe even individuals who are experts in both. This is all off the top of my head, though.
Hi, past me, you’re a filthy liar. ↩
“physical domain” is fuzzy: are employee payrolls a physical system? I’m using it here to mean one where the effects of the computation extend outside the computer, so that the payroll system leads to someone getting money. Something where reality gets in the way. ↩
This was in 2006 when Wikipedia didn’t even have a blue font for code. ↩