Whole lotta talk about formal methods last week. Moshe Vardi dropped an ACM piece, Jean Yang wrote an essay, and Nicholas Tietz-Sokolsky frontpaged Hacker News. I missed most of these as they happened because
I was fucking around with ML art working on important consulting stuff. Looking back at the discussions, though, I saw a lot of the usual misconceptions people have about formal methods. It’s a very interesting and important field, but not one that programmers are familiar with, and that leads to a lot of misunderstandings. This is an attempt to address some of these. Disclaimer, I specialize in a couple of these tools, but believe I have enough familiarity with the rest to present them accurately.
(To be fair, a lot of these myths are actually mostly accurate, but they don’t paint the full picture, where the full picture is different enough to make FM considerably more appealing.)
“Correct” is a dirty word in FM. In FM, we verify conformance to a specification: a rigorous, unambiguous description of the properties of our code. “Correctness” is something the client decides. We generally write the spec under the assumption that it represents “correctness”, so that conformance to spec means “correct”, but that’s not always true. If the spec is wrong or incomplete, then conformance doesn’t mean much.
This sounds like a quibble, but it’s very important for two reasons. First, it’s more honest about the limitations of FM. You might hear FM people say “the environment is outside the scope of the proof.” That’s us being very careful about what we can and can’t guarantee: If our spec doesn’t explicitly include environmental effects, we can’t say anything about how our program behaves under environmental effects. Second, knowing how to write good specs is just as fundamental to FM as knowing how to verify them. In fact, I’d argue that “writing good specs” is the most useful part of the FM discipline for general programmers. It’s something that’s very hard to outsiders, but very easy once you have some practice, and very useful in general. Being able to see a function and go “ah, here’s the spec” is useful for writing tests, assertions, types, everything. I also find that my workshops students need a lot of practice with it, regardless of the specific formal method they’re using.
It is true that code is the most common target of FM. But anything that can be “computationally encodable” can be verified. This includes broad-scale system designs (TLA+), requirements (Alloy), language semantics (K), business processes (BPMN), hardware, ontologies… Most of the focus has been on code, because there’s a whole lot of it, and people are less likely to realize they can encode other stuff in a verifiable way. Which is a shame: we have lots of tooling besides FM to help us write correct code but almost nothing to help us write correct other stuff. Making that other stuff verifiable gives us tools we don’t otherwise have, and I think that’s where formal methods can do the most good the most quickly.
Again, mostly true, but proving implies you’re writing a full mathematical proof yourself. That only happens when you’re working with a proof assistant like Coq or Isabelle. Most of the time people do formal verification via tooling abstractions: model checkers, SMT solvers, simulations, typecheckers, etc.
This might seem like a minor terminology quibble, but it has big impact in how we think about the field. It implies that you need a strong background in pure math to get anything done. While knowing math certainly helps, there are lots of tools you can use without having to write full proofs yourself. Take a gander at Let’s Prove Leftpad, a collection of almost 20 proofs of leftpad in different formal verification languages. Some of them are full step-by-step proofs. Some of them look like this:
That’s the entire verification of leftpad in Idris. At that point, you’re not really writing “proofs” anymore: the computer is doing most of the proof for you. This makes the field a lot more open to newcomers who’d otherwise be scared off by writing mathematical proofs.
Here’s a formally verified LeftPad in C. Frama-C retrofits a specification language onto C, so you can use formal methods on C code. Most enterprise languages have an FM retrofit available. These languages aren’t designed to make FM easy, though, which is why people do a lot of work in specialized FM languages. But you can still use FM on legacy projects.
(Generally developer-oriented affordances make verifying code harder. Things like dynamic dispatch, free aliasing, user input, etc. There are also “hybrid” languages like Ada, where a subset of the language is designed around verification and the rest around general usability.)
You need a lot of expertise to use a proof assistant, where you’re writing most of the proof by hand. The further you get from proof assistants, the less academic background you need. I’m not saying easy, I’m just saying it’s something everyday devs can learn to use well (with time investment).
Proof assistants dominate popular imagination for two reasons:
People think that FM is only useful if you can verify all properties of your entire program: a full verification of an end-to-end spec. This is true if you want absolute assurance. You can also do partial verification, where you verify less in a way that’s cheaper. You could:
I’ve seen all of these work well in practice. Really depends on the problem domain and what you consider important.
The public perception is that FM is only worthwhile on incredibly expensive mission-critical systems: “nukes ‘n NASA”, as I think of it. This is because most people consider the tradeoff as
time learning FM + time applying FM vs financial cost of a bug in production
Since most bugs don’t cause that much financial damage, the time spent applying FM has to be really small to make this all worthwhile. Which means you can’t do much with it.
I think a more accurate tradeoff is
time learning FM + time applying FM - time saved applying FM vs financial cost of a bug in production + time spent debugging
There are lots of bugs that are only mildly financially damaging but are brutal to find and fix. Eliminating them beforehand doesn’t save much raw money but does save a lot of developer time. More important is the “time saved applying FM”, which takes a lot of people by surprise. Why would spending more time on verification save time on the project overall? Lots of ways!
It’s rare that FM will save you net development time over not using it (though I’ve seen it happen). But the design speedup and the time saved on debugging (which can be significant) tilt the economic cost of applying FM, making it economical in a wider range of settings.
(We can push the balance further by making FM tooling easier to learn and use. I’m really proud of the work I’ve done on FM pedagogy, which opened the discipline up to a lot of people who otherwise wouldn’t have bothered.)
As my readers are no doubt sick to death of hearing, I first started writing formal specifications at a company with ten engineers. In the course of my teaching, I’ve heard from people who’ve successfully applied it at even smaller companies, even in cases with just one or two engineers. I think you’re more likely to see FM at larger companies because they have more projects that “obviously” benefit from it, and with more engineers comes a bigger chance of an individual championing the discipline.
I do believe that verifying systems is beneficial to a wider range of companies than verifying code is, which is why I specialize in teaching TLA+ and Alloy and not Coq or Dafny. But I’m always on the lookout for new languages or technique improvements that would help code verification hit mainstream, and I’m confident we’ll start seeing big advances in affordability over the next few years.
“FM tells you if you built the thing right, but not if you built the right thing.” This is just an extreme example of a basic fact about client requirements: if they change, you’re going to have to throw work away, whether that’s just some code, a full test suite, or a mathematical proof. It’s just that unit tests are cheaper to write than formal proofs, so you’re out more money with FM.
The problem is this is usually trotted out as a universal fact of software, when it’s only true for a subset. Requirement changes happen most often when you’re writing software for an outside client, like you’d see in consulting. If you’re making internal software, or selling a software product, then you have more control over requirement changes. You’re also more likely to be maintaining legacy software and writing infrastructure code. If you need to move a nightly batch job to a realtime task queue for performance reasons, it’s incredibly unlikely your boss is going to say “nevermind put it on a blockchain”. I mean they could, but it happens less often. Project timelines are longer and so FM is more valuable.
Let’s close with a misconception on the other side of things: the common idea that without formal methods, software will never be “good”. The poster child for this misconception is Edgar Dijkstra’s quote:
program testing can be used very effectively to show the presence of bugs but never to show their absence.
Funnily enough in the exact same lecture he says
Instead of trying to devise methods to establish the correctness of arbitrary, given programs, we are now looking for the subclass of what I would like to call “intellectually manageable programs”, which can be understood and for which we can justify our belief in their proper operation under all circumstances without excessive amounts of reasoning.
The whole piece is actually about structured programming! Funny that.
Regardless, there’s a big difference between code being correct and being verified correct. You can determine code conforms to a spec to a very high degree of confidence without needing a proof: sort of being “99% sure it’s 100% correct.” You can do this with really thorough testing, or really thorough code review, or following strict processes, etc. Oftentimes these will be cheaper than full end-to-end code verification. Even with “mainstream” techniques like testing, CI/CD, and regular code review can get you really high confidence in your correctness. You don’t need formal methods for any of this.
Hopefully this cleared up some basic misconceptions on formal methods! Probably not all of them, but there’s only so much I can do in a single newsletter. If you’re interested in learning more, I’ve got an old piece called Why Don’t People Use Formal Methods, which is both a general intro and a discussion of the practical challenges. Hoping to drop a spiritual sequel later this year!
Also I consult in this stuff, yada yada
Now this Friday to accommodate 4th of July vacays. We’re also accepting remote submissions!
Proof assistants have also gotten better in the past 15 years, with things like Sledgehammer removing a lot of the tedium. ↩