Worked pretty well last time, so let’s do it again! This Friday, 11 AM CST, Zoom Room [redacted], password is [redacted]. Not sure if this is going to be a weekly thing, might move the time if it is. Feel free to send thoughts my way.
I recently got a lot of followers from my essays on software history. A lot of you might not know what it is I do for an actual living. And it’s something I’ve always had trouble explaining to people, even other programmers. I work in a pretty weird niche of programming, and while I want to make it more mainstream, it’s a pretty uphill battle as you’d expect. Let’s see if I can explain what formal methods are!
Note: I cover this all in a lot more detail in Why Don’t People Use Formal Methods?. If you want a deeper intro, that’s a good 6,000 words for ya.
I’m sure you’re all familiar with testing. I have a function sort and want to make sure works properly. I’ll assume for now that we are only dealing with integer arrays. We might write a test like
assert sort([2, 1, 3]) == [1, 2, 3]
But all this is doing is testing a single example of what the sort function should do. I have no guarantee based on this test that, say, an input like [1, -5, 0, 0, 2, -9, 100] would work! All I know is that the specific input I tested works.
[1, -5, 0, 0, 2, -9, 100]
This is fine for most cases. But there are certain domains where this is not enough. It’s not enough for something to be probably correct, or definitely correct most of the time. It must be definitely correct all of the time. We’re willing to put in extra effort if it guarantees that we are safe in every possible case.
To do that, I first need a specification. The specification is a description of what I want my program to definitely do. In this case the specification of sort is this:
forall i, j in 0..Len(l)-1:
if i < j, then l[i] <= l[j]
If I can show that this code implements this specification then I’m guaranteed that the output of this function will always be sorted, no matter what I put in. This is called formal verification. We take a piece of code and mathematically prove that it does what we expected to in all circumstances. This is not a pipe dream! There are tools that do this right now. We’ve actually been doing this kind of work since at least 1975, and the tools have only gotten better since.
If it works, why haven’t you heard of it? It’s much harder than it sounds, and it sounds pretty hard. First of all, it’s meaningless to talk about correctness without being very explicit in what kind of specification you mean. Proving something correct only proves that it matches the specification you intend, not the you have the right specification the first place. For example! Here’s a function that satisfies our specification of sort:
def sort(l: list[int]): list[int]
ensures forall i, j in 0..Len(l)-1:
if i < j, then l[i] <= l[j]
The problem is that we didn’t specify that it should be the input list that is sorted. Our specification did not match our intentions. Writing the correct spec in the first place is half the battle.
And even then, lots of things can interfere with correctness. Consider a function that modifies an array in place. If you look at just that function, it might seem that it guarantees the array is sorted. Now imagine that in a separate thread we are mutating the same array. The sort function cannot guarantee that the output is sorted, because something else could unsort it before sort finished. Pretty much everything that makes programming languages more expressive also makes them harder to prove.
Finally, there’s the actual process of proving itself. Most code verification tools need a lot of hints from the programmer or steps filled in for them. It’s still possible, but you need a lot of time and skill. This all makes it cost-ineffective for most orgs compared to things like code review and testing.
Okay so proving code is hard. One thing we can do is make better tools to prove code. And as I’ve said before, tools have been getting a lot better over time. There’s also something else we can do: instead of proving code correct, we prove abstract models of the system correct. Then, once we’re confident the system works, we implement it in code. There’s no connection between the proven abstract model and the implemented code, besides for code review and careful testing and stuff. This makes it a lot easier to write and prove properties about. In fact, showing properties hold is often completely automatic, with no extra “hinting” from the programmer needed.
It’s kind of hard to explain, but I like to think of it as testing ideas instead of code. I’ve also heard it described as “design verification”, “debuggable design”, and “lightweight formal methods”. It’s also a lot more effective than people think. Probably best to show that with an example. Imagine you have
What kinds of pathological cases would you test? What kind of weird edge cases do you need to worry about?
How about this one?
That’s where a user has direct access to a parent resource, but it also has a role that is blocked by the child resource. Does this case satisfy all your properties? If you’re lucky, you spend a day tracing the code before deciding that yes, it does. If you’re unlucky, you find out late in development and have to redo your entire access model in order to handle this situation.
By contrast, I wrote those four rules in Alloy and it found this particular case in a tenth of a second. And it would test this case, and all other complicated cases, for any property I wanted. Would you rather find problems a few minutes into modeling or months into your development process?
This is stuff I believe is world-changing. I’ve employed a few of these design verification languages to production systems and the results have been consistently stellar. My favorite case was a greenfield project I was hired on as a consultant. They were trying to create a new kind of blockchain for their purposes and had already built a prototype. Within three hours of modelling, we discovered that a blockchain was completely the wrong data structure and they had to start from square one. Possibly the only time anyone was persuaded not to use a blockchain.
This kind of design modeling is incredibly powerful. But it’s also hard to use. This isn’t an essential problem but an accidental one. It just happens that they don’t have great UI/UX, and most aren’t well documented. If something is hard to learn, you’re not going to use it, even if it has the potential to cut months off your development time.¹
Most of my work is on making these tools more accessible. That’s mostly been writing documentation and guides to make it easier for beginners to get started.² More recently, I’ve been doing work on auxiliary tooling, mostly CLI. I also teach, which is where most of my income comes from. Companies want to use formal specifications, so they bring me in to teach some people.
(I care a lot about teaching. One of the best compliments I’ve gotten was from a student who said it was the only corporate workshop he’s ever been to that was worth the time. 16 hours of hands-on labs and group exercises.)
I’ve done the most work on two of these tools:
Here’s some of my favorite stuff I’ve written on using these tools:
If you’re interested in using this stuff at work, I also wrote a few guides:
¹ I call this the “flossing problem”. Half of Americans don’t floss, even though it’s like two minutes a day. If the annoyingness of flossing is enough to get people to chance root canals, then poor UI/UX on a tool is enough to sink it.
² I also run Let’s Prove Leftpad, where people can compare and contrast different theorem provers. But that’s more on the “verifying production code” side of things.