Just a few useful ideas from the formal methods world. Read it here!
Let’s stay on theme this week with a quick overview of data invariants. These are statements about the data that determine correctness: an invariant can only be broken if there’s a bug in your program.
One I had to implement recently was an open-ended interval. A start integer could be defined, or an end integer, or both. If only a start was defined, then the interval contains all numbers ≥ the starting point, and similarly with the end point. So the invariants were:
Or, formally
1. Null(start) => !Null(end)
2. Null(end) => !Null(start)
3. !(Null(start) || Null(end)) => start <= end
Once we have invariants, we can use those assumptions to simplify further code. In my case, I had a list of intervals and wanted to find the maximum number explicitly covered. I can write it like this:^{1}
def max(i: Interval): int {
if Null(i.end) {
return i.start;
}
return i.end;
}
Notice I don’t need to check if i.start
exists: by invariant (2), it must exist if i.end
doesn’t. And if i.end
does exist, I know by (3) that it must be the maximal number covered.
For pure data, the invariant can be just checked on initialization. Things are more difficult with mutable data (like an object with state). For this, let’s switch to a slightly less weird Interval type which always has start and end points, so we only need to care about invariant (3). Now what happens if we define setStart
?
def Interval.setStart(new_start: int) {
start := new_start;
}
If we pick new_start > end
, then we break our invariant! So we need to be careful with how we use setStart
and guarantee it’s only called with safe values (or make it defensive). Now here’s a more interesting case.
def Interval.translate(shift: int) {
start := start + shift;
end := end + shift;
}
This looks like it preserves our invariant, but there’s a subtle problem. Imagine we translate the interval (2, 4)
to (6, 8)
. In between the two assignments the interval is temporarily (6, 4)
, which breaks our invariant! It held at the beginning of the call, held at the end of the call, but broke in the middle of it. We could fix this by swapping the order of the two statements, but then we can instead break the invariant with a sufficiently-negative shift.
The easy way out is to say that invariants are “observable” properties and nothing can observe the broken invariant in the middle of the method call, ie we only need to check it at the beginning and end. Okay, but then what about this?
def Interval.translate(shift: int) {
setStart(start + shift);
setEnd(end + shift);
}
It’s the same thing semantically, but now the invariant is preserved at the beginning of the setStart
method call and broken at the end. And we haven’t even gotten into complications like concurrent access or aliasing…
In short, there’s a lot more you have to think about when preserving mutable invariants, which is one of the reasons why immutable-everything is so tempting for people.
Now how do you actually check the invariant is preserved? There’s a few different ways. The first is to use a technique called MISU: “Make Illegal States Unrepresentable”. If you encode the invariant directly into the nature of your data structure, it’s literally impossible for it to break.
struct Interval {
start: int,
offset: nat
}
Where nat
is a natural number (an integer ≥ 0). Here I 2 3
represents the interval (2, 5)
. Since natural numbers cannot be negative, we’ll always compute an end that’s larger than the start.MISU can be really useful when it works. The problem here is that we’re now storing our data in a potentially “unnatural” way that is more difficult to use. Here’s our new setStart
:
def Interval.setStart(new_start: int) {
start := new_start;
offset := offset - (new_start - start);
}
I don’t even know if I got that right! Looks right and passed a couple test cases but I didn’t write a property test just to check this newsletter and I doubt many people would in practice.
Okay if we don’t always want MISU, the next option is the good old assert statement. Assert statements are great. Shove them anywhere you aren’t sure the invariant still holds and a few places you are sure, just in case. Using assert statements is just a good idea in general.
You can formalize “use lots of asserts” into a “contract system”, with preconditions, postconditions, and invariants. Some languages have contracts built-in, others have them through libraries. Usually the contract framework has opinions on how to handle mutable data, as well as other features like only verifying invariants on public methods, letting you toggle off checking invariants, things like that.
(Contracts are actually broader than “just” runtime asserts, as you also have the option to statically check them, similar to types. This becomes a form of formal verification, which is safer and way harder.)
Another thing you can do is separate defining the invariants from checking them, and so be able to test potential data for invariants. Ruby on Rails does this with their ActiveRecord validations. You can put any data you want into a model, and only once you try to save it does it check the invariants. The upside is that you can then call model.errors
to see what invariants would be violated if you tried to save it.
Regardless of how you verify the invariant, just knowing about and communicating it is good for software quality. If gives us a way to check if changes to functionality are safe or not: a change that maintains invariants may be safe, while a change that breaks one is definitely unsafe.
There’s probably a bunch of other stuff I forgot to talk about, might eventually upgrade this to a blog essay if there’s too much I left out
If this newsletter ever gets famous enough to have a bingo card one of the spaces has gotta be “examples in a weird pseudocode” ↩