Hi nerds, I’m back! I didn’t think about tech at all last week so don’t have a whole lot on my mind this week, but let’s see what I can dredge up.
I’ve unpublished the Alloy workshop due to lack of interest, but there’s still availability for the TLA+ Workshop! Jan 4th-6th, 10 AM - 6 PM CST (4 PM - Midnight UTC). Learn how to find critical bugs in your systems in hours instead of weeks! It’s 24 hours of intense, hands-on instruction that will melt your brain out your ears.
Safety and Liveness
I was thinking a bit about Software Engineers Don’t Have Disciplines, where I argue that we haven’t built a lot of the surrounding theory of software engineering: the “disciplines” of things like requirements gathering or designing work queues or whatnot. It occured to me that a lot of the online material about programming lacks explanations and references; it’s all tutorials and how-tos. The more esoteric a topic gets, the more the four types of documentation get out of whack.
Well, not quite. There’s plenty of reference available. The problem is that there’s not references for subsets of the topic needed to help people reach mastery. Nobody lays out the kinds of properties a specification might have in one place. I started out trying to write a quick one and got sidetracked by writing 1500 words on safety and liveness, so that’s what you get instead. Also, I didn’t edit it, because editing is for chumps.
(For new readers: when I talk about a specification, I mean a formal model of the system. Because we’re talking about behaviors, how the system evolves over time, the space of properties is much richer than you’d have with pure functions or single mutations. That makes it harder to test or even express the desired properties at the code level, which is why we use more sophisticated tools like TLA+ or Alloy.)
Most software modeling tools use separate languages for specifying the system and specifying the properties of the system. They commonly use some variant of Linear Temporal Logic (LTL) for properties. In LTL, each behavior of the system is a single sequence of states from start to finish. So if we had
x := 1; EITHER x++ OR x--; x := 10, then the two behaviors would be
This is in contrast to Computation Tree Logic (CTL), which we’ll talk about later. LTL is popular because it is simple, and because it has a very nice property: all LTL properties are some combination of Safety and Liveness properties.
A safety property is, informally, that “bad things never happen”. Like “We never go above the rate limit”, or “we never allow an unauthorized user to access this” or “The clock never runs backwards”. Some common genres of safety properties are:
- Invariants must be true for every single state. The account balance is never below zero. A key is never assigned to two different people.
- Action/Transition Properties limit how the system can change from state to state. If a message leaves the task queue, it must have been taken by a worker or moved to the dead queue. The regulator only changes the temperature by at most 2 degrees per step. The history log monotonically grows.
- Safety properties on subsequences of states (I don’t think there’s a name for this). If the editor shows
x, then you make a change, and then undo it, then you should have
- Boundedness requirements, like “we must alert all other machines between starting shutdown and completing it.” Anything where A must happen within a certain timeframe of B, or before C happens.
- More general properties can be things like “y should never be more than the highest value x ever was”. Having trouble thinking of a more realistic property that isn’t trivially convertible into an invariant.
If a safety property is violated, the system can do it in a finite number of steps. There is a point where you can say “This and this and this happened, and now we’ve messed up.”
A liveness property is, informally, that “good things always happen”. Like “the database will eventually commit the data” or “if the worker continually retries, it will eventually succeed” or “if someone requests a book, they eventually get a chance to borrow it.”
These are much less common than safety properties but often core business requirements. A system satisfies all safety properties if it just sits there and does nothing, since it never does anything bad. But it never does anything good, either. Liveness properties are only violatible by infinite traces: you have to show you never eventually do the thing to have a problem.
Some common genres of liveness properties:
- $thing eventually is true. We eventually show an alert, even if it disappears later.
- $thing eventually is true and stays true. The database nodes can disagree on what was written, but eventually they will all converge on the same thing and it stays that way.
- $thing always comes back to true. The vehicle can temporarily go over the speed limit, but it can’t stay above the speed limit forever, and must eventually come back down.
- $condition leads to $thing being true in the future. If I send a message, you eventually receive it.
- Fairness: If the something can- okay look, fairness is hard to explain in a bullet point, I wrote a lot about it here. But it’s really important.
Common Safety/Liveness Properties
- Algorithm termination is a liveness property. “The algorithm eventually gets the right answer” is liveness. “If the algorithm gets an answer, then it is right” is safety.
- Type safety is a safety property.
- Real-Time Properties: what the system needs to do in a finite time limit. If we get a request, respond within 100ms. We do not allow more than one action per solar day. A batch job must start at least once per hour. These are all safety properties.
- Deadlock-Freedom: There’s never a point where no elements of the system can do anything. More generally, a state where nothing can change ever again. It’s still deadlock-free even if the elements are just spinning in circles; that’s where liveness comes in.
- Starvation-Freedom: Every element of the system eventually reaches its goals. Liveness.
- Refinement, showing one spec is an abstraction(ish) of another spec, is both safety and liveness, and also a huge topic I really need to write about.
- Race conditions are neither. A “race condition” is a class of possible behaviors of the spec, not its properties. Race conditions do often lead to violating properties, though.
Limitations of LTL
A lot of properties are some combination of liveness and safety, enough so to make LTL a popular choice for expressing system properties. The downside is that there are also lots of properties which aren’t a combination, meaning that they can’t be expressed in LTL. One example is reachability. It should always be possible for me to turn off my phone, even if I never do. The state where the phone is off should always be reachable. LTL can’t express that. LTL can’t even express the weaker property “there’s at least one state where it’s possible for me to turn off my phone.”
These are possible in CTL, which looks at branching timelines. The above example of
x := 1; EITHER x++ OR x--; x := 10 becomes
So reachability becomes something like “there is a branch where
x = 2”. But CTL has limitations, too, like being unable to say “$thing eventually stays true”. Different ways of representing properties have different tradeoffs. It seems that tools geared towards verifying hardware designs are more likely to use CTL, but I don’t know if that’s actually true or just my perception.
There’s a language that combines the two called CTL*, but it seems to be less common than using either LTL or CTL, which I’m guessing is for implementation and tractability reasons.
I was not expecting the safety and liveness sections to be as long as they did, so I’ll be brief here. There are plenty of properties that aren’t safety, liveness, or reachability. Probablistic properties, like “$thing should be true 25% of the time”, need language extensions to be expressible. This is more of an adjective than an entirely new class of property, so you might be expressing P-safety with PCTL or something.
Hyperproperties are properties on sets of behaviors. “Observational determinism” says that an algorithm always produces the same answer each time; that’s hypersafety. I don’t even want to try to give an example of hyperliveness or hyperreachability. I don’t think anybody has written a probablistic hypersafety property before, but it’s certainly possible.
All of this is only on behaviors. If you wanted to get really nasty, you could write properties on the state space or behavior space directly. Something like “There are less than 1000 unique states in this model”. I don’t know what the use case would be, but it’s certainly possible.
And you can get more exotic than that, like properties on the class of specifications or ones that assert the existence of behaviors or specs or DOWN THIS PATH LIES MADNESS
All of these are only properties on general temporal systems. As soon as you add in a problem domain other categories may become more important. Like topological properties matter in data modeling problems.
Yeah and this is why I spend weeks editing my essays before putting them on my website. Good thing this is the newsletter!