One reason I like teaching is it helps me understand things better. This came up in a couple different ways while working on the new book, and I’m too delighted by it not to share. Say we have the following user requirements:
Our conference center has a bunch of rooms that are reserved by people each day. Each room has a set of time slots that can be reserved. We need software that supports this.
That gives us three domain objects,
Room, and (time)
Slot, and the concept of reservations. But there are many different ways our software could represent reservations:
phas a reservation for
schedule[r][s] == p.
(r, s) in p.reservations.
table.exists?(t => t.r == r && t.p == p && s between (t.start, t.end).
All of these have their tradeoffs, and there’s a lot of factors you have to consider. One of these factors is correctness: do any representations make bugs less likely? And here’s where predicate logic can help. If the system diverges from the user requirements, that’s a bug, and we can use logic to formalize which requirements matter. First, let’s define the primitive predicate
Reserved(p, r, s):
Reserved(p, r, s): Person `p` has a reservation for room `r` at time slot `s`
This is higher-level than any of our possible representations: for each one the implementation of
Reserved in our code will look different, but the meaning is independent of our choice of structure. Now we can use this to define properties of the system. For example, the user might want that the same room and time can’t be reserved by two separate people simultaneously. In logic, we’d write that
all p in Person, r in Room, s in Slot: Reserved(p, r, s) => all p' in Person - p: !Reserved(p', r, s)
In English, for any given person, room, and slot, if the person has a reservation for that room and slot, then all other people don’t have a reservation for that room and slot. It’s still possible for nobody to have a reservation, that isn’t ruled out by our property.
The field and table representations I gave can violate this property, so we’d need validation to make sure it never happens. The schedule representation, however, cannot violate it. Since
schedule[r][s] can only have one value, there will only be at most one person scheduled for a given pair. You might be familiar with this concept as making illegal states unrepresentable— we used predicate logic to determine what states should be illegal.
This doesn’t mean that the schedule is the best representation! There’s other tradeoffs we need to consider besides this one property, some of which may even be other properties that other representations can satisfy more easily. Maybe the client also wants that the minimum reservation time is two hours:
all p, r, s: Reserved(p, r, s) => Reserved(p, r, s+1) || Reserved(p, r, s-1)
In which case the table representation would most easily ensure that property. The properties give us feedback on the best implementation, but they don’t decide it for us. Though if we decide something else, we’ll know what props we need to validate, and which ones we’ll need to start validating if we change our data representation later.
Anyway, I only really internalized the connections between formalizing requirements and data representation while writing a lesson on the former. That’s why I like teaching so much: it helps me understand things better.
Something else I realized while writing this: I defined schedules as
schedule[r][s] == p. I could have also done
schedule[s][r] == p. In most programming languages, that would be a dict of dicts (or an array of arrays), so the two are structurally different representations. You can’t switch between the representations. In array programming languages, like J, that would be a single 2D array, making multiple (similar) representations freely possible:
NB. two rooms, four slots ] x =: 4 2 $ 'acbcb ba' ac bc b ba NB. put slots first 1 0 |: x abbb cc a
I’m guessing there’s cool stuff you can do by making a 3D boolean array so that all three domain objects are axes, but I wanna avoid that rabbit hole until I send this newsletter, at least.
Why not have multiple representations simultaneously? Probably because they’d fall out of sync and cause all sorts of horrible bugs. But besides that? If you’re doing constraint solving, you’ll often want to represent the same data in multiple formats. Different constraints work best on different formats, so using several representations can be a major optimization. This is called channeling.