When designing software systems, we care about flexibility as requirements change. One might break that down into two rough parts. First, flexibility across systems—i.e., when the way that two or more systems interacts needs to change, one can adapt without significantly changing any individual component. A lot has been written about this, including the Unix Philosophy, SOLID (some of which overlaps this article’s content), modular programming, and software contracts.
Second, which I will focus on in this article, is flexibility within a component. For this article, a “component” is a single deployable binary, which could be a web service, or a background job, or a data processing pipeline.
By flexibility within a component, I mean that changes to requirements—such as the semantic interpretation of its output, the incorporation of a new input, or an efficiency improvement—don’t require major redesigns. Sometimes it’s inevitable, but ideally the amount of code that needs to change should be somehow proportional to the desired impact of the requirement. The benefits should justify the effort.
One approach to designing a component for flexibility is what I informally think of as “layering abstractions.” A software component is an onion. The outer layers get weathered, bruised, and eventually peeled off, but they protect the inner layers. Then, breaking with the metaphor, the core rarely needs to change, and can be reused as the core of another onion. The amount of work required for new applications depends on how many layers need to be modified or rebuilt to support it.
For me, this approach works well in the context of a mathematically inclined system, where the layers usually correspond to:1
Interfacing with the outside world : ! ! ! ! ! ! : ! ! ! ! ! ! : --------- r-' : --------- : !`. : --------- : ! !`. : ========= n ! ! !`.-.-.-.-.-.-. '-n ! ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! : .'-n --------- : .' ! ----------- : .' ! ! ----------- : Applying hyper-specific business logic (hacks and patches) .-.-.-.-.-.' ! ! ! ----------- : : ! ! ! ! ! ! ! ! ============= : : ! ! ! ! ! ! ! r-' : ! ! ! ! ! ! r-' : --------- r-' : --------- : !`. : --------- : ! !`. : ========= n ! ! !`.-.-.-.-.-.-. '-n ! ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! : .'-n --------- : .' ! ----------- : .' ! ! ----------- : Generally applicable business logic .-.-.-.-.-.' ! ! ! ----------- : : ! ! ! ! ! ! ! ! ============= : : ! ! ! ! ! ! ! r-' : ! ! ! ! ! ! r-' : --------- r-' : --------- : !`. : --------- : ! !`. : ========= n ! ! !`.-.-.-.-.-.-. '-n ! ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! : .'-n --------- : .' ! ----------- : .' ! ! ----------- : Interface to mathematical model .-.-.-.-.-.' ! ! ! ----------- : : ! ! ! ! ! ! ! ! ============= : : ! ! ! ! ! ! ! r-' : ! ! ! ! ! ! r-' : --------- r-' : --------- : !`. : --------- : ! !`. : ========= n ! ! !`.-.-.-.-.-.-. '-n ! ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! ! : '-n ! ! ! ! ! ! : '-n --------- : .'---------- : .'! ---------- : .'! ! ---------- : .'! ! ! ========== : ! ! ! r-' :: Raw algorithm ! ! r-' :: or model ! r-' ::
Here’s an example I worked on recently, with the domain of datacenters replaced with the domain of libraries. In this problem you manage a library system and you want to make sure there are enough children’s books among the different branches. You know where the books are now, and you have some idea of where the demand for books is, and you want to make sure that some branches get priority over other branches,2 but that there is a minimum supply of books in every branch.
This problem naturally comes with caveats that you only learn along the way. For one, we also care about particular categories within “children’s books,” such as cardboard books, classics, new releases, books about animals, etc. Categories might overlap, and we want enough of each category at each branch, not just total book count. And there are always exceptions: Goodnight Moon is required at every library. We also need to be aware of which books are currently checked out and can’t be redistributed yet.
The non-layered approach would pick an algorithm, and then expose the algorithm to the details of the children’s books and categories, branches, checked-out books, and exceptional cases. A layered approach would pick a general algorithm at the bottom layer, and higher layers would shape or simplify domain specificity and data to fit the algorithm’s interface.
For example, whatever algorithm we choose, the lowest level should know nothing about books. Instead “books” would be a bucket of integer-valued “supply” of a generic resource bucket. And the demand for books in various branches would be similar (I’m using a second type for name clarity later). In Java, modeling everything with just interfaces for brevity:
interface Supply<B, L> { B bucket(); int quantity(); } interface Demand<B, L> { B bucket(); int quantity(); }
Here, crucially, a B
represents not just a single item,
but a “bucket” of items that can include
the branch where the books are currently,
the category of books—which could be a singleton,
like the “Goodnight Moon” category—or
other details we haven’t thought of yet.
Notably, this setup assumes buckets don’t overlap.
If categories overlap, then a bucket would have to
correspond to a multiset of category labels,
which would be simple to model for supply,
but might be difficult to model for demand.
Prioritization rules would be provided to the lowest level as a comparison of buckets, and minimum thresholds are similar.
interface DemandPrioritization<B> implements Comparator<B> { @Override int comparePriority(B left, B right); } interface SupplyBuffer<B> { int minimumBuffer(B bucket); }
The core algorithm would then operate with this interface
interface SupplyReallocation<B> { Collection<Supply<B>> reallocate( Collection<Supply<B>> supply, Collection<Demand<B>> demand, DemandPrioritization<B> priority, SupplyBuffer<B> minimumSupply); }
The caller of this code (still in the same component)
is required to define the buckets as concrete types,
make choices about how special cases are represented
(such as a Goodnight Moon bucket),
and implement the two interfaces required.
For example, minimumSupply
might be provided
by a class that is backed by the data pulled
from a database table,
or even hard coded until “real” data is available.
Imagine a twist: there are some books in “sets.” If you move one book, you have to move the whole set. Where would we make this change? We could redefine the buckets so that the unit of supply is a “book or set.” This would require no changes to the hard, algorithmic part.
Alternatively, if the rules for how books may move around get sufficiently complex, or the buckets suffer combinatorial explosion, one could expand the algorithm to be more general by accepting as input a “reallocation impact rule” that, according to some limited interface, defines impact of book movement. E.g.,
interface ReallocationChoice<B> { B sourceBin(); B targetBin(); int quantity(); } interface ReallocationEffect<B> { Collection<ReallocationChoice<B>> impactOfChoice(ReallocationChoice<B> choice); }
This should have well-defined behavior like throwing an exception if the choice is incompatible with the required effects. Of course, not all algorithm implementation choices would be compatible with such an extension. But the point is that by generalizing the domain slightly, you admit more flexibility for changes, and they often require only changes to how the data in the outer layers is reorganized and passed to the core. And, as in mathematics, sometimes generalizing the problem slightly points to a more obvious, efficient solution.
At the layer just above the core algorithm, we can split the business processing into more layers. Typically I like to think of these higher layers as split between long-term and short-term business logic. The long-term logic might be organized like these interfaces.
interface BookBranchQty { Book book(); Branch branch(); int quantity(); } interface BookReallocator { Collection<BookBranchQty> reallocate(Collection<BookBranchQty> currentDistribution); }
Here the details about books, branches,
and such are made explicit,
as well as being aware of things like
checked-out books vs not-checked-out books, if needed.
In the above,
BookReallocator
might wrap (via a builder)
the additional data needed to group BookBranchQty
into the right Supply
buckets, along with
a DemandPrioritization
instance to pass to
SupplyReallocation::reallocate
.
And crucially, “hacks” meant to deal
with short term situations
should live outside of BookReallocator
,
and only be applied to the data passed to it.
In some sense,
that means that you’re designing
both the BookReallocator
and the deeper SupplyReallocation
instance
to be flexible enough that “hacks”
can be injected via the normal interface.
One might also say that makes them “not hacks,”
but your mileage may vary on that.
Say a beloved children’s book author donates her private collection to your library system. We can inject new data to BookReallocator representing the donation as its own branch, with zero demand or buffer at that branch.
Say some books are suddenly deemed too racy for children, and they need to be phased out quickly, but we want to make sure we replenish enough books before pulling the bad books out of circulation. Add a new fake branch called “The Dump”, with infinite demand for the bad book, and gets highest priority.
And, of course, the core algorithm is generally useful! You could apply it to other kinds of supply/demand problems related to your library. Perhaps reallocating volunteer hours or funding dollars. And those, likely, would come with their own peculiar hacks and business logic.
Stairs ASCII art from baustein.eu. ↩
One might assert that libraries, as a public resource, should prioritize underserved communities. ↩