Software Isomorphisms

also

                        October 25, 2022

                Software Isomorphisms

                        SUPRISE MATH LESSON

                        Last week Gabriella Gonzalez wrote What does "isomorphic" mean (in Haskell), which covers isomorphism from a first-principles perspective. At the very end she says

In my experience, the more you train your ability to reason formally about isomorphisms the more you broaden your ability to recognize disparate things as equivalent and draw interesting connections between them. [\n] For example, fluency with many common isomorphisms is a useful skill for API design because often there might be a way to take an API which is not very ergonomic and refactor it into an equivalent (isomorphic) API which is more ergonomic to use.

I want to expand a bit more on how they're useful. I'll be gleefully butchering rigor here, so if you want a more rigorous treatment I'll rerecommend Gabriella's post.
Isomorphisms
Let's start with an example. Take the string "abc". We can either represent it with a string type or as an array of characters, [a, b, c]. We can also create conversion functions between each format:
split("abc") = [a, b, c]
join([a, b, c]) = "abc"

These two functions are inverses of each other, and split(join(char_arr)) == char_arr and similar with str. Now let's say you have [a, b, c] and you want to upcase it, where upcase only applies to strings and characters. You can do this with map(upcase, [a, b, c]). But you can also do it this way:
split(upcase(join([a, b, c]))) = [A, B, C]

And, pushing this further:
split(upcase("abc")) = map(upcase, [a, b, c])
join(map(upcase, [a, b, c])) = upcase("abc")

In fact, for any string transformation ts, there's a corresponding array transformation ta, such that ta(split(str)) == split(ts(str)) and similar for char_array. We can say that *waves hands* strings and character arrays are isomorphic to each other. 
[aside] (Okay this isn't really an isomorphism, it's actually a consequence of the isomorphism, which is formally defined as split and join forming a bijective monoid homomorphism (over concatenation). If that didn't make sense to you, no worries I'll explain at the end after we've built more of an intuition. If you understood it and it was wrong, I'm sorry it's been years since I studied abstract algebra and all my brains have leaked out of my ears.) [/aside]
In other words, when data structures are isomorphic, we can choose which version we operate on. 
Now, here the "fast and loose" part comes in: we can also say that a subset of data values are isomorphic to another subset. F.ex strings are not isomorphic to arrays of strings (at least with split and join):
split(join([ab, c])) == [a, b, c] != [ab, c]

But the set of "string arrays where each value of the array is a length-1 string" is isomorphic to the set of strings. As long as we stay in that subset, the isomorphism is maintained.
Okay, now that we have a rough concept of an isomorphism, let's talk about some uses. 
Simple Uses
So the most common way we use isomorphisms is to make data more convenient to work with. If you need to modify a JSON file, do you directly sed file itself or do you load it into a data structure, manipulate the data in a program, and then dump it back to the file?¹ Bam, you used an isomorphism. Sometimes when working with small intricate data I'll create multiple synchronized representations of the same data structure to make various operations easier.
We can also use isomorphisms to avoid bugs. Consider storing sequential information in a database, like the order of songs in a playlist. Standard practice is to add an order column to the table, which you update every time you modify the playlist. If you rearrange or remove songs, you risk leaving gaps or giving two fields the same order value.² However, the subset of a valid playlists is isomorphic to arrays of songs:
Song1 | Order2
Song2 | Order1

->

[Song2, Song1]

Mapping the data into an array, transforming the array, and then mapping the data back to the database is equivalent to just transforming the database records directly, but without the risk of breaking the data invariant. 
Exotic uses
So sometimes the point isn't the operations, but the isomorphism itself. IE if you have to synchronize data with a third party vendor, who uses an inane data format due to constraints you don't have or care about. Then we can test if our translation is correct with property tests:

First, test round trip: To(From(VendorFormat)) == VF, From(To(OurFormat)) == OF
Create some transformation on both forms of the data and test To(T1(From(Vendor))) == T2(Vendor), etc

If any of the tests fail then we know we haven't shown an isomorphism: either our conversion functions are buggy or we screwed up writing T1 and T2.
(I don't know if this gives you more firepower than just roundtripping combined with other property tests, but it's good to keep in mind.)
Sometimes one form of the data is much performant than another form, for example storing aggregate data as a struct of arrays instead of an array of structs. The challenge then is preserving the isomorphism, like making sure all of the arrays are the same length. This might be a case where that testing idea could bear fruit.
Homomorphisms
While I'm here I might as well throw a math lesson in. An isomorphism between two types T and U (at least in abstract algebra) is defined as a bijective homomorphism. "Bijective" just means it's one-to-one, so it maps to every element in U with exactly one in T. Homomorphism is the interesting bit. Let's say the types have binary operations □ : (T, T) -> T and ◇: (U, U) -> U. Then the function f is homomorphic iff
f(t₁ □ t₂) == f(t₁) ◇ f(t₂)

So with the string and character array, □ is string-append (.) and ◇ is concat: split("abc" . "def") == concat(split(abc), split(def)). Other examples:

Len is a homomorphism between strings and integers. Len(abc . def) == Len(abc) + Len(def)
f(x, y) = (x, y, 0) is a homomorphism from 2D points to 3D points.
g(x, y, z) = (x, y) is a homomorphism from 3D points to 2D points. 

Exercise for the student: there's a homomorphism from 2D points to 3D points and vice versa. So are they isomorphic? Why or why not?
I have no idea how this relates to homomorphic encryption.

I just checked and jq creates an internal representation too, so you're still using an isomorphism if you use that directly. ↩

You could prevent the "duplicate values" with a uniqueness constraint, but I don't think you can easily prevent "all intermediate values exist". ↩

                    If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.

                        Don't miss what's next. Subscribe to Computer Things: