This was inspired by a few conversations I had last week. There’s a certain class of problems that are hard to test:
Some examples of this: computer vision code, simulations, lots of business logic, most “data pipelines”. I was struggling to find a good semantic term for this class of problems, then gave up and started mentally calling them “rho problems”. There’s no meaning behind the name, just the first word that popped into my head.
Most rho problems are complex, such that providing a realistic example would take too long to set up and explain. Here’s instead a purely contrived rho problem.
def f(x, y, z): out = 0 for i in range(10): out = out * x + abs(y*z - i**2) x, y, z = y+1, z, x return abs(out)%100 < 10
Not code you’d see in the real world, but it has all the properties of a rho problem. Imagine this is the code in our
deploy branch, and we try to refactor this code on a new branch. The refactor introduces a subtle bug that affects only 1% of possible inputs, which we’ll simulate with this change:
def f(x, y, z): out = 0 for i in range(10): out = out * x + abs(y*z - i**2) x, y, z = y+1, z, x - return abs(out)%100 < 10 + return abs(out)%100 < 9
What kind of test would detect this error? Standard options are:
f(1, 1, 1)tells us little about
f(1, 1, 2).
(Yes, I know we could try decomposing the function and testing the parts individually. That’s not the point: we’re trying to find general principles here regardless of the problem.)
But we do have a “correct” reference point: the
deploy branch! We can use property testing where the property is that our refactored code always gives the same output as our regular code.1 Step one, use git worktree:
git worktree add ref deploy
That creates a new
ref folder in our directory that’s identical to our deploy branch. And by “identical”, I mean that you’re effectively on the deploy branch when in that folder, to the point of being able to commit. Worktree is pretty cool! Anyway, if our code is
deploy version is
ref.f. We can write a property test (using hypothesis and pytest):
from hypothesis import given from hypothesis.strategies import integers import ref.foo as ref import foo @given(integers(), integers(), integers()) def test_f(a,b,c): assert ref.f(a,b,c) == foo.f(a,b,c)
Running this gives us an erroneous input:
Falsifying example: test_f( a=5, b=6, c=5, )
This is nondeterministic: I’ve also gotten
(0, 4, 0) and
(-114, -114, -26354) as error cases. Regardless, I still think it works as a proof of concept. We find an error case where the original code and the refactor diverge. Main blocker I see is that version control systems aren’t designed for this kind of use case. Before someone told me about
git worktree I had
pyest setup code manually change between branches and reload the module, which is… not exactly usable in production. But this seems close enough to something that a sufficiently-motivated person could pull it off.
This bears a lot of similarity to a technique known as “snapshot testing”, where you compare the output to a result stored in a text file. The difference is that this is generative: we aren’t tied to fixed inputs. Also, snapshot testing is actually used in production while this is a wild fringetech idea.
While we’re here, this also counts as metamorphic testing. Instead of our relation being between inputs, it’s between versions of the functions. Metamorphic testing is pretty neat! ↩