In Defense of Testing Mocks

great

                            January 9, 2023

                In Defense of Testing Mocks

                        Computer Things: 2021 Edition
It's over a year late, I know, but the 2021 Newsletter collection is now available to purchase as a PDF. 70,000 words, 250 pages, 20 bucks. Unlike last year, there's no private subscriber-only emails, so this is purely for people who want to read it on the go / give me money. I might add a postmortem review or an introduction piece if there's enough interest.
I also wrote a bunch of automation so later editions get out a lot faster, I'll try to get around to the 2022 edition by the end of the month.
In Defense of Testing Mocks
This isn't going to be a great defense, because I generally agree with the conventional wisdom that mocking should be avoided when possible. But I think it's important to give things a fair shake, even if I don't like them. 
So first, by "mocking" I mean anything that replaces a real entity in your test with an artificial one. Technically speaking these are "test doubles", of which "mocks" are a specific type, where your tests assert the mock was actually called. See here for more on the specific differences. However,outside the specific TDD circles that distinguished the term, everybody uses "mock" as the general term. 
A lot's been written about the problems of mocking (1 2 3). It boils down to two things: 1) mocks couple the test to the internal implementation of the object, not just the behavior. 2) If the mocked entity changes, mocked code will still pass, so the tests become less useful. 
Okay, so with that premise, when are mocks are a good idea?
Why do we test?
The purpose of testing isn't to be sure code is 100% correct. If it was, we'd use formal methods instead. Rather, testing is to get sufficient confidence that the code is sufficiently correct. We want to do this as painlessly as possible, which is why we want tests that are easy to write and give a lot of confidence, and also aren't brittle or flaky.
For mocking to be useful, using them needs to give us tests that provide value: giving us additional confidence of sufficient correctness, while also being painless. 
Initial and Marginal Values
Some tests are more valuable than others, and some tests make others less valuable. Take these test cases for division:
12 / 1 = 12
12 / 2 = 6
12 / 3 = 4

The first test is the most valuable, as any code that passes (1) is very likely to pass (2) and (3). But they're still a little valuable, as they rule out functions like iden. Tests with lower marginal value can be still worth writing, as long as they're easy enough to write. That's were mocking would come in.
Abstractions and Refinement
So I said that testing isn't formal methods, but FM has one thing that's really important here. When we specify large and complex systems, verifying the entire system at once is often intractable. One very useful remedy is to break the system into subcomponents and then specify each component as an abstraction, where we just specify the (correct) behavior. Then we refine one component, or flesh out how it interacts with the abstractions.
As an example, let's say I'm modeling a distributed task pool, with workers and coordinators. Each is going to have a complex specification. What'd I'd do is first write both of them as behaviors:
Coordinator:
  Assign tasks to workers
  Get results back
  Return final answer

Worker:
  Get input
  Return correct output

Then I'd write a spec refining just the worker, keeping the coordinator abstract:
Coordinator:
  Assign tasks to workers
  Get results back
  Return final answer

Worker:
  Get input
    Complex implemenation logic
    More logic
    Maybe hit another API
  Return correct output

Now I have increased confidence that, if the coordinator is working correctly, the worker logic will work correctly too. At some point I might have to verify the whole, detailed system, but I can do that having built on a solid foundation of individual components.
Verifying the whole system gives high confidence but is painful. Verifying parts of the system against abstractions gives less confidence and is also less painful. I can use it to reduce the amount of painful testing I have to do.
Abstraction/refinement looks an awful lot to me like mocking/"real code".
When Mocks Are Useful
Mocks are useful when they make testing more convenient without removing the marginal value of mock-involving tests. They can do this by replacing implemented dependencies of the tested entity with abstractions. 
(Some point would argue that if testing is painful then that's a sign the code itself needs to be changed. I'm not convinced that "good code" is always 100% equivalent to "easy to test". )
Let's give a couple of examples. 
1.
def analyze_webhooks(user):
  webhooks = WebhookSystem.get_webhooks(user)
  # A bunch of analysis code here

The get_webhooks method can make testing analyze_webhooks painful in a lot of ways. Most commonly, it could be making an API request to a third party. But even if it works purely locally, it could take a long time to run, or require a lot of setup. If we need to write a lot of tests for analyze_webhook, that pain can add up fast.
Instead, we can have a few "proper" tests for the function, and have a bunch of tests that mock out get_webhooks. The mocking tests have lower marginal value but are also less painful. This can make it easier to get to sufficient confidence of sufficient correctness, which is the while point of testing.
Without the real tests, the mock tests have significantly less marginal value. They should all share the same setup, same input, and same (stubbed) output as a real test, so that if the real tests fails we know to fix the mock tests, too. The abstraction should match the refinement.
(In specification it's the opposite: the abstraction comes first, which the refinement must match. In code you start with implementation and build out the abstractions.)
See also: contract tests, verified fakes.
2.
def reserve_item(user_id, item_id):
  # stuff
  result = request_reservation(user, item)
  if result == error:
    stuff
  else:
    other stuff

def request_reservation(user, item) -> Either[Reservation, Error]:
  record_request_audit()
  log_request_event()
  get_user_permissions()
  get_item_requirements()
  check_user_authorized_request()
  # …

A lot of request_reservation is implementation details immaterial to reserve_item. While it should be fully tested, writing a lot of tests would be painful. In some of the tests we could use a simpler abstraction, like
def request_reservation(user, item) -> Either[Reservation, Error]:
  get_user_permissions()
  get_item_requirements()
  check_user_authorized_request()
  # …

Or even
def request_reservation(user, item, denied=False):
  if denied:
    return Error
  return Reservation

Reduced marginal benefit, but also more convenient.
When Mocks are problematic
The biggest red flag is when you're mocking everything for all tests. You use mocks to supplement the real tests with lots of simple followups.
Another read flag is when you modify the production code to make it easier to mock, for example by adding dependency injection. Then mocks are being painful, which defeats the entire point of mocking.
I think having tests that a mock was called is a bad idea.

That's the best defense I can do in a day. One big flaw is that mocking isn't painless, and there are cases where it can be a lot more trouble that refactoring your code. One example is when you're dealing with collections:
def some_aggregation_function(users)
  for user in users:
    hooks = analyze_webhooks(user)
    # blah blah blah

If you want to test that with three users, you have to figure out how to mock get_webhooks() with three different outputs, and then you bring it a mocking library, and then you start rewriting your code to be more mockable, and then bad things happen.

                    If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.

                            Don't miss what's next. Subscribe to Computer Things: