I recently led a refactor of one of our codebases at work (we’re hiring!). It was a fantastic learning experience, and I’m documenting some learnings for future reference. There’s a good chance these aren’t new to you. Most weren’t new to me, but there’s a difference between reading about them and experiencing them.
We designed our initial abstractions by carefully enumerating the future scenarios we’d want to support. We thought we were being responsible and investing in our future.
We were wrong.
There’s a fine line between preparing for the future and predicting it. Preparing for the future is a must, especially with decisions that leave options open. In the context of early-stage abstractions, flexibility is critical. You preserve flexibility in part by maintaining a minimal surface area. Each new abstraction is a commitment, and while refactoring is almost always an option, it’s painful. We crossed over into predicting the future by implementing abstractions for our perceived use cases.
In effect, we were making a deal with our early abstractions. We’ll invest some of the time we have now, and in exchange, we’ll increase our velocity in the future. This is hedged on our predictions becoming a reality. Barring omniscience, this is a deal you won’t consistently win.
Sure, we would have saved time if our eventual use cases required these abstractions. But as time passed, we refined our product, and our engineering requirements changed. Our abstractions were no longer a great match for them. They weren’t a bad match, so they were usable. Arguably, this was worse since we accepted the complexity of subtle discrepancies instead of fixing the root issues.
If your software supports a current or future product, it must stay aligned with the product strategy over time. There’s a healthy push and pull between development and engineering. Without consistent product direction, timelines will happily fill with well-intentioned but misguided engineering “requirements.“
Ruthless prioritization with the help of product maintains focus on what matters.
Looking back, the intellectual and aesthetic allure of well-crafted code was dangerous. Earlier in my career, I fell victim to the belief that code quality was an end instead of just a means to an end. Reducing complexity, type checking, test cases, and maintainable software: they’re all in service of the end product, efficiently discovering, building, and adapting something people want.
At the onset of this project, there was an internal service being developed by another team. As it was new, our team had a unique opportunity to help shape this service. This sounded great, but it, unfortunately, did not end up as we had hoped.
The benefits of successful coordination were clear: our product could provide much-needed guidance for creating a generalizable internal service. In exchange, our product could offload a critical piece of functionality to this service.
In hindsight, we were again in the untenable situation of predicting the future.
While abstractions in your project are hard to change, cross-team services are even more challenging.
Dependencies like SQLite, PostgreSQL, and S3 are stable, known quantities. We can also think of them as constraints. Once we commit to one, we’re constrained by its features, performance, and data model. This is a good thing! Constraints limit possibilities, and for an early-stage project, this is welcomed. Constraints allow you to focus on areas you control—areas where you can make a meaningful impact.
As I’ve learned more about design, I’ve better appreciated the importance of constraints. I didn’t realize how much this applied to software engineering as well.
At best, testing approximates your code’s behavior in the real world. Observability, on the other hand, is a window into reality.
Tests are critical, but when a bug slips through to production, you need a way to determine what went wrong quickly. Observable services are necessary to maintain your development velocity, and it’s worth getting a good observability story as soon as possible.
This is especially true as your service becomes more complex. It’s hard enough debugging a simple monolith, but the difficulties compound once you expand to a distributed system.
Observability is also critical for machine learning systems to help ensure auditability. It’s essential to measure how updating data sources, training processes, or dependencies affect your pipeline's various steps.
Easy in theory, hard in practice. There’s a constant “append-only” pressure on a codebase over time, especially as new features need to be added under time and resource constraints. It’s easier for someone to add an extra
if condition or subclass to handle a piece of functionality than to bite the bullet and touch more code than necessary.
Specifically, don't use complicated types, especially if your project uses Python. In my experience, you quickly run up against mypy deficiencies when you use some “advanced” types, notably generics. Mapping your types closely to the domain may be slick, but it’s likely not the best use of your time to spend it on mypy’s issue tracker. Trust me 😅
There’s also the increased complexity of onboarding new members to the project. Overly complex abstractions increase the risk of someone not adequately understanding the system or introducing bugs.
Another thing: be aware of your dependency balance. If the standard library provides adequate functionality for your use case, it’s probably best to just rely on it rather than adding or using a dependency.
Dataclasses are the motivating example behind this. They’re amazing, and we likely don’t use them enough. However, we ran into annoying issues while parsing some external data where our code passed static type checks, but at runtime, the external data did not. Dataclasses do not validate your input data. We could have implemented some manual validation code, but this seemed redundant. After all, all the required type information is already declared for the dataclass!
We turned to pydantic, specifically their dataclass replacement. It’s a great package, and it provided the functionality we sought. We excitedly converted most of our standard library dataclasses into pydantic ones. This resolved most of our parsing woes, but it uncovered something I didn’t appreciate the importance of at first: editor support. The standard library dataclasses, meanwhile, have excellent and broad editor support.
I use VS Code, and VS Code’s Python plugin does not play nicely with pydantic dataclasses. (If you use PyCharm, you’re in luck.) Niceties like autocompletion and inline type-checking fail. While only a mild annoyance initially, this was a persistent pain point, especially when multiple developers experienced it.
Was it nice not to have to write our validation logic? Yes. Was it worth the lack of editor support? In my experience, probably not.
State-of-the-art machine learning models frequently require GPUs for acceptable performance, but the developer experience around GPUs has a long way to go.
Unless you have an Nvidia GPU on your local machine, working with these models is a challenge. You can run the same models on your CPU, but it’s slow, especially if they require training. GitHub Codespaces (or similar tools) is a massive step in the right direction. I haven’t had a chance to try it, but it’s on the list.
I’m grateful I was on this project since its inception and have been able to work on it long enough to realize and experience all the mistakes I made. This refactor was a fantastic learning experience, and in the end, we ended up with around 20,000 fewer lines of code than we started with. That’s a fun statistic :)
As always, thanks for reading!