Building containers is easy when your applications are small. I’m usually doing projects in Golang, but at work we’re a monolithic Django application and after we install all our system dependencies and application dependencies, we end up with a 4.5GB container. It takes a while to build and even longer to push the container. Because of that, I started re-reading any literature that I could find about improving your Dockerfile usage.
It comes as no surprise that the first thing everyone tells you to do is to try using multi-stage builds. It makes sense. In a language like Go, you can download all of your dependencies in one stage and then copy your binary out to the stage that’ll actually be your container image. Unfortunately, Python ecosystems don’t work quite like that. You can install everything into your virtual environments, but there isn’t quite the same concept as a single binary. There’s a project called pex which does something close, but building a pex artifact just reduces everything into a single file which can help but doesn’t actually reduce the size of anything. This is kind of an over simplification but the pex format can be thought of as simply zipping up the virtual environment. You still include everything.