What does a cache do?

additional


            
                
                    
                        March 5, 2021
                    
                What does a cache do?
                

                

                
                    
                        I’ve recently had cause to work on scaling up a web application. It’s got a pretty traditional architecture: A CDN in front of a fast native code web server (e.g. apache2 or nginx) in front of an app code in a slow interpreted language (e.g. Python or Ruby) which talks to a database (e.g Postgres or MySQL), with an additional in-memory KV store (e.g. redis or memcached) as a cache . As you might imagine, scaling this application ends up involving a lot of “add more caching.”
As I’ve talked about before, I like to think about performance questions in terms of hardware utilization — what hardware resources are we consuming in order to accomplish a given unit of work? So, this seems like a good opportunity to write up a model that’s been floating around in my head: when we cache stuff in, say, memcached, what are we actually doing in terms of usage of the underlying physical resources?
Caching point lookups
One of the reasons I started pondering this taxonomy was to answer the question: Why might we cache database point lookups by primary key? Databases already have their own in-memory caches for recently-accessed data, so why would we put an additional cache in front of them? Nonetheless, a highly unscientific poll Twitter poll confirms that it is a fairly common practice to put memcache or similar technology in front of the database even for primary-key lookups. As I run through the following taxonomy, I will touch on each perspective and purpose of caching in general, and then also touch on what it tells us about caching lookups by primary key.
The function of a cache
I’ve identified three different “fundamental” purposes a cache server likely plays in a traditional webapp architecture. These are “pure” perspectives; any cache deployments involve some combination of all of these. But I find it helpful to think of them a bit separately, and to have an idea of which of them I am hoping for if I try to adding caching to some endpoint.
Trading memory for CPU
This is perhaps the most classical conception of what application caching is for. You have some expensive operation, and instead of doing it every time you need the result, you do it less frequently, and you store the result in the cache. The net result is that your system uses less CPU, but more memory, since it needs to store the cached results. A classic example of this pattern in web architecture is something like whole-page caching, where you cache the entire rendered page somewhere (be it in your app framework or in a frontend web server or CDN). Caching the output of a complex database query so that it can be replaced by a single point lookup would be another example.
This tradeoff makes the most sense when the computations we are caching are expensive, and the output relatively small (to reduce the memory we need). Web applications are often written in relatively slow languages like Python or Ruby, which makes it more likely that this tradeoff is worthwhile.  However, more generally, this may be an appropriate trade-off any time that we are more CPU-bound than we are memory-limited.
It’s worth noting that we don’t need a separate cache server to make this tradeoff. If we store the results of some interim computation in the database itself, we can make a similar tradeoff; versions of this strategy generally goes by the name “materialization.”
This view of caching is probably the least informative for understanding why we might cache primary-key lookups. The cache will store approximately the same data as the database, so it’s not obvious what computation we’re saving. That said, I will note that an in-memory cache likely consumed less CPU than a general-purpose database in order to serve a point lookup. In most cases, the database has to parse a query, inspect table metadata, find the right index, and walk a B-Tree (even if all of the relevant pages are in memory), while a tool like memcached has a much simpler parser task, followed by essentially a single hash-table lookup. Thus, one view on caching point lookups is that we’re trading off memory in the cache server against CPU usage in the database server.
Of course, it’s also relevant that the cache node is usually a different piece of hardware from the database. That brings us to the second feature of in-memory caching services:
Adding more memory or CPU
Traditional database architectures are fairly monolithic: You have one machine running MySQL or PostgreSQL, serving all read and write traffic. That gives us a limit on how much CPU and RAM we can add to the server (and, for practical reasons, we’re often unable or unwilling to use the very largest instances money can buy). Even in sharded architectures, adding more shards is often a time-consuming or heavyweight operation. Thus, we are limited in practice in our ability to turn money into more CPU or more memory in the database.
However, if we can move work out of the database and into a cache, it effectively gives us access to more total CPU usage or memory in the system. It’s important to note that adding more resources in this way does not in general increase efficiency, as measured by “cost to serve a single request.” In general, in fact, it tends to decrease it, since we are adding the overhead of moving data between the different nodes in the system. If we are trading CPU off against memory, as described previously, we may actually reduce the total cost per request. If we are just adding resources, enable ourselves to trade money for throughput, but we don’t typically make individual requests any cheaper.
Caches also tend to be easier to scale than databases, operationally. Sharding by a single cache key lookup is conceptually very simple, allowing us to add multiple cache nodes. In this way, cache architectures can potentially multiply the available memory for caching and CPU for handling these requests many times above what our database server can offer.
Here we also can compare cache nodes to database read replicas, another common strategy to scale database clusters. Read replicas also add additional CPU and RAM to the system, without substantially changing the performance characteristics of each individual query. However, since each replica typically has to replicate the entire data set, write traffic consumes some amount of CPU and I/O bandwidth on every replica, reducing the effectiveness of the scale-out. It’s also hard or impossible to shard which portions of the keyspace each replica keeps in cache, making it challenging to use the additional RAM nearly as efficiently as in a sharded caching architecture.
This perspective does an excellent job explaining the value of caching point lookups! Even if the cache and the database are identically efficient at serving individual lookups, adding cache nodes lets us scale our memory and CPU horizontally much more easily than we can scale the database cluster, thereby increasing the total number of records held in memory, and the total QPS available.
More efficient memory usage
This last point is in some ways the most domain-specific, but also one of the most interesting to me, because it was the most surprising.
Most databases implement their caching at the level of “pages,” which are commonly in the 4k-16k range. The page is the fundamental unit of storage and access on disk, and the database cache sits on top of the I/O layer and caches entire pages. 
The implication of this architecture is that we have to cache at least a full page of data, even if we only want a single record. If the “working set” of records that are frequently accessed is much smaller than the total database collection, this can lead to terrible memory efficiency! Imagining a table with 100-byte rows, in the very worst case we might pull a full 8k of data into memory just to cache a single row, for an 80x blowup in RAM required to cache our working set! In general our records are unlikely to be sparse enough to hit that limit, but the overhead is still going to be quite significant. 
In contrast, in-memory stores like Redis or Memcache cache individual objects correctly. Caching a 100-byte record will inevitably take some overhead for the hash table, the allocator overhead, and such, but probably on the order of 2x at most! Thus, even given the same amount of memory, an object-based cache can often cache many more objects than the database’s own page cache, providing us with our third advantage of using caches over just relying on the database’s own cache.
This result was very surprising to me when I first realized it! The first two uses of caches outlined in this document were quite intuitive to me, but I naively never expected dedicated caches to have this kind of drastic efficiency gain over the database’s own caches. This reason, to me, is an extraordinarily compelling case for caching primary key lookups in a dedicated cache server, at least in some regimes of operation: you may be able to cache several times as many records per MB of RAM as your database itself can!
                    
                

                

            

            

            
                
                    
                        Don't miss what's next. Subscribe to Musing in Computer Systems: