Pre-warming Memcache for fun and profit

Datetime:2016-08-23 01:30:05          Topic: Memcached           Share

One of the services my team runs in AWS makes good use of Memcached (via the ElastiCache product). I say “good” use as we manage to achieve a hit rate of something like 98% most of the time, although now I realise that it comes at a significant cost – when this cache is removed, it takes a significant toll on the application. Unlike other applications that traditionally cache the results of MySQL queries, this particular application stores GOB-encoded binary metadata, but what the application does is outside the scope of this post. When the cached entries aren’t there, the application has to do a reasonable amount of work to regenerate it and store it back.

Recently I observed that when one of our ElastiCache nodes are restarted (which can happen for maintenance, or due to system failure) we already saw a less desirable hit to the application. We could minimise this impact by having more instances in the cluster with less capacity each – for the same overall cluster capacity. Thus, going from say 3 nodes where we lose 33% of our cache capacity to 8 nodes where we would lose 12.5% of our cache capacity is a far better situation. I also realised we could upgrade to the latest generation of cache nodes, which sweetens the deal.

The problem that arises is: how can I cycle out the ElastiCache cluster with minimal impact to the application and user experience? To save a long story here, I’ll tell you that there’s no way to change individual nodes in a cluster to a different type, and if you maintain your configuration in CloudFormation and change the instance type there, you’ll destroy the entire cluster and recreate it again – losing your cache in the process (in fact you’ll be without any cache for a short period of time). I decided to create a new CloudFormation stack altogether, pre-warm the cache and bring it into operation gently.

How can you pre-warm the cache? Ideally, you could dump the entire contents and simply insert it into the new cluster (much like MySQL dumps or backups), but with Memcached this is impossible. There is the stats cachedump command to Memcached, which is capable of dumping out the first 2MB of keys of a given slab. If you’re not aware of how Memcached stores its data, it breaks the memory allocation into various “slabs” of increasing sizes and stores values in the closest-sized slab that will fit it (although always rounding up). Thus, internally the data is segmented. You can list stats for all of the current slabs with stats slabs , then perform a dump of the keys with stats cachedump {slab} {limit} .

There are a couple of problems with this. One is the aforementioned 2MB limit on the returned data, which in my case did in fact limit how useful this approach was. Some slabs had several hundred thousand objects and I was not able to retrieve nearly the whole keyspace. Secondly, the developer community around Memcached is opposed to the continued lifetime of this command, and it may be removed in future (perhaps it already is, I’m not sure, but at least it still exists in 1.4.14 which I’m using) – I’m sure they have good reasons for it. I was also concerned that using the command would lock internal data structures and cause operational issues for the application accessing the server.

You can see the not-so-reassuring function comment here describing the locking characteristics of this operation. Sure enough, the critical section is properly locked with pthread_mutex_lock on the LRU lock for the slab, which I assumed meant that only cache evictions would be affected by taking this lock. Based on some tests (and common sense) I suspect that it is an LRU lock in name only, and more generally locks the data structure in the case of writes (although it does record cache access stats somewhere as well, perhaps in another structure). In any case as mentioned before, I was able to retrieve only a small amount of the total keyspace from my cluster, so as well as being a dangerous exercise, using the stats cachedump command was not useful for my original purpose.

Later in the day I decided to instead retrieve the Elastic LoadBalancer logs from the last few days, run awk over them to extract the request path (for some requests that would trigger a cache fill) and simply make the same requests to the new cluster. This is more effort up-front since the ELB logs can be quite large, and unfortunately are not compressed, but fortunately awk is very fast. The second part to this approach (or any for that matter) is using Vegeta to “attack” your new cluster of machines, replaying the previous requests that you’ve pulled from the ELB logs.

A more adventurous approach might be to use Elastic MapReduce to parse the logs, pull out the request paths and using the streaming API to call an external script that will make the HTTP request to the ELB. That way you could quite nicely farm out work of making a large number of parallel requests from a much larger time period in order to more thoroughly pre-warm that cache with historical requests. Or poll your log store frequently and replay ELB requests to the new cluster with just a short delay after they happen on your primary cluster. If you attempt either of these and enjoy some success, let me know!





About List