Wednesday, November 7, 2012

Caching architectures using Memcached & Amazon ElastiCache

Applications can be often made to perform better and run faster by caching critical pieces of data in memory.  Frequently accessed data, layers of HTML fragments, results of time-consuming/expensive database queries, search results, sessions, results of complex calculations and processes are usually very good candidates for cache storage. In general not all application architectures will benefit from having a caching solution in their system, example applications that are read intensive, will usually have better performance gains using cache whereas write intensive applications may not get much benefit.
There are various ways in which caching layer can be designed in AWS infra. The most popular model is distributed caching using Memcached.
Memcached is a high-performance, in-memory key-value store with distributed memory object caching system.

Anatomy of a Memcached system

  • A memcached client which is given a list of available memcached servers in the farm
  • A memcached client-based hashing algorithm which decides (GET/PUT) a server based on the "key" input.
  • Memcached server instances which stores your values with their keys into an internal hash table

Memcached uses memory and NW heavily followed by CPU. Memcached supports TCP and UDP protocols in binary and text format for communication. Memcached supports client libraries for popular programming languages like Java, .Net, PHP, Python, Ruby etc.
In this article let us explore and analyze some popular memcached/ElastiCache deployment architectures in AWS.

Architecture 1: Apache + Memcached shared in same EC2 (Distributed Cache)

Memcached is shared with Apache in the same Amazon EC2 instance. Imagine m1.large EC2 instance where 7 GB RAM and 2 CPU Cores are shared between OS, Apache and memcached. Since Amazon ElastiCache runs on a separate tier it will not fit this shared approach. The Apache-A can contact any memcached-A/memcached-B or memcached-X node depending upon the Key/Value hash. Since the unused memory is shared with memcached and no dedicated EC2 instances are launched for caching tier this model is usually cost effective. We have seen some implementations using this approach in production but in my opinion it is not suitable for applications which demands heavy scaling and clear service separation. This model can be scaled up and but not scaled out optimally, ie based on your traffic demand you can Scale up the shared EC2 instance to bigger capacities like Xlarge, Quadruple, High IO ,M3 class etc. but not easily add new instances of this type. Some negatives that this approach brings to the table are:
Maintenance: Since the Apache and memcached are shared in the same EC2 it is strictly advised not to Auto Scale this layer. Only Manual Scaling is possible in this layer which might add heavy configuration burden to IT team during traffic peaks and valleys.
Auto Scaling: Using Amazon Auto Scaling we can add new web/app EC2 instances dynamically depending upon the traffic demands. But when web/app Server instance also contains memcached running in it, it brings in cascading complexity into the architecture. Imagine there are 2 m1.large Apache + memcached EC2 instances running and a third one is launched by Amazon Auto Scaling based on the traffic. Now the load balancer splits the 1/3rd of the web traffic to this Apache EC2 instance. Since the cache is empty 1/3rd of the requests will now hit the backend database heavily. Now imagine instead of 1 EC2 you are auto scaling out by 2 Apache EC2 during peak, this will increase Database load to 50% more because of un warmed memcached. Secondly, the new memcached endpoint has to be propagated and configured on other memcached clients, which adds another complexity and devops engineering into the architecture. Finally, Amazon Auto Scaling will pull out an Apache EC2 instance when the load decreases, now if you are pulling out an Apache + memcached that is properly warmed it will again increase the DB load because of the cache miss.  
Note:  We can still try to address this problem by adding more complexity of designing/engineering with progressive weighted EC2 balancing + Scaling out and internal Cache Warming techniques etc, but if you deeply ask a question is it worth it, it is not many times. Alternatively, we can altogether avoid this complexity by simplifying the overall architecture of the system, which we will see as the article progresses.
Sharing: We observed earlier that sharing Apache+ memcached on same EC2 saves cost. On the other hand, this sharing also causes problems if one is not aware of the environment. In our case Apache + memcached are shared and Apache-A can talk to memcached in same EC2 or other Apache EC2 instance as well depending upon KV hash. Based on this flow let us explore some problems in sharing approach.
  • ·         Apache is usually heavy on Memory and CPU. Memcached is low on CPU, high on memory and network depending upon the average size of your items.
  • ·         If the memcached is not configured with Memory limits it can crash your Apache and OS. If the Website is heavily loaded and built to be cache dependent there will be heavy CPU contention between Apache and memcached.
  • ·         If the request/response of Apache and memcached are bigger in size there will be bigger contention on the shared network layer. Overall request throughput can reduce because of heavy buffering and NW contention.
  • ·         Apache EC2 instance will now have bigger headache of handling all the TCP sockets flowing between Internet, Database, Internal NW and memcached.  Some of them address the last point marginally by using UDP protocol for memcached communication and reduce the TCP socket temporary exhaustion. Overall this is a stitch and not a proper solution.  

Architecture 2: Apache + Memcached shared in same EC2 (like Local Cache)

This approach has a slight difference from the above one.  Apache/NginX and memcached are shared in the same EC2 instance, but the Web (Apache or NginX) process will strictly and only call the local memcached and will not call the remote memcached. Basically, memcached is used here as a local instance cache and not as a distributed cache. Every Apache/NginX will cache items in memcached and use it as extended memory. Since the items are coming from the same EC2 instance, the throughput and latency are better for cached entries. Though this approach has lesser configuration headache than the previous approach, it still inherits lots of problems from the previous one. Session sticky algorithm is preferred on the Load balancing tier to optimally reuse the cache items and reduce the DB load because RR algorithm can heavily exercise the DB during initial cache warming phase.   Rapid scaling out and scaling down should be avoided on smaller deployments because it transfers the load on the DB immediately. If already there is a large fleet (100’s) of NginX + memcached running then rapidly adding few (5-10) EC2’s of this kind will not have huge problems on the DB. Proper architecture guidance is recommended before fitting the above architecture into the use case.
As we observed  in detail, the above approaches might be cost effective for smaller deployments , but as the site gets popular, traffic increases and it demands scalability it will become complex to handle. Usually in architecture if complexity arises because of improper designs, it will be followed by heavy maintenance and management cost.

Now that we have understood the impacts sharing memcached with web/app server, a simple solution is to split the memcached into separate EC2 instance. Recently introduced M3 class instance types are good candidates for designing separate memcached tier. But the question is, whether we really need to manage and maintain a separate additional memcached layer. The answer is NO, USE AMAZON ELASTICACHE.

Amazon ElastiCache is a web service that is protocol-compliant with Memcached, a widely adopted memory object caching system, so code, applications, and popular tools that you use today with existing Memcached environments will work seamlessly with the service.

Architecture 3: Apache + Amazon ElastiCache in separate tier

Apache and Caching runs on clearly separated tiers in this approach.  Since the tiers are separated Apache EC2 can be easily scaled out using Amazon Auto Scaling or custom scaling. Dynamically launching/terminating Apache instances will not swamp database because the warmed cache is separated and still accessible by all the Apache EC2 instances. It is also easy to roll out configuration changes, add new nodes in caching layer and propagate the changes to the cache clients. The clear separation also enables us to isolate and address issues creeping up in Apache and Caching layer individually.
ElastiCache nodes are grouped inside an ElastiCache cluster.  An ElastiCache cluster is a collection of one or more cache nodes, each running an instance of the memcached service. The word cluster in this context should be related to “grouping” and not “data synchronization” because ElastiCache nodes will not talk to each other or exchange information between them inside the cluster. Most of the operations like configuration, security and parameter changes will be performed at the cache cluster level and not at individual cache node level. This enables easy maintenance and management of the caching tier on whole. Since ElastiCache is also protocol compliant with memcached, programs written in Java, PHP, and Python on Apache can still use their respective memcached clients and perform SET/GET operations seamlessly. The ElastiCache Node end points (like“") needs to be configured on the memcached clients of the Apache EC2. “ecache1a” is the cluster name, “0001” is the node number and 11211 is the port in the above mentioned URL endpoint. Whenever a new node is added into this “ecache1a” cluster, a sequence of numbers like “0002, 0003” will be assigned in end point URL to the nodes. This predictive pattern helps us to automate the detection of cache node endpoints in client side of scalable environments. Since a single ElastiCache cluster can currently span only in a single Amazon Availability zone, it is advised to keep both Apache EC2 and ElastiCache Instances in same Availability zone for improved latencies. Inside Single AZ a single SET/GET operation between Apache and ElastiCache will take around ~1-5 milliseconds using AWS High Memory Quadruple Instance types. This latency measurement also depends upon parameters like Apache EC2 instance type, ElastiCache Instance type, size of the SET/GET requests, Single or Bulk operations etc. Imagine you use m1.large for Apache and ElastiCache instance and every SET/GET is around 1 MB size. Then if the available NW bandwidth between Apache EC2 to ElastiCache is only 15 MB at that instant of time, only 15-20 requests can be performed concurrently at that instant. You may find the CPU under-utilized and max connection well set in ElastiCache, but still the throughput is less because of the above reason. This is not the problem of ElastiCache performance, but rather a bad understanding of the architecture components behaviors. If the web app is cache dependent, it is advised to spread the items in multiple cache nodes. Imagine you have close to 20 GB Cache size requirement. You can distribute it in either 2 m1.xlarge ElastiCache nodes or 4 m1.large ElastiCache nodes. The cache data will be distributed by the memcached client to multiple nodes based on the KV hash. In case one cache node goes down then 50% of the cache load will now hit the backend data stores in m1.xlarge approach whereas 25% of the cache load will only hit the data stores in m1.large approach.  Also since it is currently not possible to have multiple cache node instance types inside a single ElastiCache cluster, I advise you to do proper capacity planning taking into consideration the cache dependency and capacities of backend DB to take direct requests before planning the cache node numbers, size and consolidation levels.
Amazon ElastiCache as the name suggests you can automatically/manually add or remove cache nodes from the existing ElastiCache cluster making the whole tier elastic and flexible for customers. This is one of the important features of Amazon ElastiCache and this feature eventually falls in line on any growing websites roadmap.  Now let us try to understand the remapping implications while adding or removing cache nodes from the cache cluster.
With a normal hashing algorithm, changing the number of servers can cause many keys to be remapped to different servers resulting in huge sets of cache misses. Imagine you have 10 ElastiCache Nodes in your cache Cluster, adding an eleventh server may cause 40%+ of your keys to suddenly point to different servers than normal. This activity is undesirable, may cause cache misses and eventually swamping your backend DB with requests. To minimize this remapping it is recommended to follow consistent Hashing model in your cache clients. Consistent Hashing is a model that allows for more stable distribution of keys given addition or removal of servers. Consistent Hashing describes methods for mapping keys to a list of servers, where adding or removing servers causes a very minimal shift in where keys map to. Using this approach, adding an eleventh server should cause less than 10% of your keys to be reassigned. This % may vary in production but it is far more efficient in such elastic scenarios compared to normal hash algorithms.  It is also advised to keep memcached server ordering and number of servers same in all the client configurations while using consistent Hashing. Java Applications can use “Ketama library” through spymemcached to integrate this algorithm into their systems. More information on consistent hashing can be found at

Deep dive into Amazon ElastiCache and understand the internals like connection overheads, memory allocations, Elasticity implications in this article:

Architecture 4: Apache + Amazon ElastiCache in Multiple Availability Zones

This is an extension of the previous approach, for better availability the cache nodes are distributed among multiple Availability zones of an Amazon EC2 region. Most of the points discussed on the above approach will be applicable on this architecture as well. Since the ElastiCache cluster currently cannot span across multiple AZ’s you can create multiple ElastiCache clusters in Multiple AZ’s. Example: you can create ElastiCache cluster “ecache1a” in Amazon AZ - 1A and have a node launched with endpoint “”. In the same way you can create another ElastiCache cluster “ecache1b” in Amazon AZ – 1B and have a node launched with endpoint “”. Both the cache nodes endpoint should be configured in memcached clients.  Since the AZ concept is built transparently by AWS, the memcached clients in Apache EC2 can distribute data seamlessly and easily to both the cache nodes distributed across AZ’s.  You can manage the cache clusters separately as well you can distribute the data across AZ in this approach. In case an entire AZ is affected still the cache nodes in the other alternate AZ will be still accessible and functional.  Instead of DB getting swamped by 100% cache misses now you are reducing it to ~50% with AZ distribution in this approach. This % can be reduced much more if data is distributed among 2 or more AZ’s with more cache nodes inside them.
ElastiCache Maintenance Window allows you to specify the time range (UTC) during which any scheduled maintenance activities such as software patching or pending cache cluster modifications you requested would occur. Scheduled maintenance activities occur infrequently (generally once every few months) and will be announced on the AWS forum two weeks prior to being scheduled. After maintenance window our cache nodes may lose all the data stored in it memory and needs to be warmed again. Imagine having a single ElastiCache cluster with 10 cache nodes and all of them needing the cache warming phase after maintenance period, It puts heavy burden on your DB and other backend data stores during this refresh phase and sometimes even brings down your system to knees on heavy cache dependent architectures. Since AWS is very elastic and flexible, either you can plan to increase your backend capacity on demand for few hours to few days till the cache layer is adequately warmed or leverage the multi-AZ ElastiCache approach. Imagine you have 4 ElastiCache clusters distributed in 4 Availability zones inside an Amazon EC2 region. You can configure maintenance windows spanning multiple days for multiple cache clusters. Example ecache1a can have maintenance on Monday, ecache1b on Tuesday so forth. This distribution of ElastiCache Maintenance windows may give you enough time to warm cache nodes in phases and also helps you avoid cache swamping your backend with requests simultaneously.
This architecture approach is not suitable for smaller deployments running in single AZ’s. I suggest this for only larger deployments where Apache EC2’s are auto scaled, Apache and ElastiCache clusters are well distributed across multiple AZ’s so that overall cache item SET/GET latencies are in acceptable levels.  

Launch Amazon ElastiCache in 3 Easy Steps:

Architecture 5: Apache + Amazon ElastiCache + Redundancy

This is a slightly different approach built with availability and redundancy. Apache and ElastiCache are deployed in separate tiers. Apache EC2 can be individually auto scaled across multiple AZ’s.  Multiple ElastiCache clusters are created spreading across multiple availability zones inside Amazon EC2 region (till now very similar to previous approach).  Certain items are redundantly cached in two cache nodes in multiple AZ- ElastiCache clusters for better availability in this approach. Results of time consuming and expensive data base queries, results of complex calculations etc are good candidates for this approach. Imagine an expensive query that pounds the database for ~250 or more milliseconds, if the data does not change quite frequently in this case it can be redundantly stored in 2 ElastiCache nodes.  If cache node 1 is down or throws connection error or if cache item miss occurs then the redundant cache node 2 can be requested for the same item. If the item is not present in cache node 2 also then as last resort the DB is queried and latest result is stored in both the cache nodes redundantly. Imagine it takes around 2-5ms for single ElastiCache node to return a value hitting 2 cache nodes redundantly still gives the results in ~10ms, which is far better compared to pounding and getting the result from DB. This approach is not suitable for frequently changing data flows because it may result in fetching stale data from the cache, for such scenarios ElastiCache-> DB fallback approach is better. Also it is not necessary to have redundancy built for all the cache nodes and cache clusters totally, you should build redundancy only for specific cache nodes in the system. This feature is not pre built on the memcached API’s currently and it has to be manually implemented in the application code by crudely making multiple calls to multiple sets of cache nodes. Though it reduces the overall GET time for complex requests, your SET times will marginally increase because of multiple requests made to cache nodes.
 It is costlier and complex compared to other architecture approaches mentioned above. But for some use cases it can save on your database HW capacity cost heavily and provide immense infra cost savings overall. It is suggested to carefully analyze the fitment of this approach based on your use case, cost and maintenance needs.

Related Articles
Part 1: Understanding Amazon ElastiCache Internals : Connection overhead
Part 2: Understanding Amazon ElastiCache Internals : Elasticity Implication and Solutions
Part 3: Understanding Amazon ElastiCache Internals : Auto Discovery
Part 4: Understanding Amazon ElastiCache Internals : Economics of Choosing Cache Node Type
Launching Amazon ElastiCache in 3 Easy Steps
Caching architectures using Memcached & Amazon ElastiCache
Web Session Synchronization patterns in AWS


Randeep said...

Which is the best Java memcached client? I was using Danga memcached client but I think its deprecated. Which one is most used now? I'm a sys admin. Not a developer. So no idea about it. I have tried setting up elasticache + danga client for caching db queries. But it take more time than usual results. Any suggestions?

Harish Ganesan said...

Randeep, try spymemcached or xmemcached for java

Anonymous said...

Hello Harish,
Great article.
You talked about the auto-scaling when we have memcached running in the same server as Apache. How does that work ? My understanding that your app should be aware of each memcached instance running for it to be a truly distributed cache. So unless you bring down all the instances and change the configuration in each to add the new server and start them up again, it will not work. Is that true auto-scaling ? Am I missing something here ?

Harish Ganesan said...


Thanks for your time on reading this article. I assume you are talking about architecture-1 in reference to Auto Scaling. Yes, you are right that Auto scaling a web/app+MemCached combination is not a easy proposition. I have mentioned this difficulty in the article "The new memcached endpoint has to be propagated and configured on other memcached clients, which adds another complexity and devops engineering into the architecture". What i mean by devops engineering is that you should engineer a centralized discovery service which is updated/queried by web/app+Memcached instances for new endpoints. Netflix asgard follows that line, but i have not fully evaluated it. Also as you rightly pointed it will require a restart of your web/app process to recognize the new memcached configurations. Thanks for your question, I will take this as a opportunity to elaborate this point in detail in the article.

Wayne B. said...

Architecture 4 Question: if I am using the AmazonElastiCacheClusterClient-1.0.1 which allows me to read/write to memcache using the configuration endpoint then would I need to always write to both clusters if I have 2 availability zones for my deployment? Do I randomly choose one to read from and if it fails read from the other?

Anonymous said...

Thanks for your article.

Unknown said...

Nice post!

Jaya said...

Is there a coHQL equivalent in Elastic cache ? Is there a way to query the cache?


Saurav said...

Hey Harish,

This blog is a huge help for a beginner like me. I just have a simple query, naive it may sound though. Can we use a single node or a cluster across multiple EC2 instances? Or is it that ElastiCache is equivalent to (and in some cases better than) EC2 + Memcached? Hope you would clarify it for me. Thanks in anticipation.


Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.