Tuesday, January 8, 2013

Deep dive into Amazon Elasticache

1. Connection Overheads:

Amazon ElastiCache node (MemCached engine) uses a connection buffer per TCP connection to read/write data out over the network. This is not a problem when there are few concurrent connections to an Amazon ElastiCache Node, whereas when you get into hundreds of thousands of connections with hundreds of Amazon ElastiCache nodes, this adds up to connection memory overheads. Example: On each Amazon ElastiCache node, the memory made available for storing cache items is the total available memory on that cache node minus the memory used for connections and other overhead (like TCP connection buffer). This overhead value can be configured using the memcached_connections_overhead parameter in Amazon ElastiCache. For example, a cache node of type cache.m3.2xLarge has a max_cache_memory of 29600 MB. With the default memcached_connections_overhead value of 100 MB, the Memcached process will have 29500 MB available to store cache items. The default value for the memcached_connections_overhead parameter of 100 MB will satisfy most use cases; however, the required amount of allocation for connection overhead can vary depending on multiple factors, including request rate, payload size, and the number of connections. For cache heavy dependent site with high concurrency using multiple nodes of cache.m3.2xlarge instance, an overhead size of just 100 MB might not withstand sometimes and may cause swapping and degrade performance. That’s why Amazon ElastiCache has made this overhead a user configurable property. The configuration change will affect all cache nodes in the cluster. You need to monitor the swap, latency and number of concurrent requests using Amazon CloudWatch periodically and accordingly increase this parameter size (it could be few hundreds of MB increase to GB depending upon the usage readings).

Instead of wasting memory on Connection buffers and overheads it could be better if it is used to store user data. To reclaim this memory for user data some techniques that are employed in the industry:
·                     Implement a per-thread shared connection buffer pool for TCP and UDP sockets. This change can enable you to reclaim multiple gigabytes of memory per server. A patch was provided by Facebook engineering team in their engineering blog on per-thread shared connection buffer pool, we need to check with AWS ElastiCache team whether this patch is applied on or any equivalent tuning is done on Amazon ElastiCache Nodes for reducing this overhead.
·                     Use UDP wherever applicable and feasible. Currently UDP is not supported by AWS ElastiCache. In future it could be a possibility.

2. Elasticity Implications:

In this section we will explore the implications of Elasticity which Amazon ElastiCache brings to the architecture. The Amazon ElastiCache nodes use memcached engine currently and they are not aware of other memcached nodes running inside the cache cluster. Since the cache nodes are not aware of the data distribution among its peers, the balancing act has to be done by the clients. Usually cache clients use a simple hashing algorithm to PUT/GET values from the corresponding cache nodes. This works well if the cache cluster size is static, but for any growing web application being static will not serve the purpose. Let us assume an online business decides to run a promotion where nearly 6X more visitors are expected to hit the site during this period. It is natural that for any heavy cache dependent site their memory needs will also increase correspondingly during this period. Amazon ElastiCache service understands this elastic behavior in the online business and provides an easy mechanism using API or console to add / remove cache nodes in existing cluster. Whenever your cache memory needs grow, you can simply add “N” new cache nodes into the Amazon ElastiCache cluster. But as an IT architect you need to understand this elastic action comes with certain side effects in a distributed caching scenario with Memcached-Amazon ElastiCache Nodes. They may cause swamping of your backend with requests if it is not dealt properly. Let us understand effects in detail:

Effect 1: Cold Cache: Amazon ElastiCache nodes using Memcached engine are ephemeral in nature. The KV data is stored in the memory and it is not persisted to the disk. Whenever you add new cache nodes you need to understand the proportion of increase % and accordingly take a cache node scale out strategy.  Imagine you have 2 cache nodes each with cache.m1.large capacity, now if you decide to add 2 more cache nodes of the same type into the cluster, you are adding close to 50% of capacity in cold state without  proper cache warming. This action may bring undesirable consequences and swamp your backend with heavy requests until the Amazon ElastiCache Nodes are properly warmed. Whereas if you have 100 cache.m1.large nodes running your cluster and if you are planning to add 5 more nodes into your cluster , it will not have a big impact if the backend is designed to handle this little spike variation. Some best practices that can be followed in this aspect are:
Plan for the addition of the nodes well in advance before the promotion, so that enough time is given for warming the cache nodes adequately. 
Add the cache nodes in right proportions which your backend can optimally take without performance disruption. Example: Instead of adding 2 cache nodes in 2 X m1.large cache cluster, adding one by one with enough time to warm will add only ~25% load to your backend. For some advanced strategies using cache redundancy, Maintenance windows etc in AWS to address this situation refer this URL: http://harish11g.blogspot.in/2012/11/amazon-elasticache-memcached-ec2.html

Effect 2: Object Remapping: The most common approach used by clients of Amazon ElastiCache nodes is to distribute object “o” among multiple cache nodes “n” by putting object o in cache node number hash(o) mod n function( result of this function). This approach is good for static cache node scenarios, but when you add or remove cache nodes then object “o” may need to be hashed to a new location every time the “n” nodes change. This operation can thunder your backend with heavy load and cause undesirable consequences. Ideally it would be nice, if a new cache node is added/removed only fair share of objects were remapped to other cache nodes in the Amazon ElastiCache. This can be achieved by using “Consistent Hashing” in the cache node clients. Since Amazon ElastiCache nodes are not peer aware, it does not require any change, it is only in the cache clients we need to apply this intelligent hashing approach. Consistent Hashing was introduced by David Karger et al in the paper “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web”. The paper can be found here. http://www.akamai.com/dl/technical_publications/ConsistenHashingandRandomTreesDistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf
Consistent hashing was first implemented by last.fm team on memcached library as “ketama”. Refer URL: http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients
Now let us explore what is Consistent hashing? How it helps and so on. The idea behind the consistent hashing algorithm is to hash both objects and cache nodes using the same hash function and consistently maps objects to the same cache node, as much as possible. Consistent hashing uses a mechanism that acts like a clock. The hash function actually maps objects and cache nodes to a number range. The number range values wrap around like a circle, that's why we call this circle a continuum. Imagine in the below picture a circle with number of objects 1,2,3,4 and cache nodes A,B and C. To assign which cache node an object goes in, you move clockwise round the circle until you find an appropriate cache node. So in the below diagram, you can see object 1 and 4 belong to cache node A, object 2 belongs to cache node B and object 3 belongs in cache node C.

Consider what happens if cache node C is removed from the cache cluster: object 3 now belongs will be remapped to cache node A, and all the other object mappings are unchanged. Same way, If we add another cache node D in the position marked below in the diagram it will take objects 3 and 4, only object 1 belonging to A.

The principle advantage of consistent hashing is that departure or arrival of a node only affects its immediate neighbors and other nodes remain unaffected .This approach reduces the remapping of the objects between the cache nodes and there by significantly decrease the swamping of backend servers in event of cache elasticity.
Consistent Hashing implementation is available in most of the popular Amazon ElastiCache-Memcached clients we use every day. There is a saying “everything comes with a consequence”; since the approach used in consistent hashing is random, it is possible to have a very non-uniform data and load distribution of objects between cache nodes inside/across a cluster.
To address this issue, more advanced KV systems like Membase and Amazon Dynamo uses a variant of consistent hashing.  Instead of mapping a node to a single point in the circle, each node gets assigned to multiple points in the ring (Amazon Dynamo uses the concept of “virtual nodes”). A virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node. Effectively, when a new node is added to the system, it is assigned multiple positions (henceforth, “tokens”) in the ring. To know more about this approach read Amazon Dynamo paper. I hope in future Amazon ElastiCache Team implements this concept in their distributed caching service as well, because it will help us optimally use nodes.  

When your online user base grows, naturally the cache size also grows for a cache dependent application. AWS understands this very well and have designed their Amazon ElastiCache with a core idea that you can elastically increase and decrease the number of cache nodes. Though it is easy to add new nodes in the cluster, this activity has some impacts on client side -> Memcached client library.

To know more about Caching architectures using Amazon ElastiCache 

3. Auto Discovery:

Imagine you have an online application which has following system components:

  • 25+ Java based web/app servers deployed on Amazon EC2's in Availability Zone -A
  • 5+ Amazon ElastiCache nodes in a single cluster in Availability Zone -A
  • Web/App Servers use Java based memcached client ( example spymemcached client) to connect to the Amazon ElastiCache(Memcached) nodes

Now you are planning to run an online sales promotion and choose to add few ElastiCache nodes in the cluster to match your growing memory and performance requirements. The java based memcached clients cannot immediately identify and recognize the new ElastiCache nodes, you would have to add the cache end points manually in the client configuration and restart the the web/application server process(in this case 25+ web/app EC2's). This re-initialization action can result in temporary disruption of some services and even downtime on some architectures.
Amazon ElastiCache has recently introduced an Auto discovery feature on the spymemcached client(java based) and PHP to eliminate this complexity. You can download this patch from Github. Soon AWS will make this feature available for all popular memcached clients. 
Using Amazon ElastiCache Auto Discovery feature, customer applications now transparently adapt to the addition/ deletion of cache nodes in the cache clusters. The applications are automated to react quickly to changes in your cache cluster without downtime. Amazon ElastiCache clusters now include a unique Configuration Endpoint(DNS Record). This record contains the DNS names of each of the cache nodes that belong to the cluster. Amazon ElastiCache service will ensure that the Configuration Endpoint always points to at least one such “target” cache node. A query to the target cache node then returns endpoints for all the nodes in the cluster. AWS team has implemented this config command(which queries the target node) as an extension to the Memcached ASCII protocol. Since ElastiCache remains 100% Memcached-compatible, you can keep using your existing Memcached client libraries with new and existing clusters, but to take advantage of Auto Discovery you must use an Auto Discovery-capable client provided by AWS.

How Auto Discovery (AD) works ?

  • First the Web/Application AD client resolves the configuration endpoint's DNS name. Since the configuration endpoint maintains CNAME entries for all of the cache nodes inside the cluster, the DNS name resolves to one of the nodes
  • Second, the Web/App AD client then connects to that particular node(resolved above) and requests the configuration information for all of the other nodes. Since each node maintains configuration information for all of the nodes in the cluster, any node can pass configuration information to the Web/App AD client upon request
  • Third,the Web/App AD client receives the current list of cache node hostnames and IP addresses. Auto Discovery library then connects asynchronously to all of the other nodes in the cache cluster.
  • Since the cache nodes can be added/removed, consistent hashing (ketama) is built in the Auto Discovery client. To know more about why consistent hashing is needed refer article : http://harish11g.blogspot.in/2012/12/amazon-elasticache-memcached-consistent.html
  • Auto Discovery uses a 1 minute Polling frequency by default. Usually Cache nodes will not be frequently(in minutes/hours) added or removed in production, this frequency is more than enough for most production cases. Also before reducing this polling frequency, it is recommended one should test and know how it impacts the Web/App process and compute cycles.
  • Since there will be a gap of ~60 seconds between cache node removal and Auto Discovery polling detection, it is recommended to design Web/App clients with the exception handling and fall over capability to back end data store.  

Benefits of using Auto Discovery 

  • It avoids manual intervention and downtime.
  • When you scale up a cache cluster, the new nodes register themselves with the configuration endpoint and with all of the other nodes. When you scale down the cache cluster, the departing nodes de-register themselves. In both cases, all of the other nodes in the cluster are updated with the latest cache node metadata. .
  • Client programs poll the cluster at adjustable interval (default is per minute). If there are any changes to the cluster configuration, such as new or deleted nodes, the client receives an updated list of metadata and acts accordingly(connects or disconnects from them).

Getting started with Auto Discovery:

To get started, download the Amazon ElastiCache Cluster Client by clicking the “Download ElastiCache Cluster Client” link on the Amazon ElastiCache console. Before you can download, you must have an Amazon ElastiCache account; if you do not already have one, you can sign up from the Amazon ElastiCache detail page. After you download the client, you can begin setting up and activating your Amazon ElastiCache cluster by visiting the Amazon ElastiCache console. More details can be found here.

Launch Amazon ElastiCache in 3 Easy Steps:  

4. Economics behind choosing Cache nodes:

Amazon ElastiCache has variety of node types and often users end up choosing cheapest instance types and scaling out whenever the cache memory needs increase.  But “Cheap may not always be the best.”  Adding to this it may not be optimal for your requirements in terms of performance and cost and as a customer you may end up paying more because of bad choices.
In this article I have taken a sample cache memory requirements and application characteristics, based on that i have explored the economics of choosing a cache node type in Amazon ElastiCache which makes sense.
Following table illustrates the sample cost you will be spending on your Amazon ElastiCache for 100 GB of cache memory size.  Multiply this cost by 10X in case your Cache memory requirements are in TB range. The cost of building 100 GB Amazon ElastiCache cluster using various cache nodes are listed below:

Category 1 Cache Node types for Heavy Utilization
Network IO:  These node types come with High Network IO capacity. They can be used for Cache dependent sites which has heavy cache utilization in terms of requests and data transfer.
Compute utilization: Amazon ElastiCache internally uses memcached engine. Memcached is written using libevent model. This model allows the memcached to scale better with multiple cores, which effectively means better throughput. For applications which needs better concurrency and request throughput it is recommended to plan capacity with multiple cores for cache node types.  The cache nodes under “category-1” have 4-8 cores which are ideally suitable for application which utilizes cache heavily.
Elasticity: Any growing web application has elastic caching needs. In this case, the cache nodes will be added or removed on daily/weekly/monthly basis depending upon the needs. This activity affects the object remapping, node discovery and cache warming process.  Number of existing cache nodes, proportion of addition/removal of cache nodes, cache node type and frequency of addition/removal are some of the parameters that needs to carefully considered and planned during elasticity. Consistent hashing and Auto Discovery are some of the techniques that need to be adopted to minimize the complexity of this action. In general, it is better to build a strategy combining the below points:
  • ·         Proper base node type selection based on the current and project needs. Consolidating them as the cache memory size grows
  • ·         Distribute the data in multiple nodes (more is better than few)
  • ·         Add in smaller proportions with better cache warming procedures

Category 2 Cache Node types for Moderate Utilization
Network IO: Same as Category 1: High IO but depends/regulated upon node types (size).
Compute utilization:  Suitable for moderate workloads. Since cache node types in this category have only 2 cores, they are not suitable of heavy concurrent and compute driven cache access workloads.
Elasticity: All points discussed on category 1 apply here.

Category 3 Node types for Low utilization  
Most of them are Costlier, Some of them are performance wise poor (NW IO and compute) and offers less value for money for bigger cache memory requirements

Economics of choosing Cache Node Types:
From the table we can observe following points in the economics of choosing cache node type:
  • For a cache memory size requirement of 100 GB cache.m3.2xlarge, cache.m1.xlarge, cache.m2.2xlarge, cache.m2.4xlarge are good candidates. Amazon ElastiCache clusters built with these node types are very good package and offers overall better value to the users in terms of Elasticity, Proportions, Price and Performance
  • Smaller Cache node types (cache.m1.small) with lower per hour usage price need not be overall cost efficient when your cache memory need grows (~ 100 GB or more)
  • High CPU EC2 (cache.c1.xlarge) will be the costliest if the cache memory needs grows to GB/TB’s. This cache node type is not a good candidate for most Cache heavy (size) use cases. The only use case I can think of using cache.c1.xlarge when I have ~32 GB of Cache memory requirement which is not elastic and which will be heavily utilized. But anything above 32 GB of cache memory requirement will bleed your cost if you use this node type
  • Moderate IO Cache node types (like cache.m3.xlarge, cache.m1.medium, cache.m1.small) are costlier and less efficient compared to High IO peers. You end up paying more if your utilization is heavy in case your cache cluster is composed of Moderate IO Cache node types
  • The below point emphasizes how important it is choose the right cache node type for cache memory size. Planning the right cache node type according to your current and future requirements helps you save costs. Also I recommend you to consolidate your cache node type periodically till you reach a size of few 100 GB’s to save costs. Post 500 GB, building your cache clusters with High memory Cache node types are usually cost effective. Example: We can see for 100 GB cache, 2 X Cache.m2.4xlarge are sufficient.  But choosing this node type for this cache memory size will not be ideal because of the distribution and churn problems (in case if you are going to expand the memory size frequently). On the other hand if the cache memory size will be increased to or it is 1 TB then building the cache cluster using 15 X Cache.m2.4xlarge makes lots of sense in terms proportion and cost.

Choosing the Cache Node types: 

Application Characteristics
Cache Node Types
·         Cache dependent application with heavy cache data usage
·         Distributed Cache size:  100 GB -> 1 TB & above
·         Varied message sizes
·         High Concurrency of cache requests
·         Cache nodes will be added or removed on daily/weekly/monthly basis
·         Better performance and value for money

Category 1 Cache node types: Heavy Utilization nodes
·         Moderate  Cache dependency & usage
·         Distributed Cache size:  50-100 GB
·         Varied message sizes
·         Moderate concurrency requirements
·         Not very Elastic needs
·         Moderate performance and cost efficient

Category 2 Cache node types: Moderate Utilization nodes
·         Low Cache dependency & usage
·         Distributed Cache size:  ~50 GB and above
·         Small message sizes
·         Low concurrency requirements
·         Not very Elastic needs
·         Moderate performance and I am ready for $$$ leakage when memory needs grow
Category 3 Cache node types: Low Utilization nodes

5. Memory Allocation and Eviction policies:

Amazon ElastiCache internally uses Memcached 1.4.5 engine. We are going to explore the memory allocation and eviction policies of Amazon ElastiCache through our understanding of memcached internals in this section. If you are a complex and extensive user of Amazon ElastiCache it is better to have detailed understanding of how the internals work, to avoid leakages and maintain overall optimum performance.

How is it organized internally?

Amazon ElastiCache node usually breaks the allocated memory into smaller parts called pages. Each page is usually 1 megabyte in size. Each page is then assigned to a slab class when necessary. Each slab class is in turn divided into chunks of a specific size. The chunks in each slab have the same size.
Following diagram illustrates the above divisions:

If you are using memcached on EC2 you can view the slab, chunk and byte details by using the following command $. /memcached -vv
./memcached -vv
slab class   1: chunk size        80 perslab   13107
slab class   2: chunk size       104 perslab   10082
slab class   3: chunk size       136 perslab    7710
slab class   4: chunk size       176 perslab    5957
slab class   5: chunk size       224 perslab    4681
slab class   6: chunk size       280 perslab    3744
slab class   7: chunk size       352 perslab    2978
slab class   8: chunk size       440 perslab    2383
slab class   9: chunk size       552 perslab    1899
slab class  10: chunk size       696 perslab    1506

From the output of the above command you can observe that there is page with 80 byte chunks (slab class 1), a page with 104 byte chunks (slab class 2). In slab class 1, each chunk is 80 bytes and each page can then contain 13,107 chunks (1 MB / 80 bytes). This continues all the way up the slab chain till 1 megabyte (with growth factor of ~1.25). Note: There can be multiple pages assigned for each slab-class, but as soon as a page is assigned to a slab-class, it is permanent.
When you are storing items in Amazon ElastiCache, they are pushed into the slab class of the nearest fit. If your key + miscellaneous data + value is 70 bytes total, it will go into slab class 1, with an overhead loss of 10 bytes per item. If your data is 90 bytes total, it will go into Slab class 2, with an overhead of 14 bytes. If your cache access pattern ends up putting 90% of your pages in slab class 2, there will be less memory available for slab class 3. This simple model is followed by memcached engine to avoid memory defragmentation and get better speed in performance versus memory usage tradeoff.
When a page is full (meaning all chunks in the page are filled) and we still need to add another item, engine will fetch a new free page from the pool, assign it to the specified slab-class (depending upon item size), partitions it into chunks and gets the first available chunk to store the new item. Note: These 1024 byte pages are assigned on a FCFS basis (first come-first served), to the slab classes.
When there are no more pages left to assign to slab class, it will use LRU algorithm to evict items to reclaim memory. In the following section you can see in detail about the eviction mechanism of Amazon ElastiCache works.

When items are evicted?

Memory for an item is not actively reclaimed in Amazon ElastiCache. The memcached engine does not have in-built background threads that explicitly expires item and reclaims chunks. For instance, if you store an item and it expires, it still sits in the LRU cache at its position, until it falls to the end of the cache and is reused. However, if you fetch an expired item, memcached will find that the item is expired and free its memory for reuse.
To explain eviction mechanism in detail; Items are evicted from Amazon ElastiCache if they are expired or the slab class is completely out of free chunks and there are no free pages to assign to a slab class. In case there are no free chunks, or no free pages in the appropriate slab class, Amazon ElastiCache will look at the LRU in tail to reclaim an item. Basically, it will search the last few items in the “end” and identifies the ones that are already expired, makes it free for reuse. If it cannot find an expired item on the end, it will "evict" one which has not yet expired. Actually you could end up with one slab class constantly evicting recently used items, on the other hand another slab having a bunch of old items that just sit around. For example: When we need a 104 byte chunk, it will evict a 104 byte chunk, even though there might be a 280 byte chunk that is even older.  This explains the internal workings that “Each slab-class has its own LRU and statistical counter updates, it behaves like a separate cache itself, it is not global LRU, but slab class LRU in short”.
Amazon ElastiCache has parameters like slab_automove(0-2) and slab_reassign(0,1) configurable. Note from memcached release notes: slabs automove 2 enables an ultra aggressive page reassignment algorithm. On every eviction, it will try to move a slab page into that class. You should never run this in production unless you have a very, very good idea of what's going to happen. For most people who have spurious evictions everywhere, you'll end up mass evicting random data and hurting your hit rate. It can be useful to momentarily enable for emergency situations, or if you have a data access pattern where evictions should never happen. 

How to reduce memory overheads?

If you are running Amazon ElastiCache clusters of few MB/GB overheads will not matter much. But if you are running Amazon ElastiCache clusters spanning in Hundreds of GB- > TB, you will end up losing lots of allocated memory as overheads if the chunk size and chunk size growth factor is not planned properly. (Chunk size growth factor = Growth factor controlling the size of each successive memcached chunk - each chunk will be chunk_size_growth_factor times larger than the previous chunk.)
Since memcached engine of Amazon ElastiCache does defrag, you need to plan for better compaction and avoid overheads to keep the chunk sizes closer to your item sizes. This is not a one-time activity, but a periodical activity aligned to your cache growth.
 A simple approach that can be followed is:
·         Identify the item size usage and cache access patterns periodically
·         Set the initial chunk_size and chunk_size_growth_factor accordingly in AWS console or API's for Amazon ElastiCache.

    You can also set the factor using (-f) and the initial chunk-size (-s) options if its memcached on EC2.
·         If item sizes are big and predictable it is recommended to have bigger chunks and growth factors, if the item sizes are varied it is better to have smaller initial size and growth factor. This will keep the wastage minimal.

Disclaimer: The article is entirely based on the premise that Amazon ElastiCache uses memcached 1.4.5 as the engine. If AWS team has customized some sections of memcached engine, the author will not be liable for misinforming readers.

    No comments:

    Need Consulting help ?


    Email *

    Message *

    All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.