In this
article we will explore the implications of Elasticity which Amazon ElastiCache
brings to the architecture. The Amazon ElastiCache nodes use memcached engine currently
and they are not aware of other memcached nodes running inside the cache
cluster. Since the cache nodes are not aware of the data distribution among its
peers, the balancing act has to be done by the clients. Usually cache clients
use a simple hashing algorithm to PUT/GET values from the corresponding cache
nodes. This works well if the cache cluster size is static, but for any growing
web application being static will not serve the purpose. Let us assume an
online business decides to run a promotion where nearly 6X more visitors are
expected to hit the site during this period. It is natural that for any heavy cache
dependent site their memory needs will also increase correspondingly during
this period. Amazon ElastiCache service understands this elastic behavior in
the online business and provides an easy mechanism using API or console to add
/ remove cache nodes in existing cluster. Whenever your cache memory needs grow,
you can simply add “N” new cache nodes into the Amazon ElastiCache cluster. But
as an IT architect you need to understand this elastic action comes with certain
side effects in a distributed caching scenario with Memcached-Amazon ElastiCache
Nodes. They may cause swamping of your backend with requests if it is not dealt
properly. Let us understand effects in detail:
Effect 1: Cold Cache: Amazon ElastiCache nodes using
Memcached engine are ephemeral in nature. The KV data is stored in the memory
and it is not persisted to the disk. Whenever you add new cache nodes you need
to understand the proportion of increase % and accordingly take a cache node scale
out strategy. Imagine you have 2 cache
nodes each with cache.m1.large capacity, now if you decide to add 2 more cache
nodes of the same type into the cluster, you are adding close to 50% of
capacity in cold state without proper cache
warming. This action may bring undesirable consequences and swamp your backend
with heavy requests until the Amazon ElastiCache Nodes are properly warmed.
Whereas if you have 100 cache.m1.large nodes running your cluster and if you
are planning to add 5 more nodes into your cluster , it will not have a big
impact if the backend is designed to handle this little spike variation. Some
best practices that can be followed in this aspect are:
Plan for the
addition of the nodes well in advance before the promotion, so that enough time
is given for warming the cache nodes adequately.
Add the
cache nodes in right proportions which your backend can optimally take without
performance disruption. Example: Instead of adding 2 cache nodes in 2 X
m1.large cache cluster, adding one by one with enough time to warm will add only
~25% load to your backend. For some advanced strategies using cache redundancy,
Maintenance windows etc in AWS to address this situation refer this URL: http://harish11g.blogspot.in/2012/11/amazon-elasticache-memcached-ec2.html
Effect 2: Object Remapping: The most common approach used by clients
of Amazon ElastiCache nodes is to distribute object “o” among multiple cache
nodes “n” by putting object o in cache node number hash(o) mod n function(
result of this function). This approach is good for static cache node scenarios,
but when you add or remove cache nodes then object “o” may need to be hashed to
a new location every time the “n” nodes change. This operation can thunder your
backend with heavy load and cause undesirable consequences. Ideally it would be
nice, if a new cache node is added/removed only fair share of objects were
remapped to other cache nodes in the Amazon ElastiCache. This can be achieved
by using “Consistent Hashing” in the cache node clients. Since Amazon
ElastiCache nodes are not peer aware, it does not require any change, it is
only in the cache clients we need to apply this intelligent hashing approach. Consistent
Hashing was introduced by David Karger et al in the paper “Consistent Hashing
and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the
World Wide Web”. The paper can be found here. http://www.akamai.com/dl/technical_publications/ConsistenHashingandRandomTreesDistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf
Consistent
hashing was first implemented by last.fm team on memcached library as “ketama”.
Refer URL: http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients
Now let us
explore what is Consistent hashing? How it helps and so on. The idea behind the consistent
hashing algorithm is to hash both objects and cache nodes using the same hash
function and consistently maps objects to the same cache node, as much as
possible. Consistent hashing uses a mechanism that acts like a clock. The hash
function actually maps objects and cache nodes to a number range. The number range
values wrap around like a circle, that's why we call this circle a continuum. Imagine
in the below picture a circle with number of objects 1,2,3,4 and cache nodes
A,B and C. To assign which cache node an object goes in, you move clockwise
round the circle until you find an appropriate cache node. So in the below
diagram, you can see object 1 and 4 belong to cache node A, object 2 belongs to
cache node B and object 3 belongs in cache node C.
Consider
what happens if cache node C is removed from the cache cluster: object 3 now
belongs will be remapped to cache node A, and all the other object mappings are
unchanged. Same way, If we add another cache node D in the position marked
below in the diagram it will take objects 3 and 4, only object 1 belonging to A.
The
principle advantage of consistent hashing is that departure or arrival of a
node only affects its immediate neighbors and other nodes remain unaffected .This
approach reduces the remapping of the objects between the cache nodes and there
by significantly decrease the swamping of backend servers in event of cache elasticity.
Consistent Hashing
implementation is available in most of the popular Amazon ElastiCache-Memcached
clients we use every day. There is a saying “everything comes with a consequence”;
since the approach used in consistent hashing is random, it is possible to have
a very non-uniform data and load distribution of objects between cache nodes
inside/across a cluster.
To address
this issue, more advanced KV systems like Membase and Amazon Dynamo uses a
variant of consistent hashing. Instead
of mapping a node to a single point in the circle, each node gets assigned to
multiple points in the ring (Amazon Dynamo uses the concept of “virtual nodes”).
A virtual node looks like a single node in the system, but each node can be
responsible for more than one virtual node. Effectively, when a new node is
added to the system, it is assigned multiple positions (henceforth, “tokens”)
in the ring. To know more about this approach read Amazon Dynamo paper. I hope in
future Amazon ElastiCache Team implements this concept in their distributed caching
service as well, because it will help us optimally use nodes.
Related Articles
No comments:
Post a Comment