While designing highly scalable systems load balancing tier becomes an integral part of any architecture. We have captured some of our prior experiences working with Amazon ELB in this article as points detailed below. Some of the points mentioned here will be encountered only by advanced users in complex use cases. But surely if you/your team have noted some of these points, I feel it might shorten your efforts while debugging a problem or designing a solution and not go through the same effort cycle and pain as our team.
In AWS, there are wide variety of solution choices for the Load balancing layer like Amazon Elastic Load Balancing (ELB) , EC2 AMI’s like HAProxy , Nginx , Zeus , Citrix NetScaler. In this article we are going to dissect our experience with Amazon ELB layer as X points which you will not frequently encounter in Amazon documents or blogosphere.
To know more about Configuring Amazon ELB in 4 Easy Steps, Refer article:
Currently there are 18 points in this article and i am having plans to add some more in coming days . So if you are an advanced user of Amazon ELB , please watch this article closely.
Some of the points are:
Point 1) Algorithms supported by Amazon ELB
Currently Amazon ELB only supports Round Robin(RR) and Session Sticky Algorithms.
Round Robin algorithm can be used for load balancing traffic between
- Web/App EC2 instances which are designed stateless
- Web/App EC2 instances which synchronizes the state between them
- Web/App EC2 instances which synchronizes the state using common data stores like MemCached , ElastiCache , Database etc.
- Web/App EC2 instances which are designed to be statefull
Point 2) Amazon ELB is not a PAGE CACHE
Amazon ELB is just a load balancer and not to be confused with Page Cache Server or Web Accelerator. Web Accelerators like Varnish can cache pages, Static assets etc and also do RR load balancing to backend EC2 servers. Amazon ELB is designed to do just Load balancing efficiently and elastically. If you need page accelerators + LB you can use Varnish or NetScaler in your LB Tier. Refer Article Varnish or NetScaler. Amazon ELB can be used with Amazon CloudFront to deliver the static assets and dynamic assets that can be page cached at edge location itself to reduce latency for above use cases.
Point 3) Amazon ELB can be pre warmed on request basis
Amazon ELB can be pre warmed by raising a request to Amazon Web Service Support Team. Amazon team will pre warm the Load Balancers in the ELB tier to handle the sudden load/flash traffic. This is advisable for scenarios like Quarterly sales/launch campaigns, promotions etc which follow flash traffic pattern. AWS team would require details like estimated Request per second, average request size in bytes, average response size in bytes, what percentage of traffic is SSL/ Non SSL, whether HTTP/1.1 keep alive is enabled ? etc from your team. Once provided, it will be activated by them. Amazon ELB pre warm cannot be done on hourly/daily basis (i think). It will be a cool feature if Amazon team can get these details and offer ELB Pre warming as a configurable feature into the AWS console (like Amazon DynamoDB console)
Point 4) Amazon ELB is not designed for sudden load spikes /Flash traffic
Amazon ELB is designed to handle unlimited concurrent requests per second with “gradually increasing” load pattern. It is not designed to handle heavy sudden spike of load or flash traffic. For example: Imagine an e-commerce website whose traffic increases gradually to thousands of concurrent requests/sec in hours, Amazon ELB can easily handle this traffic pattern. According to RightScale benchmark, Amazon ELB was easily able to handle 20K+ requests/sec and more in such patterns. Whereas imagine use cases like Mass Online Exam or GILT load pattern or 3-Hrs Sales/launch campaign sites expecting 20K+ concurrent requests/sec spike suddenly in few minutes, Amazon ELB will struggle to handle this load pattern. If this sudden spike pattern is not a frequent occurrence then we can pre warm ELB, else we need to look for alternative Load balancers in AWS infrastructure.
Comparison analysis of HAProxy vs Amazon ELB, Refer article:
Currently Amazon ELB only supports following protocols: HTTP, HTTPS (Secure HTTP), SSL (Secure TCP) and TCP protocols. ELB supports load balancing for the following TCP ports: 25, 80, 443, and 1024-65535. In case RTMP or HTTP Streaming protocol is needed, we need to use Amazon CloudFront CDN in your architecture.
Point 6) Amazon ELB timeouts at 60 seconds (kept idle)
Amazon ELB currently timeouts persistent socket connections @ 60 seconds if it is kept idle. This condition will be a problem for use cases which generates large files (PDF, reports etc) at backend EC2, sends them as response back and keeps connection idle during entire generation process. To avoid this you'll have to send something on the socket every 40 or so seconds to keep the connection active in Amazon ELB. Note: I heard we can extend this value after explaining the case to AWS support team.
Point 7) Amazon ELB does not provide Permanent or Fixed IP for its load Balancers
Currently Amazon ELB does not provide fixed or permanent IP address for the Load balancing instances that are launched in its tier. This will be a bottleneck for enterprises which have compulsion to whitelist their Load balancer IP’s in external firewalls/gateways. For such use cases, currently we can use HAProxy, NginX, NetScaler over EC2 attached with Elastic IPs as load balancers in AWS infrastructure.
Designing High Availability @ HAProxy / ELB Layer
Point 8) Amazon ELB cannot do Multi AWS Region Load Balancing
Amazon ELB can be used to Load balance
- Multiple EC2 instances launched inside a Single Amazon Availability Zone
- Multiple EC2 instances launched inside Multiple Availability Zones inside a Single Region
To know more about DNS Load Balancing :
To know more about Geo Distributed Load Balancing using Amazon Route 53 :
Point 9) Amazon ELB sticks request when traffic is generated from Single IP
This point comes as a surprise to many users using Amazon ELB. Amazon ELB behaves little strange when incoming traffic is originated from Single or Specific IP ranges, it does not efficiently do round robin and sticks the request. Amazon ELB starts favoring a single EC2 or EC2’s in Single Availability zones alone in Multi-AZ deployments during such conditions. For example: If you have application A(customer company) and Application B, and Application B is deployed inside AWS infrastructure with ELB front end. All the traffic generated from Application A(single host) is sent to Application B in AWS, in this case ELB of Application B will not efficiently Round Robin the traffic to Web/App EC2 instances deployed under it. This is because the entire incoming traffic from application A will be from a Single Firewall/ NAT or Specific IP range servers and ELB will start unevenly sticking the requests to Single EC2 or EC2’s in Single AZ.
Note: Users encounter this usually during load test, so it is ideal to load test AWS Infra from multiple distributed agents.
Point 10)Too long Load Balancer CNAMES causes issues in some firewalls /ISP
Some ISP's do not allow Amazon ELB CNAMES that exceeds 32 characters and some firewalls versions/models (like Cisco PIX) will not allow larger CNAMES , in such cases try to have shorter name.
Point 11) Amazon ELB cannot Load Balance based on URL patterns
Amazon ELB cannot Load Balance based on URL patterns like other Reverse proxies. Example Amazon ELB cannot direct and load balance between request URLs www.xyz.com/URL1 and www.xyz.com/URL2. Currently for such use cases you can use HAProxy in EC2.
Point 12) Amazon ELB can easily support more than 20K+ Concurrent reqs/sec
Amazon ELB is designed to handle unlimited concurrent requests per second. ELB is inherently scalable and it can elastically increase /decrease its capacity depending upon the traffic. According to a benchmark done by RightScale, Amazon ELB was easily able to scale out and handle 20K or more concurrent requests /sec. Refer URL: http://blog.rightscale.com/2010/04/01/benchmarking-load-balancers-in-the-cloud/
Point 13) Amazon ELB does not provide logs
Amazon ELB currently does not provide access to its log files for analysis. We cannot debug load balancing problems , analyze the traffic and access patterns; categorize bots / visitors etc currently because we do not have access to the ELB logs.This will also be a bottleneck for some organizations which has strong audit/compliance requirements to be met at all layers of their infrastructure. Amazon ELB can generate the logs and put in Amazon S3 buckets– (feature request to Amazon ELB product team)
Point 14) Monitoring Amazon ELB
Amazon ELB is an AWS building block and it does not currently provide access to its logs or Stats files for monitoring. Secondly, we cannot get full access to the Load Balancers launched inside the ELB tier and install any monitoring agents in it. This closed model of ELB makes us rely only on CloudWatch metrics for monitoring. Refer this URL for ELB metrics that can be currently monitored: http://harish11g.blogspot.in/2012/02/cloudwatch-elastic-load-balancing.html
Point 15) Amazon ELB and Compliance requirements
SSL Termination can be done at 2 levels using Amazon ELB in your application architecture .They are
- SSL termination can be done at Amazon ELB Tier, which means connection is encrypted between Client(browser etc) and Amazon ELB, but connection between ELB and Web/App EC2 is clear. This configuration may not be acceptable in strictly secure environments and will not pass through compliance requirements.
- SSL termination can be done at Backend with End to End encryption, which means connection is encrypted between Client and Amazon ELB, and connection between ELB and Web/App EC2 backed is also encrypted. This is the recommended ELB configuration for meeting the compliance requirements at LB level.
- Important ELB-SSL Reference URLs
- Refer this URL to understand how to configure SSL Offloading in Amazon ELB : http://harish11g.blogspot.in/2012/03/ssl-offloading-elastic-load-balancing.html
- How Amazon ELB-SSL support/offloading helps Cloud Admins ? :http://harish11g.blogspot.in/2010/10/amazon-elastic-load-balancing-support.html
Sometimes ELB assigns its load balancers with IP address ending with X.X.X.255. Though it is technically fine, there are certain networks that will not properly route to an IP address ending in X.X.255 series. Unfortunately, it is not possible to exclude an IP address ending in .255 from ELB currently. It is possible, in such circumstances, some requests from certain users may face issues. Note this when you are debugging ELB for missing requests.
Point 17) Amazon ELB inherently fault tolerant and Scalable service
Elastic Load Balancer does not cap the number of connections that it can attempt to establish with the load balanced Amazon EC2 instances. We can expect this number to scale with the number of concurrent HTTP, HTTPS, or SSL requests or the number of concurrent TCP connections that the Elastic Load Balancer receives. Since multiple Load balancers are launched in ELB tier, it is inherently fault tolerant as well. If you need a Scalable and and elastic LB layer , then ELB comes highly recommended. Amazon ELB can be deployed to support following HA architectures in AWS : http://harish11g.blogspot.in/2012/02/elastic-load-balancing-aws-deployment.html
Point 18) Amazon ELB + Amazon AutoScaling : No graceful connection termination
Amazon ELB can be configured with work seamlessly with Amazon AutoScaling and Amazon CloudWatch. The New EC2 instances launched by AutoScaling are added to the ELB for Load balancing automatically and whenever load drops; existing EC2 instances can be removed by Auto Scaling from ELB. Both Auto Scaling and ELB use CloudWatch Monitoring for enabling this functionality. The important point to remember while using this kind of integration is Amazon AutoScaling does not gracefully (without interruption to existing connections) remove Web/App EC2 from Amazon ELB. The connections are instantly dropped when the Web/App EC2 is removed and no grace period is given by ELB or AutoScaling. This behavior of Auto scaling can make dozens or hundreds of users to get error pages when they are using the application when such an event occurs in the backend infrastructure.
To know more about Amazon Auto Scaling :
Harish Presentation @ AWS Summit on Amazon Auto Scaling
How Auto Scaling can save costs ?
Other Load Balancing Articles