Friday, August 24, 2012

HAProxy- Amazon EC2 Instance Types and AMI Types

Before choosing optimal Amazon EC2 instance Type for HAProxy layer in AWS we need to minimum understand the parameters involved.  They are:

On HAProxy Side
  • HAProxy follows an event-driven, single-process model. Event-driven models can address the memory limits, System scheduler limits, and lock contention efficiently than Apache like models because all the tasks are in user-space which allows finer resource and time management.
  • Processing several hundreds of tasks in a millisecond using HAProxy is possible, and usually the memory usage will be in the order of a few kilobytes per session compared to Apache-like models (megabytes per process).
  • HAProxy is also optimized to perform using very low CPU usage on moderate loads and High loads (Event driven program models generally don't scale well on multi-processor systems)
  • From the above it is evident that by design HAProxy, is not memory intensive and does not utilize additional cores.

On Amazon EC2 Side
  • Amazon Web Services offers various EC2 instance types.  In addition to this AWS has 32-bit and 64-bit platform. Because of such variety provided by AWS in EC2’s, new users often get confused on which instance type is suitable for the HAProxy tier.
  • The virtual interface of Amazon EC2’s peaks at 100K PPS (packets per second) in total (input+output). Earlier load tests have pointed that that a maximum of about 125K PPS were achieved, with an average of ~120K being more typical for M1.Large. During such scenarios, other people running instances on the same physical hardware can be affected.
  • Scale out strategy is better than Scale up for many uses cases using HAProxy as Load balancer in Amazon EC2. Even in reference to above point, Scale out minimizes the risk of getting affected by noisy neighbors  
  • EC2 instances can be launched from EBS Backed AMI’s or S3 Backed AMI’s – Which is ideal for a HAProxy tier?

On Application Side
  • Whether the application supports HTTP or HTTPS protocol?
  • What is the concurrency expected in the application LB tier?

Choosing an EC2 Instance Type

Scenario 1: If the application expects around ~4-10K concurrency on HTTP protocol, then have Multiple (2-3) M1.Large EC2 instance Type with S3 Backed AMI’s are ideal. Sync logs to S3. 64 bit platform is preferred. M1.large comes with High IO compared to m1.small or c1.medium. 

Scenario 2: If the application expects around ~100 requests/sec concurrency on HTTP protocol, then a single c1.medium EC2 instance Type with S3 Backed AMI’s are ideal.Sync logs to S3. 32 bit or 64 bit platform both ok. Since the traffic is not heavy c1.medium might be sufficient. T1.micro is suitable only for not so frequent spikey patterns and surely a NO for such scenarios. 

Scenario 3: If the application expects around ~4-10K requests/sec concurrency on HTTPS protocol, then have Multiple M1.XLarge EC2 instance Type with S3 Backed AMI’s. SSL termination needs more memory and CPU. Using “taskset” assign CPU Core affinity for HAProxy and Stud processes. 64 bit platform is mandatory for such use cases. M1.Xlarge comes with High IO compared to lower peers. 

Choosing an EC2 AMI Type

Amazon Web Services has two types of AMI’s – S3 backed AMI and EBS backed AMI.
EBS backed AMI comes with an EBS volume and are faster to launch (~ 20 secs), whereas S3 backed AMI’s are little slower to launch (~150 secs) compared to EBS AMI’s.  EC2 Instances launched from S3 backed AMI’s comes only with ephemeral disks and do not have EBS volume; we need to attach one depending upon the needs.  EBS backed AMI’s cost extra (for the inbuilt EBS volume) than the S3 backed AMI which usually comes only with Ephemeral disks.
When we are constructing a Load Balancing tier using HAProxy in AWS, there are multiple approaches we can adopt, they are:

Option 1: Install HAProxy on a S3 Backed AMI and re bundle -> launch the same. The HAProxy and its logs will reside in the Ephemeral disk of EC2. The logs have to be periodically synced to S3. Log analysis can be done using AWS EMR or other programs. Since there is no dependency on EBS for HAProxy, we can totally avoid performance issues and EBS dependent outages. 

Option 2: Install HAProxy on a S3 Backed AMI and attach an EBS volume for storing the logs. The logs can be kept in EBS as long as we want. Logs can also be periodically synced to S3 or Splunk server. Log analysis can be done using AWS EMR from S3 or Splunk or other programs. 

Option 3: Some customers will prefer a single Linux variant (like RedHat) across all tiers of their technical stack for easy maintenance. Note: Some of the Linux OS variants are currently available only on EBS backed AMI in AWS. Also some customers do not have admin resources to manage their systems and logs efficiently; their HAProxy logs will lie in the EBS disks for months and are analyzed only on need basis. For both the cases, it is preferable to use EBS backed AMI for HAProxy.

Other General recommendations:
  • Since HAProxy cannot be clustered, for achieving High Availability, Add Multiple HAProxy EC2 under Route53 in Weighted or DNS RR.
  • HAProxy can be scaled out automatically using Custom scripts inside Amazon VPC and Non VPC.
  • Depending upon the Load Volatility scenarios (Example: Seasonal loads), optimum EC2 instance capacity for HAProxy should be revisited for cost savings in AWS.  


jeff said...

Why wouldn't you just use their new 'internal ELB'?

Unknown said...

In recent tests, I found the limit in pps to depend on the instance type. m1.small and c1.xlarge were in the 10kpps range, while I could achieve 120kpps on m2.2xlarge. Tests were done with iperf in UDP mode with 128 bytes packets, between 2 instances of the same type.

Anonymous said...

Internal ELB only works if you are in a VPC.

Unknown said...

Moreover the ELB is brutal with existing connections when scaling down:

With HAProxy, you can gracefully remove servers, as it cleanly drains existing connections.

Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.