Cloud, Big Data and Mobile: Deeper Health Checks and Problems in Load Balancing in AWS

Sunday, August 18, 2013

Deeper Health Checks and Problems in Load Balancing in AWS

Health Checks are one of the essential mechanisms that helps you to keep N-Tiered system highly available. Usually a simple script or program is deployed on the Web/App Server. The Health check component of Load Balancer is configured to frequently call this script in Web/App Server in a light weight protocol. Based on the response from the script/program the Load balancer decides the status of the Web/App Servers and accordingly direct the requests to healthy Web/App servers. This is a usual mechanism that is followed in all popular load balancers like Amazon ELB, Netscaler, HAProxy and NGinx in AWS cloud. This sounds simple and straight forward right, but some of the customers i have consulted follow a much deeper Health check diagnostic mechanism and it might have problems when migrated to AWS cloud.
Let us explore this case in detail :

What is the architecture ?
A simple multi-tiered architecture with : A load balancer deployed at the front. The Web/App Server has the health script/program. The database is MySQL deployed Master+ Slave mode.

What is deeper Health Check ?
The script/program deployed in the Web/App Server is little intelligent; when it is called by load balancer it performs simple operations and checks the status of the Database. So when you get a response back from the health check script / program you are verifying whether the health of DB and Web/App server is sound at the load balancer tier.

What is the problem scenario ?
Imagine when migrating this infrastructure to AWS you have adopted the standard architecture pattern consisting of :

Amazon ELB is used as the Load balancer
Web/App Server in auto scaling mode
MySQL moved to Amazon RDS+Multi-AZ with RR

Now let us explore this problem in detail :

Imagine the any of the following condition in your production, network between database and Web/App is down intermittently for few minutes or RDS MySQL is elevating the Hot Standby as new Master. In such scenarios, the health check response actually timeouts at Database level, whereas the Load Balancer will mark the even the healthy App Servers as unhealthy because of the deeper health checks. This is not good especially for Amazon Auto Scaled scenario's where Amazon ELB marks Web/App EC2 as unhealthy because of deeper health check and Amazon Auto Scaling keeps restarting the Web/App EC2 auto automatically to maintain minimum healthy farm. This unwanted effect can cascade the overall availability and surely not good for the production in AWS. So in short Deeper Health checks are not surely recommended for complex N-Tier systems that follows Auto scaling/healing and Service oriented architecture patterns in AWS.
Usually the purpose of health check is to check the status of next tier or service consumed by a particular tier. Deeper health checks is heavy weight and it usually takes much more time to respond because majority of your tiers are exercised in this process. If we set this frequency too aggressive, then health checks itself will eat lots of your CPU. So the frequency of the health checks and the response time out have to be set considerably large. Also during heavy traffic scenario, such heavy weight calls can be queued and you might not get faster response in deeper health checks.
Deeper health checks are usually suitable for simple and fixed infrastructures. When your infrastructure is non elastic , the decisions are taken manually by the ops team after analyzing the particular failing tier. For Elastic Auto scaled workloads in AWS it is better to isolate the health checks of load balancing tier separate from Deeper Health checks that can be used for assessing the availability of the infrastructure.

Cloud, Big Data and Mobile

Pages

Sunday, August 18, 2013

Deeper Health Checks and Problems in Load Balancing in AWS

No comments:

Need Consulting help ?

Followers

My Presentations / Webinars / Conferences

Popular Posts - All Time

My Articles

SlideShares