Tuesday, July 24, 2012

Exploring Amazon Availability Zones

Amazon Web Services (AWS) currently serves hundreds of thousands of customers in more than 190 countries. AWS is steadily expanding their global infrastructure to help customers achieve lower latency and higher throughput, and to ensure that customer data resides only in the Region they specify. AWS currently operates at 9 regions around the world and they are constantly expanding their infrastructure as I write this article. Following diagram illustrates their current regional infrastructure distribution:

Note: Image is not the latest

Each Amazon EC2 Region is designed to be completely isolated from the other Amazon EC2 Regions. This infrastructure design achieves the greatest possible failure independence and stability. Also by launching EC2 instances in separate Amazon Regions, we can design our application to be closer to specific customers or to meet legal/compliance or other requirements. 
Every Amazon Region is further sub divided into Availability Zones. By launching EC2 instances in separate Availability Zones (AZ), we can protect our applications from the failure of a single location.

So what is an Amazon Availability Zone?

Amazon operates state-of-the-art, highly available data center facilities. However, failures can occur that affect the availability of EC2 instances that are in the same location. Although this is rare, if you host all your Amazon EC2 instances in a single location that is affected by such a failure, your instances will be unavailable.
So to overcome this every Amazon Region is further sub divided into Availability zones. Amazon Availability Zones are distinct physical locations having Low latency network connectivity between them inside the same region and are engineered to be insulated from failures from other AZ’s. Each availability zone runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable; they have Independent power, cooling, network and security. Common points of failures like generators and cooling equipment are not shared across Availability Zones. Additionally, they are physically separate; such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone. The following diagram illustrates availability zone concept: (image Source : AWS)

We can visualize each AZ internally having Single Data Center or Multiple Data centers. The following diagram illustrates the current AZ allocation of AWS infrastructure:(image Source : AWS)

Note: Image is not the latest
  • ·         Region :US-EAST( North Virginia) has 5 availability zones
  • ·         Region :US-West (California + Oregon) has 6 availability zones
  • ·         Region :Europe – West (Dublin) has 3 availability zones
  • ·         Region :Asia Pacific (Japan) has 2 availability zones
  • ·         Region :Asia Pacific (Singapore) has 2 availability zones
  • ·         Region :South America (Sao Paulo) has 2 availability zones 
  •       Region :Australia (Sydney) has 2 availability zones
  • ·         Region : Gov Cloud
Why we should leverage AWS Availability Zones?

Just moving a system into the cloud doesn’t make it fault-tolerant or highly available; we need to understand the features provided by the cloud provider for availability in depth and then architect our application leveraging the same. 
In AWS infrastructure it is usually recommended to architect applications using multiple availability zones inside a region as best practice. When you design such a system, it is suggested to have a good prior understanding of zone dependencies.
Availability zones (AZs) are distinct geographical locations that are engineered to be insulated from failures in other AZs. Certain services that provide basic infrastructure, such as Amazon Elastic Compute Cloud (EC2) and Amazon Elastic Block Store (EBS) need to be properly architected using this Amazon Multi-AZ’s concept. By placing Amazon EC2 instances in multiple AZs, an application can be protected from failure at a Single EC2, Single location or Single Data center level. It is important to run independent application stacks in more than one AZ, so that if one zone fails, the application in the other zone can continue to run without disruption.
Usually applications which are properly architected for multi-tiered distribution and statelessness in mind can easily leverage AWS High availability features without any code change. Applications which are Statefull can still leverage Amazon Multi-AZ deployment, but it will not be completely fault tolerant because of it is design. Some of the techniques followed in AWS Multi-AZ deployments to avoid Single point of Failure (SPOF) in web/app layer are:

  • Since AWS infrastructure does not support Multicast protocol currently, the application layer software should synchronize data using Unicast TCP mechanism. Example: Java based servers can use JGroups or Terracotta NAM to synchronize the cluster information in AWS.
  • In case the Web/App servers are written on PHP, .Net, Python etc then all the user and session data can be stored on centralized systems like MemCached EC2 instance or ElastiCache nodes or Amazon DynamoDB. Note: Deploy redundant ElastiCache Clusters in different Availability Zones for HA (or) build cache warming in your design for ElastiCache/MemCached Layers.
  • Uploaded User files and documents should be centrally stored on NFS Pool or RedHat Gluster Storage Pool or Amazon S3
The above mentioned points are some of the simple steps that needs to be remembered during the Architecture or Migration phase in AWS.

How AWS Building blocks inherently leverage Availability Zones?

Most of the higher-level AWS, such as Amazon Simple Storage Service (S3), Amazon DynamoDB, Amazon CloudWatch, Amazon Simple Queue Service (SQS), and Amazon Elastic Load Balancing (ELB), have been built with inherent fault tolerance and high availability in mind. They already work across multiple availability zones inside an Amazon Region. Let us see how each of them uses the Availability zone concept;

Amazon ELB:  Amazon ELB can direct and load balance requests to multiple EC2 instances launched across multiple availability zones inside a Region. Load Balancer Instances are spawned in Multiple Availability Zones in the ELB tier as well internally when the traffic increases. 
Amazon RDS MySQL:  Amazon RDS for MySQL and Oracle currently supports Multi-AZ Hot Standby architecture, where the Primary Master will run on one Availability zone and Hot Standby (Secondary) will run on another Availability zone. In event of Primary AZ failure, the Hot Standby will start serving the requests from the alternative AZ. We can also create and launch multiple RDS Read Replica’s in Multiple Availability Zones for Read scaling and HA.
Amazon S3: Any data that is stored on S3 (Standard Storage option) is synchronized at multiple locations/facilities inside an Amazon Region. Amazon S3 Objects are redundantly stored on multiple devices across multiple facilities (AZ’s) in an Amazon S3 Region. To help ensure durability, Amazon S3 PUT and COPY operations synchronously store our data across multiple facilities before returning a successful response.
Amazon DynamoDB: Amazon DynamoDB service replicates data across three facilities/Availability zones in an AWS Region to provide fault tolerance in the event of a server failure or Availability Zone outage. Also to achieve high uptime and durability, Amazon DynamoDB does synchronous replication of data across these (multiple) facilities.
Amazon SQS: Amazon SQS stores all queue and messages redundantly on multiple servers and in multiple data centers/availability zones, which means that no single computer or network failure will render SQS messages inaccessible.
Amazon ElastiCache: Currently Amazon ElastiCache Cluster cannot span multiple Availability Zones inside an AWS Region. Deploy redundant ElastiCache Clusters in different Availability Zones for HA in AWS or alternatively use efficient cache warming techniques in your architecture for fault tolerance in this layer.
Amazon CloudWatch: Amazon CloudWatch can monitor EC2 instances which are deployed at multiple availability zones inside an Amazon Region. Amazon CloudWatch is an AWS building block and it is a Highly Available service by design. 
Amazon AutoScaling: Amazon AutoScaling can scale out/down EC2 instances across multiple Availability zones inside an Amazon Region. Amazon AutoScaling cannot span EC2 scale out across Amazon Regions.

Multiple Availability Zones inside Amazon VPC

Using Amazon Virtual Private Cloud (Amazon VPC) we can provision a private, isolated section of the Amazon Web Services (AWS) Cloud where we can launch AWS resources in a virtual network that we define. We have seen enterprises mostly prefer VPC model of deployment in Amazon cloud.
Inside Amazon VPC, we can define a virtual network topology that closely resembles a traditional network with complete control over our virtual networking environment, including selection of IP address range, creation of subnets, and configuration of route tables and network gateways. Since Availability zone concept works inside VPC, for deploying highly available applications inside VPC it is recommended to run it in multiple Availability Zones.
Create multiple subnets inside a VPC and put each subnet in a distinct Availability Zone for High Availability. Currently a single VPC can span multiple Availability Zones and Multiple VPN connections, but we cannot create subnets inside Amazon VPC that spans multiple availability zones. Amazon VPC is available in all the regions of AWS infrastructure.
VPN Gateways are regional objects, and can be accessed from any of the subnets (subject, of course, to any Network ACLs that you create); Sample architecture using VPC-Multi-AZ is illustrated below:

Let me detail the above diagram:
Point 1) Amazon VPC is created in US-East Region of AWS.
Point 2) Multiple subnets are created inside a VPC and each subnet is put in a distinct Amazon Availability Zone for High Availability. Example: Have your Web, App and DB layer distributed in public/private subnets inside availability zone 1a and keep a similar set in availability zone 1b as well for HA. Since we cannot create subnets inside Amazon VPC that spans multiple availability zones, we need to achieve HA using the above mentioned subnet-AZ network architecture.
Point 3) Multiple VPN connections from the single VPC are attached to multiple customer gateways located in multiple geographies (simulating "branch office" architecture).

Availability Zone Names are logical 

Availability Zones are not the same across AWS accounts. There is a common misconception that an AZ name like "US-east-1a" identifies a specific physical availability zone for everyone.  The fact is that AWS can map/remap the same AZ name to different physical availability zones across multiple accounts. The Availability Zone us-east-1a for account A is not necessarily the same as us-east-1a for account B. Zone assignments are mapped independently for each account. This is important when our infrastructure or use cases spans across multiple accounts. Example: Infrastructure provisioned through Account-A and Load Testing Agents are launched through Account-B, and both pointing to "US-east-1a" may not map to same AZ.

Guidelines for architecting applications across AWS Availability Zones

A typical Web-Application Stack consists of following tiers; DNS tier, Load Balancers, Web Tier, App Tier, Cache, Database and Storage layer. It is important to run all independent application stacks in more than one AZ, so that if one zone fails, the application in the other zone can continue to run without disruption in AWS infrastructure. All the above mentioned tiers can be distributed and deployed to run on at least 2 or more availability zones inside an AWS region. In our experience we have seen the following set of tiers and software’s can be deployed across multiple availability zones with minimal configuration changes:

Sample Web/App Architecture using Multiple Availability zones

Following diagram illustrates a reference architecture using Multiple Availability zones of Amazon. Let us see what the various tiers in this architecture are and how they are using AZ concept.

DNS tier: Route53 
Load balancing Tier: Amazon ELB inbuilt with Multi-AZ
Web /App Tier: EC2 instances launched across Multiple Availability zones using Amazon AutoScaling. Integrated with Amazon ELB and CloudWatch 
Database Tier: RDS MySQL with Hot Standby, Multiple Read Replica’s in multiple AZ’s depending upon the Read Scaling needed 
Caching Tier: Amazon ElastiCache cluster or EC2 MemCached distributed in Multiple –AZ
Search Tier: Since Amazon CloudSearch currently support only single AZ; users can look at deploying Solr replication/Replicated shards across multiple Availability zones
NoSQL Tier: Amazon DynamoDB inherently replicates data in Multiple AZ’s/Facilities
CDN Tier: Amazon CloudFront internally uses global network of edge locations
Monitoring Tier: Amazon CloudWatch is used for monitoring the infrastructure. EC2 instances use detailed monitoring.

AWS Availability Zones: Usage Charges

We are charged a small bandwidth charge (Regional Data Transfer) for data that crosses Availability Zones. Regional Data Transfer rates apply if at least one of the following is true, but is only charged once for a given instance even if both are true:
·   The other instance is in a different availability zone, regardless of which type of address is used.
    • Data transferred between instances in the same availability zone on EC2 costs $0.00 per GB.
    • Data transferred between instances across different availability zones on EC2 costs $0.01 per GB.
·   Public or Elastic IP addresses are used, regardless of which zone the other instance is in.
Also Data transfer between VPC and non-VPC instances in the same Region, regardless of Availability Zone, is charged at the usual rate of $0.01 per Gigabyte

AWS Availability Zones: Simple latency test

We did a simple test to check the latency between Amazon Availability zones 1a and 1b (our account). 
The Test architecture had:
  • Multiple Grinder Load Clients
  • Tomcat Application Server and 1-RDS MySQL Database Server
  • M1.large instance type was used for the Tomcat and RDS MySQL
  • 100,000 txns generated in 60 seconds 
  • Region: US-East , Availability Zones : 1A and 1B in our account
  • In Practice Majority of Web/App EC2 and Master Database should reside in same AZ. For ease of testing purpose we have followed the above architecture.

Test 1: Observations
·         Java/Tomcat based App EC2 was launched on US-EAST-1A
·         RDS MySQL was launched on US-EAST-1A ( same availability zone)
·         Both were EC2 instances were m1.large capacity
·         100K txns/minute was generated from Tomcat to RDS instance
·         Average latency of 3 ms was observed for 100K txns in a single AZ setup

Test 2: Observations
·         Java/Tomcat based App EC2 was launched on US-EAST-1A
·         RDS MySQL was launched on US-EAST-1B (different availability zone)
·         Both were EC2 instances were m1.large capacity
·         100K txns/minute was generated from Tomcat to RDS instance
·         Average latency of 9 ms was observed for 100K txns in the multi-AZ setup.
·        Though it is roughly ~3X increase of latency from Test1, since the Average latency is in the range of single digit ms, it is suggested to use Multi-AZ deployments for applications requiring high availability.  Note: The observed ~3X latency difference will usually vary depending upon the Availability Zones (AZ origin / AZ destination /AWS Region) and it should not be treated as exact figure. 


Akash said...

This was a really nice post with examples covered (I loved the latency part).

I usually transfer data from once EC2 instance to another via s3. And since S3 also has availability across different availability zones, how does latency and price effect in this case?

Rajshekar said...

Hi Harish,
Nice post & covered advantages of the zone based deployments. I have a small doubt...Is the zones geographically separated within the region OR a separate cloud within the same DC with independent resources (viz. power, cooling etc). If the same is built across multiple DCs then intra zonal private communication is achieved through seamless integration between VLANs of the account....Do we have any AWS write up on this explaining the zones?

Admin said...

AWS states that AZ's are "physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone".

So the inference is that they are separate DCs.

Admin said...

AWS states that AZ's are "physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone"

So the inference is that these DCers are physically distant.

Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.