We understand
that EBS volumes have redundancy built-in, which means that they will not fail
if an individual drive fails. But their redundancy is limited to Availability Zone scope. EBS does not replicate data automatically across
multiple availability zones like other AWS services (S3, DynamoDB, RDS etc).
The
durability of EBS is illustrated as follows by AWS in their site
“The durability of your EBS volume
depends both on the size of your volume and the percentage of the data that has
changed since your last snapshot. As an example, volumes that operate with 20
GB or less of modified data since their most recent Amazon EBS snapshot can
expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure
refers to a complete loss of the volume. This compares with commodity hard
disks that will typically fail with an AFR of around 4%, making EBS volumes 10
times more reliable than typical commodity disk drives. “
Technically
we can overcome this by mirroring EBS Volumes but still it will not solve if
there is a failure at AZ level. This constraint strongly suggests that for safe
guarding your data you need to take backups and store them in multiple availability
zones. Some of the common challenges involved in the backup process
include the time it takes to create data copies, the disk space required, the
impact on server operations during the copy process. New generation Storage arrays have the
ability to speed up dramatically the backup process by using a technique called
as “Snapshot”.
A snapshot is the state of a system (like LUN-level copies
of data) at a particular point in time. One of the most common types is
Differential snapshots -> which allows for fast creation and reduced disk
space consumption. Some common implementations of differential snapshots
include copy-on-write or allocate-on-write; Better implementations of these
techniques create copies instantly, allow the copies to be used read-write,
permit many copies to co-exist and be active at the same time etc
Amazon EBS snapshots are incremental backups, meaning that
every snapshot only copies the blocks in the volume that were changed since the
last snapshot. The TOC and only changed
blocks are copied (in compressed form) to the S3 in subsequent snapshots. If
you have a volume with 10 GB of data, but only 2 GB of data have changed since your
last snapshot, only the 2 GB of modified data is written to Amazon S3 during
the snapshot process.
AWS does not disclose the internal of their snapshot
technology but based on our understanding with storage systems let us explore how
it works:
Step 1) when you take snapshot of an EBS volume for the first
time, it is a full snapshot, but it only copies the blocks in the EBS volume
that contains data. During the first
snapshot, the full TOC and all blocks containing data (A, B, C, D, and E) is
moved asynchronously to S3.
Step 2) Imagine in meantime, blocks D and E were changed
and F is newly added from the snapshot 1. When you take snapshot 2, this time
the TOC and only the changed blocks D1, E1 and F are moved to S3.
Step 3) when you take snapshot 3, blocks E and F are
changed and G is newly added as per diagram. This time the TOC and only the
changed blocks E2, F1 and G are moved to S3.
Step 4) since snapshot 3 is the recent and contains the
latest data, you can go ahead and delete older snapshots like 1 and 2. The capacity occupied by blocks like D, E, F,
E1 are no more relevant, and they are released and not charged by AWS.
You can observe that the above mechanism is much more cost
effective because you pay only for what had changed. Second, the overall
capacity of the backup is efficiently used and third snapshots are fast to take
than traditional backups. You should note that taking a snapshot can impact the
rate of IOPS you get from your volume while your snapshot is pending; this is
usually few milliseconds->seconds depending upon the changes occurred
between snapshots.
In Amazon infrastructure, Snapshots are usually used for
achieving some of the following objectives:
- Expand the size of a EBS volume
- Create multiple duplicate (copies) volumes inside an AZ
- Create volumes across Amazon Availability Zones inside an Amazon EC2 region (in event of failure)
- Create similar volumes across Amazon EC2 regions using EBS snapshot copy mechanism. This feature will help you during geographic expansion, data center migration, and disaster recovery.
Since EBS snapshots can be taken regardless of whether or
not the volume is attached to a running Amazon EC2 instance, it is strongly
recommended to either detach the volume or freeze all writes before taking
snapshot to prevent data loss. Not all the times we can detach a volume for
taking snapshots, imagine you are running a database or Solr Search in EC2,
these services need to run continuously and this option is not feasible and
might prove very costly. In Amazon cloud
it is a recommended practice to use file systems like XFS which provides option
to freeze writes for a while and take the snapshot consistently. XFS can is
very useful when we use EBS Striping (RAID 0) as well.
A snapshot of an EBS volume writes a copy of the volume data
in Amazon S3 (Not Buckets). S3 is an excellent option for snapshot storage because
- S3 is a separate infrastructure than EBS storage, hence it improves the availability factor and reduces the dependency in event of EBS failure
- EBS volumes have availability zone scope and can be attached only to Amazon EC2 instances launched in same AZ. On the other hand, since the snapshots are stored in S3, you can create a new volume from them in any AZ inside the Amazon EC2 region.
- Since the snapshots are not stored directly in buckets, you cannot access them using S3 API’s, you can only list the snapshots using the EC2 API
- On the other hand one negative I have observed is that: Accessing data for the first time from Amazon S3 snapshot might cause latency during the initial loading period i.e. whenever you create new volumes from existing Amazon S3 snapshots; they load lazily in the background. But if your EC2 instance accesses data that hasn’t yet been loaded from S3, the volume immediately downloads the requested data from S3, and continues loading the rest of the data in the background. In case you are trying to access S3 snapshots from the private subnet inside VPC, make sure your NAT instance capacity is right sized to reduce the latency during loading.
EBS Article Series (continued..)
No comments:
Post a Comment