Amazon S3 is one of the earliest and most popular services
in AWS infra for storing files & documents.
Customers usually store variety of files including their logs, documents,
images, videos, dumps etc in Amazon S3. We all understand different files have
different lifetime and use cases in any production application. Some documents
are frequently accessed for a limited period of time and after that, you might
not need real-time access to these objects, it becomes a candidate for deletion
or archival.
For example:
- Log files will have limited life time and they can be either parsed to Data Store or archived every few months
- Database and Data store dumps also have retention period and hence limited life time
- Files related to campaigns are not most of the time not needed once the Sales promotion is over
- Customer documents are dependent upon customer usage life cycle and have to be retained till the customer is active in the application
- Digital media archives, financial and healthcare records must be retained for regulatory compliance
Usually IT teams have to build some sort of mechanism or
automated programs in-house to track these document ages and initiate a
deletion process (individual or bulk) from time to time. In my customer consulting
experience, I have often observed that above mechanism is not adequately in
place because of following reasons:
- Not all the IT teams are efficient in their development and operations
- No mechanism/automation in place to manage the retention period efficiently
- IT staff not fully equipped with AWS cloud knowledge
- IT teams are usually occupied with their solutions/products catering to their business and hence do not have time to keep track of the rapid AWS feature roll out pace
Imagine your application stores ~5TB of documents every month. In a year it will aggregate to ~60TB of documents in Standard storage of Amazon S3. In Amazon S3 standard on US-East Region ~60TB of aggregated storage for the year will cost ~30,000 USD. Out of this imagine ~20 TB of documents aggregated for the year have limited life time and can be deleted or archived periodically a month. This equates to ~1650 USD cost leakage a year. This can avoided if proper mechanism or automation is put in place by the respective teams.
Note: Current charges for Amazon S3 standard storage in US-EAST per GB is 0.095 USD for first 1TB & 0.80 for next 49 TB.
But is there a simpler way for IT teams to cut this leakage and save costs in Amazon S3. Yes, Use Amazon S3 object expiration feature.
What is Amazon S3 Object expiration?
Amazon S3 introduced a feature called Object Expiration (in late 2011) for easing the above automation mechanism. This is a very helpful feature for the customers who want their data on s3 for a limited period of time and after that you might not need to keep those files and it should be deleted automatically by Amazon S3. Earlier as a customer you were responsible for deleting those files manually, when they do not remain useful but now you do not have to worry about it, just use Amazon S3 Object Expiration.
The leakage of ~1650 USD you saw in the above scenario can be saved by implementing Amazon S3 Object expiration feature in your system. Since it does not involve automation effort, Compute hours for the automation program to run and does not consume manual labor as well, it offers invisible savings in addition to the direct savings.
Overall Savings = ~1650 USD (scenario) + Cost of compute hrs (for deletion program) + Automation engineering effort (or) Manual deletion effort
How does it work?
Amazon S3 Object Expiration feature allows you to define rules to schedule the removal of your objects after a pre-defined time period. The rules are specified in the Lifecycle Configuration policy of an Amazon S3 bucket. Updates can be done either through AWS Management Console or S3 API’s.
Once the rule is set, the Object Expiration time is calculated by Amazon S3 by adding the expiration lifetime to the file creation time and then roundup the result time to the next day midnight GMT . For example : if a file was created on 11/12/2012 11:00 am UTC and the expiration time period was specified 3 days, then Amazon S3 would be calculating the expiration date-time of the file as 15/12/2012 00:00 UTC. Once the objects are past their expiration date, they will be queued for deletion. You can use Object Expiration rules on objects stored in both Standard and Reduced Redundancy storage of Amazon S3.
Other Tips
Cost Saving Tip 1: Amazon SQS Long Polling and Batch requests
Cost Saving Tip 2: How right search technology choice saves cost in AWS ?
Cost Saving Tip 3: Using Amazon CloudFront Price Class to minimize costs
Cost Saving Tip 4 : Right Sizing Amazon ElastiCache Cluster
Cost Saving Tip 5: How Amazon Auto Scaling can save costs ?
Cost Saving Tip 6: Amazon Auto Scaling Termination policy and savings
Cost Saving Tip 7: Use Amazon S3 Object Expiration
Cost Saving Tip 8: Use Amazon S3 Reduced Redundancy Storage
Other Tips
Cost Saving Tip 1: Amazon SQS Long Polling and Batch requests
Cost Saving Tip 2: How right search technology choice saves cost in AWS ?
Cost Saving Tip 3: Using Amazon CloudFront Price Class to minimize costs
Cost Saving Tip 4 : Right Sizing Amazon ElastiCache Cluster
Cost Saving Tip 5: How Amazon Auto Scaling can save costs ?
Cost Saving Tip 6: Amazon Auto Scaling Termination policy and savings
Cost Saving Tip 7: Use Amazon S3 Object Expiration
Cost Saving Tip 8: Use Amazon S3 Reduced Redundancy Storage
2 comments:
Harish this will work good when it is to be deleted permanently like sales promotion / backup of DB. This is good option to reduce cost. But for archive we have to do explicit and it may depends on the cost.
This is cool!
Post a Comment