Tuesday, February 14, 2012

5 Essentials every architect should know about SimpleDB

1:Shard the Data 
Spread the data among Multiple SimpleDB Domains for better throughput. Many benchmarks from Internet suggests a single SimpleDB domain can handle 70 puts/sec/domain. Every account by default can create 250 SimpleDB domains and more domains can be added by filling this form.

Example Lets assume a Single SimpleDB domain offers 70 puts/sec/domain. Your application layers requires a conncurrency throughput of 7000 req/sec. In order to increase the overall write/read efficiency shard the data into 100 simpleDB domains.

2:Retries and Exponential Backoff

Amazon SimpleDB is a webservice call and you may encounter 500,503 errors sometimes. The usual technique for dealing with such error responses in AWS
is to implement retries in the application layer. The application implementing this technique can maintain excellent level of performance and availability because it can automatically handle the overload and server errors. This technique also increases the overall reliability of the applications consuming Amazon SimpleDB service.

In addition to simple retries, the best practice is using an exponential backoff algorithm for better flow control. The algorithm logic has to be built in your application layer code. The concept behind exponential backoff is to use progressively longer waits between retries for consecutive error responses: up to 500 milliseconds before the first retry, up to 1500 milliseconds before the second, up to 6000 milliseconds before third, and so on. The timings can vary depending upon your use case.
Refer this URL for more information : http://aws.amazon.com/articles/Amazon-SimpleDB/1394

3:Run from Amazon EC2
Amazon SimpleDB gives better performance in terms of latency if we execute the queries from Amazon EC2 . This is because network round trips are avoided  when the web service calls are made from Amazon EC2. By default SimpleDB domains are created in USA-EAST AWS region. Applications accessing SimpleDB from APAC, Brazil, Tokyo AWS regions etc should make sure they select Amazon SimpleDB and Amazon EC2 from same region to get better performance. Also the network bandwidth usage is free within Amazon Region between EC2 and SimpleDB.

4:Query optimization
  • Use BatchPut API instead of PutAttributes for better write performance. BatchPut API takes 25 items (or 256 attributes or 1MB request size) on a single domain. It works like Batch commit in RDBMS , so in case of failure all items are reverted. We have observed write throughput of 20X using BatchPut compared to single Put API.
  • Avoid Non Indexed Queries like "Select * from Domain..." in Amazon SimpleDB
  • When storing dates, it is recommended that you store all dates in Joda time and use a single time zone
  • Zero Padding for sorting ( based on  largest number in your Data set)
  • Make sure you design Queries in Amazon SimpleDB domains that it will not run more than 5 seconds , beyond which Amazon SimpleDB will return error or clip them

5:Understand the SimpleDB limits

There are also certain limits that Amazon SimpleDB enforces which applies to the domain data size, domain names, query execution time, result set size, etc…please understand them before designing applications using SimpleDB


Related Articles

1 comment:

SDBExplorer said...

Good blog. On the different note - SDB Explorer has been made as an industry leading graphical user interface (GUI) to explore Amazon SimpleDB service. SDB Explorer facilitates core functionality of Amazon’s NoSQL SimpleDB in productive way. SDBExplorer uses Multithreaded BatchPutAttributes to achieve high write throughput while uploading bulk data to Amazon SimpleDB. SDB Explorer allows multiple parallel uploads. If you have the bandwidth, you can take full advantage of that bandwidth by running number of BatchPutAttributes processes at once in parallel queue that will reduce the time spend in processing.

Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.