Saturday, April 6, 2013

Cost Saving Tips : Part 2 : How right search technology choice saves cost in AWS ?


Search tier is becoming an important component of online tech stacks. Some of the widely used search software/services are Apache Solr , ElastiSearch and Amazon CloudSearch in the AWS infra. Choosing a right search technology is an essential parameter for efficient operations and cost savings in cloud. In this article we will explore the economics behind choosing Amazon CloudSearch vs Apache Solr(v3.6) on EC2 for the search tier. we will explore how you can cut leakages and save costs by making right choices for right requirements ?
I have taken 3 sample scenarios based on which the cost comparison is done and cost savings are concluded.
Scenario 1: A small application with constant load pattern
Application nature:
  • Load Volatility pattern: Constant
  • Utilization: Low to Medium
  • Dependency on Search Layer: Low
Data requirements:
  • Each document size is ~ 1 KB (For easy calculation purposes)
  • 500 MB of Search Index data
  • 50 K- 100 k requests per day
  • Low concurrency
Batch and Index Rebuilds:
  • 24 Batch uploads per day (each batch 100 documents of 1KB each).
  • Explicit Index Rebuild once/twice a month
  • 50 MB increase in Search index data per month
Administration Efforts:
  • Initial Provisioning ( One time)
  • Monitoring
  • Regular Backups
  • Index Rebuilds


Amazon Cloud Search
Apache Solr v3.6
Compute
74.4
48.36
Storage (EBS)
-
12
Batch upload
0.10
0
Index Rebuild
4
0
Data IN/OUT
0
0
Admin efforts (Person Hrs/Month @ 75) 
2
10
Administration cost
150
750
Total
~ 230
~811

  • Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small
  • EBS: 10 GB volume + 100 Million IO per month+ 10 GB snapshot for Apache Solr on EC2 Refer pricing @ http://aws.amazon.com/cloudsearch/
  • Search Expert administration effort Hourly price averaged to minimum of 75 USD/hr 
  • We can observe the infra costs are almost same, but the provisioning/admin/managing costs spikes up when it comes to Apache Solr on EC2, which can be minimized using Amazon CloudSearch. 

To know more about how Apache Solr compares with Amazon CloudSearch, Refer article:  http://harish11g.blogspot.in/2013/01/amazon-cloudsearch-vs-apache-solr_16.html



Scenario 2: Heavily utilized Search tier

Application nature:
  • Load Volatility pattern: Peak and Valleys in a day
  • Utilization: High
  • Dependency on Search Layer: High
Data requirements:
  • Each document size is ~ 1 KB (For easy calculation purposes)
  • 50 GB of Search data (Index)
  • 10 million requests per day, Each Response size is 10 KB, ~ 100 GB data out per day, 3 TB per month
  • High concurrency
Batch and Index Rebuilds:
  • 2048 Batch uploads per Month (each batch 5 MB of data).
  • Explicit Index Rebuild 12 times a month
  • Search Data growth: 10 GB index added every month
Administration Efforts:
  • Initial Provisioning
  • Partitioning and read scaling frequently
  • Monitoring, Regular Backups and maintaining the HA
  • Index Rebuilds


Amazon Cloud Search
Apache Solr v3.6
Compute
~2865
~2440
Storage (EBS)
-
140
Batch upload
0.30
-
Index Rebuild
50
-
Data In/Out
-
-
Admin efforts (Person Hrs/Month @ 75)
8
24
Administration cost
600
1800
Total
~ 3515
~4380

  • Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small
    • 7 Xlarge Search Instances in Amazon Cloud Search + more instances depending upon growth
    • EBS: 100 GB volume + 500 Million IO per month+ X GB snapshot for Apache Solr on EC2
  • Apache Solr on EC2 Costs:  
    • If we are scaling up the capacity of Solr Nodes in multiple phases from m1.xlarge to m2.4xlarge depending upon the index growth (10 GB in our case) every month, then lots of manual admin labour efforts is needed. This adds to our administration cost.  
    • If we decide not to frequently scale up but start Solr initially itself with m2.4xlarge, then we have overprovisioned for first 5 months, which essentially means cost leakage again. Still monitoring, backups etc have to done on Apache Solr on EC2.The above table indicates cost calculated in this approach. Still at the end of 6th month, we need to Shard Solr on EC2 because it will exceed m2.4xlarge capacity / or Scale up again with more costly EC2 instances to keep up with the growth. Again labour efforts to be added.
    • After having all this, there is no guarantee that Apache Solr on EC2 can handle the load, since the volatility pattern is spikey in nature for a day, there could be times where Solr is pounded and not performing well also. This under performance may lead to losing customers itself.
  • Amazon CloudSearch eliminates all the scaling up/out/portioning complexities automatically. Labour cost is one of the important costs in large scale search tier setups and Amazon CloudSearch helps us keep it at minimum as we grow. Larger and more elastic our search setup requirements, then Amazon CloudSearch will easily beat the hell out of Apache Solr on EC2.
  • Scale out during heavy load is automatic in CloudSearch and it is a manual cumbersome effort in Apache Solr on EC2. Note Scale out based on Load costs are not calculated in both Apache Solr and Amazon CloudSearch.

To know more about Apache SolrCloud deployment best practices on Amazon VPC, Refer article:  http://harish11g.blogspot.in/2013/03/Apache-Solr-cloud-on-Amazon-EC2-AWS-VPC-implementation-deployment.html



Scenario 3: Seasonal Loads 

Application nature:
  • Load Volatility pattern: Seasonal load (1 week campaign every 2 months), other times minimal activity
  • Utilization: High
  • Dependency on Search Layer: High
Data requirements:
  • Each document size is ~ 1 KB (For easy calculation purposes)
  • Getting started with 5 GB of Search data
  • ~750 million requests (week) or more during the campaign week
  • 12 hours heavy utilization and 12 hours under utilization during campaign days.
  • 10 million requests during normal days
  • High concurrency during campaign week
Batch and Index Rebuilds:
  • 512 Batch uploads per Month (each batch 5 MB of data).
  • Explicit Index Rebuild 12 times a month
  • Search Data growth: 2.5 GB added every month
Administration Efforts:
  • Initial Provisioning
  • Partitioning and read scaling frequently
  • Monitoring, Regular Backups and maintaining the HA
  • Index Rebuilds

Amazon Cloud Search
Apache Solr v3.6
Compute
~410 + 150 (Scale out)
~357+ 70 (scale out)
Storage (EBS)
-
50
Batch upload
0.10
-
Index Rebuild
5
-
Data In/Out
-
-
Admin efforts (Person Hrs/Month @ 75)
10
40
Administration cost
750
3000
Total
~ 1320
~3477·       


Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small

Amazon CloudSearch Costs
  • 1 Xlarge Search Instances in Amazon Cloud Search during normal days
  • Imagine 3 Additional Xlarge Instance are spawned during campaign period for 1 week
  • Automated Scale out and Scale down. No efforts needed.
  • Morning till night when heavy utilization is there additional xlarge (3 new) is launched. Night till morning where not much load is there, these additional instances will be removed accordingly.
Apache Solr on EC2 Costs:  
  • 2 X m1.large EC2 instances for Solr on Normal Days
  • 2 new additional m1.large instances during campaign period
  • Manual effort to scale out before campaign week and scale down post campaign week.
  • Additional EC2 instances are used all 24 hrs during the campaign week
From the above analysis we can infer that Amazon CloudSearch is cost effective for all labor dependent scenarios compared to Apache Solr. It is recommended to use Amazon CloudSearch as part of your stack for cost efficient operations, unless there is a strong need for Apache Solr case.

Note : I architected a solution using Apache SolrCloud (4.0) after this article. Apache SolrCloud seems to drastically reduce the complexity involved in Distributing search queries, Sharding and replication. But it does not offer automatic shard increase/decrease or replica addition during loads. It is less labor intensive than Solr 3.6, but still not comparable to Amazon CloudSearch in the infra aspects. 


To know more about How to migrate from Apache Solr to Amazon CloudSearch Refer article: http://harish11g.blogspot.in/2013/03/migration-from-Apache-Solr-to-Amazon-CloudSearch.html



Other Tips

Cost Saving Tip 1: Amazon SQS Long Polling and Batch requests
Cost Saving Tip 2: How right search technology choice saves cost in AWS ?
Cost Saving Tip 3: Using Amazon CloudFront Price Class to minimize costs
Cost Saving Tip 4 : Right Sizing Amazon ElastiCache Cluster
Cost Saving Tip 5: How Amazon Auto Scaling can save costs ?
Cost Saving Tip 6: Amazon Auto Scaling Termination policy and savings
Cost Saving Tip 7: Use Amazon S3 Object Expiration
Cost Saving Tip 8: Use Amazon S3 Reduced Redundancy Storage


No comments:

Need Consulting help ?

Name

Email *

Message *

DISCLAIMER
All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.

Followers