In this article we will explore the economics behind choosing Amazon CloudSearch vs Apache Solr(v3.6) on EC2 for the search tier. I have taken 3 scenarios based on which the cost comparison is done.
Scenario 1: A small application with constant load pattern
Application nature:
·
Load
Volatility pattern: Constant
·
Utilization:
Low to Medium
·
Dependency
on Search Layer: Low
Data requirements:
·
Each
document size is ~ 1 KB (For easy calculation purposes)
·
500
MB of Search Index data
·
50
K- 100 k requests per day
·
Low
concurrency
Batch and Index Rebuilds:
·
24
Batch uploads per day (each batch 100 documents of 1KB each).
·
Explicit
Index Rebuild once/twice a month
·
50
MB increase in Search index data per month
Administration Efforts:
·
Initial
Provisioning ( One time)
·
Monitoring
·
Regular
Backups
·
Index
Rebuilds
Amazon
Cloud Search
|
Apache
Solr v3.6
|
|
Compute
|
74.4
|
48.36
|
Storage (EBS)
|
-
|
12
|
Batch upload
|
0.10
|
0
|
Index Rebuild
|
4
|
0
|
Data IN/OUT
|
0
|
0
|
Admin efforts (Person
Hrs/Month @ 75)
|
2
|
10
|
Administration cost
|
150
|
750
|
Total
|
~ 230
|
~811
|
- · Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small
- · EBS: 10 GB volume + 100 Million IO per month+ 10 GB snapshot for Apache Solr on EC2
- Refer pricing @ http://aws.amazon.com/cloudsearch/
- · Search Expert administration effort Hourly price averaged to minimum of 75 USD/hr
- · We can observe the infra costs are almost same, but the provisioning/admin/managing costs spikes up when it comes to Apache Solr on EC2, which can be minimized using Amazon CloudSearch.
Scenario 2: Heavily utilized Search tier
Application nature:
·
Load
Volatility pattern: Peak and Valleys in a day
·
Utilization:
High
·
Dependency
on Search Layer: High
Data requirements:
·
Each
document size is ~ 1 KB (For easy calculation purposes)
·
50
GB of Search data (Index)
·
10
million requests per day, Each Response size is 10 KB, ~ 100 GB data out per
day, 3 TB per month
·
High
concurrency
Batch and Index Rebuilds:
·
2048
Batch uploads per Month (each batch 5 MB of data).
·
Explicit
Index Rebuild 12 times a month
·
Search
Data growth: 10 GB index added every month
Administration Efforts:
·
Initial
Provisioning
·
Partitioning
and read scaling frequently
·
Monitoring,
Regular Backups and maintaining the HA
·
Index
Rebuilds
Amazon
Cloud Search
|
Apache
Solr v3.6
|
|
Compute
|
~2865
|
~2440
|
Storage (EBS)
|
-
|
140
|
Batch upload
|
0.30
|
-
|
Index Rebuild
|
50
|
-
|
Data In/Out
|
-
|
-
|
Admin efforts (Person
Hrs/Month @ 75)
|
8
|
24
|
Administration cost
|
600
|
1800
|
Total
|
~ 3515
|
~4380
|
·
Amazon
EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type :
Small
·
7
Xlarge Search Instances in Amazon Cloud Search + more instances depending upon
growth
·
EBS:
100 GB volume + 500 Million IO per month+ X GB snapshot for Apache Solr on EC2
·
Apache
Solr on EC2 Costs:
o
If
we are scaling up the capacity of Solr Nodes in multiple phases from m1.xlarge
to m2.4xlarge depending upon the index growth (10 GB in our case) every month,
then lots of manual admin labour efforts is needed. This adds to our
administration cost.
o
If
we decide not to frequently scale up but start Solr initially itself with
m2.4xlarge, then we have overprovisioned for first 5 months, which essentially
means cost leakage again. Still monitoring, backups etc have to done on Apache
Solr on EC2.The above table indicates cost calculated in this approach. Still
at the end of 6th month, we need to Shard Solr on EC2 because it
will exceed m2.4xlarge capacity / or Scale up again with more costly EC2
instances to keep up with the growth. Again labour efforts to be added.
o
After
having all this, there is no guarantee that Apache Solr on EC2 can handle the
load, since the volatility pattern is spikey in nature for a day, there could
be times where Solr is pounded and not performing well also. This under performance
may lead to losing customers itself.
·
Amazon
CloudSearch eliminates all the scaling up/out/portioning complexities automatically.
Labour cost is one of the important costs in large scale search tier setups and
Amazon CloudSearch helps us keep it at minimum as we grow. Larger and more
elastic our search setup requirements, then Amazon CloudSearch will easily beat
the hell out of Apache Solr on EC2.
·
Scale
out during heavy load is automatic in CloudSearch and it is a manual cumbersome
effort in Apache Solr on EC2. Note Scale out based on Load costs are not
calculated in both Apache Solr and Amazon CloudSearch.
Scenario 3: Seasonal Loads
Application nature:
·
Load
Volatility pattern: Seasonal load (1 week campaign every 2 months), other times
minimal activity
·
Utilization:
High
·
Dependency
on Search Layer: High
Data requirements:
·
Each
document size is ~ 1 KB (For easy calculation purposes)
·
Getting
started with 5 GB of Search data
·
~750
million requests (week) or more during the campaign week
·
12
hours heavy utilization and 12 hours under utilization during campaign days.
·
10
million requests during normal days
·
High
concurrency during campaign week
Batch and Index Rebuilds:
·
512
Batch uploads per Month (each batch 5 MB of data).
·
Explicit
Index Rebuild 12 times a month
·
Search
Data growth: 2.5 GB added every month
Administration Efforts:
·
Initial
Provisioning
·
Partitioning
and read scaling frequently
·
Monitoring,
Regular Backups and maintaining the HA
·
Index
Rebuilds
Amazon
Cloud Search
|
Apache
Solr v3.6
|
|
Compute
|
~410 + 150 (Scale out)
|
~357+ 70 (scale out)
|
Storage (EBS)
|
-
|
50
|
Batch upload
|
0.10
|
-
|
Index Rebuild
|
5
|
-
|
Data In/Out
|
-
|
-
|
Admin efforts (Person
Hrs/Month @ 75)
|
10
|
40
|
Administration cost
|
750
|
3000
|
Total
|
~ 1320
|
~3477
|
·
Amazon
EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type :
Small
·
Amazon
CloudSearch Costs
o
1
Xlarge Search Instances in Amazon Cloud Search during normal days
o
Imagine
3 Additional Xlarge Instance are spawned during campaign period for 1 week
o
Automated
Scale out and Scale down. No efforts needed.
o
Morning
till night when heavy utilization is there additional xlarge (3 new) is launched.
Night till morning where not much load is there, these additional instances
will be removed accordingly.
· ·
Apache
Solr on EC2 Costs:
o
2
X m1.large EC2 instances for Solr on Normal Days
o
2
new additional m1.large instances during campaign period
o
Manual
effort to scale out before campaign week and scale down post campaign week.
o
Additional
EC2 instances are used all 24 hrs during the campaign week
Related Articles:
Introduction to Apache SolrCloud on AWS
Apache SolrCloud Implementation on Amazon VPC
Configuring Apache SolrCloud on Amazon VPC
Apache SolrCloud on AWS FAQ
Part 1: Comparison Analysis: Amazon CloudSearch vs Apache Solr
Apache SolrCloud Implementation on Amazon VPC
Configuring Apache SolrCloud on Amazon VPC
Apache SolrCloud on AWS FAQ
Part 1: Comparison Analysis: Amazon CloudSearch vs Apache Solr
2 comments:
Solr 4.0 has been out for almost 6 months, it has significant feature and economic advantages over Solr 3.6, consider updating this (and other) related pages.
Hi mmoody,
Thanks for the comment and your time reading my blog. SolrCloud 4.0 is in GA from october 2012 and we have started implementing it to some of our customers as well. Yes, it reduces some complexity on adding shards, Replica's etc. Very soon i will be publishing a detailed one sharing our experience about SolrCloud+AWS.
Post a Comment