I have summarized all the features compared in previous articles into a table
for easy reference. Table is listed below:
* means positive, X means negative
Weight: High/Medium/Low are the importance of a feature (my
perspective)
Feature
|
Weight
|
Amazon
CloudSearch
|
Apache
Solr on EC2
|
|
1.
|
Getting Started
|
High
|
*
|
X
|
2.
|
Scalability
|
High
|
*
|
X
|
3.
|
Partitioning
|
High
|
*
|
X
|
4.
|
Index Replication
|
High
|
*
|
X
|
5.
|
High Availability
|
High
|
*
|
X
|
6.
|
Cost
|
High
|
*
|
X
|
7.
|
Faceted Search
|
High
|
*
|
*
|
8.
|
Field Weighting/Boosting
|
High
|
*
|
*
|
9.
|
Rich Documents Support
|
High
|
*
|
*
|
10.
|
Stemming
|
High
|
*
|
*
|
11.
|
Stop words
|
High
|
*
|
*
|
12.
|
Synonyms
|
High
|
*
|
*
|
13.
|
Protocols Support
|
High
|
*
|
*
|
14.
|
“Find Similar” Feature
|
High
|
X
|
*
|
15.
|
“Did you mean” Feature
|
High
|
X
|
*
|
16.
|
Breed
|
Medium
|
*
|
*
|
17.
|
Feature Customization
|
Medium
|
X
|
*
|
18.
|
Auto Suggest
|
Medium
|
X
|
*
|
19.
|
Geo Spatial Search
|
Medium
|
X
|
*
|
20.
|
Algorithms
|
Low
|
X
|
*
|
21.
|
Multilingual Support
|
Low
|
X
|
*
|
Observations:
- Amazon Cloud Search scores overall well on most of the “High” priority features in comparison with Apache Solr, especially in infrastructure related features like scaling, partitioning etc. These infra features are essential for any online application which has heavy usage & dependence on the search tier. Usually activities like Scaling, Partitioning and Replication involve complex manual effort, planning and execution in the search tier. Amazon CloudSearch eliminates this complexity and makes it for us by automating these essentials.
- Manual effort involved in the above mentioned search infra activities translate directly to cost of training, managing and maintaining this tier with help of experts. These experts are usually costly!!!. Amazon CloudSearch with its automation brings down these manual efforts (thereby costs) significantly in comparison to expanding Apache Solr setups on EC2. This is an important aspect to be considered in the selection process of search tiers for your online applications. If your online application is constantly growing in terms of index and compute, then Amazon CloudSearch is the way to go compared to Apache Solr.
- Amazon CloudSearch is well matured, robust and stable search service built on A9 search platform. For most of the online use cases like ecommerce, job search, documents search, content search etc it is more than sufficient.
- IT teams of startups and mid-sized companies which are usually in short of technical staff (especially who cannot afford dedicated expertise for search tier) should first look into Amazon CloudSearch for their fitment. On the whole it will be a better package for them.
- Enterprises & software vendors who are refining their products for AWS, should surely consider the merits of Amazon CloudSearch vs Apache Solr/MongoDB in their technical stack. In addition if their deployments have unpredictable or elastic load volatility, surely Amazon CloudSearch will be a top contender in cost savings.
- Features like “Find similar” and “Did you mean” are generally used on search modules of Jobs and ecommerce applications. It is available in Apache Solr and surely good to have on Amazon CloudSearch. Though it is currently not available, i assume AWS might work on it if lots of customers are requesting for it. (+1 vote from me for this feature)
- If you are looking to build a specialized search module with customizations, geo spatial and multilingual intelligence, currently the best choice is to use Apache Solr on Amazon EC2. Location aware applications and localized applications can use the Geo spatial and multilingual features of Apache Solr on EC2 easily (missing in Amazon CloudSearch). I have also noticed patterns on AWS, where customers are using MongoDB for searching documents / geo spatial indexes last few years. Though these requested features are little specific, Amazon CloudSearch surely should introduce them for wider use case adoption. (+1 vote from me for these features)
- For Open source developers who are looking to extend/customize the functionalities of search tier Amazon CloudSearch is not recommended and Apache Solr is the best fit.
Related Articles:
Introduction to Apache SolrCloud on AWS
Apache SolrCloud Implementation on Amazon VPC
Configuring Apache SolrCloud on Amazon VPC
Apache SolrCloud on AWS FAQ
Part 1: Comparison Analysis: Amazon CloudSearch vs Apache Solr
Apache SolrCloud Implementation on Amazon VPC
Configuring Apache SolrCloud on Amazon VPC
Apache SolrCloud on AWS FAQ
Part 1: Comparison Analysis: Amazon CloudSearch vs Apache Solr
3 comments:
Two questions
1) Cost - Apache Solr is free - why do you have an X for it?
2) This post is titled, "Part 5" - I cannot see a "Part 4" in the list - is there one?
Thanks
Getting Started, Partitioning, Index Replication are supported in apache solr. update your post as it is misleading or clarify
I checked the latest feature set of Amazon CloudSearch and some of the features which were not available when this blog was posted are added now.
Here is the list
Autocomplete suggestions
Customizable relevance ranking and query-time rank expressions
Field weighting
Geospatial search
Highlighting
Support for 34 languages
Post a Comment