Pages

Tuesday, January 15, 2013

Part 5: Comparison Analysis: Amazon CloudSearch vs Apache Solr


I have summarized all the features compared in previous articles into a table for easy reference. Table is listed below:
* means positive, X means negative
Weight: High/Medium/Low are the importance of a feature (my perspective)


Feature
Weight
Amazon CloudSearch
Apache Solr on EC2
1.        
Getting Started
High
*
X
2.        
Scalability
High
*
X
3.        
Partitioning
High
*
X
4.        
Index Replication
High
*
X
5.        
High Availability
High
*
X
6.        
Cost
High
*
X
7.        
Faceted Search
High
*
*
8.        
Field Weighting/Boosting
High
*
*
9.        
Rich Documents Support
High
*
*
10.    
Stemming
High
*
*
11.    
Stop words
High
*
*
12.    
Synonyms
High
*
*
13.    
Protocols Support
High
*
*
14.    
“Find Similar” Feature
High
X
*
15.    
“Did you mean” Feature
High
X
*
16.    
Breed
Medium
*
*
17.    
Feature Customization
Medium
X
*
18.    
Auto Suggest
Medium
X
*
19.    
Geo Spatial Search
Medium
X
*
20.    
Algorithms
Low
X
*
21.    
Multilingual Support
Low
X
*


Observations:
  • Amazon Cloud Search scores overall well on most of the “High” priority features in comparison with Apache Solr, especially in infrastructure related features like scaling, partitioning etc. These infra features are essential for any online application which has heavy usage & dependence on the search tier. Usually activities like Scaling, Partitioning and Replication involve complex manual effort, planning and execution in the search tier.  Amazon CloudSearch eliminates this complexity and makes it for us by automating these essentials.
  • Manual effort involved in the above mentioned search infra activities translate directly to cost of training, managing and maintaining this tier with help of experts. These experts are usually costly!!!. Amazon CloudSearch with its automation brings down these manual efforts (thereby costs) significantly in comparison to expanding Apache Solr setups on EC2. This is an important aspect to be considered in the selection process of search tiers for your online applications. If your online application is constantly growing in terms of index and compute, then Amazon CloudSearch is the way to go compared to Apache Solr.
  • Amazon CloudSearch is well matured, robust and stable search service built on A9 search platform. For most of the online use cases like ecommerce, job search, documents search, content search etc it is more than sufficient.
  • IT teams of startups and mid-sized companies which are usually in short of technical staff (especially who cannot afford dedicated expertise for search tier) should first look into Amazon CloudSearch for their fitment. On the whole it will be a better package for them.
  • Enterprises & software vendors who are refining their products for AWS, should surely consider the merits of Amazon CloudSearch vs Apache Solr/MongoDB in their technical stack. In addition if their deployments have unpredictable or elastic load volatility, surely Amazon CloudSearch will be a top contender in cost savings.
  • Features like “Find similar” and “Did you mean” are generally used on search modules of Jobs and ecommerce applications. It is available in Apache Solr and surely good to have on Amazon CloudSearch. Though it is currently not available, i assume AWS might work on it if lots of customers are requesting for it. (+1 vote from me for this feature)
  • If you are looking to build a specialized search module with customizations, geo spatial and multilingual intelligence, currently the best choice is to use Apache Solr on Amazon EC2. Location aware applications and localized applications can use the Geo spatial and multilingual features of Apache Solr on EC2 easily (missing in Amazon CloudSearch).  I have also noticed patterns on AWS, where customers are using MongoDB for searching documents / geo spatial indexes last few years.  Though these requested features are little specific, Amazon CloudSearch surely should introduce them for wider use case adoption. (+1 vote from me for these features)
  • For Open source developers who are looking to extend/customize the functionalities of search tier Amazon CloudSearch is not recommended and Apache Solr is the best fit.


Related Articles:



3 comments:

  1. Two questions
    1) Cost - Apache Solr is free - why do you have an X for it?

    2) This post is titled, "Part 5" - I cannot see a "Part 4" in the list - is there one?

    Thanks

    ReplyDelete
  2. Getting Started, Partitioning, Index Replication are supported in apache solr. update your post as it is misleading or clarify

    ReplyDelete
  3. I checked the latest feature set of Amazon CloudSearch and some of the features which were not available when this blog was posted are added now.

    Here is the list

    Autocomplete suggestions
    Customizable relevance ranking and query-time rank expressions
    Field weighting
    Geospatial search
    Highlighting
    Support for 34 languages

    ReplyDelete