Search module plays an integral role in
today’s websites and online applications. It is an entry point for many online business
transactions and the user experience of these services are very critical. Search
feature has the power to make or break a online business. In this Multi Part series, we are going to compare Apache Solr - one of the most popular OSS Search Engine with recently introduced Amazon CloudSearch web service in detail.
Breed :
Apache Solr is open source software. It is
written entirely in Java and uses Lucene under the hood. Amazon CloudSearch is
a proprietary creation and is based on Amazon’s time tested A9 technology. Highly
specialized search solution companies like lucidworks, search technologies etc may
prefer creating plugins and modules using open source code. Though open source
gives us the flexibility and control, not many companies/business change the
source and customize them often. In general, we have seen more business prefer
using Search modules as appliances and services. Customers are happy if you
provide them a robust service with well-defined feature set and relieve them
from operation intricacies (like Amazon CloudSearch).
Advantage:
Neutral
Feature Weight: Medium
Configuration
& Getting started Effort:
Deep understanding and considerable Time
& effort is needed to properly configure Apache Solr and get it up &
running in Amazon EC2 infrastructure. It includes common tasks such as Apache Solr
download, knowledge of Java, configuration of environment variables, deploying
it in a server, properly configuring, understanding the admin commands,
applying patches, tuning performance and upgrading to newer versions. This is
just the start, when your application grows rapidly; you need to factor High
Availability, scalability and partitions for the search tier as well.
Application and IT teams needs to be aware of replication and sharding
technologies of Apache Solr and configure the same depending upon the need. On
the other hand, Amazon CloudSearch is a fully managed search service which
offloads the administrative burden of operating your search tier. You can get started with Amazon CloudSearch in
few clicks using the AWS Management Console. Customers need not to worry about
hardware provisioning, data partitioning, running out of disk space and
planning of compute capacity or software patches.
Advantage: Amazon CloudSearch
Feature Weight: High
Multilingual Support :
Apache Solr has multilingual support. Custom
analysers and tokenizers have to be written and plugged in for this
functionality. One of the recommended approaches for using multilingualism in
Apache Solr is to have a multi-core architecture with each core addressing one
language.
Currently Amazon CloudSearch supports only
English language for tokenizing words in the index. Though it is not a critical
one, it is a good to have feature for applications which offers localized
services for worldwide audience.
Advantage: Apache Solr
Feature Weight: Low
Faceted
Search:
Faceting is one of the important features
used in ecommerce website search modules. Faceting allows you to categorize
your results into sub-groups, which can be used as the basis for another search.
In recent times, faceting has gained popularity by allowing users to narrow
down search results in an easy-to-use and efficient way.
Faceting can be best explained with the
help of a picture (See figure below from Amazon.com). As you can observe on
left side of the figure, a search for “java
programming” results in a lot of hits. You can clearly see that the search
resulted in 3 facets (or sub-groups) using which you can narrow down your
search. For example: if you click on “PDF” in the “Format” facet (see “Facet 2”
in the figure), the search query now essentially means “java programming AND only pdf format”, thereby narrowing down the
search space eventually leading you to better and convenient results. You can
also observe that each member of a facet is accompanied by a number called
Facet Count. In the “Format” facet, you can see “PDF (14)” which means that there are 14 “java programming” results in PDF format. The important aspect of this
feature is that as you go deeper using facets, the resultant search space is vastly
reduced and hence the search will be considerably faster.
Both Apache Solr and Amazon CloudSearch allow
the user to perform faceting with minimal effort.
Advantage: Neutral
Feature Weight: High
Field
Weighting / Boosting:
Field Weighting is a process of assigning different prominence's to the same word when present in different places in a document. For
example when the phrase “Harry Potter” is present in the title of a document, it is ranked higher than when the same phrase
is present in the References section
of a document.
Both Apache Solr and Amazon CloudSearch allow
field boosting with minimal effort.
Advantage: Neutral
Feature Weight: High
Auto Suggest:
Often we find in many search boxes that
when a user types a search query, suggestions of popular queries in relevance
to the input are presented. Also we can find that the suggestion list is
refined as additional characters are typed in by the user. This feature is
called as Auto Suggest. This feature can be implemented at the Search Engine
level or at the Search Application level.
Apache Solr has the native support for autosuggest
feature. It can be facilitated in many ways using – NGramFilterFactory,
EdgeNGramFilterFactory or TermsComponent. Usually you can find this feature of
Apache Solr is used in conjunction with jQuery for creating powerful auto
suggestion experience in applications.
Amazon CloudSearch has no direct support
for this autosuggest feature currently. We have to implement the same in our
search application tier.
Advantage: Apache Solr
Feature Weight: Medium
Geospatial
Search:
Consider an example - where a user performs
a search for “Starbucks”, the search engine module must show the nearest outlet
based on the user’s current location. Such location-aware searches will always
produce significantly better results and helps the user in finding the right information
more effectively and efficiently. This use-case signifies the importance of
Geospatial search. In today’s mobile world, it is an important feature in many location
aware business applications.
Apache Solr supports geospatial search
through the implementation class solr.LatLonType.
Actions such as sorting the results by distance and boosting documents by
distance can be performed.
Amazon CloudSearch has a very limited geospatial
search feature set. As of now, Amazon CloudSearch has the capability to return
documents within a specific area. Missing features include sorting by
geographical distance and faceting by distance.
Advantage: Apache Solr
Feature Weight:
Medium
“Find
Similar” feature:
The search engine suggests similar records
based on a particular record. It is similar to the “Find Similar” resumes feature
used by popular job search engines. Ecommerce sites also benefit from this
feature as research suggests that users typically compare products before
making a transaction and are likely to buy a product which is better. Apache
Solr implements this feature using handlers/components like MoreLikeThisHandler or MoreLikeThisComponent
Amazon CloudSearch currently does not
support this feature.
Advantage: Apache Solr
Feature Weight: High
1 comment:
For "configuration and getting started," cloud hosted Solr services, such as my own http://websolr.com/ can really help swing in Solr's favor on that front.
Post a Comment