Comparison Analysis Apache Solr vs Amazon CloudSearch - continued from Part 1.......
“Did
you mean…” feature:
Sometimes when you search for a word, you
will be presented with correct spelling. Search engines like google automatically
correct the spelling and present you with even the search result. This feature
of presenting the user with spelling corrected suggestions is called “Did you
mean” feature.
Apache Solr supports this feature with the Spellcheck
search component. The recommended approach is to build a word corpus based on
the index principally because your data will contain proper nouns and other
words not present in a general-purpose dictionary.
Amazon CloudSearch has no support for “Did
you mean…” feature currently
Advantage:
Apache Solr
Feature
weight: High
Rich
Documents Support:
Rich document types like HTML, PDF, Word etc can be uploaded
into the search engine for providing searchable access. These uploaded
documents will be parsed into a native format and indexed by the search
engines. Such indexed documents can be searched using the common search terms
and patterns by the users/applications. Usually systems like
DocumentManagement, CMS etc use this feature of a search engine/service to help
itscustomers search through the documents uploaded. Typically in enterprise
scenario you can expect variety of document formats to flow into the search
systems from different applications.
Apache Solr has support for rich
document parsing & indexing using Apache Tika.
Amazon CloudSearch expects data to
be in Search Data Format (JSON & XML). CloudSearch
supports uploading rich documents via the Console, or via
the cs-generate-sdf command line tool. With CloudSearch you can
use cs-generate-sdf to extract the data on the client, and send the
text to CloudSearch.
Advantage: Neutral
Feature
Weight: High
Feature Customization:
Sometimes search software’s may not support
some specific feature natively because there might not be sufficient demand for
them to be added in core. In such cases, some search software’s provide
capability to customize and extend their existing feature sets as plugins and
modules. Amazon CloudSearch,
being a proprietary creation, does not allow for any customization either
through plugin integration or via extending functionalities. Features will be
rolled out only by AWS team. In my experience with AWS team, they are usually
very proactive, accessible and receptive. You can speak to AWS architect or
product manager and explain your specific need. In case if your specific need is not be as
specific as you think and it is being asked by considerable number of customers
around the world, they will include this in their road map.
Apache Solr, being open source, allows customizations
of analysers, tokenizers, indexers, query analysis through plugins and via
extending their code base.
Advantage:
Apache Solr
Feature
weight: Medium
Stemming,
Stop Words and Synonyms:
Stemming:
A stemming dictionary maps related words to a common
stem. A stem is typically the root or base word from which variants are
derived. For example, run is the stem of running and ran.
Stop words: Stopwords are words that should typically be
ignored both during indexing and at search time because they are either
insignificant or so common that including them would result in a massive number
of matches. Example: a,an, and, the, to… etc are some
commonly used words which can be ignored during indexing.
Synonyms: You can configure synonyms for terms that appear in the
data you are searching. That way, if a user searches for the synonym rather
than the indexed term, the results will include documents that contain the
indexed term. For example, you might want to configure synonyms so that a
search for "Rocky Four" or "Rocky 4" will match the movie
titled "Rocky IV". To do that, you would configure 4 and four as
synonyms of the indexed term IV
Both Apache Solr and Amazon Cloud
Search support these features.
Advantage: Neutral
Feature Weight: High
Support
for protocols:
Both Amazon CloudSearch and Apache Solr support
HTTP & HTTPS protocols. Amazon CloudSearch supports HTTPS and includes web
service interfaces to configure firewall settings that control network access
to your domain.
Advantage:
Neutral
No comments:
Post a Comment