Apache Solr and Elasticsearch expect
users to have basic practical knowledge of the search engine and also
complete a few significant tasks to accomplish the first step ‘Getting
Started’.
In Amazon CloudSearch, the ‘Getting
Started’ activities are easier and end users can have the CloudSearch
instance up and running with few clicks in a few minutes.
In this section, we’ll discuss some important administrative
operations such as
•
Index backup
•
Patch management
•
Re-indexing and recovery
Data backup is a
routine operation, carried out within a defined period of time. Data backup is
an essential task for recovering data responsively from failures such as
hardware crash, data corruption or related events.
Apache Solr
|
Apache Solr provides a feature called
‘ReplicationHandler’. The main objective of ReplicationHandler is to
replicate index data to slave servers, but it can also be used as a backup
copy server. A replication slave node can be configured with Solr Master, which
can be solely identified as a backup server, with no other operations taking
place on the slave node.
Solr‘s implicit support for replication allows
ReplicationHandler to be used as an API. The API has optional parameters like
location, name of snapshot, and number of backups. The backup API is a
bound-to-store snapshot on a local disk, but for any other storage options
the backup API requires customization.
If you are required to store the backups in a
different store location like Amazon’s Simple Storage Service (S3), a local
storage server, or in a remote data center, ReplicationHandler has to be
customized. Solr core libraries are available in open source that allows for
any customization.
|
Elasticsearch
|
Elasticsearch provides an advanced option called
‘Snapshot API’ for backing up the entire cluster. The API will back up the current cluster
state and related data and save it to a shared repository.
The first or initial backup process will be a
complete copy of the data. The subsequent backup processes will snapshot the
delta between the backup of fresh data with previous snapshots. Elasticsearch
prompts end users to create a repository type, which can be chosen from a
shared file system:
• Amazon S3
• Hadoop Distributed File System (HDFS)
• Azure Cloud
This integration gives a greater flexibility for
developers to manage their backups.
|
Backup Process
|
The backup options present in Apache Solr and
Elasticsearch can be executed manually or can be automated. To automate the
entire backup process, one has to write custom scripts that calls the
relevant API or handler. Most engineering companies follow this model of
writing custom scripts for backup automation.
The backup also involves maintenance of the latest
snapshots and archives. The management tasks involve key tasks like snapshot
retrieval, archival, and expiration.
In an alternate approach, if the Solr or
Elasticsearch cluster is set up in a cluster replication mode, any one of the
slave nodes is identified as backup server. The automation of the slave node
backup server needs a script written by the developer.
|
Amazon CloudSearch
|
Amazon CloudSearch inherently takes cares of the
data that is stored and indexed, leaving a lighter load for engineering and
operations teams. Amazon CloudSearch self-manages all the data backup and its
management. The backups are internally maintained behind the scenes. In the
event of any hardware failure or other problem, Amazon CloudSearch restores
the backup automatically, and this process is not revealed to end users.
|
Conclusion
|
The default option in Apache Solr is only to back
up to a ‘local disk’ store; it does not offer any other storage options as
Elasticsearch does. However, the engineers can write their own handlers to
manage the backup process.
Elasticsearch is packaged with multiple storage
options plugins which gives added advantage for engineers.
Amazon CloudSearch relieves the users of the
intricacies of the backup and its management process. The IT operations or
managed service team have a lesser role in the backup process as the entire
operations are managed behind the scenes by CloudSearch.
|
2.2 System upgrades and patch management
Patch management and
system upgrades like OS patches and fixes are inevitable in operations and
administration. For any system, there is always a version upgrade, or
maintenance on the OS, and hardware or software changes.
Rolling Restarts
|
Apache Solr and Elasticsearch both recommend using
‘Rolling Restarts’ for patch management, operating system upgrades and other
fixes. Rolling Restarts involve stopping and starting each cluster node in
the cluster sequentially. This allows the cluster to continue its operations
while each node is updated with the latest code, fixes, or patches while continuing
to serve search requests. Rolling Restarts is adopted when high availability
is mandatory and downtime is not allowable.
Sometimes, the Rolling Restarts require some
intelligent decision making based on cluster topology. If a cluster consists
of shards and replicas, the order of restarting each node has to be done
decisively.
|
Apache Solr
|
Apache’s ZooKeeper service acts as a
stand-alone application and does not get upgraded automatically when Apache
Solr is upgraded, but it should be done manually at the same time.
|
Elasticsearch
|
Elasticsearch recommends disabling the ‘shard
allocation’ configuration during node restart. This informs Elasticsearch to
stop re-balancing missing shards because the cluster will immediately start
working on node loss.
|
Amazon CloudSearch
|
Amazon CloudSearch internally manages all patches
and upgrades related to its operating system. The managed search service
offering from Amazon CloudSearch monitors for when new features are rolled
out; upgrades are self-managed and immediately available to all customers
without any action on their part.
|
Conclusion
|
The patch management in Apache Solr and
Elasticsearch has to be carried out manually using the Rolling Restarts
feature. Customers automate this process by developing custom scripts to do
system upgrades and patch management.
Patch management in Amazon CloudSearch is
transparent to the customers. The upgrades and patches done on Amazon
CloudSearch are regularly updated in the ‘What’s New’ section of the
CloudSearch documentation.
|
Any business application changes over its lifetime,
as the business running it changes. The business change has a direct effect on
the data structure of the system’s persistent information store. The search
engine, which is seen as a secondary or alternate store, will eventually have
to change its data structure when required.
Any changes to the search engine data structure will require a
re-indexing of the data.
Example: A product company started collecting
‘feedback’ from their customer for a given product. The text string from the
new field ‘feedback’ needs to be added into the search schema, and may require
re-indexing.
If the search data is not re-indexed after a
structural change, the data that has already been indexed could become inaccurate
and the search results may behave differently than expected.
Re-indexing becomes a necessary process over a
period of time as the application grows. It is also identified as a common and
mandatory admin operation executed periodically based on application
requirements.
Apache Solr
|
Apache Solr recommends re-indexing if there
is a change in your schema definitions. The options below are widely used by
the Apache Solr user community.
•
Create a fresh index with new settings. Copy all
of the documents from the old index to the new one.
•
Configure Data import handler with
‘SolrEntityProcessor’. The SolrEntityProcessor imports data from Solr
instances or cores for a given search query.
The SolrEntityProcessor has a limitation where it can only copy fields
that are stored in the source index.
•
Configure Data import handler with the source or
origination data source. Push the data freshly to the new index
|
Elasticsearch
|
Elasticsearch proposes several approaches
for data re-indexing. The following approaches are usually combined:
·
Use Elasticsearch’s Scan and Scroll and Bulk APIs
to fetch and push data into the new index.
·
Update or create an index alias with the old index
name and delete the old index.
·
Use open source Elasticsearch plugins that can
extract all data from the cluster and re-index the data. Most of these
plugins internally use the Scan and Scroll and Bulk API (as mentioned above)
which reduces development time.
|
Amazon CloudSearch
|
Amazon CloudSearch recommends data
rebuilding when index fields are added or modified. Amazon CloudSearch
expects to issue an indexing request after a configuration change. Whenever
there is a configuration change, the CloudSearch domain status changes to
‘NEEDS INDEXING’. During the index rebuilding, the domain's status changes to
‘PROCESSING’, and upon completion the status is changed to ‘ACTIVE’.
Amazon CloudSearch can continue to serve
search requests during the indexing process, but the configuration changes
are not reflected in the search results. The re-indexing process can take
some time for the changes to take effect. It is directly proportional to the
amount of data volume in your index.
Amazon CloudSearch also allows document
uploads while indexing is in progress, but the updates can become slower, if
there are is large volume of document updates. During such a scenario, the uploads
or updates can be throttled or paused until the Amazon CloudSearch domain
returns to an ‘ACTIVE’ state.
Customers can initiate re-indexing by
issuing the index-documents command using RESTful API, AWS command line
interface (CLI), or AWS SDK. They can also initiate re-indexing from the
CloudSearch management console.
|
Conclusion
|
Re-indexing in Apache Solr and Elasticsearch is
mostly a manual process because it requires a decision that factors data
size, current request size, and offline hours.
Amazon CloudSearch manages the re-indexing process
inherently and leaves much less to administrators. The re-indexing time
period is abstracted and not disclosed to administrators but Amazon
CloudSearch runs the re-indexing process based on the best practices
mentioned above.
Monitoring server health is an essential daily
task for operations and administration. In this section, we will describe the
built-in monitoring capabilities for all three search engines.
Apache Solr
|
Apache Solr has a built-in
web console for monitoring indexes, performance metrics, information about
index distribution and replication, and information on all threads running in
the Java Virtual Machine (JVM) at the time.
For more detailed monitoring, Java
Management Extensions (JMX) can be configured with Solr that share runtime
statistics as MBeans. The Apache Solr JVM container has built-in
instrumentation that enables monitoring using JMX.
|
Elasticsearch
|
Elasticsearch has a management and
monitoring plugin called ‘Marvel’. Marvel has an interactive console called
‘Sense’ that helps users to interact easily with Elasticsearch nodes.
Elasticsearch has in-built diversified APIs that emit heap usage, garbage
collection stats, file descriptions, and more. Marvel is strongly integrated
with these APIs, and it periodically executes polling, collects statistics
and stores the data back in Elasticsearch. Marvel’s interactive graph report
dashboard allows administrators to query and aggregate historical stats data.
|
Amazon CloudSearch
|
Amazon CloudSearch recently
introduced Amazon CloudWatch integration. The Amazon CloudSearch metrics can
be used to make scaling decisions, troubleshoot issues, and manage clusters.
Amazon
CloudSearch publishes four metrics into Amazon CloudWatch:
SuccessfulRequests, Searchable Documents, Index Utilization, and Partition Count.
The CloudWatch metrics can be
configured to set alarms, which can notify administrators through Amazon
Simple Notification Service.
|
Conclusion
|
Apache Solr and Elasticsearch have
integrations with in-built and external plugins. They can also support SaaS based
monitoring plugins or custom plugins developed by the customers.
CloudSearch’s integration with
CloudWatch shares some good metrics and it is expected to offer newer ones in
the future.
Schema: A schema is a
definition of fields and field types used by the search system to organize data
within the document files it indexes.
Schema definition is
the foremost task in the search data structure design. It is important that the
schema definition caters to all business requirements and is designed to suit
the application.
Apache Solr and Elasticsearch
|
Both Elasticsearch and Apache Solr
can run the search application in ‘Schema-less’ and ‘Schema’ mode. Schema
mode is suitable for application development or any production environments.
Schema-less is a very good option
for entrants to get started. After server setup, users can start the
application without a schema structure and create the field definitions on
the search indexing. However, to have a production-grade application running,
a proper schema structure becomes mandatory and the schema definition is a
necessity.
|
Amazon CloudSearch
|
Amazon CloudSearch also allows users
to set up search domains without any index fields. The index fields can be
added anytime, but before any valid document indexing or any search request.
In addition, the CloudSearch
management console has integration with Amazon Web Services like S3,
DynamoDB, or can access a local machine from where the schema can be imported
directly to CloudSearch domain. After the schema import, CloudSearch allows
the user to edit the fields or add new fields. This is a convenient feature
for a pre-built schema that is to be migrated to a CloudSearch domain.
|
Conclusion
|
Apache Solr
and Elasticsearch can be started without any schema but they cannot be put
into production use. Amazon CloudSearch allows creating domains without any
index fields, but to have any index and search requests served the schema
should be created.
The general
best practice in schema management is to rehearse and design the schema
suiting application requirements before finalizing the search structure. The
underlying schema concept of all three search engines is consistent with this
practice.
|
Dynamic fields are
like regular field definitions which support wildcard matching. They allow the
indexing of documents without knowing the type of fields they contain. A
dynamic field is defined using a wildcard pattern (*) for first, last, or only
character. All undefined fields go through dynamic field rules which validate
the pattern match configured with the dynamic field's indexing options.
Apache Solr and Elasticsearch
|
Apache Solr and Elasticsearch allow end
users to set up dynamic fields and rules using RESTful API and schema
configuration.
|
Amazon CloudSearch
|
In Amazon CloudSearch, dynamic
fields can be configured using indexing options in the CloudSearch management
console or using CloudSearch, RESTful API, or AWS SDK API.
|
Conclusion
|
If you are unsure about the schema structure or exact field names,
dynamic fields come in handy. Amazon CloudSearch, Apache Solr, and
Elasticsearch all allow the flexibility to configure dynamic fields. This
helps the application development
team to describe any omitted field definitions in the schema document.
|
There are a variety of data types supported by these search
engines. The table below illustrates the data field types supported by each
search engine.
Data type
|
Solr
|
Elasticsearch
|
CloudSearch
|
|
|
|
|
String / Text
|
Yes
|
Yes
|
Yes
|
|
|
|
|
Number types
|
integer, double, float,
long
|
byte, short, integer,
long, float, double
|
integer, double
|
|
|
|
|
Date types
|
Yes
|
Yes
|
Yes
|
|
|
|
|
Enum fields
|
Yes
|
Yes
|
No
|
|
|
|
|
Currency
|
Yes
|
No
|
No
|
|
|
|
|
Geo location / Latitude – Longitude
|
Yes
|
Yes
|
Yes
|
|
|
|
|
Boolean
|
Yes
|
Yes
|
No
|
|
|
|
|
Array types
|
Yes
|
Yes
|
Yes
|
|
|
|
|
Conclusion
|
The most important data types like string, date, and number types are
supported by all three search engines. Geo location data type, which is now
regularly used by modern applications, is also supported by all search
engines.
Engineers and developers may use an alternate data type if a particular
data type is not supported for their chosen search engine. Example,
‘currency’ data type supported in Solr is not available in Elasticsearch and
CloudSearch. During such cases, engineers use number type as an alternative
data type for ‘Currency’.
|
The most important task in a search application development
is data migration from origination source to the search engine. The origination
data can be of a data source like a database, or a file system or a persistent
store. To commence a search data set, it is required to migrate or import the
full data set from its origin to the search engine.
Likewise, extracting data from a search engine and exporting
it to a different destination source is also a crucial task but executed
occasionally.
Apache Solr
|
Apache Solr has in-built handler
called Data import handler (DIH). The DIH provides a tool for migrating
and/or importing data from the origin store. The DIH can index data from data
sources such as
• Relational Database Management
System (RDBMS)
•
Email
•
HTTP
URL end point
•
Feeds
like RSS and ATOM
• Structured XML files
The DIH has more advanced features
like Apache Tika integration, delta
import, and transformers to quickly migrate the data.
The Apache Solr export handler can
export the query result data to a Javascript Object Notification (JSON) or
comma-separated values (CSV) format. The export query expects to sort and
filter query parameters and returns only the stored fields. Users also have
the option of developing a custom export handler and incorporate it with Solr
core libraries.
|
Elasticsearch
|
Elasticsearch ‘Rivers’ is an elegant
pluggable service which runs inside the Elasticsearch cluster. This service
can be configured for pulling or pushing the data that is indexed into the
cluster. Some of the popular Elasticsearch Rivers modules are CouchDB,
Dropbox, DynamoDB, FileSystem, Java Database Connectivity (JDBC), Java
Messaging Service (JMS), MongoDB, neo4j, Redis, Solr, Twitter, and Wikipedia.
However, ‘Rivers’ will be deprecated
in the newer release of Elasticsearch, which recommends using official client
libraries built for popular programming languages. Alternatively, the
Logstash input plugin is also one of the identified tools that can be used to
ship data into Elasticsearch.
For data export, Elasticsearch
snapshot can be used for any individual indices or an entire cluster into a
remote repository. This is discussed in detail in the section ‘Operations
and Management - Backup’.
|
Amazon CloudSearch
|
Amazon
CloudSearch recommends sending the documents in batches to upload on
CloudSearch domain. A batch is a collection of add, update, and delete
operations which should be described in JSON or XML format.
Amazon
CloudSearch limits a single batch upload to 5 MB per batch, but allows
running parallel upload batches to reduce the time frame for full data
upload. The number of parallel batch uploads is directly proportional to the
CloudSearch instance types. Larger instance types have a higher upload
capacity, while smaller instance types have lower. During such scenarios, the
batch upload programs should intelligently threshold the uploads based on
instance capacity.
|
Conclusion
|
Apache Solr has good handlers to
export and import the data. In any case, if the options present are not
viable, Apache Solr allows one to develop a new custom handler or customize
an existing handler that can be used for data import and export.
Elasticsearch has integration with
popular data sources in the form of ‘River’ modules or plugins. However, the
future versions of Elasticsearch strongly recommend using Logstash input
plugins or developing and contributing new Logstash input, as customization
of a plugin is allowed in Elasticsearch.
Amazon CloudSearch does not have
elaborate options like other two search engines. However by combining custom
programs with bulk upload recommendations in Amazon CloudSearch, customers
can successfully migrate data into CloudSearch.
In this section, we will evaluate ‘Search and Indexing’
features present in the search engines we are evaluating. This is a very
important feature set as they are widely used by search application engineers.
Generally speaking,
the search engine prepares text strings for indexing and searching using
analyzers, tokenizers, and filters. These tools are frequently used by
libraries configured for indexing and searching the data. Most of the time, the
libraries are composed in a sequential series.
•
During
indexing and querying, analyzer assesses the field text and tokenizes each
block of text into individual terms. Each token is a sub-sequence of the
characters in the text.
•
The token
filter filters each token in the stream sequentially and applies its filter
functionality.
Apache Solr and Elasticsearch
|
|
Apache Solr and Elasticsearch have
multifaceted in-built libraries for analyzers, tokenizers and token filters.
These libraries are packaged with search engine installable that can be
configured during indexing and searching.
Although the analyzers can be
configured for indexing and querying, the same series of libraries doesn’t
need to be used for both operations. The indexing and searching operations
can be configured to have different tokenizers and filters, as their goals
can be different.
|
|
Search Engine
|
Tokenizers
|
Filters
|
|
|
|
Apache Solr
|
Standard,
Classic, Keyword, Letter, Lower Case, N-Gram, Edge N-Gram, ICU, Path
Hierarchy, Regular Expression Pattern, UAX29 URL Email, White Space
|
ASCII
Folding, Beider-Morse, Classic, Common Grams, Collation Key,
Daitch-MokotoffSoundex, Double Metaphone, Edge N-Gram, English Minimal Stem,
Hunspell Stem, Hyphenated Words, ICU Folding, ICU Normalizer 2, ICU
Transform, Keep Words, KStem, Length, Lower Case, Managed Stop, Managed
Synonym, N-Gram, Numeric Payload Token, Pattern Replace, Phonetic, Porter
Stem, Remove Duplicates Token, Reversed Wildcard, Shingle, Snowball Porter,
Stemmer, Standard, Stop, Suggest Stop, Synonym, Token Offset Payload, Trim,
Type As Payload, Type Token, Word Delimiter
|
|
|
|
Elasticsearch
|
Standard,
Edge NGram, Keyword, Letter, Lowercase, NGram, Whitespace, Pattern, UAX Email
URL, Path Hierarchy, Classic, Thai
|
Standard
Token, ASCII Folding Token, Length Token, Lowercase Token, Uppercase Token,
NGram Token, Edge NGram Token, Porter Stem Token, Shingle Token, Stop Token,
Word Delimiter Token, Stemmer Token, Stemmer Override Token, Keyword Marker
Token, Keyword Repeat Token, KStem Token, Snowball Token, Phonetic Token,
Synonym Token, Compound Word Token, Reverse Token, Elision Token, Truncate
Token, Unique Token, Pattern Capture Token, Pattern Replace Token, Trim
Token, Limit Token Count Token, Hunspell Token, Common Grams Token,
Normalization Token, CJK Width Token, CJK Bigram Token, Delimited Payload
Token, Keep Words Token, Keep Types Token, Classic Token, Apostrophe Token
|
|
|
|
|
|
|
|
Amazon CloudSearch
|
Amazon CloudSearch analysis scheme
configuration is used for analyzing text data during indexing. The analysis
schemes basically control:
• Text field content processing
•
Stemming
•
Inclusion
of stopwords and synonyms
•
Tokenization
(Japanese language)
• Bigrams (Chinese, Japanese, Korean
languages)
The
following analysis options are executed when text fields are configured with
an analysis scheme
1. Algorithmic stemming: Level of
algorithmic stemming (minimal, light, and heavy) to perform. The stemming
levels vary depending on the analysis scheme language.
2. Stemming dictionary: A dictionary to
override the results of the algorithmic stemming.
3. Japanese Tokenization Dictionary: A
dictionary which specifies how particular characters should be grouped into
words (only for Japanese language).
4. Stopwords: A set of terms that
should be ignored both during indexing and at search.
5. Synonyms: A dictionary of words that
have the same meaning in the text data
Before processing the analysis
scheme, Amazon CloudSearch tokenizes and normalizes the text data. During
tokenization, the text data is split into multiple tokens; this is common
behavior in all search engine text processing. During normalization, upper
case characters are converted to lower case, and more formatting is applied.
After the
tokenization and normalization processes are completed, stemming, stopwords,
and synonyms are applied.
|
Conclusion
|
Apache Solr and Elasticsearch are
packaged with varied libraries with distinct functions of analyzers,
tokenizers, and filters. Also, these libraries are allowed to be customized which
gives greater flexibility for the developers.
Amazon CloudSearch doesn’t carry
sophisticated tokenizers or filter libraries like Apache Solr or
Elasticsearch, but it has simplified the configuration. Amazon CloudSearch
tokenizers and filters cover most common search requirements and use
cases. This is ideal for developers
who want to quickly integrate search functionality into their application
stack.
|
Faceting is the
composition of search results into categories or groups, based on indexed
terms. Faceting allows for categorizing search results into more sub-groups,
which can be used as the basis for filters or other searches. Faceting is also
for efficient computation of search results by facets. For example, facets for
‘Laptop’ search results can be 'Price', ‘Operating System’, 'RAM' or 'Shipping
Method’.
Faceting is a popular
function that helps consumers filter through search results easily and
effectively.
Apache Solr
|
Apache Solr has far advanced options
in faceting ranging from simple to very advanced faceting behavior.
The below table details the
parameters used during faceting. They can be grouped by field value, date,
range, pivot, multi-select, and interval.
Facet grouping
|
Parameters
|
Field value parameters
|
facet.field, facet.prefix,
facet.sort, facet.limit, facet.offset, facet.mincount, facet.missing,
facet.method, facet.enum.cache.minDffacet.threads
|
Date faceting parameters
|
facet.date, facet.date.start, facet.date.end, facet.date.gap,
facet.date.hardend, facet.date.other, facet.date.include
|
Range faceting parameters
|
facet.range, facet.range.start,
facet.range.end, facet.range.gap, facet.range.hardend, facet.range.other,
facet.range.include
|
Pivot
|
facet.pivot, facet.pivot.mincount
|
Interval
|
facet.interval, facet.interval.set
|
|
Elasticsearch
|
Elasticsearch has
deprecated facets and announced that they will be removed in a future
release. The Elasticsearch team felt that their facet implementation was not
designed from the ground up to support complex aggregations. Elasticsearch
will be replacing facets with aggregations in their next release.
Elasticsearch says “An aggregation can be seen as a
unit-of-work that builds analytic information over a set of documents. The
context of the execution defines what this document set is (for example, a
top-level aggregation executes within the context of the executed
query/filters of the search request).”
Elasticsearch strongly recommends
migrating from facets to aggregations. The aggregations are classified into
two main families, Bucketing and Metric.
The following table lists the
aggregations available in Elasticsearch.
|
|
Elasticsearch
Aggregators
|
Min Aggregation, Max Aggregation, Sum
Aggregation, Avg Aggregation, Stats Aggregation, Extended Stats
Aggregation, Value Count Aggregation, Percentiles Aggregation, Percentile
Ranks Aggregation, Cardinality Aggregation, Geo Bounds Aggregation, Top
hits Aggregation, Scripted Metric Aggregation, Global Aggregation, Filter
Aggregation, Filters Aggregation, Missing Aggregation, Nested Aggregation,
Reverse nested Aggregation, Children Aggregation, Terms Aggregation,
Significant Terms Aggregation, Range Aggregation, Date Range Aggregation,
IPv4 Range Aggregation, Histogram Aggregation, Date Histogram Aggregation,
Geo Distance Aggregation, GeoHash grid Aggregation
|
|
|
|
Amazon CloudSearch
|
Amazon
CloudSearch simplifies facet configuration when defining indexing options.
These facets are targeted at common use cases like e-commerce, online travel,
classifieds, etc. The facet can be of any field having data type as date,
literal, or numeric field. This is done during CloudSearch domain
configuration. Amazon CloudSearch also allows the buckets definition to
calculate facet counts for particular subsets of the facet values.
The facet
information can be retrieved in two ways:
Sort:
returns facet information sorted either by facet counts or facet values.
Buckets:
returns facet information for particular facet values or ranges
During
searching, facet information can be fetched for any facet-enabled field by
specifying the “facet.FIELD” parameter in the search request (‘FIELD’ is the
name of a facet-enabled field).
Amazon
CloudSearch does allow multiple facets which help to refine search results
further. See the below example.
Example:
"q=poet&facet.genres={}&facet.rating={}&facet.year={}&return=_no_fields"
|
Conclusion
|
All three
search engines allow users to perform faceting with minimal effort. However,
in terms of an advanced complex implementation, the approaches are different
for each search engine.
|
When a user types a search
query, suggestions relevant to the query input are presented and as more
characters are typed by the user, refined suggestions are presented. This
feature is called auto-suggest. Auto-suggest is an appealing and useful
requirement and employed in many search user interfaces.
This feature can be
implemented at the Search Engine level or at the Search Application level.
Below are some options available in these three search engines.
Apache Solr
|
Apache Solr has native support for
the auto-suggest feature. It can be facilitated by using NGramFilterFactory,
EdgeNGramFilterFactory, or TermsComponent. Usually, this Apache Solr feature
is used in conjunction with jQuery or asynchronous client libraries for
creating powerful auto-suggestion and user experience in the front-end applications.
|
Elasticsearch
|
Elasticsearch also has many edge
n-grams, which are easy to set up, flexible, and fast. Elasticsearch introduced
a new data structure, Finite State Transducer (FST), which resembles big
graph data structure. This data structure is managed in memory, which makes
it much faster than a term-based query could be. Elasticsearch also
recommends using edge n-grams when query input and its word ordering are less
predictable.
|
Amazon CloudSearch
|
Amazon CloudSearch offers
‘Suggesters’ to achieve auto-suggest. CloudSearch Suggesters are configured
based on a particular text field. When Suggesters are used for querying with
a search string, CloudSearch lists all documents where the search string in the
Suggester field begins with that search string. Suggesters can be configured
to find matches for the exact query, or to perform a fuzzy matching process
to correct the query string. The ‘Fuzzy Matching’ can be defined with
fuzziness level Low, High or Default.
Suggesters also can be configured
with SortExpression, which computes a score for each one. It’s important to
do domain indexing when a new Suggester is configured. Suggestions will not be reflected until all
of the documents are indexed.
|
Conclusion
|
Amazon CloudSearch provides simple
yet powerful ‘Suggest’ implementation, which is sufficient for most of the
applications. If you are looking for advanced options or any further
customizations on ‘Suggestions’, Apache Solr and Elasticsearch offer some
good options.
|
Highlighting is a way
of giving formatting clues to end users in the search results. It is a valuable
feature, where the front-end search applications highlight search snippets of
text from each search result. This function conveys to end users why the result
document matched their query. In this section, we will describe the options
present in all three search engines.
Apache Solr
|
Apache Solr includes document text
fragments, which are matched in the query response. These text fragments are
included in the response as a highlighted section that is used as a cue by
search clients for representation. Apache Solr is packaged with good
highlighting collections which give control over the text fragments, fragment
size, fragment formatting, and so on. These highlighting collections can be
incorporated with Solr Query parsers and Request Handlers.
Apache Solr comes with
three highlighting utilities
•
Standard
Highlighter
•
FastVector
Highlighter
• Postings Highlighter
Standard Highlighter is most
commonly used by search engineers because it is a good choice for a wide
variety of search use-cases. The FastVector Highlighter is ideal for large
documents and highlighting text in a variety of languages. The Postings
Highlighter works well for full-text keyword search.
|
Elasticsearch
|
Elasticsearch also allows
for highlighting search results on one or more fields. The implementation
uses a Lucene based highlighter, fast-vector-highlighter and
postings-highlighter. In Elasticsearch, the highlighter can be configured in
the query to force a specific highlighter type. This is a very flexible
option for developers to choose a specific highlighter to suit their
requirements.
Like Apache Solr, the
three highlighters present in Elasticsearch emulate the same behavior which
is seen in Solr because these highlighters are inherited from the Lucene
family.
|
Amazon CloudSearch
|
Amazon CloudSearch
simplifies the highlighting by specifying the highlight.FIELD parameter in
the search request. Amazon CloudSearch returns excerpts with the search
results to show where the search terms occur within a particular field of a
matching document.
For example: Search terms
‘Smart Phone’ is highlighted for the description field:
Highlights":
{"description": "A *smartphone* is a mobile phone with an
advanced mobile operating system. They typically combine the features of a
cell phone with those of other popular mobile devices, such as personal
digital assistant (PDA), media player and GPS navigation unit. A *smartphone*
has a touchscreen user interface and can run third-party apps, and are camera
phones."}
Amazon CloudSearch also
provides controls like number of search term occurrences within an excerpt,
how they should be highlighted, plain text or HTML and so on.
|
Conclusion
|
From a development perspective, all
three search engines provide easy and simple highlighting implementations. If
you are looking for different and more advanced highlighting options, Apache
Solr and Elasticsearch have some good features.
|
|
|
Multilingualism is a very
important feature for global applications which cater to non-English speaking
geographies. A leading information measurement company’s survey reveals that
search engines built with multilingual features are emerging and successful
because of native language support, and focus on the cultural background of the
users.
Business Impact:
A multilingual search is an effective marketing strategy to get the attention
of consumers. In e-commerce, a platform to do more business is created when the
language is in the native tongue of the customer.
Apache Solr
|
Apache Solr is packaged with
multilingual support for most common languages. Apache Solr carries many
language-specific tokenizers, and filters libraries which can be configured
during indexing and querying.
Apache Solr engineering forums
recommend using multi-core architecture where each core manages one language.
Solr also supports language detection using Tika and LangDetect detection
features. This helps to map the text data to language-specific fields during
indexing.
|
Elasticsearch
|
Elasticsearch has incorporated a
vast collection of language analyzers for most commonly spoken languages. The
primary role of the language analyzer is to split, stem, filter, and apply
required transformations specific to the language.
Elasticsearch also allows a user to
define a custom analyzer that can be a base extension of another analyzer.
|
Amazon CloudSearch
|
Amazon CloudSearch has strong
support for language-specific text processing. Amazon CloudSearch has
pre-defined default analysis schemes support to 34 languages. Amazon
CloudSearch processes the text and text-array fields based on the configured
language-specific analysis scheme.
Amazon CloudSearch also allows a
user to define a new analysis scheme that can be an extension of the default
language analysis scheme.
|
Conclusion
|
All three search engines have ample
and effective support features for widely spoken international languages.
|
Languages Support
|
The table below lists the languages
supported by each search engine
Search engine
|
Languages supported
|
|
|
Apache Solr
|
Arabic,
Brazilian, Portuguese, Bulgarian, Catalan, Chinese, Simplified Chinese, CJK,
Czech, Danish, Dutch, Finnish, French, Galician, German, Greek, Hebrew,
Lao, Myanmar Khmer, Hindi, Indonesian, Italian, Irish, Japanese, Latvian,
Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Scandinavian,
Serbian, Spanish, Swedish, Thai and Turkish
|
|
|
Elasticsearch
|
Arabic,
Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish,
Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian,
Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian,
Portuguese, Romanian, Russian, Spanish, Swedish, Thai and Turkish
|
|
|
Amazon CloudSearch
|
Arabic,
Armenian, Basque, Bulgarian, Catalan, Chinese - Simplified, Chinese -
Traditional, Czech, Danish, Dutch, English, Finnish, French, Galician,
German, Greek, Hindi, Hebrew, Hungarian, Indonesian, Irish, Italian,
Japanese, Korean, Latvian, Norwegian, Persian, Portuguese, Romanian,
Russian, Spanish, Swedish, Turkish and Thai
|
|
Search engine
|
Request
formats
|
Response
formats
|
|
|
|
Apache Solr
|
XML, JSON, CSV
|
JSON, XML, CSV
|
|
|
|
Elasticsearch
|
JSON
|
XML, JSON
|
|
|
|
Amazon CloudSearch
|
XML, JSON
|
XML, JSON
|
Search
engine
|
Integrations
available
|
|
|
Apache Solr
|
Drupal, Magento, Django, ColdFusion, Wordpress,
OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak, DataStax Enterprise
Search, Cloudera Search, Hortonworks Data Platform, MapR
|
|
|
Elasticsearch
|
Drupal, Django, Symfony2, Wordpress, CouchBase,
SearchBlox, Hortonworks Data Platform, MapR
|
|
|
Amazon CloudSearch
|
|
Search
engine
|
Protocols
support
|
|
|
Apache Solr
|
HTTP, HTTPS
|
|
|
Elasticsearch
|
HTTP, HTTPS
|
|
|
Amazon CloudSearch
|
HTTP, HTTPS
|
Feature 8: High Availability
All three search engines are architected for
•
High availability (HA)
•
Replication
•
Scaling design principles
In this next section, we will discuss high availability
options present in these three search engines.
Replication is copying
or synchronizing the search index from master nodes to slave nodes for managing
the data efficiently.
Replication is a key
design principle exercised in high availability searches and scaling. From a
High Availability perspective, replication can be effective for both HA and
failovers from master nodes (shards or leaders) to slave nodes (replicas).
Replication from a scaling perspective is used to scale the slave or replica
nodes when the requests traffic increases.
Apache Solr
|
Apache Solr supports two models of
replication, namely legacy mode and SolrCloud. In legacy mode, the
replication handler copies data from the master node index to slave nodes.
The master server manages all index updates and the slave nodes handle read queries.
This segmentation of master and slave allows scaling Solr clusters to deliver
heavy volume loads.
Apache SolrCloud is a distributed
advanced cluster setup using Solr nodes designed with high availability and
fault-tolerance. Unlike legacy mode, there is no explicit concept of
"master/slave" nodes. Instead, the search cluster is categorically
split into leaders and replicas. The leader has the responsibility to ensure
the replicas are updated with the same data stored in the leader. Apache Solr
has a configuration called ‘numShards’ which defines number of shards
(leaders). During start-up, the core index is split across the ‘numShards’
(number of shards) and the shards are represented as leaders. The nodes that
are attached in the Solr cluster after the initial ‘numShards’ will be
automatically assigned as replicas for the leaders.
|
Elasticsearch
|
Elasticsearch follows a similar
concept to SolrCloud. In brief, an Elasticsearch index can be split into
multiple shards and each shard can be replicated into any number of nodes (0,
1, 2 …n). When replication is completed, the index will have primary shards
and replica shards. During index creation, the number of shards and replicas
are defined. The number of replicas can be dynamically changed, but the shards
count cannot.
Apache Solr and Elasticsearch
|
Both Apache Solr and Elasticsearch
support synchronous and asynchronous replication models. If the replication
is configured in ‘synchronous’ mode, the primary (leader) shard will wait for
successful responses from the replica shards before returning commit
transaction. If the model is ‘asynchronous’, the response is returned to the
client as soon as the request is executed on the primary or leader shard. The
request to the replicas is forwarded asynchronously.
The diagram below depicts the
replication concept which is followed in Solr and Elasticsearch.
Replication handling in Apache Solr
and Elasticsearch
S1
|
Node 1
|
Shard 1 of
the cluster
|
S2
|
Node 2
|
Shard 1 of the cluster
|
R1
|
Node 3
|
Replica 1
of Shard 1
|
R2
|
Node 4
|
Replica 1 of Shard 2
|
R3
|
Node 5
|
Replica 2
of Shard 1
|
R4
|
Node 6
|
Replica 2 of Shard 2
|
|
Amazon CloudSearch
|
Amazon CloudSearch is simple and
refined when it comes to handling replication and streamlines the job of
search engineers and administrators. During the configuration of scaling,
Amazon CloudSearch prompts for the desired replication count which should be
based on load requirements.
Amazon CloudSearch will
automatically scale up and scale down the replicas for a domain based on the
requests traffic and data volume, but not below the desired replication
count. In Amazon CloudSearch, the replication scaling option can be changed
at any time. If the scale requirement is temporary, (for example, anticipated
spikes because of a seasonal sale) the desired replication count of the
domain can be pre-scaled, and then the changes reverted after the requests
volume returns to a steady state. Modifying the replication count does not
require any index rebuilding but the replica sync completion is dependent on
the size of search index.
The following describes the benefits
of Amazon CloudSearch replication model.
•
The
search instance capacity is automatically replicated and load is distributed,
the search layer is robust and highly available at all times.
•
Improved
fault tolerance. If any one of the replicas is down, the other replica(s)
will continue to handle requests while the failed replica is in recovery
mode.
• The entire process of scaling and
distribution is automated and avoids manual intervention and support.
|
Conclusion
|
All three search engines have a good
base to support the ‘replication’ feature. Apache Solr and Elasticsearch
allow defining your own replication topology which can be configured for
synchronous and asynchronous replication. They can be manually or
automatically scaled based on application requirements and by writing custom
programs. However, substantial managed service operations are required if the
cluster replication is set up in enterprise scale.
Amazon CloudSearch fully manages the
replication by managing scaling, load distribution, and fault tolerance. This
simplicity saves operations costs for the enterprises and companies.
|
Failover is a back-end
operation that switches to secondary or standby nodes in the event of primary
server failure. Failover is identified as an important fault tolerance function
for systems with lower or zero downtime requirements.
Apache Solr and Elasticsearch
|
When an Apache Solr or Elasticsearch
cluster is built with shards and replicas, the cluster inherently becomes
fault-tolerant and mechanically supports failover.
During any failure, a cluster is
expected to support the operations while the failed node is put into recovery
state. Both the Apache Solr and Elasticsearch documentation strongly
recommend a distributed cluster setup to protect user experience from
application or infrastructure failure.
In the event of all nodes storing
shards and replicas failing, then the client requests will also fail. If the
shards are set to tolerant configuration, partial results can be returned
from the available shards. This behavior is anticipated in both Apache Solr
and Elasticsearch.
The representation below depicts how
failover is handled in cluster. This flow is applicable for both Solr and
Elasticsearch.
Node
|
Replica 1
|
Replica 2
|
SHARD 1 – Node Number 1
|
SHARD 1 FIRST REPLICA – Node Number 3
|
SHARD 1 SECOND REPLICA – Node Number 5
|
SHARD 2 –
Node Number 2
|
SHARD 2
FIRST REPLICA – Node Number 4
|
SHARD 2
SECOND REPLICA – Node Number 6
|
The below table
illustrates the failure scenarios in a Search cluster.
Scenario A
|
If SHARD1
fails, then one of its replica nodes, either Node number 3 or Node number 5
is chosen as leader.
|
Scenario B
|
If SHARD2 fails, then one of its replica nodes, either
Node number 4 or Node number 6 is chosen as leader.
|
Scenario C
|
If SHARD 1
REPLICA1 fails, then Shard 1 Replica 2 continues to support replication and
as well serve the requests.
|
Scenario D
|
If SHARD 2 REPLICA1 fails, then Shard 2 Replica 2
continues to support replication and as well serve the requests.
|
Elasticsearch uses
internal Zen Discovery to detect failures. If the node holding a primary
shard dies, then a replica is promoted to the role of primary. Apache Solr
uses Apache ZooKeeper for Co-ordination, failure detection, and leader
voting. ZooKeeper initiates leader election process between replicas during a
leader/primary shard failure.
|
Amazon CloudSearch
|
Amazon CloudSearch has built-in
failover support. Amazon CloudSearch recommends scaling options and
availability options to increase fault tolerance in the event of a service
disruption or node failures.
When Multi-AZ is turned on, Amazon
CloudSearch provisions the same number of instances in your search domain in
the second availability zone within that region. The instances in the primary
and secondary zones are capable of handling a full load in the event of any
failure.
In the event of a service disruption
or failure in one availability zone, the traffic requests are automatically
redirected to the secondary availability zone. In parallel, Amazon
CloudSearch self-heals the cluster in failure, and Multi-AZ restores the
nodes without any administrative intervention. During this switch, the
inflight queries might fail, and they will need to be retried from the
front–end application side.
By increasing the partitions and
replicas in the Amazon CloudSearch scaling options, failover support can be
improved. If there's a failure in one of the replicas or partitions, the
other nodes (replica or partition) will handle requests and support while it
is being recovered.
Amazon CloudSearch is very
sophisticated in terms of handling failure, as the node health is continuously
monitored. In the event of infrastructure failures, the nodes are
automatically recovered or replaced.
|
Conclusion
|
Failover can be architected by
applying techniques like replication, sharding, service discovery, and
failure-detection services. Apache Solr and Elasticsearch advocate building your
search system in ‘Cluster mode’ to address failover. They undertake that
responsibility by employing service discovery which can detect unhealthy
nodes. The service discovery maintains the cluster information and balances
the search cluster when nodes are detected for failures.
Amazon CloudSearch supports failover
for single node as well as for cluster mode. Behind the scenes, CloudSearch
continuously monitors the health of the search instances and they are
automatically managed during failures.
|
|
The ability to scale
in terms of computing power, memory, or data volume is essential in any data
and traffic bound applications. Scaling is a significant design principle
employed to improve performance, balancing and high availability.
Over time, the search
cluster is expected to be scaled horizontally (scale out) or vertically (scale
up) depending upon the needs.
Scale-up is the
process of moving from a small server to a large server. Scale-out is the
process of adding multiple servers to handle the load. The scaling strategy
should be selected based on application requirements.
Apache Solr and Elasticsearch
|
Scaling an Apache Solr or
Elasticsearch application involves manual processes. These can include a
simple server addition task or advanced tasks like cluster topology changes,
storage changes, or infrastructure upgrades.
If vertical scaling takes place, the
search cluster needs to follow processes like new setup and configuration,
downtime, node restarts, etc. If scaling is horizontal, the process may
involve re-sharding, rebalancing, or cache warming.
While a search cluster system can
benefit from powerful hardware, vertical scaling has its own limitations.
Upgrading or increasing the infrastructure specifications on the same server
can involve tasks like:
• New setup
•
Backup
•
Down
time
• Application re-testing
The scaling out process is
identified as a relatively easier task compared to scaling up.
An expert search administrator
(Apache Solr or Elasticsearch) is usually posted to keep a close watch on the
performance of the search servers. Infrastructure and search metrics play a
key role in administrator decision making.
When these metrics increase beyond
the threshold of a particular server and start affecting overall performance,
the new server(s) have to be manually spawned. Also, the scale up task can
expand to index partitioning, auto-warming, caching and
re-routing/distribution of the search queries to the new instances. It
requires a Solr expert on your team to identify and execute this activity
periodically.
|
Sharding and Replication
|
Though scaling up, scaling out, and
scaling down involve manual work, technology-driven companies automate this
process by developing custom programs. These smart programs continuously
monitor the cluster group and make decisions to do elastic scaling. This
output is quite similar to AutoScaling’s offering.
In terms of administration
functionality, both Apache Solr and Elasticsearch offer scaling techniques
called Sharding and Replication.
Sharding (which means partitioning)
is a method in which a search index is split into multiple logical units
called "shards". If the indexed documents exceed the collection’s
physical size, then sharding is recommended by administrators. When sharding
is enabled the search requests are distributed to every shard in the
collection, results are individually collected and then merged.
Another scaling technique,
replication, (See 8.1 Replication - discussed in
detail) allows adding new servers with redundant copies of your index
data to handle higher concurrent query loads by distributing the requests
around to multiple nodes.
|
Amazon CloudSearch
|
Amazon CloudSearch is a fully
managed search service; it scales up and down seamlessly as the amount of
data or query volume increases. CloudSearch can be scaled based on the data
or based on the requests traffic. When the search data volume increases, CloudSearch
can be scaled from a smaller instance type to a larger search instance type.
If the capacity of largest search instance type is also exceeded then CloudSearch
partitions the search index across multiple search instances (Sharding
technique).
When traffic and concurrency grows, Amazon
CloudSearch deploys additional (replicas) search instances to support traffic
load. This automation eases the complexity and manual labour required in the
scaling out process. Conversely, when the traffic drops, Amazon CloudSearch
scales down your search domain by removing the additional search instances in
order to minimize costs.
The Amazon CloudSearch management
console allows users to configure the desired partition count and the desired
replication count. The AWS console also allows changing of the instance type
(scaling up) anytime. This inherent behavior of elastic scaling makes one of
the most important points in favor of Amazon CloudSearch.
|
Conclusion
|
Scaling in search is implemented in
the form Sharding and Replication. All three search engines have a strong
scaling support for setting up their search tier in ‘cluster mode’.
Scaling in Apache Solr and
Elasticsearch often requires administration as there is no direct hard and
fast rule. Techniques like elastic scaling can implemented only up to a limit
and when cluster grows further, manual intervention and thought process is
required. Vertical scaling in Apache Solr and Elasticsearch is even more delicate.
It requires individual management of the nodes in the cluster and executed by
using techniques like ‘Rolling restarts’ and custom scripts.
|
Amazon Cloud Search takes
away all the operation intricacies from the administrators. The desired partition
count and desired replication count option in CloudSearch will automatically
scale up and scale down based on the volume of data and requests traffic. This
saves lot of efforts and cost on operations and management.
At times, the search
system or its software may not have support for a specific feature or built-in
integration with other systems. In such cases, most open source software allows
developers to customize and extend their desired features as plugins,
extensions or modules. Often, the developer community shares extension
libraries which are helpful for a practical cause. These libraries can be
customized and integrated with the system.
Apache Solr and Elasticsearch
|
Apache
Solr and Elasticsearch both belong to the same source breed, allowing
customizations on:
• Analyzers
•
Filters
•
Tokenizers
•
Language
analysis
•
Field
types
•
Validators
•
Fall
back query analysis
• Alternate query custom handlers
Since both products are
open source, the developers can customize or extend the libraries to fit the
required feature modifications through plugins and libraries. The build and
deployment becomes a developer’s responsibility after the extending the code
base.
Apache Solr and
Elasticsearch have many plugin extensions that will allow developers to add
custom functionality for a variety of purposes. These plugins are configured
as special libraries and refer to the application using configuration
mapping.
|
Amazon CloudSearch
|
Amazon CloudSearch does not allow
for any customizations. The search features in Amazon CloudSearch are offered
by AWS after much careful thought and collective feedback from the customers.
The Amazon CloudSearch team continually evaluates new features and rolls them
out proactively.
|
Conclusion
|
Amazon CloudSearch has a highly
capable feature set to develop search systems. However, if you anticipate
strong customization on your search functionalities, Apache Solr or
Elasticsearch are better choices as their search core libraries are open sourced.
It is also important to note that any customization in the core libraries
leaves the build and deployment process responsibility to the developer. The
customization also needs to be maintained for every version upgrade or newer
release of your search engine.
|
Client libraries are
required for communicating with search engines. They are essential for
developers as they provide essential information to the connecting search
engine and allow applications to easily interact with high-level libraries.
Apache Solr
|
Apache Solr has an open source
API client to interact with Solr using simple high-level methods. The client
libraries are available for PHP, Ruby, Rails, AJAX, Perl, Scala, Python,
.NET, and JavaScript.
|
Elasticsearch
|
Elasticsearch provides official
clients for Groovy, JavaScript, .NET, PHP, Perl, Python, and Ruby. There are other community-provided client
libraries that can be integrated with Elasticsearch.
|
Open source
|
Other than official and open source
client APIs, Elasticsearch and Apache Solr can be integrated using the
RESTful API. The REST client can use a typical web client developed in the
favored programming language or even called from a normal command line.
|
Amazon CloudSearch
|
Amazon CloudSearch exposes a RESTful
API for configuration, document service and search.
•
The
configuration API can be used for CloudSearch domain creation, its
configuration and end to end management.
•
The
document service API enables the user to add, replace, or delete documents in
your Amazon CloudSearch.
•
The
search API is used for search or suggestion requests to your Amazon
CloudSearch domain.
Alternatively, AWS also shares a
downloadable SDK package, which simplifies coding. The SDK is available for
popular languages like Java, .NET, PHP, Python, and more. The SDK APIs are
built for most Amazon Web services, including Amazon S3, Amazon EC2,
CloudSearch, DynamoDB, and more. The SDK package includes the AWS library,
code samples, and documentation.
|
From an overall
perspective, Cost is a very important factor and companies always endeavor ways
to reduce Total cost of Ownership (TCO). In this section, we will see the Cost components
in these three search engines.
Apache Solr and Elasticsearch
|
The cost factor in Apache Solr and
Elasticsearch includes infrastructure resources cost, managed services cost
and people resources cost. For any type of deployment, the servers cost and
engineers cost are essential. The commitment to continuous admin operations
depends on application requirements and its criticality.
|
Amazon CloudSearch
|
Amazon CloudSearch cost component
includes server costs and engineers cost and they are essential for any
search deployment like the above two. Amazon CloudSearch being a
fully-managed service covers the managed services as part of the server
costs. Also, Amazon CloudSearch does not charge during the beginning of
service usage but charges at the end of the month based on CloudSearch usage.
|
Conclusion
|
The net operating costs are
essentially the same across all three search engines, but people costs will
be 30% more for self-managed Apache Solr or Elasticsearch compared to Amazon
CloudSearch.
For Example, A highly important and
critical search application will require 24 * 7 support and managed services.
This cost incurred as part of Managed services which is an additional one in
Apache Solr and Elasticsearch deployments.
|
Search is an indispensable feature in most business
applications.
Apache Solr and Elasticsearch are time proven solutions.
Many larger organizations have used Apache Solr and Elasticsearch for years,
but are now looking for greater operational efficiency and cost effectiveness.
On the other hand, companies looking for innovative ways grow their businesses
and provide value. In the recent years, a huge number technology companies have
started to employ the benefits of using cloud-based search services, mainly in
terms of getting started and then accommodating growth without the need to
switch vendors to do so. When
scalability, cost, and speed-to-market are primary concerns, we recommend using
some form of cloud service. And if you want to enjoy the benefits of a cloud
solution built on the architecture of Apache Solr, we recommend Amazon
CloudSearch.
|
|
No comments:
Post a Comment