Wednesday, March 6, 2013

Apache SolrCloud on Amazon EC2 FAQ

Apache SolrCloud on Amazon EC2 FAQ:

  1. How are the Solr replicas assigned to shards?
    1. When a new Solr server is added, it is assigned to the shard with the fewest replicas, tie-breaking on the lowest shard number.
  2. Why should I specify all the ZooKeepers for the parameter –DzkHost?
    1. TCP connections are established between the Solr instance and the ZooKeeper servers. When a ZooKeeper server goes down, the corresponding TCP connection is lost. However, other existing TCP connections are still functional and hence this ensures fault tolerance of the SolrCloud cluster even when one or more ZooKeeper servers are down.
  3. What is a Solr transaction log?
    1. It is an append-only log of write operations maintained by each node. It records all write operations performed on an index between two commits. Anytime the indexing process is interrupted, any uncommitted updates can be replayed from the transaction log.
  4. When does the old-style replication kick in?
    1. When a Solr machine is added to the cluster as a replica, it needs to get itself synchronized with the concerned shard. If more than 100 updates are present, then an old-style master-slave replication kicks off. Otherwise, transaction log is replayed to synchronize the replica.
  5. How is load balancing performed on the Solr client side?
    1. Solr client uses LBHttpSolrServer. It is a simple round-robin implementation. Please note that this should NOT be used for indexing.
  6. What will happen if the entire ZooKeeper ensemble goes down or quorum is not maintained?
    1. ZooKeeper periodically sends the current cluster configuration information to all the SolrCloud instances. When a search request needs to be performed, the Solr instance reads the current cluster information from its local cache and executes the query. Hence, search requests need not have the ZooKeeper ensemble running. Please bear in mind that any new instances that are added to the cluster will not be visible to the other instances.
    2. However, a write index request is a bit more complicated. An index write operation results in a new Lucene segment getting added or existing Lucene segments getting merged. This information has to be sent to ZooKeeper. Each Solr server must report to ZooKeeper which cores it has installed. Each host file is of the form host_version. It is the responsibility of each Solr host/server to match the state of the cores_Nfile. Meaning, each Solr server must install the cores defined for it and after successful install, write the hosts file out to ZooKeeper. Hence, an index write operation always needs ZooKeeper ensemble to be running.
  7. Can the ZooKeeper cluster be dynamically changed?
    1. ZooKeeper cluster is not easily changed dynamically but is part of their roadmap. A workaround is to do a Rolling Restart.
  8. Is there a way to find out which Solr instance is a leader and which one is a replica?
    1. Solr 4.0 Admin console shows these roles in a nice graphical manner.
  9. How much time does it take to add a new replica to a shard?
    1. For a shard leader index size of 500MB, an m1.medium EC2 cold-start replica instance takes about 30 seconds to synchronize its index with the leader and become part of the cluster.
  10. Is it possible to ensure that the SolrCloud cluster is HA even when an entire AZ goes out of action?
    1. When an entire AZ is down and the cluster is expected to be running successfully, the simplest and recommended approach is to have all leaders in one AZ and replicas in other AZs with the ZooKeeper ensemble spanning across AZs.

Original article was authored by vijay . He can be reached @

No comments:

Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.