Realtime Search: Solr vs Elasticsearch

|   May 31, 2011

What is Elasticsearch?

Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem). It’s simple, scalable and “cool, bonsai cool“.

Why is it better than Solr?

First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).

Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.

Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!

It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.

Realworld Results…

After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!

And now for something a bit more interesting…

The typical realtime search architecture goes something like this:

  1. index user content into the search engine
  2. perform set of queries against search engine to determine if content matches particular criteria
  3. perform specific logic notifying registered channels that new content is available

Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.

Introducing: Percolation!

Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.

The new workflow looks like this:

  1. register specific query (percolation) in Elasticsearch
  2. index new content (passing a flag to trigger percolation)
  3. the response to the indexing operation will contain the matched percolations

This is the perfect architecture for realtime search and a true gamechanger.

The Bottom Line

Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.

Comments

  • Cool article. Now, i know why I love ES ! ;)

    Commented on May 31, 2011 at 11:26 am
  • Was the ‘Search Fresh Index while Idle’ performed against an elasticsearch 5 shard index (the default setup for a newly created index) or a single shard index?

    Commented on May 31, 2011 at 11:26 am
    • @jrawlings these benchmarks are for the “out of the box” vanilla install of Elasticsearch and Solr so yes, this is using the 5 shard index setting.

      Commented on May 31, 2011 at 11:29 am
  • Elasticsearch is a peach, when it doesn’t break. I’ve had so many nightmares trying to recover from a broken elasticsearch cluster that I wouldn’t recommend it to anyone.

    I guess for small sites it’s ok. For serious business, I’ll stick with solr.

    It would be nice to see a comparison with riaksearch as well.

    Commented on May 31, 2011 at 11:37 am
  • That percolation business is awesome. Webhooks make updating realtime data sources easy, and it’s brilliant that Elasticsearch takes that approach. Thanks for sharing.

    Commented on May 31, 2011 at 11:38 am
  • Good blog post. What were some of the parameters around index sizes (per shard) and commit rates? We have some massive warming times on our solr indexes that requires us to batch our adds before a commit, certainly not a position to be in with real time search though. I can see how without tuning and default cache warming you might run into bunches of overlapping warming searchers.

    Commented on May 31, 2011 at 11:38 am
  • And why not using master-slave configuration in SOLR? Isn`t that perfect solution for sepearating add doc/query operations?

    Commented on June 1, 2011 at 11:39 am
    • @MarcMarc master-slave really isn’t an option for realtime search applications. The current Solr replication solution is not synchronous so once your update operation is complete on the master, the data is not yet available on all slaves for subsequent searches.

      Introducing master-slave for the search index also introduces a lot of operational complexity that if you can avoid, you really should.

      Commented on June 1, 2011 at 11:49 am
  • Ryan, what was the commit strategy you used with Solr? Commit after each request, autocommit after X secs, autocommit after X docs? This can greatly impact update performance. See http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs, http://blog.raspberry.nl/2011/04/08/solr-update-performance/ and http://www.elevatedcode.com/articles/2009/01/14/speeding-up-solr-indexing/

    Commented on June 1, 2011 at 11:39 am
    • @vlad we require all content to be immediately available for searches after indexing, so we commit after each update operation. this the nature of the beast when building a true realtime search application and as you point out is not the “preferred” way to integrate with Solr.

      Commented on June 1, 2011 at 11:50 am
  • Nice post. You’ll need to compare ES and Solr once Solr starts making use of the underlying Lucene NRT mechanism.

    Just to make it clear to readers not familiar with the underlying details:
    It is Lucene that adds the NRT support. ES uses it, while Solr does not use it yet, which is different from Solr using the same Lucene API as ES and doing it/still performing poorly.

    Commented on June 1, 2011 at 11:40 am
  • Being a Xapian fan as of many years I’d love to see Xapian benchmarked against ES.

    Commented on June 1, 2011 at 11:41 am
  • What’s the difference between “search fresh index” and “search full index”?

    Were you running Solr and ElasticSearch on the same hardware?

    Commented on June 1, 2011 at 11:41 am
    • @andy the fresh index benchmarks are done against an empty/clean index. the “full index” benchmarks were done after populating the index with a few million documents. The index is never technically “full”, but it was just a quick way of getting more realistic and real world benchmarks.

      Commented on June 1, 2011 at 11:53 am
  • Interesting that umad says he had so many issues with broken clusters, that he stopped recommending ES for production usage. We’ve been running in production for 6 months with significant traffic volume on behalf of demanding clients.

    There have been some nice robustness improvements in ES 0.16

    We evaluated Solr vs ES and for our data with a wide range of queries, ES was significantly faster than Solr. Tuning Solr is challenging.

    David

    Commented on June 7, 2011 at 11:42 am
  • Solr doesn’t support GeoPolygons either, so if you need spatial searches look to ElasticSearch.

    Commented on August 24, 2011 at 11:42 am
  • Field collapsing (grouping, or whatever you call it) is still awaited in ES, but exists in Solr.

    This is in some particular use cases a must have feature (think about SKUs in an index and search results must be products (and not SKU)

    Commented on September 16, 2011 at 11:42 am
  • Ryan –
    Nice blog, I was looking at comparing Solr to other products like FAST and this gives me a good example of pros and cons. Most likely Solr does not support a good way to do Federated Searches and there is a limitation for the real-time searching.

    Commented on November 7, 2011 at 11:43 am
  • Nice blog! If anyone wants to read about NRT in Solr, which is on it’s way, you can find more info about it here:
    http://wiki.apache.org/solr/NearRealtimeSearch.
    Seems it would be very interesting to see how Solr performs compared to ES when NRT is added.

    Commented on November 15, 2011 at 11:43 am
  • This article is a nice start, but I agree with other commenters that Solr really deserves to be benchmarked with its NRT commits. Stock commits in 1.4 and 3 just aren’t designed for real-time updates.

    At Websolr, we do actually have support for NRT commits with Solr 3 in our hosted Solr servie at http://websolr.com/ — you should drop us a line at info@onemorecloud.com if you would like to try it out.

    Commented on November 22, 2011 at 11:44 am
  • “NRT commit” solves it for the single node case, but in the distributed case, you really need a distributed model that supports it. Solr, currently, does not, while elasticsearch does. Check this video for more info: http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html

    Commented on November 29, 2011 at 11:45 am
  • Hi Ryan,

    Could you please elaborate why percolation is the perfect architecture for realtime search?

    Thanks

    Commented on December 23, 2011 at 11:45 am
  • Awesome article, thanks.

    Commented on March 19, 2012 at 11:46 am
  • I’m trying to decide on either elastic or solr now, this article raised some interesting points for me to consider, thanks!

    Commented on April 16, 2012 at 11:47 am
  • I am looking for a solution to search into a large documents database (10 milions docs). Read this article was excelent. Congratulations to All!

    Commented on May 22, 2012 at 11:47 am
  • I have suffered a lot trying to configure SOLR, hope ES alleviates my pain!

    Commented on November 20, 2012 at 4:53 am
  • nice article. interested to see how SOLR stacks with NRT. But still nested document feature and native JSON format is an attractive point. percolation has very interesting usages, especially in an application where I am rep, looking for a particular kind of support ticket, I just get notified about my related change with out polling.

    Commented on December 18, 2012 at 5:29 am
  • Ryan , you can try solr softcommit,it’s better.eg:
    softcommit per second / per doc
    hardcommit per min
    it’s wok very well.

    Commented on January 19, 2013 at 2:22 am
  • With the recent release of Solr 4.1 this seems really out of date.
    http://www.gossamer-threads.com/lists/lucene/general/181326

    Can you re-run these test with the new version. I’d be very interested to see how the new NRT, etc. of SolrCloud stacks up.
    http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

    Thanks

    Commented on February 15, 2013 at 11:26 am
  • This is seriously out of date now – Can you update based on Solr 4.2?

    Commented on March 14, 2013 at 1:47 pm
  • Hi Ryan,
    Thanks for your nice post. Though in my experiment, ElasticSearch took more than 20 hours to index 360k simple objects with 3 fields “id”, “title”, “tags”. Solr 4.3 only took 14 minutes.

    You mentioned you have indexed 1 million record in your experiment?

    Thanks and best regards,
    Malix

    Commented on May 13, 2013 at 10:17 am
  • Hi people – Just curios to know if anyone has evaluated the latest release of ES v.s latest release of Solr! I am trying to understand how much of the above comparisons still hold good.

    Amit

    Commented on May 24, 2013 at 8:57 pm
  • Somebody please update this study with the latest binaries from both sides, I’m sure this question arises frequently enough.

    Commented on September 30, 2013 at 2:41 am
  • How to do a benchmark for the Elasticsearch?

    Commented on April 9, 2014 at 11:30 am

Leave a comment

Your email address will not be published. Required fields are marked *

Connect with Facebook

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Sign up to receive email communications regarding events, webinars, and product news.

Author Spotlight

Ryan Sonnek
Ryan Sonnek Senior Software Engineer View full bio

What is Socialcast?

Socialcast by VMware (NYSE: VMW) is a social network for business uniting people, information, and applications with its real-time enterprise activity stream engine. Behind the firewall or in the cloud, Socialcast enables instant collaboration in a secure environment. Socialcast is headquartered in San Francisco, California. www.socialcast.com