Realtime Search: Solr vs Elasticsearch
What is Elasticsearch?
Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem). It’s simple, scalable and “cool, bonsai cool“.
Why is it better than Solr?
First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).
Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.
Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!
It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.
After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!
And now for something a bit more interesting…
The typical realtime search architecture goes something like this:
- index user content into the search engine
- perform set of queries against search engine to determine if content matches particular criteria
- perform specific logic notifying registered channels that new content is available
Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.
Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.
The new workflow looks like this:
- register specific query (percolation) in Elasticsearch
- index new content (passing a flag to trigger percolation)
- the response to the indexing operation will contain the matched percolations
This is the perfect architecture for realtime search and a true gamechanger.
The Bottom Line
Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.