RavenDB to Elasticsearch replication for real-time reporting and data-analysis
RavenDB is a document-database written in C#. It's very easy to work with as a document database, and it serves as a great replacement for SQL databases and ORM's for many types of applications. But reporting and ad-hoc querying is not one of its strong suits, and this is why the official recommendation is to use tools like SQL Server to create reports on data coming from RavenDB.
Elasticsearch is a search server built around Lucene, and is getting a lot of traction lately thanks to its amazing capabilities in the field of real-time analytics. With its built-in features for performing real-time aggregations on huge amounts of data, it is an awesome tool for live reporting on data. It is also open-source and completely free.
While replicating data from RavenDB to SQL Server or the like does make sense, every report can take a while to generate. Replicating to Elasticsearch provides real-time view of the data, and fast reporting capabilities on it. Elasticsearch also provides better tools for analysing the data, but that's for a different post.
Now that we've established it makes a lot of sense to use Elasticsearch as a reporting tool, how do we get data to it from a RavenDB database?
The RavenDB.ElasticsearchReplication Bundle
Meet the RavenDB.ElasticsearchReplication
Bundle, available here.
A Bundle is a RavenDB plugin. What it does, in short, is integrate with RavenDB at a low level so it can get notified on every change made to every database hosted on the RavenDB server.
For each such notification, one or more replication scripts are executed. Each script tells RavenDB which documents it expects to work on, and then it maps the data from those documents to an Elasticsearch document. This mapping process is very similar to the way the RavenDB indexing process works, only this one is using a Javascript-based map function.
After the mapping is done, the bundle then posts those documents to an Elasticsearch cluster. The bundle supports specifying multiple known nodes from the cluster, and by that provides full high-availability support.
Once the data is there, we can start generating reports from it, or use Kibana to create nice looking dashboards.
Full source code, demo and usage instructions are available here.
Kibana Dashboards
Kibana is a nice UI for building dashboards on top of Elasticsearch. This comes in handy when you want to provide reports on data. Kibana lets you create interactive nice-looking reports on huge amounts of data, in real-time.
Running the Demo app provided with the bundle will generate a Kibana dashboard similar to this:
This dashboard shows a histogram of shopping carts and orders made to an imaginary system, and can give you a good idea of how things are working in real time. If something fails, you will see a drop in the histogram visually. And by zooming out you can easily see trends even with a naked eye.
There is also the running total for the time period selected, and obviously you can change it by clicking on the Time Picker (upper menu, right) or selecting a range from the histogram. It is also easy to find the Top Customers and Top Products.
At the bottom of the dashboard you can see the actual documents that matched in Elasticsearch to generate this dashboard.
There are more ways to see trends (like the term stats API, or the significant terms facet), some may be a bit harder to do with the current version of Kibana, but once the data is in Elasticsearch there's literally no limit to what you can do with it.
While this looks quite impressive, this is probably the easiest dashboard to generate with Kibana. Depending on your data model and patience you can come up with much better dashboards that better suit your business requirements.
Kibana dashboards can give awesome insight into your data, in real-time, and it doesn't really matter how much data you generate.
Setting up
RavenDB.ElasticSearchReplication
is a RavenDB Bundle, read: plugin, that once installed on the server enables replicating data from RavenDB to an Elasticsearch cluster (or multiple clusters).
Current version requires the latest version of RavenDB 2.5. Install the plugin by adding it to the Plugins
folder of the server, and then setup the replication configuration documents.
Further instructions and a working demo (including a self-hosted Kibana instance) can be found on github.