Why you shouldn't use AWS Elasticsearch Service
Elasticsearch is very widely used today for text and geospatial search, real-time BI dashboards and log analysis. While it is tempting to use a managed Elasticsearch cloud service instead of running your own cluster on your own machines, Amazon's Elasticsearch Service is a bad choice, as bad as it gets in fact, and here is why.
This summary post is based on experience with quite a few customers over the course of the past months, all companies I've worked with felt the same pains and moved away from it shortly after hitting some or all of the below points. I listed alternatives in the end of this post.
Not adhering to Elasticsearch best practices
AWS opted for creating a hosted Elasticsearch offering but it seems like they are lacking important know-how and real-world experience, as many of their decisions around the service just don't make sense.
Invalid number of master nodes. AWS ES allows you to request dedicated master nodes for your cluster. For any cluster of a non-trivial size that is important to have. However, in the drop-down for picking the number of master nodes to provision you will also find the completely invalid option of 2 master nodes. This setting is known to cause split-brain situations and I can't even guess why they allow it.
Multi AZ support exists but is not enabled by default - probably to save on Regional Data Transfer costs but I'd recommend any real-world cluster to use this affinity feature.
No in-place upgrades. This is the biggest show-stopper. Once you have provisioned Elasticsearch on the AWS service you get a cluster running a specific Elasticsearch version with no ability whatsoever to upgrade it to a newer version. AWS ES doesn't support in-place / rolling version upgrades - which is the easiest and also recommended way to upgrade. If you will find yourself running a version that has a bug or simply need a feature from a newer version - you have to go through a lengthy process of launching a new cluster, whitelisting it for reindexing, executing the reindexing and then updating all your system permissions to point to the new clusters. Not fun.
Backup can only be executed once a day. Backups in Elasticsearch are very cheap to execute and I usually recommend a backup twice an hour or more for critical systems. The best you can achieve with AWS ES is backup in a given hour once a day, and that is a terrible default for a production system.
Security. It is way too easy to leave your Elasticsearch cluster open to the public as many security features aren't enforced. Also and most notably X-Pack is not supported, and as such the flexibility for doing security right is non-existent. Most systems use Elasticsearch for sensitive data and as such this is usually a show-stopper - but something you find out about too late in the process.
Slow in releasing ES versions. Elasticsearch are moving fast, and there is a new release every so often with important fixes and additional useful features. Most times it takes AWS weeks and sometimes even months to make a newer release available in the AWS ES service (but to be fair that is also the case with many other hosted ES solutions).
Various limitations and blocking behavior. AWS have opted to intervene with the cluster operation in some cases, limiting your control over the cluster and disabling some APIs and behaviors. For example the close/open indices API - which is required for using the Shrink API for example or updating certain index settings - is disabled and trying that will result in an error message saying "Your request is not allowed by Amazon Elasticsearch Service". Similarly there is the
ClusterBlockExceptionwhich will block all cluster write operations when some CloudWatch alerts are triggered - see the docs here. The main problem with such an exception is it takes quite a while to reset while fixed due to the statistical nature of CloudWatch - meaning the index will be blocked for a lot longer than necessary. A much better behavior would have been if AWS were just allowing ES to do it's thing.
No extensibility, no visibility
No configurations allowed. There is very (very!) limited support for configuration changes. Out of dozens of important configurations only a small subset of about 5 configuration options can be changed. As such so many performance optimizations (e.g. query cache sizes, thread pool sizes) and even standard functionality is just not possible to use (e.g. reindex.remote.whitelist is not supported and as such reindexing from a remote cluster is just not possible). This renders many real-world production uses just impossible to do with AWS ES, and my second biggest show-stopper to using AWS ES.
No logs. There is absolutely no visibility for logs while sometimes the Elasticsearch logs are real time savers. The complaints, warnings, GC slow logs and even the info bits - are just too precious for any production system to ignore.
No plugins. No support for installing plugins robs you from the ability to use Elasticsearch to its full extent. It's not only about X-Pack (which adds Security and proper Monitoring among other things - which AWS ES completely lacks), but also about the inability to install analyzer plugins, ingest plugins and more -- many of which are a crucial part of a fully functioning ES system in the real-world.
Zero visibility. AWS ES offers close to zero monitoring and cluster metrics visibility. CloudWatch which is enabled by default to monitor the underlying VMs is far from being enough - it only shows some general machine metrics joined by a few high level cluster metrics. In a real ES installation you want to have much deeper knowledge of your cluster for optimizations in normal operations and to be able to debug efficiently when the cluster starts gagging. That includes a better view of machine metrics, a deeper look at many cluster metrics (caches, thread pools, rates, etc). To start with, every ES cluster should have Kibana Monitoring and a Grafana dashboard installed, as well as a Cerebro instance for seeing current cluster state clearly.
Limits on instance types. Some instance types are not supported - limiting your ability to define your cluster size as needed.
There are two main alternatives to running Elasticsearch on AWS:
Elastic Cloud - a hosted solution on AWS managed and maintained by Elastic, the company behind Elasticsearch. As such, it always supports also the most recent versions immediately following it's release, and provides almost all the visibility needed, and most of the extensibility you will ever need - including log visibility and uploading your custom plugins. Upgrading to newer versions of ES without any downtime, and cluster resizing on the fly is also supported out of the box.
Running your own Elasticsearch cluster on AWS EC2 will be much cheaper (about 2 times as much) and will give you full control, visibility and accessibility if you set it up right. Deployment is really easy if done using Terraform (see my Elasticsearch Cloud Deploy repository), and then sizing and monitoring is as easy as with using a cloud offering. Nowadays a properly-sized cluster requires almost no attention as long as growth is linear.
Admittedly, the biggest benefit of running AWS ES is the pluggability and seamless integration to other AWS services (like Kinesis Firehose and many others). As far as that goes, obviously some code will need to be written if you will be running your own cluster or use Elastic's managed solution but for some use-cases you may be able to find good tools which already do the work. For instance, here is a small tool I wrote to use AWS Lambda to forward events from Kinesis Streams to a non-AWS Elasticsearch cluster: https://github.com/synhershko/lambda-streams-to-elasticsearch.
I can hardly see reasoning to use the AWS Elasticsearch service other than initial experimenting. Any real-world use-case will find it greatly lacking, while easier and significantly alternatives (even cheaper ones) definitely exist. Feel free to use any of the tools and pointers linked here, and if you require any further assistance we offer consultancy and development services as an official Elastic partner.