Why you shouldn't use AWS Elasticsearch Service

English posts, Elasticsearch, Kibana, Cloud, AWS, Terraform

Comments

7 min read

Elasticsearch is very widely used today for text and geospatial search, real-time BI dashboards and log analysis. While it is tempting to use a managed Elasticsearch cloud service instead of running your own cluster on your own machines, Amazon's Elasticsearch Service is a bad choice, as bad as it gets in fact, and here is why.

This summary post is based on experience with quite a few customers over the course of the past months, all companies I've worked with felt the same pains and moved away from it shortly after hitting some or all of the below points. I listed alternatives in the end of this post.

Not adhering to Elasticsearch best practices

AWS opted for creating a hosted Elasticsearch offering but it seems like they are lacking important know-how and real-world experience, as many of their decisions around the service just don't make sense.

  1. Invalid number of master nodes. AWS ES allows you to request dedicated master nodes for your cluster. For any cluster of a non-trivial size that is important to have. However, in the drop-down for picking the number of master nodes to provision you will also find the completely invalid option of 2 master nodes. This setting is known to cause split-brain situations and I can't even guess why they allow it.

  2. Multi AZ support exists but is not enabled by default - probably to save on Regional Data Transfer costs but I'd recommend any real-world cluster to use this affinity feature.

  3. No in-place upgrades. This is the biggest show-stopper. Once you have provisioned Elasticsearch on the AWS service you get a cluster running a specific Elasticsearch version with no ability whatsoever to upgrade it to a newer version. AWS ES doesn't support in-place / rolling version upgrades - which is the easiest and also recommended way to upgrade. If you will find yourself running a version that has a bug or simply need a feature from a newer version - you have to go through a lengthy process of launching a new cluster, whitelisting it for reindexing, executing the reindexing and then updating all your system permissions to point to the new clusters. Not fun.

  4. Backup can only be executed once a day. Backups in Elasticsearch are very cheap to execute and I usually recommend a backup twice an hour or more for critical systems. The best you can achieve with AWS ES is backup in a given hour once a day, and that is a terrible default for a production system.

  5. Security. It is way too easy to leave your Elasticsearch cluster open to the public as many security features aren't enforced. Also and most notably X-Pack is not supported, and as such the flexibility for doing security right is non-existent. Most systems use Elasticsearch for sensitive data and as such this is usually a show-stopper - but something you find out about too late in the process.

  6. Slow in releasing ES versions. Elasticsearch are moving fast, and there is a new release every so often with important fixes and additional useful features. Most times it takes AWS weeks and sometimes even months to make a newer release available in the AWS ES service (but to be fair that is also the case with many other hosted ES solutions).

  7. Various limitations and blocking behavior. AWS have opted to intervene with the cluster operation in some cases, limiting your control over the cluster and disabling some APIs and behaviors. For example the close/open indices API - which is required for using the Shrink API for example or updating certain index settings - is disabled and trying that will result in an error message saying "Your request is not allowed by Amazon Elasticsearch Service". Similarly there is the ClusterBlockException which will block all cluster write operations when some CloudWatch alerts are triggered - see the docs here. The main problem with such an exception is it takes quite a while to reset while fixed due to the statistical nature of CloudWatch - meaning the index will be blocked for a lot longer than necessary. A much better behavior would have been if AWS were just allowing ES to do it's thing.

No extensibility, no visibility

  1. No configurations allowed. There is very (very!) limited support for configuration changes. Out of dozens of important configurations only a small subset of about 5 configuration options can be changed. As such so many performance optimizations (e.g. query cache sizes, thread pool sizes) and even standard functionality is just not possible to use (e.g. reindex.remote.whitelist is not supported and as such reindexing from a remote cluster is just not possible). This renders many real-world production uses just impossible to do with AWS ES, and my second biggest show-stopper to using AWS ES.

  2. No logs. There is absolutely no visibility for logs while sometimes the Elasticsearch logs are real time savers. The complaints, warnings, GC slow logs and even the info bits - are just too precious for any production system to ignore.

  3. No plugins. No support for installing plugins robs you from the ability to use Elasticsearch to its full extent. It's not only about X-Pack (which adds Security and proper Monitoring among other things - which AWS ES completely lacks), but also about the inability to install analyzer plugins, ingest plugins and more -- many of which are a crucial part of a fully functioning ES system in the real-world.

  4. Zero visibility. AWS ES offers close to zero monitoring and cluster metrics visibility. CloudWatch which is enabled by default to monitor the underlying VMs is far from being enough - it only shows some general machine metrics joined by a few high level cluster metrics. In a real ES installation you want to have much deeper knowledge of your cluster for optimizations in normal operations and to be able to debug efficiently when the cluster starts gagging. That includes a better view of machine metrics, a deeper look at many cluster metrics (caches, thread pools, rates, etc). To start with, every ES cluster should have Kibana Monitoring and a Grafana dashboard installed, as well as a Cerebro instance for seeing current cluster state clearly.

  5. Limits on instance types. Some instance types are not supported - limiting your ability to define your cluster size as needed.

Alternatives

There are two main alternatives to running Elasticsearch on AWS:

  1. Elastic Cloud - a hosted solution on AWS managed and maintained by Elastic, the company behind Elasticsearch. As such, it always supports also the most recent versions immediately following it's release, and provides almost all the visibility needed, and most of the extensibility you will ever need - including log visibility and uploading your custom plugins. Upgrading to newer versions of ES without any downtime, and cluster resizing on the fly is also supported out of the box.

  2. Running your own Elasticsearch cluster on AWS EC2 will be much cheaper (about 2 times as much) and will give you full control, visibility and accessibility if you set it up right. Deployment is really easy if done using Terraform (see my Elasticsearch Cloud Deploy repository), and then sizing and monitoring is as easy as with using a cloud offering. Nowadays a properly-sized cluster requires almost no attention as long as growth is linear.

Admittedly, the biggest benefit of running AWS ES is the pluggability and seamless integration to other AWS services (like Kinesis Firehose and many others). As far as that goes, obviously some code will need to be written if you will be running your own cluster or use Elastic's managed solution but for some use-cases you may be able to find good tools which already do the work. For instance, here is a small tool I wrote to use AWS Lambda to forward events from Kinesis Streams to a non-AWS Elasticsearch cluster: https://github.com/synhershko/lambda-streams-to-elasticsearch.

Summary

I can hardly see reasoning to use the AWS Elasticsearch service other than initial experimenting. Any real-world use-case will find it greatly lacking, while easier and significantly alternatives (even cheaper ones) definitely exist. Feel free to use any of the tools and pointers linked here, and if you require any further assistance we offer consultancy and development services as an official Elastic partner.


Comments

  • Assaf Lavie

    Great write up. Thank you! I’ve always hated AWS’s hosted ES, but was actually tempted by their recent announcements to give it another look. You reminded me just how behind it really is. Thanks again.

  • Jon Webster

    Nice post Itamar, and you might want to add some of the very compelling points made in this post: https://read.acloud.guru/things-you-should-know-before-using-awss-elasticsearch-service-7cd70c9afb4f

  • ME

    I'll just use Splunk, thanks.

  • Muhammad Rehan Saeed

    It's not just ElasticSearch. AWS is behind in other SaaS offerings too. Thier AWS ECS Docker story still uses their own custom orchestration instead of industry standard options like Swarm or Kubernetes. Also, their RDS offering still doesn't let you upgrade to SQL Server 2017 and has some limitations when it comes to server configuration.

  • Arthur

    Great post. Thanks for spreading the truth.

  • Filip Tepper

    You can (should) run as many backups as you wish. If you use Amazon’s ES you should run them yourself, so that you can access them without reaching out for technical assistance.

    Only Amazon’s automated backups are limited to one a day.

  • Zee Baig

    Well, these are the reasons one should use AWS ES (instead of not using). Managing ES yourself could become a pain for growing organizations. You won't be upgrading all the time in production there should be proper deployment lifecycle for it.

    • Itamar Syn-Hershko

      Like you haven't read a single word in this post, specifically the "Alternatives" section. Nobody is saying you have to manage ES yourself.

    • JM

      You won't be upgrading all the time in production there should be proper deployment lifecycle for it.

      Welcome to 2018 where we deploy our solutions (...When there are changes...) and upgrade dependencies (...When they are needed...) to production all the time (can even happen multiple times a day)... Without eve remotely affecting operations...

Comments are now closed