Elasticsearch One Tip a Day: Managing Index Mappings Like a Pro

No data store with secondary indexes is truly schema-less, and Elasticsearch is no different. While it doesn't require you to define your data structure up-front, it does derive a schema from your documents structure and makes decisions on how to index them based on that. In that respect, Elasticsearch uses an implicit schema, rather than an explicit schema, to index your data.

Those indexing decisions are quite important, and have big impact on how you can search your data. If its a string field, should it be tokenized and normalized? if so how? if a numeric field, what precision is required? there are many more field types like date-time fields, geo-spatial shapes and parent/child relationships that require special care.

In Elasticsearch, this is called a Mapping. A Mapping is simply a definition of document fields and their types, as well as other fine tuning and configurations that are available, the full list of which can be found in the Core Types page in Elasticsearch's documentation.

A Mapping is created automatically by Elasticsearch whenever you add your first document to an index, and is derived from the structure of the document and has the defaults applied to it. So for example a string field will automatically be prepared to enable full-text search (tokenized and normalized). You can have several Mappings per index, to control different schemas for different document types within the same index, via Elasticsearch's notion of types.

Creating Mappings Manually

Very soon after starting to use Elasticsearch in real-world projects you come to realize the Mapping defaults are far from perfect. The most common requirement is to make certain string fields not_analyzed; or to use a different non-default analyzer. Basically, to control how they are being searched for. For that, you need to override the defaults, or provide your own Mapping before the defaults are applied.

You can define a mapping up-front when creating an index, by simply creating the index explicitly and passing a Mapping JSON while at it:

PUT /twitter
{
    "tweet" : {
        "properties" : {
            "message" : {"type" : "string" },
            "user" : {"type" : "string", "index": "not_analyzed" }
        }
    }
}

Not all fields have to be defined up-front in the mapping (just like defining the mapping up-front is optional). If new fields are introduced as you are indexing documents, Elasticsearch's default will be applied to it. Read more details in the Put Mapping API documentation.

However, this is still bothersome to do. Creating an index explicitly still requires more code and more maintenance, and when an index gets deleted the mapping gets deleted with it. Quite the party pooper especially for development environments!

Index Templates

Enter index templates. You can pre-register Mappings using index templates, and those will be applied automatically to any index created whose name matches the index name as defined in the template:

PUT http://localhost:9200/_template/my_template
{
    "template" : "tw*",
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "tweet" : {
        	"properties" : {
            	"message" : {"type" : "string" },
            	"user" : {"type" : "string", "index": "not_analyzed" }
        	}
    	}
    }
}

The above template defined the same mapping we have seen before, but this time we don't have to do anything for it to be applied to our twitter index. Once the index will be created, the template will be picked up and the mappings in it will be applied to the new index - because its name matched with the wildcard we defined (tw*). Now you don't have to create an index explicitly - it will be created automatically if it doesn't already exist once the first document is added to it.

Also note how we can apply custom index settings using the template, so we can automatically set the number of shards, replicas, or define custom analyzers.

An index when created can have multiple templates matching, and the order in which they are applied can be defined by the order property. This allows for creating complex mapping, all managed as documents in your Elasticsearch instance, and applied dynamically as your business logic requires.

At this point it is worth noting changing a Mapping on an existing index is only partially supported - some configurations cannot be changed post-creation. Similarly, a template is only applied when an index is created. Changing the template after that will not affect already created indexes.

Read more about index templates here, and start using them instead of manually creating indexes with explicit mapping. You will never look back.


Comments are now closed