ElasticSearch Nested Field Type: The importance of defining the mapping of the index

Miguel Barrios
3 min readJan 10, 2021

Elasticsearch is an open-source distributed analysis engine that stores information and then allows it to be located.

When we want to get the information from documents with nested fields, we may have trouble retrieving the data

For example, we want to create an index called “candidates” in Elasticsearch and we don’t define the fields and how we want to store the data in the index:

PUT candidates
{}

If the process has been correct, it returns the following message:

{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "candidates"
}

During the indexing process, Elasticsearch stores documents and makes an inverted index in order to be able to search for data in the document almost in real-time.

The documents are now indexed with nested fields so that information can be consulted later:

POST candidates/_doc
{
"firstname": "Mike",
"age": 31,
"city": "New York",
"language":[
{
"name": "English",
"level": "native"
},
{
"name": "French",
"level": "basic"
},
{
"name": "German",
"level": "advanced"
},
{
"name": "Spanish",
"level": "low"
}
]
}
POST candidates/_doc
{
"firstname": "Andre",
"age": 28,
"city": "Texas",
"language":[
{
"name": "Portuguese",
"level": "native"
}
]
}
POST candidates/_doc
{
"firstname": "Peter",
"age": 37,
"city": "Iowa",
"language":[
{
"name": "French",
"level": "basic"
}
]
}

When we want to search in the name field with the value “Mike” across a query in the API Search, we are able to get a document with this value.

What happens when we want to search for the data that is in brackets?. We are going to try the same with the language field to know who can speak Portuguese:

GET candidates/_search
{
"query":{
"match": {
"language.name": "Portugese"
}
}
}

The hits field is empty:

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}

What have happened?

Elasticsearch has not understood the hierarchy of objects in the nested document JSON.

To fix it:

  • Delete the index “candidate”: 1)
  • Define the properties in the mapping with _template: 2)
  • Index all documents again
DELETE candidates   1)PUT _template/candidates     2)
{ "index_patterns": ["candidates"],
"mappings": {
"properties": {
"language": {"type":"nested"

}
}
}
}

Now that the field “language” is defined as type nested, it must be specified in the query to access the data:

GET candidates/_search
{
"query":{
"nested": {
"path": "language",

"query": {
"match": {
"language.name": "Portuguese"
}
}
}
}
}

And we get the expected response back:

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3862942,
"hits" : [
{
"_index" : "candidates",
"_type" : "_doc",
"_id" : "WYOdm3YBrd5s6S9m1UFu",
"_score" : 1.3862942,
"_source" : {
"firstname" : "Andre",
"age" : 28,
"city" : "Texas",
"language" : [
{
"name" : "Portuguese",
"level" : "native"

}
]
}
}
]
}
}

What happens if we want to use Agreggations?

We must also go through the same process of defining the nested field:

GET candidates/_search?filter_path=aggregations
{
"aggs": {
"nestedField": {
"nested": {
"path": "language"
},
"aggs": {
"nameField": {
"terms": {
"field": "language.name",
"size": 10
}
}
}
}
}
}

Finally, it is useful to know Filter_path parameter can be used to reduce the response returned by Elasticsearch.

If this article has helped you please tap or click “♥︎” or follow me on Linkedin.

Thanks for reading!

--

--

Miguel Barrios

Patience and time management are two skills that I have, to continue building my own anatomy.