How To Index JSON With Elasticsearch

Datetime:2016-08-22 22:57:20          Topic: Elastic Search           Share

Let's look at the basics of indexing data into Elasticsearch. A wealth of knowledge on Elasticsearch will help you understand why you sometimes encounter issues when working with both Logstash and Kibana. Many issues encountered by new users has to do with them not understanding how Logstash and Kibana interact with Elasticsearch.

In our previous article we indexed JSON directly using Sense . We indexed the data using the HTTP verb PUT.

For this article I’m assuming that you are making use of Sense to interact with Elasticsearch.

PUT /candidate_index/candidate/1?pretty=true
{
  "name" : "Donald John Trump",
  "affiliation" : "Republican",
  "age" : "69",
  "occupation" : "businessman",
  "twitter" : "https://twitter.com/realDonaldTrump",
  "website" : "http://www.donaldjtrump.com"
}

What is the difference in indexing data with POST as opposed to PUT in the context of Elasticsearch? For starters, you can create an index and index data on the fly when using the PUT HTTP verb. For example, If you are trying to index this with PUT you will get an error:

PUT /candidate_index/candidate/
{
  "name" : "Hillary Clinton",
  "affiliation" : "Democratic",
  "age" : "68",
 "occupation" : "politician",
  "twitter" : "https://twitter.com/HillaryClinton",
  "website" : "https://www.hillaryclinton.com"
}

This is the error:

"No handler found for uri [/candidate_index/candidate/] and method [PUT]"

When using PUT you are updating at a known ID or placing data at a known ID. When using the POST HTTP verb, the ID is optional because Elasticsearch dynamically creates an ID for you. This is a nice feature, because just imagine indexing more than 100 presidential candidates and having to keep track of where you are with the ID's being used for each candidate.

Learn About Our New Open Source Container Orchestration System Supergiant >

You would have to keep incrementing the ID numbers like this:

PUT /candidate_index/candidate/1
{
  "name" : "Hillary Clinton",
  "affiliation" : "Democratic",
  "age" : "68",
  "occupation" : "politician",
  "twitter" : "https://twitter.com/HillaryClinton",
  "website" : "https://www.hillaryclinton.com"
}

This was our first candidate. Now take note that the ID will now have to be incremented. The next candidate will have an ID of 2 because we want to index a new document and not replace document with ID of 1 with an update. So, let's index our next candidate.

PUT /candidate_index/candidate/2
{
  "name" : "Donald John Trump",
  "affiliation" : "Republican",
  "age" : "69",
  "occupation" : "businessman",
  "twitter" : "https://twitter.com/realDonaldTrump",
  "website" : "http://www.donaldjtrump.com"
}

For the next candidates we will need to keep adding 1 to the ID. The ID of the next candidate will be 3. If you don't feel like specifying an ID every time when indexing data, you can POST the data to Elasticsearch with the following:

POST /candidate_index/candidate/
{
  "name" : "Hillary Clinton",
  "affiliation" : "Democratic",
  "age" : "68",
  "occupation" : "politician",
  "twitter" : "https://twitter.com/HillaryClinton",
  "website" : "https://www.hillaryclinton.com"
}

Where the only difference was that we used the HTTP verb POST instead of using PUT. This is useful because we can index items one by one, and without having to worry about what the next ID will have to be. This brings us to another important topic.

Mappings

You can think of mappings as a way to define how to map data to a certain data type and to map how data should be indexed.  Mappings are important because when we index data we expect to be able to access data in a readable format that makes sense. For example, if you index the name "Hillary Clinton" then we expect elasticsearch to not break the data up into Hillary and Clinton . When we retrieve the name for our candidate then we want the full name and surname: "Hillary Clinton" .  Because we do not specify any settings on our fields, they will use the default analyzer (which we can change) Standard Analyzer on our string fields. If we view the data in Kibana then it might appear that we have a field containing "Hilary" and a field containing "Clinton" as opposed to having one field that contains  "Hillary Clinton"

The only way we can prevent this is by setting a mapping for our index and setting the field name to not analyzed .  We start by looking at what the current mapping is for the data we are working with currently.

GET candidate_index/_mapping

Which gives us the result of:

{
  "candidate_index": {
    "mappings": {
      "candidate": {
        "properties": {
          "affiliation": {
            "type": "string"
          },
          "age": {
            "type": "string"
          },
          "name": {
            "type": "string"
          },
          "occupation": {
            "type": "string"
          },
          "twitter": {
            "type": "string"
          },
          "website": {
            "type": "string"
          }
        }
      }
    }
  }
}

Dynamic Index Creation

Elasticsearch has an interesting feature called Automatic or dynamic index creation. If an index does not already exist then Elasticsearch will create it if you are trying to index data into an index that does not already exist. Elasticsearch also creates a dynamic type mapping for any index that does not have a predefined mapping. In other words, if you haven't defined a mapping for an index then Elasticsearch will come up with a mapping for you.

In aprevious articlewe created an index by the name of "candidate_index" and with the type "candidate" , but we didn't define a mapping for it.

If we look at this request in Sense:

PUT /candidate_index/candidate/1
{
  "name" : "Hillary Clinton",
  "affiliation" : "Democratic",
  "age" : "68",
  "occupation" : "politician",
  "twitter" : "https://twitter.com/HillaryClinton",
  "website" : "https://www.hillaryclinton.com"
}

Notice that we didn't have an index by this name before sending the request. Thus, elasticsearch automatically created the index for us and that is why we could add data into the index even though it didn't exist. We added data to a type "candidate" that we came up with and elasticsearch automatically created a dynamic mapping for this type.

We can view the mapping for this type "candidate" with the following command:

GET /candidate_index/_mapping/candidate

The mapping will appear the same as when we had a look at the index wide mapping earlier , since

we only have one type on our index. But if we had more than one type in our index then it would make sense

to check the mapping per type.

Let's add data into another type in our index just so my examples can be more interesting and make more sense. I'm creating a type to store data related to each party.

PUT /candidate_index/party/1
{
  "party_name" : "Republican Party",
  "aka" : "GOP",
  "Founded" : "1854",
  "chair" : "Reince Priebus"
}

Now if you have a look at the index wide mapping you will see that there is a mapping for the type "candidate" and for the type "party" . Have a look at the index wide mapping:

GET candidate_index/_mapping

Now have a look at the per type mapping:

GET /candidate_index/_mapping/party

You can define a mapping by making use of what is called a template.

Templates

A template defines specific settings and the mapping to use when you create a new index with a certain name. You can use templates to set a mapping, also, since mapping can be part of your template. Templates are very useful if you are creating indexes with similar names, such as indexes prefixed with a name and followed by a timestamp.

With templates you can set the mapping for any future indices matching a naming pattern. If you have been using the tool Logstash, then you have been making use of templates, probably without even realizing it.  If you are indexing data from Logstash into Elasticsearch into an index with the name of logstash-* then you are making use of a default template that has been set for the index name of logstash-* .  

If you spend enough time on the Elasticsearch forum you will notice that many people who are using logstash, who are still beginners will always index their data in Elasticsearch to an index with the name of "logstash-*" , because they are aware of the template that the "logstash-*" index uses, but they aren't aware of how to create their own templates. 

Learn The Top Reasons You Should Move To Kubernetes >

Write your own templates and make them easy to use. If you are working on a big team then add your template with your other code or even just add it to your documentation and make notes for others on how to use your template and why you are using that specific template.

Conclusion

As you progress with your journey with the ELK Stack you will sometimes encounter the issue of having data that you have already indexed that you want to change the mapping of. This can be done, although you will have to reindex the data.

You need to learn how Elasticsearch works and how to work with it directly without using Logstash, if you want to master using Elasticsearch and Logstash together. This will help you with debugging complex problems related to using the ELK Stack.





About List