All About The Elasticsearch Bucket Script

Datetime:2016-08-23 02:06:49          Topic: Elastic Search           Share

In previous installments of the pipeline aggregation blog series, we discussed ready-made aggregations which are ready to be used with the query. In this post, we explore pipeline aggregations which make use of scripts for its operations, and thus provides flexibility for the user to handle the field data.

Data Set

You can download the data here.

Bucket Script

The "bucket_script" pipeline aggregation falls under the class of parent aggregations. It can execute a script which can do per-bucket calculations on the required metrics of the parent multi-bucket aggregation. The metric which is to be operated should be numeric. The script is to return a numeric value.

Note: You must enable inline scripts in order to make these aggregations work, otherwise you might hit the script execution error from elasticsearch.

Looking at the data set we generated, we find all the temperature values are given in Fahrenheit. There is a requirement in which we want the values to be in Fahrenheit. We know how to convert Fahrenheit to degree celsius, multiply by 0.5556 . In order to apply it on a per-bucket basis, we can employ the following query coupled with the "bucket_script" aggregation:

curl -XPOST 'localhost:9200/weather-data/_search?pretty' -d '{
  "query": {
    "match": {
      "city": "NY"
    }
  },
  "aggs": {
    "temp": {
      "date_histogram": {
        "field": "date",
        "interval": "month",
        "format": "dd-MM-yyyy"
      },
      "aggs": {
        "monthly": {
          "avg": {
            "field": "temp"
          }
        },
        "farenhietTo_celsius": {
          "bucket_script": {
            "buckets_path": {
              "farht_value": "monthly"
            },
            "script": "farht_value*0.5556"
          }
        }
      }
    }
  }
}'

By running the above script, you can see that each aggregation bucket is holding a field "farenhietToCelsius" , in which the corresponding temperature value is converted to degree celsius.

Bucket Selector Aggregation

The Bucket Selector Aggregation is a parent aggregation. It runs a script to decide which all buckets should be shown in the response. The metric specified should be a numeric metric, and the script return value must be boolean for the bucket selector aggregation to work.

Learn About Our New Open Source Container Orchestration System : Supergiant.

Suppose you want to retrieve only the buckets which has the temperature values greater than 50 for New York city. Employ the bucket selector aggregation like below:

curl -XPOST 'localhost:9200/weather-data/_search?pretty' -d '{
  "query": {
    "match": {
      "city": "NY"
    }
  },
  "aggs": {
    "temp": {
      "date_histogram": {
        "field": "date",
        "interval": "month",
        "format": "dd-MM-yyyy"
      },
      "aggs": {
        "monthly": {
          "avg": {
            "field": "temp"
          }
        },
        "temp_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "temp_var": "monthly"
            },
            "script": "temp_var >= 50"
          }
        }
      }
    }
  }
}'

Here you can see the "bucket_selector" aggregation is applied under the name "temp_bucket_filter" . Inside, we create a variable named "temp_var" , with value equalling the per-bucket value from the "monthly" aggregation. Now, there is a script which is specifying the condition to show only the buckets with value greater than or equal to 50.

Conclusion

In this post we've shown two different kinds of pipeline aggregations which employs scripts for the operations. Note: You must enable inline scripts in order to make these aggregations work, otherwise you might hit the script execution error from elasticsearch. In the next post of this series, we show another set of useful aggregations.





About List