A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
The extended_stats aggregations is an extended version of the stats aggregation, where additional metrics are added such as sum_of_squares, variance, std_deviation and std_deviation_bounds.
Assuming the data consists of documents representing exams grades (between 0 and 100) of students
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : { "extended_stats" : { "field" : "grade" } }
}
}The above aggregation computes the grades statistics over all documents. The aggregation type is extended_stats and the field setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
{
...
"aggregations": {
"grades_stats": {
"count": 2,
"min": 50.0,
"max": 100.0,
"avg": 75.0,
"sum": 150.0,
"sum_of_squares": 12500.0,
"variance": 625.0,
"std_deviation": 25.0,
"std_deviation_bounds": {
"upper": 125.0,
"lower": 25.0
}
}
}
}The name of the aggregation (grades_stats above) also serves as the key by which the aggregation result can be retrieved from the returned response.
By default, the extended_stats metric will return an object called std_deviation_bounds, which provides an interval of plus/minus two standard
deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
three standard deviations, you can set sigma in the request:
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"field" : "grade",
"sigma" : 3
}
}
}
}sigma can be any non-negative double, meaning you can request non-integer values such as 1.5. A value of 0 is valid, but will simply
return the average for both upper and lower bounds.
The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so if your data is skewed heavily left or right, the value returned will be misleading.
Computing the grades stats based on a script:
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"script" : {
"source" : "doc['grade'].value",
"lang" : "painless"
}
}
}
}
}This will interpret the script parameter as an inline script with the painless script language and no script parameters. To use a stored script use the following syntax:
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"script" : {
"id": "my_script",
"params": {
"field": "grade"
}
}
}
}
}
}It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats:
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"field" : "grade",
"script" : {
"lang" : "painless",
"source": "_value * params.correction",
"params" : {
"correction" : 1.2
}
}
}
}
}
}The missing parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.