A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
Example use cases:
significant_terms
Example:
A query on StackOverflow data for the popular term javascript OR the rarer term
kibana will match many documents - most of them missing the word Kibana. To focus
the significant_terms aggregation on top-scoring documents that are more likely to match
the most interesting parts of our query we use a sample.
POST /stackoverflow/_search?size=0
{
    "query": {
        "query_string": {
            "query": "tags:kibana OR tags:javascript"
        }
    },
    "aggs": {
        "sample": {
            "sampler": {
                "shard_size": 200
            },
            "aggs": {
                "keywords": {
                    "significant_terms": {
                        "field": "tags",
                        "exclude": ["kibana", "javascript"]
                    }
                }
            }
        }
    }
}Response:
{
    ...
    "aggregations": {
        "sample": {
            "doc_count": 200, "keywords": {
                "doc_count": 200,
                "bg_count": 650,
                "buckets": [
                    {
                        "key": "elasticsearch",
                        "doc_count": 150,
                        "score": 1.078125,
                        "bg_count": 200
                    },
                    {
                        "key": "logstash",
                        "doc_count": 50,
                        "score": 0.5625,
                        "bg_count": 50
                    }
                ]
            }
        }
    }
}
            "keywords": {
                "doc_count": 200,
                "bg_count": 650,
                "buckets": [
                    {
                        "key": "elasticsearch",
                        "doc_count": 150,
                        "score": 1.078125,
                        "bg_count": 200
                    },
                    {
                        "key": "logstash",
                        "doc_count": 50,
                        "score": 0.5625,
                        "bg_count": 50
                    }
                ]
            }
        }
    }
}| 200 documents were sampled in total. The cost of performing the nested significant_terms aggregation was therefore limited rather than unbounded. | 
Without the sampler aggregation the request query considers the full "long tail" of low-quality matches and therefore identifies
less significant terms such as jquery and angular rather than focusing on the more insightful Kibana-related terms.
POST /stackoverflow/_search?size=0
{
    "query": {
        "query_string": {
            "query": "tags:kibana OR tags:javascript"
        }
    },
    "aggs": {
             "low_quality_keywords": {
                "significant_terms": {
                    "field": "tags",
                    "size": 3,
                    "exclude":["kibana", "javascript"]
                }
        }
    }
}Response:
{
    ...
    "aggregations": {
        "low_quality_keywords": {
            "doc_count": 600,
            "bg_count": 650,
            "buckets": [
                {
                    "key": "angular",
                    "doc_count": 200,
                    "score": 0.02777,
                    "bg_count": 200
                },
                {
                    "key": "jquery",
                    "doc_count": 200,
                    "score": 0.02777,
                    "bg_count": 200
                },
                {
                    "key": "logstash",
                    "doc_count": 50,
                    "score": 0.0069,
                    "bg_count": 50
                }
            ]
        }
    }
}The shard_size parameter limits how many top-scoring documents are collected in the sample processed on each shard.
The default value is 100.
Being a quality-based filter the sampler aggregation needs access to the relevance score produced for each document.
It therefore cannot be nested under a terms aggregation which has the collect_mode switched from the default depth_first mode to breadth_first as this discards scores.
In this situation an error will be thrown.