A bucket aggregation returning a form of adjacency matrix.
The request provides a collection of named filter expressions, similar to the filters aggregation
request.
Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
Given filters named A, B and C the response would return buckets with the following names:
| A | B | C | |
|---|---|---|---|
A | A | A&B | A&C |
B | B | B&C | |
C | C |
The intersecting buckets e.g A&C are labelled using a combination of the two filter names separated by
the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
same set of documents as "A&C". The matrix is said to be symmetric so we only return half of it. To do this we sort
the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
An alternative separator parameter can be passed in the request if clients wish to use a separator string
other than the default of the ampersand.
Example:
PUT /emails/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "accounts" : ["hillary", "sidney"]}
{ "index" : { "_id" : 2 } }
{ "accounts" : ["hillary", "donald"]}
{ "index" : { "_id" : 3 } }
{ "accounts" : ["vladimir", "donald"]}
GET emails/_search
{
"size": 0,
"aggs" : {
"interactions" : {
"adjacency_matrix" : {
"filters" : {
"grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
"grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
"grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
}
}
}
}
}In the above example, we analyse email messages to see which groups of individuals have exchanged messages. We will get counts for each group individually and also a count of messages for pairs of groups that have recorded interactions.
Response:
{
"took": 9,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"interactions": {
"buckets": [
{
"key":"grpA",
"doc_count": 2
},
{
"key":"grpA&grpB",
"doc_count": 1
},
{
"key":"grpB",
"doc_count": 2
},
{
"key":"grpB&grpC",
"doc_count": 1
},
{
"key":"grpC",
"doc_count": 1
}
]
}
}
}On its own this aggregation can provide all of the data required to create an undirected weighted graph.
However, when used with child aggregations such as a date_histogram the results can provide the
additional levels of data required to perform dynamic network analysis
where examining interactions over time becomes important.