Cross-cluster search

» »

Cross-cluster search

The cross-cluster search feature allows any node to act as a federated client across multiple clusters. A cross-cluster search node won’t join the remote cluster, instead it connects to a remote cluster in a light fashion in order to execute federated search requests. For details on communication and compatibility between different clusters, see Remote clusters.

Using cross-cluster search

Cross-cluster search requires configuring remote clusters.

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}

Copy as cURL View in Console

To search the twitter index on remote cluster cluster_one the index name must be prefixed with the alias of the remote cluster followed by the : character:

GET /cluster_one:twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}

Copy as cURL View in Console

{
  "took": 150,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 1,
    "successful": 1,
    "skipped": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}

Indices with the same name on different clusters can also be searched:

GET /cluster_one:twitter,twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}

Copy as cURL View in Console

Search results are disambiguated the same way as the indices are disambiguated in the request. Indices with same names are treated as different indices when results are merged. All results retrieved from an index located in a remote cluster are prefixed with their corresponding cluster alias:

{
  "took": 150,
  "timed_out": false,
  "num_reduce_phases": 3,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 2,
    "successful": 2,
    "skipped": 0
  },
  "hits": {
    "total" : {
        "value": 2,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}

Skipping disconnected clusters

By default, all remote clusters that are searched via cross-cluster search need to be available when the search request is executed. Otherwise, the whole request fails; even if some of the clusters are available, no search results are returned. You can use the boolean skip_unavailable setting to make remote clusters optional. By default, it is set to false.

PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.cluster_two.skip_unavailable": true 
  }
}

Copy as cURL View in Console

cluster_two is made optional

GET /cluster_one:twitter,cluster_two:twitter,twitter/_search 
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}

Copy as cURL View in Console

Search against the twitter index in cluster_one, cluster_two and also locally

{
  "took": 150,
  "timed_out": false,
  "num_reduce_phases": 3,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": { 
    "total": 3,
    "successful": 2,
    "skipped": 1
  },
  "hits": {
    "total" : {
        "value": 2,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}

The clusters section indicates that one cluster was unavailable and got skipped