The reroute command allows for manual changes to the allocation of individual shards in the cluster. For example, a shard can be moved from one node to another explicitly, an allocation can be cancelled, and an unassigned shard can be explicitly allocated to a specific node.
Here is a short example of a simple reroute API call:
POST /_cluster/reroute { "commands" : [ { "move" : { "index" : "test", "shard" : 0, "from_node" : "node1", "to_node" : "node2" } }, { "allocate_replica" : { "index" : "test", "shard" : 1, "node" : "node3" } } ] }
It is important to note that after processing any reroute commands
Elasticsearch will perform rebalancing as normal (respecting the values of
settings such as cluster.routing.rebalance.enable
) in order to remain in a
balanced state. For example, if the requested allocation includes moving a
shard from node1
to node2
then this may cause a shard to be moved from
node2
back to node1
to even things out.
The cluster can be set to disable allocations using the
cluster.routing.allocation.enable
setting. If allocations are disabled then
the only allocations that will be performed are explicit ones given using the
reroute
command, and consequent allocations due to rebalancing.
It is possible to run reroute
commands in "dry run" mode by using the
?dry_run
URI query parameter, or by passing "dry_run": true
in the request
body. This will calculate the result of applying the commands to the current
cluster state, and return the resulting cluster state after the commands (and
re-balancing) has been applied, but will not actually perform the requested
changes.
If the ?explain
URI query parameter is included then a detailed explanation
of why the commands could or could not be executed is included in the response.
The commands supported are:
move
index
and shard
for index name and shard number, from_node
for the
node to move the shard from, and to_node
for the node to move the
shard to.
cancel
index
and shard
for
index name and shard number, and node
for the node to cancel the shard
allocation on. This can be used to force resynchronization of existing
replicas from the primary shard by cancelling them and allowing them to be
reinitialized through the standard recovery process. By default only
replica shard allocations can be cancelled. If it is necessary to cancel
the allocation of a primary shard then the allow_primary
flag must also
be included in the request.
allocate_replica
index
and shard
for index name and shard number, and node
to allocate the shard to. Takes
allocation deciders into account.
The cluster will attempt to allocate a shard a maximum of
index.allocation.max_retries
times in a row (defaults to 5
), before giving
up and leaving the shard unallocated. This scenario can be caused by
structural problems such as having an analyzer which refers to a stopwords
file which doesn’t exist on all nodes.
Once the problem has been corrected, allocation can be manually retried by
calling the reroute
API with the ?retry_failed
URI
query parameter, which will attempt a single retry round for these shards.
Two more commands are available that allow the allocation of a primary shard to a node. These commands should however be used with extreme care, as primary shard allocation is usually fully automatically handled by Elasticsearch. Reasons why a primary shard cannot be automatically allocated include the following:
The following two commands are dangerous and may result in data loss. They are
meant to be used in cases where the original data can not be recovered and the
cluster administrator accepts the loss. If you have suffered a temporary issue
that can be fixed, please see the retry_failed
flag described above. To
emphasise: if these commands are performed and then a node joins the cluster
that holds a copy of the affected shard then the copy on the newly-joined node
will be deleted or overwritten.
allocate_stale_primary
index
and shard
for index name and shard number, and node
to allocate
the shard to. Using this command may lead to data loss for the provided
shard id. If a node which has the good copy of the data rejoins the cluster
later on, that data will be deleted or overwritten with the data of the
stale copy that was forcefully allocated with this command. To ensure that
these implications are well-understood, this command requires the flag
accept_data_loss
to be explicitly set to true
.
allocate_empty_primary
index
and shard
for index name and shard number, and node
to allocate the shard to. Using
this command leads to a complete loss of all data that was indexed into
this shard, if it was previously started. If a node which has a copy of the
data rejoins the cluster later on, that data will be deleted. To ensure
that these implications are well-understood, this command requires the flag
accept_data_loss
to be explicitly set to true
.