Pagination of results can be done by using the from
and size
but the cost becomes prohibitive when the deep pagination is reached.
The index.max_result_window
which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to from + size
.
The Scroll api is recommended for efficient deep scrolling but scroll contexts are costly and it is not
recommended to use it for real time user requests.
The search_after
parameter circumvents this problem by providing a live cursor.
The idea is to use the results from the previous page to help the retrieval of the next page.
Suppose that the query to retrieve the first page looks like this:
GET twitter/_search { "size": 10, "query": { "match" : { "title" : "elasticsearch" } }, "sort": [ {"date": "asc"}, {"tie_breaker_id": "asc"} ] }
A field with one unique value per document should be used as the tiebreaker
of the sort specification. Otherwise the sort order for documents that have
the same sort values would be undefined and could lead to missing or duplicate
results. The _id
field has a unique value per document
but it is not recommended to use it as a tiebreaker directly.
Beware that search_after
looks for the first document which fully or partially
matches tiebreaker’s provided value. Therefore if a document has a tiebreaker value of
"654323"
and you search_after
for "654"
it would still match that document
and return results found after it.
doc value are disabled on this field so sorting on it requires
to load a lot of data in memory. Instead it is advised to duplicate (client side
or with a set ingest processor) the content
of the _id
field in another field that has
doc value enabled and to use this new field as the tiebreaker
for the sort.
The result from the above request includes an array of sort values
for each document.
These sort values
can be used in conjunction with the search_after
parameter to start returning results "after" any
document in the result list.
For instance we can use the sort values
of the last document and pass it to search_after
to retrieve the next page of results:
GET twitter/_search { "size": 10, "query": { "match" : { "title" : "elasticsearch" } }, "search_after": [1463538857, "654323"], "sort": [ {"date": "asc"}, {"tie_breaker_id": "asc"} ] }
The parameter from
must be set to 0 (or -1) when search_after
is used.
search_after
is not a solution to jump freely to a random page but rather to scroll many queries in parallel.
It is very similar to the scroll
API but unlike it, the search_after
parameter is stateless, it is always resolved against the latest
version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.