The rank_feature
query is a specialized query that only works on
rank_feature
fields and rank_features
fields.
Its goal is to boost the score of documents based on the values of numeric
features. It is typically put in a should
clause of a
bool
query so that its score is added to the score
of the query.
Compared to using function_score
or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
track_total_hits
is not set to true
. Speedups may be
spectacular.
Here is an example that indexes various features:
- pagerank
, a measure of the
importance of a website,
- url_length
, the length of the url, which typically correlates negatively
with relevance,
- topics
, which associates a list of topics with every document alongside a
measure of how well the document is connected to this topic.
Then the example includes an example query that searches for "2016"
and boosts
based or pagerank
, url_length
and the sports
topic.
PUT test { "mappings": { "properties": { "pagerank": { "type": "rank_feature" }, "url_length": { "type": "rank_feature", "positive_score_impact": false }, "topics": { "type": "rank_features" } } } } PUT test/_doc/1 { "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", "pagerank": 50.3, "url_length": 42, "topics": { "sports": 50, "brazil": 30 } } PUT test/_doc/2 { "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil", "pagerank": 50.3, "url_length": 47, "topics": { "sports": 35, "formula one": 65, "brazil": 20 } } PUT test/_doc/3 { "url": "http://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", "pagerank": 50.3, "url_length": 37, "topics": { "movies": 60, "super hero": 65 } } POST test/_refresh GET test/_search { "query": { "bool": { "must": [ { "match": { "content": "2016" } } ], "should": [ { "rank_feature": { "field": "pagerank" } }, { "rank_feature": { "field": "url_length", "boost": 0.1 } }, { "rank_feature": { "field": "topics.sports", "boost": 0.4 } } ] } } }
The rank_feature
query supports 3 functions in order to boost scores using the
values of rank features. If you do not know where to start, we recommend that you
start with the saturation
function, which is the default when no function is
provided.
This function gives a score that is equal to S / (S + pivot)
where S
is the
value of the rank feature and pivot
is a configurable pivot value so that the
result will be less than 0.5
if S
is less than pivot and greater than 0.5
otherwise. Scores are always is (0, 1)
.
If the rank feature has a negative score impact then the function will be computed as
pivot / (S + pivot)
, which decreases when S
increases.
GET test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 8 } } } }
If pivot
is not supplied then Elasticsearch will compute a default value that
will be approximately equal to the geometric mean of all feature values that
exist in the index. We recommend this if you haven’t had the opportunity to
train a good pivot value.
GET test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": {} } } }
This function gives a score that is equal to log(scaling_factor + S)
where
S
is the value of the rank feature and scaling_factor
is a configurable scaling
factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
GET test/_search { "query": { "rank_feature": { "field": "pagerank", "log": { "scaling_factor": 4 } } } }
This function is an extension of saturation
which adds a configurable
exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^)
. Like for the
saturation
function, pivot
is the value of S
that gives a score of 0.5
and scores are in (0, 1)
.
exponent
must be positive, but is typically in [0.5, 1]
. A good value should
be computed via training. If you don’t have the opportunity to do so, we recommend
that you stick to the saturation
function instead.
GET test/_search { "query": { "rank_feature": { "field": "pagerank", "sigmoid": { "pivot": 7, "exponent": 0.6 } } } }