Elasticsearch allows you to configure a scoring algorithm or similarity per
field. The similarity
setting provides a simple way of choosing a similarity
algorithm other than the default BM25
, such as TF/IDF
.
Similarities are mostly useful for text
fields, but can also apply
to other field types.
Custom similarities can be configured by tuning the parameters of the built-in
similarities. For more details about this expert options, see the
similarity module.
The only similarities which can be used out of the box, without any further
configuration are:
-
BM25
-
The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
See Pluggable Similarity Algorithms
for more information.
-
classic
-
The TF/IDF algorithm which used to be the default in Elasticsearch and
Lucene. See Lucene’s Practical Scoring Function
for more information.
-
boolean
-
A simple boolean similarity, which is used when full-text ranking is not needed
and the score should only be based on whether the query terms match or not.
Boolean similarity gives terms a score equal to their query boost.
The similarity
can be set on the field level when a field is first created,
as follows:
PUT my_index
{
"mappings": {
"properties": {
"default_field": {
"type": "text"
},
"boolean_sim_field": {
"type": "text",
"similarity": "boolean"
}
}
}
}
|
The default_field uses the BM25 similarity.
|
|
The boolean_sim_field uses the boolean similarity.
|