Strings longer than the ignore_above
setting will not be indexed or stored.
For arrays of strings, ignore_above
will be applied for each array element separately and string elements longer than ignore_above
will not be indexed or stored.
All strings/array elements will still be present in the _source
field, if the latter is enabled which is the default in Elasticsearch.
PUT my_index { "mappings": { "properties": { "message": { "type": "keyword", "ignore_above": 20 } } } } PUT my_index/_doc/1 { "message": "Syntax error" } PUT my_index/_doc/2 { "message": "Syntax error with some long stacktrace" } GET _search { "aggs": { "messages": { "terms": { "field": "message" } } } }
This field will ignore any string longer than 20 characters. | |
This document is indexed successfully. | |
This document will be indexed, but without indexing the | |
Search returns both documents, but only the first is present in the terms aggregation. |
The ignore_above
setting can be updated on
existing fields using the PUT mapping API.
This option is also useful for protecting against Lucene’s term byte-length
limit of 32766
.
The value for ignore_above
is the character count, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to 32766 / 4 = 8191
since UTF-8 characters may occupy at most
4 bytes.