Performs the analysis process on a text and return the tokens breakdown of the text.
Can be used without specifying an index against one of the many built in analyzers:
GET _analyze
{
"analyzer" : "standard",
"text" : "this is a test"
}If text parameter is provided as array of strings, it is analyzed as a multi-valued field.
GET _analyze
{
"analyzer" : "standard",
"text" : ["this is a test", "the second text"]
}Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter filter parameter name:
GET _analyze
{
"tokenizer" : "keyword",
"filter" : ["lowercase"],
"text" : "this is a test"
}GET _analyze
{
"tokenizer" : "keyword",
"filter" : ["lowercase"],
"char_filter" : ["html_strip"],
"text" : "this is a <b>test</b>"
}
Use filter/char_filter instead of filters/char_filters and token_filters has been removed
Custom tokenizers, token filters, and character filters can be specified in the request body as follows:
GET _analyze
{
"tokenizer" : "whitespace",
"filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
"text" : "this is a test"
}It can also run against a specific index:
GET analyze_sample/_analyze
{
"text" : "this is a test"
}The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the analyze_sample index. An analyzer
can also be provided to use a different analyzer:
GET analyze_sample/_analyze
{
"analyzer" : "whitespace",
"text" : "this is a test"
}Also, the analyzer can be derived based on a field mapping, for example:
GET analyze_sample/_analyze
{
"field" : "obj1.field1",
"text" : "this is a test"
}Will cause the analysis to happen based on the analyzer configured in the
mapping for obj1.field1 (and if not, the default index analyzer).
A normalizer can be provided for keyword field with normalizer associated with the analyze_sample index.
GET analyze_sample/_analyze
{
"normalizer" : "my_normalizer",
"text" : "BaR"
}Or by building a custom transient normalizer out of token filters and char filters.
GET _analyze
{
"filter" : ["lowercase"],
"text" : "BaR"
}