The path_hierarchy tokenizer takes a hierarchical value like a filesystem
path, splits on the path separator, and emits a term for each component in the
tree.
POST _analyze
{
"tokenizer": "path_hierarchy",
"text": "/one/two/three"
}The above text would produce the following terms:
[ /one, /one/two, /one/two/three ]
The path_hierarchy tokenizer accepts the following parameters:
|
|
The character to use as the path separator. Defaults to |
|
|
An optional replacement character to use for the delimiter.
Defaults to the |
|
|
The number of characters read into the term buffer in a single pass.
Defaults to |
|
|
If set to |
|
|
The number of initial tokens to skip. Defaults to |
In this example, we configure the path_hierarchy tokenizer to split on -
characters, and to replace them with /. The first two tokens are skipped:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-",
"replacement": "/",
"skip": 2
}
}
}
}
}
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "one-two-three-four-five"
}The above example produces the following terms:
[ /three, /three/four, /three/four/five ]
If we were to set reverse to true, it would produce the following:
[ one/two/three/, two/three/, three/ ]