Several different result types are created for each job. You can query anomaly results for buckets, influencers, and records by using the results API. Summarized bucket results over multiple jobs can be queried as well; those results are called overall buckets.
Results are written for each bucket_span
. The timestamp for the results is the
start of the bucket time interval.
The results include scores, which are calculated for each anomaly result type and each bucket interval. These scores are aggregated in order to reduce noise, and normalized in order to identify and rank the most mathematically significant anomalies.
Bucket results provide the top level, overall view of the job and are ideal for alerts. For example, the bucket results might indicate that at 16:05 the system was unusual. This information is a summary of all the anomalies, pinpointing when they occurred.
Influencer results show which entities were anomalous and when. For example,
the influencer results might indicate that at 16:05 user_name: Bob
was unusual.
This information is a summary of all the anomalies for each entity, so there
can be a lot of these results. Once you have identified a notable bucket time,
you can look to see which entities were significant.
Record results provide details about what the individual anomaly was, when it occurred and which entity was involved. For example, the record results might indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was 1067 bytes. Once you have identified a bucket time and perhaps a significant entity too, you can drill through to the record results in order to investigate the anomalous behavior.
Categorization results contain the definitions of categories that have been identified. These are only applicable for jobs that are configured to analyze unstructured log data using categorization. These results do not contain a timestamp or any calculated scores. For more information, see Categorizing Log Messages.
All of these resources and properties are informational; you cannot change their values.
Bucket results provide the top level, overall view of the job and are best for alerting.
Each bucket has an anomaly_score
, which is a statistically aggregated and
normalized view of the combined anomalousness of all the record results within
each bucket.
One bucket result is written for each bucket_span
for each job, even if it is
not considered to be anomalous. If the bucket is not anomalous, it has an
anomaly_score
of zero.
When you identify an anomalous bucket, you can investigate further by expanding the bucket resource to show the records as nested objects. Alternatively, you can access the records resource directly and filter by the date range.
A bucket resource has the following properties:
anomaly_score
bucket_influencers
bucket_span
bucket_span
that is specified in the job.
event_count
initial_anomaly_score
anomaly_score
for any of the bucket influencers.
This is the initial value that was calculated at the time the bucket was
processed.
is_interim
job_id
processing_time_ms
result_type
bucket
.
timestamp
Events that occur exactly at the timestamp of the bucket are included in the results for the bucket.
Bucket influencer results are available as nested objects contained within
bucket results. These results are an aggregation for each type of influencer.
For example, if both client_ip
and user_name
were specified as influencers,
then you would be able to determine when the client_ip
or user_name
values
were collectively anomalous.
There is a built-in bucket influencer called bucket_time
which is always
available. This bucket influencer is the aggregation of all records in the
bucket; it is not just limited to a type of influencer.
A bucket influencer is a type of influencer. For example, client_ip
or
user_name
can be bucket influencers, whereas 192.168.88.2
and Bob
are
influencers.
An bucket influencer object has the following properties:
anomaly_score
bucket_span
bucket_span
that is specified in the job.
initial_anomaly_score
influencer_field_name
client_ip
or
user_name
.
influencer_field_value
192.168.88.2
or
Bob
.
is_interim
job_id
probability
anomaly_score
is provided as a
human-readable and friendly interpretation of this.
raw_anomaly_score
result_type
bucket_influencer
.
timestamp
Influencers are the entities that have contributed to, or are to blame for,
the anomalies. Influencer results are available only if an
influencer_field_name
is specified in the job configuration.
Influencers are given an influencer_score
, which is calculated based on the
anomalies that have occurred in each bucket interval. For jobs with more than
one detector, this gives a powerful view of the most anomalous entities.
For example, if you are analyzing unusual bytes sent and unusual domains
visited and you specified user_name
as the influencer, then an
influencer_score
for each anomalous user name is written per bucket. For
example, if user_name: Bob
had an influencer_score
greater than 75, then
Bob
would be considered very anomalous during this time interval in one or
both of those areas (unusual bytes sent or unusual domains visited).
One influencer result is written per bucket for each influencer that is considered anomalous.
When you identify an influencer with a high score, you can investigate further by accessing the records resource for that bucket and enumerating the anomaly records that contain the influencer.
An influencer object has the following properties:
bucket_span
bucket_span
that is specified in the job.
influencer_score
initial_influencer_score
, this value will be updated by a re-normalization
process as new data is analyzed.
initial_influencer_score
influencer_field_name
influencer_field_value
is_interim
job_id
probability
influencer_score
is provided as a
human-readable and friendly interpretation of this.
result_type
influencer
.
timestamp
Additional influencer properties are added, depending on the fields being
analyzed. For example, if it’s analyzing user_name
as an influencer, then a
field user_name
is added to the result document. This information enables you to
filter the anomaly results more easily.
Records contain the detailed analytical results. They describe the anomalous activity that has been identified in the input data based on the detector configuration.
For example, if you are looking for unusually large data transfers, an anomaly record can identify the source IP address, the destination, the time window during which it occurred, the expected and actual size of the transfer, and the probability of this occurrence.
There can be many anomaly records depending on the characteristics and size of the input data. In practice, there are often too many to be able to manually process them. The machine learning features therefore perform a sophisticated aggregation of the anomaly records into buckets.
The number of record results depends on the number of anomalies found in each bucket, which relates to the number of time series being modeled and the number of detectors.
A record object has the following properties:
actual
bucket_span
bucket_span
that is specified in the job.
by_field_name
client_ip
.
by_field_value
by_field_name
. This value is present only if
it is specified in the detector. For example, 192.168.66.2
.
causes
over_field_name
. For scalability reasons,
a maximum of the 10 most significant causes of the anomaly are returned. As
part of the core analytical modeling, these low-level anomaly records are
aggregated for their parent over field record. The causes resource contains
similar elements to the record resource, namely actual
, typical
,
*_field_name
and *_field_value
. Probability and scores are not applicable
to causes.
detector_index
field_name
sum()
.
For those functions, this value is the name of the field to be analyzed.
function
max
.
function_description
influencers
influencers
was specified in the detector configuration, then
this array contains influencers that contributed to or were to blame for an
anomaly.
initial_record_score
is_interim
job_id
over_field_name
user
.
over_field_value
over_field_name
. This value is present only if it
was specified in the detector. For example, Bob
.
partition_field_name
region
.
partition_field_value
partition_field_name
. This value is present only if
it was specified in the detector. For example, us-east-1
.
probability
record_score
is provided as a
human-readable and friendly interpretation of this.
multi_bucket_impact
record_score
initial_record_score
, this
value will be updated by a re-normalization process as new data is analyzed.
result_type
record
.
timestamp
typical
Additional record properties are added, depending on the fields being
analyzed. For example, if it’s analyzing hostname
as a by field, then a field
hostname
is added to the result document. This information enables you to
filter the anomaly results more easily.
When categorization_field_name
is specified in the job configuration, it is
possible to view the definitions of the resulting categories. A category
definition describes the common terms matched and contains examples of matched
values.
The anomaly results from a categorization analysis are available as bucket, influencer, and record results. For example, the results might indicate that at 16:45 there was an unusual count of log message category 11. You can then examine the description and examples of that category.
A category resource has the following properties:
category_id
examples
grok_pattern
job_id
max_matching_length
regex
terms
Overall buckets provide a summary of bucket results over multiple jobs.
Their bucket_span
equals the longest bucket_span
of the jobs in question.
The overall_score
is the top_n
average of the max anomaly_score
per job
within the overall bucket time interval.
This means that you can fine-tune the overall_score
so that it is more
or less sensitive to the number of jobs that detect an anomaly at the same time.
An overall bucket resource has the following properties:
timestamp
bucket_span
bucket_span
of the job with the longest one.
overall_score
top_n
average of the max bucket anomaly_score
per job.
jobs
max_anomaly_score
per job_id
.
is_interim
result_type
overall_bucket
.