# Anomalies

## Details covered in the analysis

The Anomalies app calculates the 'normal' values and patterns in a dataset, then identifies data points or ranges that significantly deviate from these norms. Finally, it generates a narrative that highlights and contextualizes the anomalies found.

The anomalies identified by the app fall into three categories:

Global outliers (also known as point anomalies) occur when the value of a single data point differs significantly from all other data points in the dataset. For example:

• If Sales of Karateon are significantly higher than Sales of any other Product across the dataset, Karateon is a global outlier.

• If Sales in March 2021 are significantly higher than Sales in every other month in the dataset, March 2021 is a global outlier.

Contextual anomalies (also known as seasonal or conditional anomalies) occur when a single data point or a range of data points contains values that are not necessarily outside of the normal global range but are considered abnormal in a specific context. For example:

• If Sales in the US usually increase significantly during November and December due to the holiday season but don't increase during November and December 2021, that is a contextual anomaly.

Collective anomalies (also known as continual anomalies) occur when a collection of values deviates significantly from the usual range or pattern observed across the entire dataset. For example:

• If the range of Profit values is generally between $1M and$2M but the range of Profit values in April, May, and June 2020 is between $0.5M and$5M, that three-month period is a collective anomaly.

### Tip

For contextual and collective anomalies only, use the Sensitivity narrative option to control how much deviation from the norm is considered significant.

The narrative has a section for each measure (e.g. Sales). Each section includes some or all of the following details, depending on your data selection and the type of anomalies that exist in your data:

• Outliers

• For non-time series data: Outliers by Dimension

When no time dimensions are selected, you get a summary of global outliers and a drilldown analysis that uses up to three non-time dimensions (e.g. Country, City, and Product). For example:

• For time series data: Outliers Over Time

When time dimensions are selected (e.g. Month and Year), you get a summary of global outliers over time and a drilldown analysis that uses up to three non-time dimensions (e.g. Country, City, and Product). For example:

• Anomalies Over Time (time series data, only)

• Daily/Monthly/Quarterly Comparisons to Other Weeks/Years

This section provides an overview of contextual anomalies and, if non-time dimensions have been selected, a breakdown for each under its own heading (e.g. By Country). For example:

• Over the Full Range of Time

This section provides an overview of collective anomalies and, if non-time dimensions have been selected, a breakdown for each under its own heading (e.g. By Country). For example:

### Note

The order of drilldown is determined by the order of dimensions set in Step 1.

### Tip

The line graph and the narrative below are generated using the same sales data. At first glance, they may appear to contradict each other: the narrative states that no anomalies were found in the sales data, but the graph seems to show a significant spike.

However, this apparent mismatch is due to the fact that the dataset contains one or more null values (empty cells) for the Date time dimension. Unlike the line chart, the Anomalies app is designed to ignore null values.

Top

## Data requirements

All NLG Apps have the same data requirements relating to supported formats and field names. See the NLG Apps Data requirements topic for details.

In addition, each app has specific data requirements:

If your data selection and configuration do not meet the specific requirements for a particular app, the app is grayed out and disabled in the wizard. In this example, the Describe a Pie Chart app has been disabled:

### Dimensions and measures

The number of dimensions and measures required depends on whether you're analyzing time series data or non-time series data.

#### Non-time series data

If your data selection does not contain a time dimension, this app can analyze:

• multiple measures and up to three non-time dimensions.

Minimum requirements: one measure and one non-time dimension.

### Important

At least seven data points are required in each series. If insufficient data is selected, an error is generated.

#### Time series data

If your data selection contains a time dimension, this app can analyze:

• multiple measures, up to three non-time dimensions, and multiple time dimensions*.

Minimum requirements: one measure and one time dimension*.

*Time dimension requirements for each analysis period:

 Analysis period Required date fields Year Date or Year Quarter Date or Year+Quarter Month Date or Year+Month Day Date or Year+Month+Day

### Important

Global outliers: at least seven data points are required in each series.

Contextual anomalies: at least seven data points are required in each series. In addition, at least 3 weeks of data is needed to report by Day, and at least 3 years of data is needed to report by Month or Quarter.

Collective anomalies: at least nine data points are required in each series.

A maximum of 20 instances of each dimension is reported in the narrative.

### Aggregation and entity types

Not all combinations of aggregation and entity types are supported for measures. The Anomalies app can analyze the following combinations:

### Tip

Aggregation types are set in Power BI. Note that Power BI may automatically set the aggregation type of a calculated measure to Don't summarize.

Entity types and other data attributes are set in Step 1 of the Arria for Power BI wizard.

Top

## Configuration options

The following options are available to configure:

OPTION

DESCRIPTION

LOCATION

Period

Select a time period to analyze. The time dimensions in your data selection must include the period you select here — either as a Date field, or as a Day, Month, Quarter, or Year field.

Sensitivity

For contextual and collective anomalies only, control how sensitive the Anomalies algorithm is to deviations from the expected pattern or range of values in your data.

At a sensitivity of 100, even a slight deviation is considered an anomaly. At a sensitivity of 0, only major deviations are considered anomalies.

This field is available only when your data selection contains a time dimension (time series data).

Default: 75

Top