Anomaly Detection
Anomaly Detection
IQR
Usage
This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected.
Name: IQR
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
method
: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch".q1
: The lower quantile when method is set to "stream".q3
: The upper quantile when method is set to "stream".
Output Series: Output a single series. The type is DOUBLE.
Note:
Examples
Batch computing
Input series:
+-----------------------------+------------+
| Time|root.test.s1|
+-----------------------------+------------+
|1970-01-01T08:00:00.100+08:00| 0.0|
|1970-01-01T08:00:00.200+08:00| 0.0|
|1970-01-01T08:00:00.300+08:00| 1.0|
|1970-01-01T08:00:00.400+08:00| -1.0|
|1970-01-01T08:00:00.500+08:00| 0.0|
|1970-01-01T08:00:00.600+08:00| 0.0|
|1970-01-01T08:00:00.700+08:00| -2.0|
|1970-01-01T08:00:00.800+08:00| 2.0|
|1970-01-01T08:00:00.900+08:00| 0.0|
|1970-01-01T08:00:01.000+08:00| 0.0|
|1970-01-01T08:00:01.100+08:00| 1.0|
|1970-01-01T08:00:01.200+08:00| -1.0|
|1970-01-01T08:00:01.300+08:00| -1.0|
|1970-01-01T08:00:01.400+08:00| 1.0|
|1970-01-01T08:00:01.500+08:00| 0.0|
|1970-01-01T08:00:01.600+08:00| 0.0|
|1970-01-01T08:00:01.700+08:00| 10.0|
|1970-01-01T08:00:01.800+08:00| 2.0|
|1970-01-01T08:00:01.900+08:00| -2.0|
|1970-01-01T08:00:02.000+08:00| 0.0|
+-----------------------------+------------+
SQL for query:
select iqr(s1) from root.test
Output series:
+-----------------------------+-----------------+
| Time|iqr(root.test.s1)|
+-----------------------------+-----------------+
|1970-01-01T08:00:01.700+08:00| 10.0|
+-----------------------------+-----------------+
KSigma
Usage
This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm.
Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly.
Name: KSIGMA
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
k
: How many times to multiply on standard deviation to define anomaly, the default value is 3.window
: The window size of Dynamic K-Sigma Algorithm, the default value is 10000.
Output Series: Output a single series. The type is same as input series.
Note: Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output.
Examples
Assigning k
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2020-01-01T00:00:02.000+08:00| 0.0|
|2020-01-01T00:00:03.000+08:00| 50.0|
|2020-01-01T00:00:04.000+08:00| 100.0|
|2020-01-01T00:00:06.000+08:00| 150.0|
|2020-01-01T00:00:08.000+08:00| 200.0|
|2020-01-01T00:00:10.000+08:00| 200.0|
|2020-01-01T00:00:14.000+08:00| 200.0|
|2020-01-01T00:00:15.000+08:00| 200.0|
|2020-01-01T00:00:16.000+08:00| 200.0|
|2020-01-01T00:00:18.000+08:00| 200.0|
|2020-01-01T00:00:20.000+08:00| 150.0|
|2020-01-01T00:00:22.000+08:00| 100.0|
|2020-01-01T00:00:26.000+08:00| 50.0|
|2020-01-01T00:00:28.000+08:00| 0.0|
|2020-01-01T00:00:30.000+08:00| NaN|
+-----------------------------+---------------+
SQL for query:
select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30
Output series:
+-----------------------------+---------------------------------+
|Time |ksigma(root.test.d1.s1,"k"="3.0")|
+-----------------------------+---------------------------------+
|2020-01-01T00:00:02.000+08:00| 0.0|
|2020-01-01T00:00:03.000+08:00| 50.0|
|2020-01-01T00:00:26.000+08:00| 50.0|
|2020-01-01T00:00:28.000+08:00| 0.0|
+-----------------------------+---------------------------------+
LOF
Usage
This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output.
Name: LOF
Input Series: Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
method
:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension.k
:use the k-th distance to calculate lof. Default value is 3.window
: size of window to split origin data points. Default value is 10000.windowsize
:dimension that will be transformed into when method is "series". The default value is 5.
Output Series: Output a single series. The type is DOUBLE.
Note: Incomplete rows will be ignored. They are neither calculated nor marked as anomaly.
Examples
Using default parameters
Input series:
+-----------------------------+---------------+---------------+
| Time|root.test.d1.s1|root.test.d1.s2|
+-----------------------------+---------------+---------------+
|1970-01-01T08:00:00.100+08:00| 0.0| 0.0|
|1970-01-01T08:00:00.200+08:00| 0.0| 1.0|
|1970-01-01T08:00:00.300+08:00| 1.0| 1.0|
|1970-01-01T08:00:00.400+08:00| 1.0| 0.0|
|1970-01-01T08:00:00.500+08:00| 0.0| -1.0|
|1970-01-01T08:00:00.600+08:00| -1.0| -1.0|
|1970-01-01T08:00:00.700+08:00| -1.0| 0.0|
|1970-01-01T08:00:00.800+08:00| 2.0| 2.0|
|1970-01-01T08:00:00.900+08:00| 0.0| null|
+-----------------------------+---------------+---------------+
SQL for query:
select lof(s1,s2) from root.test.d1 where time<1000
Output series:
+-----------------------------+-------------------------------------+
| Time|lof(root.test.d1.s1, root.test.d1.s2)|
+-----------------------------+-------------------------------------+
|1970-01-01T08:00:00.100+08:00| 3.8274824267668244|
|1970-01-01T08:00:00.200+08:00| 3.0117631741126156|
|1970-01-01T08:00:00.300+08:00| 2.838155437762879|
|1970-01-01T08:00:00.400+08:00| 3.0117631741126156|
|1970-01-01T08:00:00.500+08:00| 2.73518261244453|
|1970-01-01T08:00:00.600+08:00| 2.371440975708148|
|1970-01-01T08:00:00.700+08:00| 2.73518261244453|
|1970-01-01T08:00:00.800+08:00| 1.7561416374270742|
+-----------------------------+-------------------------------------+
Diagnosing 1d timeseries
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|1970-01-01T08:00:00.100+08:00| 1.0|
|1970-01-01T08:00:00.200+08:00| 2.0|
|1970-01-01T08:00:00.300+08:00| 3.0|
|1970-01-01T08:00:00.400+08:00| 4.0|
|1970-01-01T08:00:00.500+08:00| 5.0|
|1970-01-01T08:00:00.600+08:00| 6.0|
|1970-01-01T08:00:00.700+08:00| 7.0|
|1970-01-01T08:00:00.800+08:00| 8.0|
|1970-01-01T08:00:00.900+08:00| 9.0|
|1970-01-01T08:00:01.000+08:00| 10.0|
|1970-01-01T08:00:01.100+08:00| 11.0|
|1970-01-01T08:00:01.200+08:00| 12.0|
|1970-01-01T08:00:01.300+08:00| 13.0|
|1970-01-01T08:00:01.400+08:00| 14.0|
|1970-01-01T08:00:01.500+08:00| 15.0|
|1970-01-01T08:00:01.600+08:00| 16.0|
|1970-01-01T08:00:01.700+08:00| 17.0|
|1970-01-01T08:00:01.800+08:00| 18.0|
|1970-01-01T08:00:01.900+08:00| 19.0|
|1970-01-01T08:00:02.000+08:00| 20.0|
+-----------------------------+---------------+
SQL for query:
select lof(s1, "method"="series") from root.test.d1 where time<1000
Output series:
+-----------------------------+--------------------+
| Time|lof(root.test.d1.s1)|
+-----------------------------+--------------------+
|1970-01-01T08:00:00.100+08:00| 3.77777777777778|
|1970-01-01T08:00:00.200+08:00| 4.32727272727273|
|1970-01-01T08:00:00.300+08:00| 4.85714285714286|
|1970-01-01T08:00:00.400+08:00| 5.40909090909091|
|1970-01-01T08:00:00.500+08:00| 5.94999999999999|
|1970-01-01T08:00:00.600+08:00| 6.43243243243243|
|1970-01-01T08:00:00.700+08:00| 6.79999999999999|
|1970-01-01T08:00:00.800+08:00| 7.0|
|1970-01-01T08:00:00.900+08:00| 7.0|
|1970-01-01T08:00:01.000+08:00| 6.79999999999999|
|1970-01-01T08:00:01.100+08:00| 6.43243243243243|
|1970-01-01T08:00:01.200+08:00| 5.94999999999999|
|1970-01-01T08:00:01.300+08:00| 5.40909090909091|
|1970-01-01T08:00:01.400+08:00| 4.85714285714286|
|1970-01-01T08:00:01.500+08:00| 4.32727272727273|
|1970-01-01T08:00:01.600+08:00| 3.77777777777778|
+-----------------------------+--------------------+
MissDetect
Usage
This function is used to detect missing anomalies.
In some datasets, missing values are filled by linear interpolation.
Thus, there are several long perfect linear segments.
By discovering these perfect linear segments,
missing anomalies are detected.
Name: MISSDETECT
Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.
Parameter:
error
: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10.
Output Series: Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true.
Examples
Input series:
+-----------------------------+---------------+
| Time|root.test.d2.s2|
+-----------------------------+---------------+
|2021-07-01T12:00:00.000+08:00| 0.0|
|2021-07-01T12:00:01.000+08:00| 1.0|
|2021-07-01T12:00:02.000+08:00| 0.0|
|2021-07-01T12:00:03.000+08:00| 1.0|
|2021-07-01T12:00:04.000+08:00| 0.0|
|2021-07-01T12:00:05.000+08:00| 0.0|
|2021-07-01T12:00:06.000+08:00| 0.0|
|2021-07-01T12:00:07.000+08:00| 0.0|
|2021-07-01T12:00:08.000+08:00| 0.0|
|2021-07-01T12:00:09.000+08:00| 0.0|
|2021-07-01T12:00:10.000+08:00| 0.0|
|2021-07-01T12:00:11.000+08:00| 0.0|
|2021-07-01T12:00:12.000+08:00| 0.0|
|2021-07-01T12:00:13.000+08:00| 0.0|
|2021-07-01T12:00:14.000+08:00| 0.0|
|2021-07-01T12:00:15.000+08:00| 0.0|
|2021-07-01T12:00:16.000+08:00| 1.0|
|2021-07-01T12:00:17.000+08:00| 0.0|
|2021-07-01T12:00:18.000+08:00| 1.0|
|2021-07-01T12:00:19.000+08:00| 0.0|
|2021-07-01T12:00:20.000+08:00| 1.0|
+-----------------------------+---------------+
SQL for query:
select missdetect(s2,'minlen'='10') from root.test.d2
Output series:
+-----------------------------+------------------------------------------+
| Time|missdetect(root.test.d2.s2, "minlen"="10")|
+-----------------------------+------------------------------------------+
|2021-07-01T12:00:00.000+08:00| false|
|2021-07-01T12:00:01.000+08:00| false|
|2021-07-01T12:00:02.000+08:00| false|
|2021-07-01T12:00:03.000+08:00| false|
|2021-07-01T12:00:04.000+08:00| true|
|2021-07-01T12:00:05.000+08:00| true|
|2021-07-01T12:00:06.000+08:00| true|
|2021-07-01T12:00:07.000+08:00| true|
|2021-07-01T12:00:08.000+08:00| true|
|2021-07-01T12:00:09.000+08:00| true|
|2021-07-01T12:00:10.000+08:00| true|
|2021-07-01T12:00:11.000+08:00| true|
|2021-07-01T12:00:12.000+08:00| true|
|2021-07-01T12:00:13.000+08:00| true|
|2021-07-01T12:00:14.000+08:00| true|
|2021-07-01T12:00:15.000+08:00| true|
|2021-07-01T12:00:16.000+08:00| false|
|2021-07-01T12:00:17.000+08:00| false|
|2021-07-01T12:00:18.000+08:00| false|
|2021-07-01T12:00:19.000+08:00| false|
|2021-07-01T12:00:20.000+08:00| false|
+-----------------------------+------------------------------------------+
Range
Usage
This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output.
Name: RANGE
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
lower_bound
:lower bound of range anomaly detection.upper_bound
:upper bound of range anomaly detection.
Output Series: Output a single series. The type is the same as the input.
Note: Only when upper_bound
is larger than lower_bound
, the anomaly detection will be performed. Otherwise, nothing will be output.
Examples
Assigning Lower and Upper Bound
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2020-01-01T00:00:02.000+08:00| 100.0|
|2020-01-01T00:00:03.000+08:00| 101.0|
|2020-01-01T00:00:04.000+08:00| 102.0|
|2020-01-01T00:00:06.000+08:00| 104.0|
|2020-01-01T00:00:08.000+08:00| 126.0|
|2020-01-01T00:00:10.000+08:00| 108.0|
|2020-01-01T00:00:14.000+08:00| 112.0|
|2020-01-01T00:00:15.000+08:00| 113.0|
|2020-01-01T00:00:16.000+08:00| 114.0|
|2020-01-01T00:00:18.000+08:00| 116.0|
|2020-01-01T00:00:20.000+08:00| 118.0|
|2020-01-01T00:00:22.000+08:00| 120.0|
|2020-01-01T00:00:26.000+08:00| 124.0|
|2020-01-01T00:00:28.000+08:00| 126.0|
|2020-01-01T00:00:30.000+08:00| NaN|
+-----------------------------+---------------+
SQL for query:
select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30
Output series:
+-----------------------------+------------------------------------------------------------------+
|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")|
+-----------------------------+------------------------------------------------------------------+
|2020-01-01T00:00:02.000+08:00| 100.0|
|2020-01-01T00:00:28.000+08:00| 126.0|
+-----------------------------+------------------------------------------------------------------+
TwoSidedFilter
Usage
The function is used to filter anomalies of a numeric time series based on two-sided window detection.
Name: TWOSIDEDFILTER
Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE
Output Series: Output a single series. The type is the same as the input. It is the input without anomalies.
Parameter:
len
: The size of the window, which is a positive integer. By default, it's 5. Whenlen
=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point.threshold
: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold.
Examples
Input series:
+-----------------------------+------------+
| Time|root.test.s0|
+-----------------------------+------------+
|1970-01-01T08:00:00.000+08:00| 2002.0|
|1970-01-01T08:00:01.000+08:00| 1946.0|
|1970-01-01T08:00:02.000+08:00| 1958.0|
|1970-01-01T08:00:03.000+08:00| 2012.0|
|1970-01-01T08:00:04.000+08:00| 2051.0|
|1970-01-01T08:00:05.000+08:00| 1898.0|
|1970-01-01T08:00:06.000+08:00| 2014.0|
|1970-01-01T08:00:07.000+08:00| 2052.0|
|1970-01-01T08:00:08.000+08:00| 1935.0|
|1970-01-01T08:00:09.000+08:00| 1901.0|
|1970-01-01T08:00:10.000+08:00| 1972.0|
|1970-01-01T08:00:11.000+08:00| 1969.0|
|1970-01-01T08:00:12.000+08:00| 1984.0|
|1970-01-01T08:00:13.000+08:00| 2018.0|
|1970-01-01T08:00:37.000+08:00| 1484.0|
|1970-01-01T08:00:38.000+08:00| 1055.0|
|1970-01-01T08:00:39.000+08:00| 1050.0|
|1970-01-01T08:01:05.000+08:00| 1023.0|
|1970-01-01T08:01:06.000+08:00| 1056.0|
|1970-01-01T08:01:07.000+08:00| 978.0|
|1970-01-01T08:01:08.000+08:00| 1050.0|
|1970-01-01T08:01:09.000+08:00| 1123.0|
|1970-01-01T08:01:10.000+08:00| 1150.0|
|1970-01-01T08:01:11.000+08:00| 1034.0|
|1970-01-01T08:01:12.000+08:00| 950.0|
|1970-01-01T08:01:13.000+08:00| 1059.0|
+-----------------------------+------------+
SQL for query:
select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test
Output series:
+-----------------------------+------------+
| Time|root.test.s0|
+-----------------------------+------------+
|1970-01-01T08:00:00.000+08:00| 2002.0|
|1970-01-01T08:00:01.000+08:00| 1946.0|
|1970-01-01T08:00:02.000+08:00| 1958.0|
|1970-01-01T08:00:03.000+08:00| 2012.0|
|1970-01-01T08:00:04.000+08:00| 2051.0|
|1970-01-01T08:00:05.000+08:00| 1898.0|
|1970-01-01T08:00:06.000+08:00| 2014.0|
|1970-01-01T08:00:07.000+08:00| 2052.0|
|1970-01-01T08:00:08.000+08:00| 1935.0|
|1970-01-01T08:00:09.000+08:00| 1901.0|
|1970-01-01T08:00:10.000+08:00| 1972.0|
|1970-01-01T08:00:11.000+08:00| 1969.0|
|1970-01-01T08:00:12.000+08:00| 1984.0|
|1970-01-01T08:00:13.000+08:00| 2018.0|
|1970-01-01T08:01:05.000+08:00| 1023.0|
|1970-01-01T08:01:06.000+08:00| 1056.0|
|1970-01-01T08:01:07.000+08:00| 978.0|
|1970-01-01T08:01:08.000+08:00| 1050.0|
|1970-01-01T08:01:09.000+08:00| 1123.0|
|1970-01-01T08:01:10.000+08:00| 1150.0|
|1970-01-01T08:01:11.000+08:00| 1034.0|
|1970-01-01T08:01:12.000+08:00| 950.0|
|1970-01-01T08:01:13.000+08:00| 1059.0|
+-----------------------------+------------+