CN113159163A

CN113159163A - Lightweight unsupervised anomaly detection method based on multivariate time series data analysis

Info

Publication number: CN113159163A
Application number: CN202110418526.1A
Authority: CN
Inventors: 刘振涛; 樊谨; 汪森; 陈金华; 冯龙超; 匡振中
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-07-23

Abstract

The invention discloses a lightweight unsupervised anomaly detection method based on multivariate time series data analysis. The invention comprises two models: a detection model and an inference model; the detection model extracts time dependency characteristics of the captured multivariate time sequence data through a random convolutional neural network, and then encodes and decodes the multivariate time sequence data after characteristics are extracted through a deep Bayesian network, and the detection model can determine a detection accuracy range; the inference model is composed of a score attention unit, a threshold value automatic selection unit and a point adjustment unit, wherein the score attention unit adopts an attention mechanism to enlarge the characteristic difference between abnormal data and normal data and provides a theoretical basis for abnormal interpretation, the threshold value automatic selection unit can automatically calculate a threshold value, the point adjustment unit can simulate the real abnormal generation process, and the inference model can improve the accuracy, stability and interpretability of abnormal detection. The invention can deal with the rapidly-increased data scale and the complicated and changeable abnormal variety.

Description

Lightweight unsupervised anomaly detection method based on multivariate time series data analysis

Technical Field

The invention belongs to the field of machine learning anomaly detection, and particularly relates to a lightweight unsupervised anomaly detection method based on multivariate time series data analysis.

Background

Anomalies are individuals or subsets of observations that differ significantly from the ontology of observations, and these individuals or subsets deviate from the original generation pattern. Anomaly detection is the process of finding these individuals or subsets. The multivariate time sequence data is a numerical sequence set with precedence relationship, and particularly, each element in the numerical sequence is a multidimensional vector; the multivariate time sequence data can describe the state of the observation value body and imply the change rule of the indication phenomenon, so that the analysis of the multivariate time sequence data is very important in the field of anomaly detection.

The traditional supervised learning method needs labeled data for model training and can only identify the known abnormal type, so that the application is very limited. The unsupervised multivariate timing data anomaly detection can be roughly divided into three categories:

and (5) reducing the dimensionality. When multiple sensors monitor a single system, there is often a relationship between the values generated by each sensor, and dimensionality reduction is intended to identify and abstract the key relationships between these attributes. A common method of dealing with multivariate situations is Principal Component Analysis (PCA), which reduces the overall size of the system under study. Projection pursuit offers another way to reduce the dimensionality of multivariate systems, but like PCA, it can cost a significant amount of computation. The Auto Encoder (Auto Encoder) reduces the length of the hidden layer and can produce the same effect as the PCA method, but the computation cost is much lower. Convolutional variational autocoder (CNN-VAE) greatly reduces the size, complexity and training cost of the autocoder, making it more suitable for industrial internet of things, but it is deficient in timing dependency capture. The LSTM is used for replacing a feedforward network in the VAE, the LSTM and the VAE are combined together, and the problem of lack of timing dependence capture can be solved to a certain extent.

And (6) clustering. In previous research, an overall method for performing anomaly detection by using an isolation tree (called an isolation forest) has been proposed, and compared with a class of SVMs, random forests and the like, the method has a good effect. Multi-core anomaly detection (MKAD) uses kernel functions to learn similarity measures between variables in a data stream and uses One-class SVM to perform classification tasks.

Other methods. A series of methods for capturing time dependence in a multivariate time series using RNN have been proposed. The neural network based on LSTM and GRU is obviously superior to the clustering method in various abnormal detection. The CNN, in combination with a trainable wavelet transform layer, is able to identify progressive drift and input data distribution over conceptual tradeoffs over time.

The multivariate time series anomaly detection random recurrent neural network proposed by omnianomallly can learn multivariate time series with random variable chaining and flat normalized flow, but the model is too large and training costs are too high. A GAN-based autoencoder architecture can learn how to amplify reconstruction errors of input information containing anomalies. Compared with Omnianally, the method has great improvement on training time, but the practicability and stability of the method are greatly reduced due to the problems of mode collapse, difficulty in convergence, difficulty in training and the like inherent in the method, so that the method is not easy to use in practice. Compared with the method, the lightweight unsupervised anomaly detection method based on the multivariate time sequence data analysis can clearly model the time dependence and randomness of the time sequence, and the reasoning model can better improve the accuracy of the model and realize the anomaly interpretation. In addition, the scale of the device is greatly reduced, and the stability and the practicability of the device are better improved.

Disclosure of Invention

The invention aims to solve the technical problems of reducing the scale of model training as low as possible and improving the stability and the usability of an abnormality detection method on the premise of ensuring high accuracy of abnormality detection, so that the abnormality detection method can be deployed and used on equipment with limited resources and computing power.

In the anomaly detection based on the supervised learning method, the data collection difficulty with labels is large, the feasibility is low, the novel anomaly which occurs cannot be judged, and the practicability and the stability are greatly reduced. The traditional RNN-based anomaly detection method has the advantages of large model scale, high training difficulty, long time consumption and difficulty in being applied to resource and computing-capacity-limited equipment.

The technical scheme adopted by the invention for solving the technical problems is as follows: the random convolution neural network is used for replacing the traditional neural network based on RNN, and then the processed data is reconstructed according to the variational encoder technology, so that unsupervised learning is realized; in addition, a general multivariate time series data anomaly detection and inference model is designed in the scheme, and anomaly judgment can be carried out according to the generated anomaly score self-adaptively selecting a corresponding threshold value. The method is realized by the following steps:

step 1: data are preprocessed and divided, and the data are processed into sizes corresponding to the sizes of windows according to set window parameters so as to adapt to the requirements of a detection model on input data; the processed data is divided into training data and test data.

Step 2: and (5) training a model. Training the whole model according to the training data obtained in the step 1 to obtain the required parameter W^*-s,b^*S, φ -s and θ -s.

And step 3: starting an anomaly detection process, and carrying out N times of detection by using a random convolution neural network in a detection model by means of a group of window test data obtained in the step 1: expanding convolution, weight normalization, a rectification unit, residual error connection processing and capturing time dependence.

And 4, step 4: and (3) reconstructing the data processed in the step (3) by means of the deep Bayesian network parameters obtained by training in the step (2) to obtain a multivariate abnormal score.

And 5: and (4) calculating the abnormal degree of each observation example by adopting a score attention unit in the inference model according to the multiple abnormal scores obtained in the step (4) to obtain the abnormal score of each observation example.

Step 6: the auto-select threshold unit may use the generalized Pareto distribution with parameters to construct a threshold using the set of anomaly scores obtained in step 5.

And 7: the point adjusting means determines an abnormality based on the abnormality score obtained in step 5 and the threshold value obtained in step 6, using the characteristics of an actual abnormality.

And 8: and (5) repeating the steps 3 to 7 until all the test data are detected.

The invention reduces the scale of the model and the training cost by using the random convolution neural network, realizes unsupervised anomaly detection by using the variational encoder technology, and improves the usability and the stability of the model by adopting the score attention unit, the automatic threshold selection unit and the point adjustment unit.

The invention has the beneficial effects that: when the multi-element time series data abnormity is detected, the method based on supervised learning is always influenced by the difficulty in obtaining the label data and the changeability of the abnormity types, so that the abnormity detection accuracy is low in practical application, and the abnormity is difficult to identify; the traditional RNN-based abnormality detection method is large in scale and small in application range. The method reconstructs the abnormal detection method of the multivariate time sequence data, solves the defect of supervised learning on the premise of ensuring the accuracy of abnormal supervision, and greatly reduces the model scale, thereby effectively relieving the problem that the current abnormal detection method is difficult to be applied to resource-limited equipment, and greatly improving the practicability of abnormal detection.

Drawings

Fig. 1 is an overall system architecture of the present invention.

Fig. 2 is a structural diagram of a detection model.

Fig. 3 is an implementation of a stochastic convolutional neural network.

FIG. 4 is a block diagram of an inference model.

FIG. 5 is a graph comparing training time and model size with OmniAnomally method under three public data sets.

Fig. 6 is a graph comparing F1 metrics generated for three public data sets, four unsupervised anomaly detection techniques.

Detailed Description

The invention is further described with reference to the accompanying drawings and specific implementation steps:

a lightweight unsupervised anomaly detection method based on multivariate time series data analysis comprises the following steps:

As shown in fig. 1, the overall structure of the present invention is shown. The data processing and dividing part is arranged at the entrance of the structure of the invention and is responsible for carrying out primary processing on the original data to form a data structure required by the detection model. Notably, we account for the window size as T +1, test data x at time T_tDepends on x_t-T:tSequences, rather than individual test values. Wherein x_t-T:tRepresenting the sequence of test values from time T-T to time T.

Model training is an important part influencing an anomaly detection result, parameters are adjusted in an ELBO optimizing mode in detection model training, and a loss function of a single observation example can be described as follows:

the detection model comprises an encoder q_φ(z | x) and decoder p_θ(x | z) module, its parameter W^*-s,b^*S, φ -s and θ -s are adjusted synchronously by optimizing ELBO.

Represents x_tAt the point of satisfying q_φ(z_t∣h_t) Expectation under distributed conditions. Setting the window size to be T +1 according to the training data obtained in the step 1, and simulating and calculating the expectation by using a Monte Carlo methodSetting the sampling length to be L, wherein the first sample is recorded as

h_t＝TCN(x_t-T:t)，

TCN is a paradigm of a stochastic convolutional neural network, represented as

x_tIs subject to a priori probability

First term of formula (1)

Is a standard multivariate Gaussian normal distribution, the second term represents the loss of Kullback-Leibler (KL), and the third term

Is a non-negative reconstruction error. The total loss under the entire window is recorded as:

where T +1 represents the window size,

representing the loss of a single instance of observation,

representing the total loss of observed data under the entire window.

And step 3: detecting each group of test data by using a random convolutional neural network in the detection model, wherein N groups of test data are total; detection includes dilation convolution, weight normalization, rectification unit, residual concatenation, and capturing time dependence.

The model structure for capturing the time dependence is shown in fig. 2, which represents the random convolutional neural network implementation process in our detection model. The dilation convolution operation is defined as:

wherein the input sequence x ∈ RⁿThe filter f: { 0., k-1} → R, d is a dilation factor, k is the convolution kernel size, and s-d · i represents the subscript to the history value. The expansion means that a fixed step is introduced between two adjacent filters, and when d is equal to 1, the expansion convolution is a regular convolution. The field of view of the convolutional network can be effectively enlarged by using a larger expansion factor.

In addition, the method of residual error connection can effectively prevent the gradient disappearance phenomenon from occurring, so that each layer can learn the change process more accurately instead of learning the whole conversion mapping. Is defined as:

the whole test data time series capture process can therefore be summarized as:

where h is the generated hidden layer variable.

And 4, step 4: by detecting model parameters W^*-s,b^*And (4) coding and decoding the test data processed by the random convolutional neural network to obtain each group of multivariate abnormal score.

Each set of test data comprises a plurality of test cases, the dimension of each abnormality score corresponds to the dimension of each test case, and the value of each abnormality score represents the abnormality degree of the dimension of the corresponding test case.

The dimensions of the multivariate abnormality score correspond to the dimensions of the test data, and the value of each dimension represents the degree of abnormality of each dimension of the corresponding observation instance.

The step is a core of the detection model, and a practical framework of the detection model is designed by using the ideas of the VAE and the random convolutional neural network, as shown in FIG. 3.

The left part of fig. 3 shows details of the encoder module, including the random convolutional neural network layer, the VAE layer, and the planar nf layer. In this module, the hidden layer sequence h is generated by means of step 3_t-T:tInput into the VAE layer to generate a mean vector mu_zSum variance vector σ_zObtaining a sequence z of potential spaces using a resampling technique₀. Finally, z is laminated using a PlanarNF layer⁰Conversion to potential space z^k. The process is summarized as follows:

the first expression in formula (6) represents the process of capturing the time dependency of the stochastic convolutional neural network in step 3, the second and third expressions represent the inference probability generation process in VAE, f_φ(h) Representing the hidden layer by using the ReLu activation function and the mean vector mu_zFrom the linear layer, variance vector σ_zGenerated by the Soft-Plus activation function and the minor perturbation ∈. The fourth, fifth and sixth formulas show the generation process of the potential space variable z, wherein z is z^K。

Representing obedience mean value μ_zVariance is σ_zA gaussian distribution of (a).

u_z,W_z,b_zand phi represents the model parameters trained in step 2. By means of the encoder module, the potential spatial sequence z will be obtained finally_t-T:t。

The right half of fig. 3 represents details of the decoder module, including the random convolutional neural network layer and the VAE layer. The process is the same as the encoder module, but with minor changes, the process is disclosed as:

wherein, the expression (7) shows that the random convolution neural network module generates the hidden layer sequence

The process of (1). The two-three formula in the public display (7) is similar to the two-two formula in the public display (6), and the only difference is that

The process of (1). Reconstruction sequence

From probability distribution

Rather than from the Planar NF layer. Same parameters

and theta represents the model parameters after the training in the step 2.

The detection model uses the reconstruction probability as an anomaly score of each dimension of the test case, and is defined as:

wherein s is_tRepresents x_tIs scored for abnormality, and its dimension is related to x_tThe same is true.

The scoring attention unit can effectively process the abnormity close to the normal example, and plays a great role in improving the accuracy of the whole model. For an M-dimensional anomaly score, the unit can be described as:

AS_tan anomaly score representing an observation instance, a_iIs s_tThe weight of the ith dimension. The abnormality score is an index for measuring the degree of abnormality of an observation instance, and a larger abnormality score means that the observation instance is less likely to be abnormal, and conversely, the observation instance is more likely to be abnormal.

The abnormality judgment is the last step of the abnormality detection method, and the judgment process is as follows: and comparing the abnormal score of each observation example with a threshold, if the abnormal score is higher than the threshold, judging the observation example to be abnormal, and if not, judging the observation example to be normal.

And 8: and (5) repeating the steps 3 to 7 until all the test data are detected.

FIG. 5 shows a comparison of training time and model size for different anomaly detection techniques under the same experimental environment and three public data sets. Due to the use of a lightweight stochastic convolutional neural network in the LUAD, it was found that the LUAD was reduced by a factor of 1.5 in training time and 24 in model scale compared to the current advanced method omnianomally.

Fig. 6 shows a comparison of the performance of five unsupervised anomaly detection techniques under the three public data set conditions, which is a measure of the F1 metric. It can be found that a single approach is: AE and LSTM-VAE, the performance expression is much lower than the other three methods. Omni Anomaly uses a traditional RNN-based anomaly detection method, and the large-scale model of the method has better performance on a small-batch data set, but has poorer performance expression on a large-scale data set. USAD is based on the non-supervision anomaly detection model of GAN realization, and this model has better performance in detecting "light weight" anomaly, and USAD needs a large amount of data training in addition, and the practicality is relatively poor. The LUAD can overcome the defects of training time and model scale brought by the traditional RNN-based method, and can also express excellent performance on data sets with various sizes.

Claims

1. A lightweight unsupervised anomaly detection method based on multivariate time series data analysis is characterized in that the method is realized by two models: a detection model and an inference model; the detection model extracts time dependency characteristics of the captured multivariate time sequence data through a random convolutional neural network, and then encodes and decodes the multivariate time sequence data after characteristics are extracted through a deep Bayesian network, and the detection model can determine a detection accuracy range; the inference model is composed of a score attention unit, a threshold value automatic selection unit and a point adjustment unit, wherein the score attention unit adopts an attention mechanism to enlarge the characteristic difference between abnormal data and normal data and provides a theoretical basis for abnormal interpretation, the threshold value automatic selection unit automatically calculates a threshold value by using a peak-over-threshold method, the point adjustment unit simulates the generation process of real abnormality by using a point-adjust mechanism, and the inference model can improve the accuracy, stability and interpretability of abnormal detection.

2. The lightweight unsupervised anomaly detection method based on multivariate time series data analysis as claimed in claim 1, characterized in that the method comprises the following steps:

step 1: data are preprocessed and divided, and the data are processed into sizes corresponding to the sizes of windows according to set window parameters so as to adapt to the requirements of a detection model on input data; dividing the processed data into training data and testing data;

step 2: training a model; training the detection model according to the training data obtained in the step 1 to obtain the required model parameter W^*-s，b^*-s, φ -s and θ -s;

the detection model comprises an encoder module q_φ(zx) and decoder module p_θ(x | z), parameter W thereof^*-s，b^*-s, φ -s and θ -s are adjusted synchronously by optimizing ELBO;

represents x_tAt the point of satisfying q_φ(z_t|h_t) (iii) a desire under distribution conditions; setting the window size to be T +1 according to the training data obtained in the step 1, simulating and calculating expectation by using a Monte Carlo method, and setting the sampling length to be L, wherein the first sampling is recorded as

1≤l≤L；h_t＝TCN(x_t-T：t)，

TCN is a paradigm of a stochastic convolutional neural network, represented as

x_tIs subject to a priori probability

Formula (II)(1) Item I of (1)

Is a non-negative reconstruction error and the total loss under the whole window is recorded as:

where T +1 represents the window size,

representing the loss of a single instance of observation,

represents the total loss of observed data under the whole window;

and step 3: detecting each group of test data by using a random convolutional neural network in a detection model, wherein N groups of test data are total; detecting comprises expanding convolution, weight normalization, a rectifying unit, residual error connection and capturing time dependence;

and 4, step 4: by detecting model parameters W^*-s，b^*Coding and decoding the test data processed by the random convolutional neural network to obtain each group of multivariate abnormal score;

each group of test data comprises a plurality of test cases, the dimension of each abnormal score corresponds to the dimension of each test case, and the value of each abnormal score represents the abnormal degree of the dimension of the corresponding test case;

and 5: taking the multivariate abnormal score as the input of a score attention unit in the inference model, and calculating the abnormal degree of each test case through the score attention unit so as to obtain the abnormal score of each test case;

the score attention unit can effectively process the abnormity close to the normal example, and plays a great role in improving the precision of the whole model; for an M-dimensional anomaly score, the score attention unit describes the anomaly score as:

AS_trepresents the current t-th test case s_tAn abnormality score of_iIs the current test case s_tThe weight of the ith dimension of (1); s_iIs the current test case s_tThe abnormality score of the ith dimension of (1); the abnormality score is an index for measuring the abnormality degree of an observation example, and the larger the abnormality score is, the smaller the possibility that the observation example is abnormal is, otherwise, the higher the possibility that the observation example is abnormal is;

step 6: the automatic threshold selection unit constructs a threshold by using generalized Pareto distribution with parameters based on each group of abnormal scores obtained in the step 5;

and 7: the point adjusting unit judges the abnormality of each group of abnormal scores obtained in the step 5 and the corresponding threshold value obtained in the step 6 by using the characteristics of the abnormality in reality;

the abnormality judgment is the last step of the abnormality detection method, and the judgment process is as follows: comparing the abnormal score of each observation example with a threshold, if the abnormal score is higher than the threshold, judging the observation example to be abnormal, otherwise, judging the observation example to be normal;

and 8: and sorting the abnormal scores of each dimension of the heavy observed examples which are judged to be abnormal, wherein the larger the abnormal score is, the larger the influence on the observed examples is.

3. The lightweight unsupervised anomaly detection method based on multivariate time series data analysis as claimed in claim 2, characterized in that the encoder module comprises a random convolutional neural network layer, a VAE layer and a Planar NF layer; hidden layer sequence h generated by means of step 3_t-T：tInputting it into VAE layer, and generatingMean vector μ_zSum variance vector σ_zObtaining a sequence z of potential spaces using a resampling technique₀(ii) a Finally z is laminated using a Planar NF layer⁰Conversion to potential space z^k(ii) a The process is summarized as follows:

the first expression in formula (6) represents the process of capturing the time dependency of the stochastic convolutional neural network in step 3, the second and third expressions represent the inference probability generation process in VAE, f_φ(h) Representing the hidden layer by using the ReLu activation function and the mean vector mu_zFrom the linear layer, variance vector σ_zGenerating by a Soft-Plus activation function and a small perturbation epsilon; the fourth, fifth and sixth formulas show the generation process of the potential space variable z, wherein z is z^K；

Representing obedience mean value μ_zVariance is σ_z(ii) a gaussian distribution of;

u_z，W_z，b_zand phi represents the model parameters trained in the step 2; by means of the encoder module, the potential spatial sequence z will be obtained finally_t-T：t。

4. The lightweight unsupervised anomaly detection method based on multivariate time series data analysis as claimed in claim 3, characterized in that the decoder module comprises a random convolutional neural network layer and a VAE layer, the process is similar to the encoder module, and the process is disclosed as:

wherein, the following is shown in the same type in the public display (7)Hidden layer sequence generated by machine convolution neural network module

The process of (2); the two-three formula in the public display (7) is similar to the two-two formula in the public display (6), and the only difference is that

The process of (2); reconstruction sequence

From probability distribution

Instead of from the Planar NF layer; same parameters

and theta represents the model parameters after the training in the step 2.