CN116663613A

CN116663613A - Multi-element time sequence anomaly detection method for intelligent Internet of things system

Info

Publication number: CN116663613A
Application number: CN202310339800.5A
Authority: CN
Inventors: 张啸; 许书晴; 徐伟涛; 韩旭; 于东晓
Original assignee: Shandong University; Inspur Cloud Information Technology Co Ltd
Current assignee: Shandong University; Inspur Cloud Information Technology Co Ltd
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-08-29

Abstract

The application belongs to the field of Internet of things, and particularly relates to a multi-element time sequence anomaly detection method for an intelligent Internet of things system, which comprises the following steps that S1, a space encoder adopts a graph learning strategy to model potential graph relations among sequences; s2, a multi-head self-attention mechanism and a GAT network capture potential time sequence representation are applied by a time encoder; s3, the reconstruction decoder realizes the reconstruction of the input sequence by utilizing a multi-head self-attention network based on potential time sequence representation; a predictive decoder uses a multi-headed attention network to implement multi-step predictions of future time series based on the potential time series representation; s4, reversely updating parameters of a hierarchical variation automatic encoder formed by a space encoder, a time encoder, a reconstruction decoder and a prediction decoder by using a random gradient descent algorithm to obtain a model of a learned time sequence normal mode; s5, abnormality detection is carried out by using an abnormality scoring mechanism combined with history and future. The method has the advantage of solving two major problems of time randomness and space randomness in multi-element time sequence anomaly detection.

Description

Multi-element time sequence anomaly detection method for intelligent Internet of things system

Technical Field

The application belongs to the field of Internet of things, and particularly relates to a multi-element time sequence anomaly detection method for an intelligent Internet of things system.

Background

The multivariate time series anomaly detection aims at monitoring systems in various real scenes, finding anomalies deviating from a normal mode in time and giving early warning so as to help an administrator make correct decisions and perform system maintenance. In recent years, as various sensing devices are widely applied to intelligent internet of things infrastructure, intelligent power grid, intelligent traffic, intelligent industry and other systems, the problem of multi-element time series anomaly detection has become a research hotspot in academia and industry. Anomalies are generally considered unexpected values that deviate significantly from the normal mode, so existing approaches have focused on learning the normal mode of the time series and determining data that deviates from the normal mode as outliers. Common deep learning anomaly detection methods mainly include prediction-based and reconstruction-based methods. The prediction-based method often adopts time dependence and sequence-to-sequence relationships of modeling time sequences such as a long-short-term memory network (LSTM), a transducer, a Graph Neural Network (GNN) and the like, and takes points with prediction errors larger than a threshold value as abnormal points. Reconstruction-based methods often use Automatic Encoders (AE), variational Automatic Encoders (VAEs), generate a potential representation of learning robustness against network (GAN), etc., and determine points where the reconstruction error is above a threshold as outliers. In practice, learning a robust representation of the normal mode from the multivariate time series is crucial for anomaly detection, while the robustness of anomaly detection is highly limited by the temporal and spatial randomness of the multivariate time series. In particular, the time-dependent pattern of the time-randomness representation sequence is quite variable, and normal time fluctuations are common in a multivariate time sequence, but these normal fluctuations are easily misjudged as anomalies. Thus, various methods based on variational automatic encoders have been proposed to model temporal randomness in a multi-dimensional time series, such as OmniAnomaly, by introducing latent variables to learn a robust representation. In addition, some work considers potential spatial relationships in multivariate time series, significantly improving the performance of anomaly detection, such as GRELEN. The goal of most existing approaches is to learn graph structures, such as GDNs, by measuring the similarity between the embeddings of the individual sequences and selecting the sequence most similar to top-K as the neighbor sequence. Obviously, the top-K strategy cannot model the randomness of the graph relationship well, namely the spatial randomness, which can lead to information loss of the graph relationship and impair the anomaly detection capability.

Disclosure of Invention

Based on the problems, a high-robustness multi-element time sequence anomaly detection method for an intelligent Internet of things system is provided to solve two problems of time randomness and space randomness in multi-element time sequence anomaly detection, an end-to-end hierarchical space-time variation automatic encoder framework is provided, the technical scheme is that,

a multi-element time sequence abnormality detection method for an intelligent Internet of things system comprises the following steps,

s1, modeling potential graph relations among sequences by a space encoder by adopting a graph learning strategy;

s2, capturing a potential time sequence representation by a time encoder by using a multi-head self-attention mechanism and a graph attention network;

s3, the reconstruction decoder realizes the reconstruction of the input sequence by utilizing a multi-head self-attention network based on potential time sequence representation; a predictive decoder uses a multi-headed attention network to implement multi-step predictions of future time series based on the potential time series representation;

s4, reversely updating parameters of a hierarchical variation automatic encoder formed by a space encoder, a time encoder, a reconstruction decoder and a prediction decoder by using a random gradient descent algorithm to obtain a model of a learned time sequence normal mode;

s5, abnormality detection is carried out by using an abnormality scoring mechanism combined with history and future.

Preferably, in step S1, the spatial encoder models the potential graph relationship between the multivariate time series using a graph learning strategy based on a self-attention mechanism, and the specific steps are as follows:

s11, using a linear network to convert any two original time sequences x _i 、x _j Mapping to high dimension, extracting high dimension time feature g _i 、g _j I.e.Wherein (1)>Is a parameter matrix which can be learned;

s12, extracting time characteristics g _i 、g _j Mapping to vectors q in different subspaces _i 、k _i And q _j 、k _j I.e. q _i ＝g _i W _q ，k _i ＝g _i W _k ，q _j ＝g _j W _q ，k _j ＝g _j W _k Wherein W is _q And W is _k Is a parameter matrix which can be learned;

s13, assuming that the potential diagram relation random variable obeys Bernoulli distribution, then changing the distribution Connection vector q _i And k _j Then, a probability value between 0 and 1 is obtained through a multi-layer perceptron (MLP) network, and is taken as probability parameter pi of Bernoulli distribution _i,j I.e. pi _i,j ＝MLP(W _s [q _i ,k _j ]) Wherein W is _s Is a parameter matrix which can be learned;

s14, resampling by utilizing Gumbel-Softmax distribution to obtain continuous approximation of Bernoulli random variables, thereby obtaining potentialThe graph relationship represents the variable e _i,j I.e.Wherein the method comprises the steps ofRespectively controlling the smoothness of Gumbel-Softmax distribution for the probability of the existence of correlation and the probability of the nonexistence of correlation between the time sequences i and j, wherein tau is a temperature coefficient; />Is an independent co-distributed sample sampled from a standard gummel distribution;

s15, executing S11-S14 on any two sequences in the N input time sequences in parallel to obtain an adjacency matrix E.

Preferably, in step S2, the temporal encoder combines the multi-headed self-attention mechanism with the GAT network capture potential time series representation to model time-dependent randomness, as follows:

s21, for each time sequence, using an embedded layer module containing 'position coding', inputting the input value at the time stamp t of the ith time sequenceInput embedding vector converted into d-dimension +.>As input to a multi-headed self-care module, i.e.Wherein emb is a linear layer and pos represents position coding;

s22, modeling time dependence on each time sequence i through a multi-head self-attention network, fusing time sequence data in a time window through the multi-head self-attention network, and carrying out weighted average according to importance degree of the time sequence data in the time window to obtain time characteristic representationI.e. < -> Where h is the number of heads of the multi-head attention network, W _o Is a matrix of parameters that can be learned, for the mth head,q, K, V represent the values of the query, key value and data itself, respectively, are represented by h _i Obtained by linear mapping, in particular q=h _i W _i ^Q ,K＝h _i W _i ^K ,V＝h _i W _i ^V Wherein W is _i ^Q ∈R ^d×d ,W _i ^K ∈R ^d×d ，W _i ^V ∈R ^d ^×d ，/>Mapping matrices for different linearities;

s23, sending the time characteristic representation of all sequences into a GAT network, and aggregating neighbor information according to a potential graph relation adjacency matrix learned by a space encoder to obtain the updated time characteristic representation of each sequenceI.e.Where Relu is the activation function, N (i) = { j|e _j,i =1 }, W is a trainable parameter matrix, α _j,i Representing the attention factor, +.>Wherein the method comprises the steps ofLeakyRelu is the activation function, a is the learning coefficient of the attention mechanism;

s24, assuming that the potential time sequence random variable is subjected to Gaussian distribution, then distributing the variation Will->Mean and variance of Gaussian distribution obtained through linear network>I.e.

S25, obtaining potential time series representation of the ith time series at time t by utilizing reparameterization techniqueI.e. < ->Wherein->s is the time window length, d _z The dimension of the vector is represented for each time-wise potential time sequence of each sequence.

Preferably, in step S3, the decoder network modeling distribution is reconstructedThe input sequence is reconstructed in the following way, for each time sequence i, the potential time sequence thereof represents +.>As input to the multi-headed self-care network, the multi-headed self-care network learns the time dependence between different moments in the potential time series representation, resulting in the feature representation +.>Then obtaining a reconstruction output +.>And reconstructing the input sequence is realized.

Preferably, in step S3, the predictive decoder modelingMulti-step prediction is performed as follows, the prediction decoder of which consists of two layers of attention network for each time series i: the first layer is a self-attention network layer, which takes a randomly generated token vector as an original input, takes the predicted output of the last moment as the input of the moment in an autoregressive manner in the prediction process, and obtains the characteristic->A second layer of the codec attention network layer, the query is output from the first layer of the self-attention network, the key and the value are from the potential time series representation z _i This allows each instant in the decoder to be concerned with the information of all the instants in the input sequence, thus enabling a multi-step prediction of the future time sequence, yielding a prediction value y _i 。

Preferably, in step S4, a gaussian prior distribution is set to p (z _i ) =n (0, 1), bernoulli a priori distribution is Ber (b), where b is a hyper-parameter controlling graph sparsity; the variation lower bound ELBO of the variation automatic encoder is used as a loss function, and the parameter phi of the hierarchical variation automatic encoder is reversely updated by using a random gradient descent algorithm _e ，φ _z ，θ _r ，θ _f The method comprises the steps of carrying out a first treatment on the surface of the Specifically, loss is:

φ _e is a parameter of a spatial encoder, phi _z Parameters, θ of time encoder _r Reconstructing parameters, θ of decoder _f Is a parameter of a predictive decoder, X is input data of a spatial encoder and a temporal encoder; where the first term is reconstruction loss, representing the ability of the reconstruction decoder to reconstruct the history input, the second term is prediction loss, representing the prediction accuracy of the prediction decoder, the latter two terms representing using KL-divergence to control the variance distribution and the prior distribution to be as small as possible.

Preferably, in step S5,

s51, inputting test set data into a trained model to obtain a reconstruction value and a predicted value, and for each time sequence i, for the historical time t= (t-s, …, t-1), obtaining the reconstruction valueAnd true value +.>Mean square error of (2) as reconstruction errorFor the future instant t= (t, … t+s), the predicted value +.>And true value +.>Mean square error of (2) as prediction error->

S52, regarding each time t, taking the average value of the reconstruction errors of all sequences at the time as the reconstruction error Err of the time t _r (t) taking the average of the prediction errors of all sequences at that time as the prediction error Err at time t _f (t)；

S53, selecting the maximum reconstruction error in the history time window (t-s, …, t-1) as the history abnormality factor A of the moment t _h (t) selecting the maximum prediction error in the future time window (t, … t+s) as the future abnormality factor A of the moment t _f (t) the prediction error at the time t is used as the current abnormality factor A _c (t)；

S54, historical abnormality factor A _h (t) current abnormality factor A _c (t) future abnormality factor A _f (t) coefficient weighting to obtain an anomaly score A based on reconstruction and prediction errors ₁ (t)＝A _c (t)+γ·A _h (t)+η·A _f (t) wherein γ and η are each A _h (t)、A _f The scaling factor of (t) is derived from a grid search algorithm;

s55, taking the average neighbor number of all sequences as an abnormal score based on the image sparsity Wherein D is _i Representing the number of neighbors of the ith sequence;

s56 if A ₁ (t)＞thr ₁ Or A ₂ (t)<thr ₂ If the time t is considered to be abnormal, the abnormal label a ^t =1, wherein the threshold thr ₁ And thr ₂ Can be obtained by a grid search algorithm.

Compared with the prior art, the application has the following advantages:

1. the hierarchical space-time variant automatic encoder HSTVAE is provided, and the time randomness and the space randomness of a multi-element time sequence are simultaneously modeled under a unified end-to-end framework.

2. In a spatial encoder, whether there is a relationship between sequences is modeled as a binary random variable, and the graph structure can be automatically learned by a graph learning strategy. The time encoder then embeds the highly structured time series into potentially random variables to capture complex time-dependency and neighbor information. In addition, the history-future combined anomaly scoring mechanism weights the historical anomaly factors, the future anomaly factors and the prediction errors of the current timestamp to achieve anomaly detection that is more sensitive to the current timestamp.

Drawings

FIG. 1 is a schematic flow chart of the method of the present application;

FIG. 2 is a probability map model of a hierarchical variational automatic encoder;

FIG. 3 is an overall frame diagram of a hierarchical variant automatic encoder;

FIG. 4 is a flow chart of anomaly detection;

fig. 5 is a graph comparing the anomaly detection effect of HSTVAE proposed by the present application with that of three prior arts.

Detailed Description

The application will be further described in detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Graph attention network (GAT) -Graph attention networks

Hierarchical temporal-spatial variant automatic encoder (HSTVAE) -Hierarchical Spatio-Temporal Variational Autocoder

Example 1:

the embodiment provides a high-robustness multi-element time sequence anomaly detection method for an intelligent Internet of things system, which provides an end-to-end hierarchical space-time variation automatic encoder framework, the flow is shown in figure 1,

first, a probabilistic graph model of the hierarchical spatiotemporal-variant automatic encoder HSTVAE is shown in fig. 2, which represents the potential graph relationship and the potentialThe dependency between time series variables, each node represents a random variable, the shaded nodes represent observations, and the arrows represent the dependency between variables. The black dashed line represents the inference model of the automatic variant encoder and the black solid line represents the generation model of the automatic variant encoder. If the potential diagram is related to variable e _ij =1, then there is an edge from node i to node j, i.e. time series i is correlated with time series j. If e _ij =0, indicating that there is no edge from node j to node k.

The specific implementation steps are as follows:

and step S01, carrying out normalization processing on the three disclosed multivariate time sequence anomaly detection data sets to obtain the input of the hierarchical variation automatic encoder. The method comprises the following steps:

s02 select a dataset, extensive experiments were performed on the following three real world public multivariate time series datasets: SWaT (safe water treatment bench), WADI (water distribution bench), SMD (internet company server dataset). The time series in SWaT and WADI represent indexes such as water pressure values recorded by various sensors in the intelligent water treatment platform, and the time series in SMD represent conditions that indexes such as CPU load, network usage, memory usage and the like in the server change with time. Table 1 summarizes the statistics of these three data sets.

Table 1 dataset attribute statistics

Data set attribute \name	SWaT	WADI	SMD
				Number of sequences	50	98	28*38
Training set size (containing normal data only)	47519	78457	708405
				Test set size (containing exception data)	44931	17280	708420
Number of occurrences of abnormality	36	15	327
				Duration (minutes) of each anomaly	2～25	1.5～30	0.5～53
Abnormality Rate (%)	12.21	5.76	4.16

S03, dividing a data set: the training set contains only normal data, while the test set contains abnormal data. In the normal data set, the ratio of training set data to validation set data is 8:2.

s04, normalization processing: performing a minimum on each data setMaximum value scaling, scaling to a number between 0 and 1, formulaWhere X is the input of the spatial and temporal encoders and X' is the raw dataset data without normalization.

The steps S1-S5 mainly illustrate the training process that the hierarchical space-time variant automatic encoder learns the potential graph relation and the potential time sequence representation sequentially through the space encoder and the time encoder, then based on the potential time sequence representation, the history sequence and the prediction future sequence are respectively reconstructed by the reconstruction decoder and the prediction decoder, and finally the ELBO of the hierarchical space-time variant automatic encoder is used as loss to reversely update the model parameters.

1-4, a multi-element time sequence anomaly detection method for an intelligent Internet of things system comprises the following steps:

in step S1, the spatial encoder models the potential graph relationship between the multivariate time series using a graph learning strategy based on a self-attention mechanism, and specifically comprises the following steps:

s12, extracting time characteristics g _i 、g _j Mapping to vectors q in different subspaces _i 、k _i And q _j ，k _j I.e. q _i ＝g _i W _q ，k _i ＝g _i W _k ，q _j ＝g _j W _q ，k _j ＝g _j W _k Wherein W is _q And W is _k Is a parameter matrix which can be learned;

s13, supposing a submarineIn the graph relationship, the random variable obeys Bernoulli distribution, and then the variation distribution Connection vector q _i And k _j Then a probability value with a value between 0 and 1 is obtained through a multi-layer perceptron (MLP) network and is used as probability parameter pi of Bernoulli distribution _i,j I.e. pi _i,j ＝MLP(W _s [q _i ,k _j ]) Wherein W is _s Is a parameter matrix which can be learned;

s14, resampling by utilizing Gumbel-Softmax distribution to obtain continuous approximation of Bernoulli random variable, thereby obtaining potential diagram relation representation variable e _i,j I.e.Wherein the method comprises the steps ofThe probability of the presence and absence of a correlation between sequences i and j, respectively, τ is the temperature coefficient, and the smoothness of the gummel-Softmax distribution is controlled. />Is an independent co-distributed sample sampled from a standard gummel distribution, in particular, g= -log (-log), where u-Uniform (0, 1) are uniformly distributed;

s15, executing S11-S14 on any two sequences in the N input sequences in parallel to obtain an adjacency matrix E.

In step S2, the time encoder combines the multi-headed self-attention mechanism with the GAT network capture potential time series representation to model time-dependent randomness, as follows:

s21, for each time sequence, using an embedded layer module to input the value at the time stamp t of the ith sequenceInput embedding vector converted into d-dimension +.>As input to the multi-headed self-attention module, i.e.)>Wherein emb is a linear layer and pos represents sine and cosine position coding;

s22, modeling time dependence through a multi-head self-attention network for each time sequence i. The multi-head self-attention network fuses time sequence data in a time window, and performs weighted average according to the importance degree of the time sequence data in the time window to obtain time characteristic representationI.e. < -> Where h is the number of heads of the multi-head attention network, W _o Is a matrix of parameters that can be learned, for the mth head,q, K, V represent the values of the query, key value and data itself, respectively, are represented by h _i Obtained by linear mapping, in particular q=h _i W _i ^Q ,K＝h _i W _i ^K ,V＝h _i W _i ^V Wherein W is _i ^Q ∈R ^d×d ,W _i ^K ∈R ^d×d ，W _i ^V ∈R ^d ^×d ，/>Mapping matrices for different linearities;

s23, sending the time characteristic representation of all sequences into GAT network, according to potential graph relation adjacency matrix E learned by space encoder to aggregate neighbor information, obtain time characteristic representation after each sequence updateI.e.Where Relu is the activation function, N (i) = { j|e _j,i =1 }, W is a trainable parameter matrix, α _j,i Representing the attention factor, +.>Wherein the method comprises the steps ofLeakyRelu is the activation function, a is the learning coefficient of the attention mechanism;

S25, obtaining potential time series representation of the ith time series at time t by utilizing reparameterization techniqueI.e. < ->Wherein->s is the time window length, d _z The potential time series for each instant of each sequence represents the dimension of the vector, R being a real number.

In step S3, the decoder network modeling distribution is reconstructedThe input sequence is reconstructed. The specific steps are as follows, for each sequence i, its potential time sequence represents +.>As input to the multi-headed self-care network, the multi-headed self-care network learns the time dependence between different moments in the potential time series representation, resulting in the feature representation +.>Then obtaining a reconstruction value +.>And reconstructing the input sequence is realized.

In step S3, predictive decoder modelingMulti-step predictions are made. The specific steps are as follows, for each time series i, its predictive decoder is constituted by a two-layer attention network: the first layer is a self-attention network layer, which takes a randomly generated token vector as an original input, takes the predicted output of the last moment as the input of the moment in an autoregressive manner in the prediction process, and obtains the characteristic->A second layer of coding and decoding the attention network layer, inquiring the output of the query from the first layer self-attention network, and keysKey and value from the underlying time series representation z _i This allows each instant in the decoder to be concerned with the information of all the instants in the input sequence, thus enabling a multi-step prediction of the future time sequence, yielding a prediction value y _i 。

In step S4, the parameters of the hierarchical variable automatic encoder formed by the two encoders and the two decoders are reversely updated by using a random gradient descent algorithm, and a model which has been learned into a time sequence normal mode is obtained by training, specifically comprising the following steps: setting the Gaussian a priori distribution as p (z _i ) =n (0, 1), bernoulli a priori distribution is Ber (b), where b is a hyper-parameter controlling graph sparsity. The variation lower bound ELBO of the variation automatic encoder is used as a loss function, and the parameter phi of the hierarchical variation automatic encoder is reversely updated by using a random gradient descent algorithm _e ，φ _z ，θ _r ，θ _f . Specifically, loss is:

φ _e is a parameter of a spatial encoder, phi _z Parameters, θ of time encoder _r Reconstructing parameters, θ of decoder _f Is a parameter of a predictive decoder, and X is input data of a spatial encoder and a temporal encoder; the first term is a reconstruction loss, which indicates the ability of the reconstruction decoder to reconstruct the history input, and the second term is a prediction loss, which indicates the prediction accuracy of the prediction decoder. The latter two terms represent that the KL-divergence is used to control the variance distribution and the a priori distribution to be as small as possible.

In step S5, anomaly detection is performed using a history-future combined anomaly scoring mechanism, as shown in fig. 4. The method comprises the following specific steps:

s51, inputting the test set data into the trained model to obtain a reconstruction and prediction value. For each time series i, pairAt the historical time t= (t-s, …, t-1), the values are reconstructedAnd true value +.>Mean square error of (2) as reconstruction errorFor the future instant t= (t, … t+w), the predicted value +.>And true value +.>Mean square error of (2) as prediction error->

S52, regarding each time t, taking the average value of the reconstruction errors of all sequences at the time as the reconstruction error Err of the time t _r (t) taking the average of the prediction errors of all sequences at that time as the prediction error Err at time t _f (t)。

S53, selecting the maximum reconstruction error in the history time window (t-s, …, t-1) as the history abnormality factor A of the moment t _h (t) selecting the maximum prediction error in the future time window (t, … t+s) as the future abnormality factor A of the moment t _f (t) the prediction error at the time t is used as an abnormality factor A _c (t)。

S54, historical abnormality factor A _h (t) current abnormality factor A _c (t) future abnormality factor A _f (t) coefficient weighting to obtain an anomaly score A based on reconstruction and prediction errors ₁ (t)＝A _c (t)+γ·A _h (t)+η·A _f (t) wherein γ and η are each A _h (t)、A _f The scaling factor of (t) is derived from a grid search algorithm.

S55, averaging neighbors of all sequencesNumber as graph sparsity-based anomaly score Wherein D is _i Representing the number of neighbors of the ith time series.

S56 if A ₁ (t)＞thr ₁ Or A ₂ (t)<thr ₂ If the time t is considered to be abnormal, the abnormal label a ^t =1; wherein the threshold thr ₁ And thr ₂ Can be obtained by a grid search algorithm.

Calculating an abnormality detection performance evaluation index value according to the abnormality label detected by the hierarchy-change automatic encoder and the real label of the data set: accuracy, recall, and F1 score, where F1 score is a harmonic mean of accuracy and recall. A series of experiments such as ablation study, visual analysis, parameter analysis and the like are gradually carried out.

In table 2, the HSTVAE method proposed by the present application and other prior art anomaly detection performance in terms of accuracy, recall and f1 score on SWaT, WADI and SMD datasets, where the best performance values are highlighted in bold, suboptimal underlined. The HSTVAE proposed by the present application is significantly superior to all other methods on three data sets, particularly in terms of recall and F1. The F1 score and recall of HSTVAE are better than all baselines, and the performance in terms of accuracy, recall is quite balanced. In particular, the present application achieves the best f1 score, i.e., SWaT of 0.8955, WADI of 0.8401, and SMD of 0.8963. In order to obtain higher recall rate, unlike other methods combining reconstruction and prediction such as DVGCRN, MTAD-GAT and the like which only use reconstruction and prediction errors to calculate anomaly scores, the application also uses the sparsity of the learning graph as a detection index, and if the sparsity of the obtained relation graph is lower than a certain threshold value, the moment is judged to be anomaly. Compared with the suboptimal model, the recall rates of HSTVAE on the three data sets were increased by 2.50%, 1.34% and 0.64%, respectively.

Table 2 comparison of anomaly detection Performance for different methods

In order to verify the function of each module of the hierarchical variation automatic encoder, the application also carries out an ablation experiment, and the experimental result is shown in table 3. It can be observed that the recall rate is significantly reduced after the GAT module is removed, which illustrates that learning by graphs to aggregate appropriate neighbor information can help the model learn a more efficient representation of the underlying time series. The elimination of the reconstruction or predictive decoder also results in reduced accuracy and recall of anomaly detection, proving that the reconstruction and predictive decoders also play an important role in modeling temporal and spatial randomness.

Table 3 experimental results of ablation study

In order to verify the temporal and spatial randomness of the high-robustness hierarchical variation automatic encoder framework, the abnormal scores and thresholds of the HSTVAE and OmniAnomaly, GRELEN, GDN are visualized, as shown in fig. 5, the curves are abnormal scores, the straight lines represent the thresholds, the rectangular frame area is the area where the abnormality occurs, and the other areas are normal areas. The time periods in the graph include time fluctuations, relationship fluctuations and anomalies, and in the normal time period on the left, the HSTVAE of the present application achieves the smoothest score, while OmniAnomaly's score is relatively high, approaching the threshold, probably because it only considers temporal randomness, ignoring spatial randomness. In particular, GRELEN has false detection where temporal fluctuations occur near t=200, because it mimics only spatial randomness without considering temporal randomness. As a deterministic method that does not consider the randomness of the time series pattern, two false positives occur for GDNs, which also indicate the necessity of considering randomness. In the abnormal time period of the right rectangle mark, although all methods detect the abnormality, the method of the application can detect the abnormality earlier, which reflects the sensitivity of the model of the application to the detection of the abnormality, and also proves the feasibility of taking the reconstruction and the prediction error and the sparsity of the graph into consideration as the indexes for identifying the abnormality.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A multi-element time sequence abnormality detection method for an intelligent Internet of things system is characterized by comprising the following steps,

2. The method for detecting multiple time sequence anomalies for an intelligent Internet of things system according to claim 1, wherein in step S1,

the space encoder adopts a graph learning strategy based on a self-attention mechanism to model potential graph relations among multiple time sequences, and the method comprises the following specific steps of:

s13, assuming that the potential diagram relation random variable obeys Bernoulli distribution, then changing the distribution Connection vector q _i And k _j Then, a probability value between 0 and 1 is obtained through a multi-layer perceptron (MLP) network, and is taken as probability parameter pi of Bernoulli distribution _i,j I.e. pi _i,j ＝MLP(W _s [q _i ，k _j ]) Wherein W is _s Is a parameter matrix which can be learned;

s14, resampling by utilizing Gumbel-Softmax distribution to obtain continuous approximation of Bernoulli random variable, thereby obtaining potential diagram relation representation variable e _i,j I.e.Wherein->Respectively controlling the smoothness of Gumbel-Softmax distribution for the probability of the existence of correlation and the probability of the nonexistence of correlation between the time sequences i and j, wherein tau is a temperature coefficient; />Is an independent co-distributed sample sampled from a standard gummel distribution;

3. The method for detecting multi-component timing anomalies in an intelligent internet of things system according to claim 1, wherein in step S2, a time encoder combines a multi-head self-attention mechanism with a GAT network capture potential time sequence representation to model time-dependent randomness, and the specific steps are as follows:

s22, modeling time dependence on each time sequence i through a multi-head self-attention network, fusing time sequence data in a time window by the multi-head self-attention network,weighted average is carried out according to the importance degree of time sequence data in the time window to obtain time characteristic representationI.e. < -> Where h is the number of heads of the multi-head attention network, wo is a matrix of learnable parameters, for the mth head,q, K, V represent the values of the query, key value and data itself, respectively, are represented by h _i Obtained by linear mapping, in particular, +.>Wherein-> Mapping matrices for different linearities;

s23, sending the time characteristic representation of all sequences into a GAT network, and aggregating neighbor information according to a potential graph relation adjacency matrix learned by a space encoder to obtain the updated time characteristic representation of each sequenceI.e.Where Relu is the activation function, N (i) = { j|e _j，i =1 }, W is a trainable parameter matrix, α _j，i The attention coefficient is represented by a graph of the attention coefficient,/>wherein the method comprises the steps ofLeakyRelu is the activation function, a is the learning coefficient of the attention mechanism;

S25, obtaining potential time series representation of the ith time series at time t by utilizing reparameterization techniqueI.e.Wherein->s is the time window length, d _z The dimension of the vector is represented for each time-wise potential time sequence of each sequence.

4. The method for detecting multi-element time sequence abnormality oriented to intelligent Internet of things system according to claim 1, wherein in step S3, the decoder network modeling distribution is reconstructedThe input sequence is reconstructed in the following way, for each time sequence i, the potential time sequence thereof represents +.>As input to the multi-headed self-care network, the multi-headed self-care network learns the time dependence between different moments in the potential time series representation, resulting in the feature representation +.>Then obtaining a reconstruction output +.>And reconstructing the input sequence is realized.

5. The method for detecting multi-component time sequence anomalies for intelligent Internet of things system according to claim 1, wherein in step S3, a predictive decoder is modeledMulti-step prediction is performed as follows, the prediction decoder of which consists of two layers of attention network for each time series i: the first layer is a self-attention network layer, which takes a randomly generated token vector as an original input, takes the predicted output of the last moment as the input of the moment in an autoregressive manner in the prediction process, and obtains the characteristic->A second layer of the codec attention network layer, the query is output from the first layer of the self-attention network, key and value value from potential time series representation z _i This allows each instant in the decoder to be concerned with the information of all the instants in the input sequence, thus enabling a multi-step prediction of the future time sequence, yielding a prediction value y _i 。

6. The method for detecting multi-element time sequence abnormality oriented to intelligent Internet of things system according to claim 1, wherein in step S4, gaussian prior distribution is set as p (z _i ) =n (0, 1), bernoulli a priori distribution is Ber (b), where b is a hyper-parameter controlling graph sparsity; the variation lower bound ELBO of the variation automatic encoder is used as a loss function, and the parameter phi of the hierarchical variation automatic encoder is reversely updated by using a random gradient descent algorithm _e ，φ _z ，θ _r ，θ _f The method comprises the steps of carrying out a first treatment on the surface of the Specifically, loss is:

7. The method for detecting multiple time sequence anomalies for an intelligent Internet of things system according to claim 1, wherein in step S5,

s51, inputting the test set data into a trained model to obtain a reconstruction and prediction value, and performing a test on the test set dataFor each time series i, for the historical time t= (t-s, …, t-1), the values will be reconstructedAnd true value +.>Mean square error of (2) as reconstruction errorFor the future instant t= (t, … t+s), the predicted value +.>And true value +.>Mean square error of (2) as prediction error->

S54, historical abnormality factor A _h (t) current abnormality factor A _c (t) future abnormality factor A _f (t) coefficient weighting to obtain an anomaly score A based on reconstruction and prediction errors ₁ (t)＝A _c (t)+γ·A _h (t)+η·A _f (t) wherein γ and η are each A _h (t)、A _f (t)Is obtained by a grid search algorithm;

s56 if A ₁ (t)＞thr ₁ Or A ₂ (t)＜thr ₂ If the time t is considered to be abnormal, the abnormal label a ^t =1, wherein the threshold thr ₁ And thr ₂ Can be obtained by a grid search algorithm.