CN115169430A

CN115169430A - Cloud network end resource multidimensional time sequence anomaly detection method based on multi-scale decoding

Info

Publication number: CN115169430A
Application number: CN202210456392.7A
Authority: CN
Inventors: 王树良; 徐卓辉; 袁汉宁; 耿晶; 滕腾; 党迎旭
Original assignee: Shenzhen Surui Data Intelligent Technology Research Institute; Beijing Institute of Technology BIT
Current assignee: Shenzhen Surui Data Intelligent Technology Research Institute; Beijing Institute of Technology BIT
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-10-11

Abstract

The invention discloses a multi-dimensional time sequence anomaly detection method for cloud network side resources based on multi-scale coding, relates to the technical field of computer science, provides a multi-dimensional time sequence anomaly detection scheme based on multi-scale integrated decoding, and improves accuracy of multi-dimensional time sequence anomaly detection. The technical scheme of the invention comprises the following steps: and (4) calculating the correlation among the time sequences, and constructing a correlation characteristic matrix of the sequences. Based on the correlation signature matrix, the time series correlations are encoded using an encoder and the temporal patterns are captured using an attention-based convolutional long-short term memory network. And decoding by using decoders with different scales, constraining the output of different decoders by using the similarity of tensor kernels, reconstructing a characteristic matrix by fusing all decoding results, and calculating a reconstruction error to detect the abnormality.

Description

Cloud network end resource multidimensional time sequence anomaly detection method based on multi-scale decoding

Technical Field

The invention relates to the technical field of computer science, in particular to a cloud network side resource multi-dimensional time sequence abnormity detection method based on multi-scale decoding.

Background

The time series abnormity detection is an important task for finding problems and avoiding risks in time. In the field of internet application, technicians analyze time sequences of dimensions by monitoring data information of cloud network end resources to find abnormal conditions, and can find potential risks and give an alarm in time, so that the purposes of reducing project economic loss, guaranteeing information safety and the like are achieved.

Cloud refers to cloud computing and infrastructure and resources for supporting cloud computing, a network generally refers to the internet, and a terminal is a terminal device. From the development processes of the three, the end is connected to form a net, and the net is brewed into clouds. In the actual development process, the maturity and popularization of network technology in turn promote the development of computers and various terminal devices, and the cloud computing promotes the intellectualization of the network. The three supplement each other and are a mutually fused whole.

The development of terminal equipment enables people to acquire a large amount of multi-dimensional time sequence data, compared with a single-dimensional time sequence, the multi-dimensional time sequence has diversity and higher data magnitude, invalid and interference information in the data is more serious, meanwhile, more complex correlation conditions exist among dimensions, and the difficulty of the multi-dimensional time sequence abnormity detection task is increased. How to effectively mine deep features in a multidimensional time sequence enables abnormal data to be better distinguished, and the method has gained wide attention of domestic and foreign scholars.

In the whole, at present, two types of methods are available for multi-dimensional time series anomaly detection in domestic and foreign research. One is the traditional time series anomaly detection algorithm, and most of the time series anomaly detection algorithms are the improvement of the time series anomaly detection algorithm based on clustering and classification. The method transversely divides the multi-dimensional time sequence data under the condition of small data scale, converts the multi-dimensional time sequence into a plurality of single-dimensional time sequences, and finds abnormal modes by utilizing an algorithm in the field of the single-dimensional time sequences, thereby having better effect. However, the method has a poor performance under the condition of large data set size, and the system-level abnormality cannot be effectively identified because the correlation among sequences is not considered. The other type is an anomaly detection algorithm based on deep learning, and there are two common methods, one is a method based on a Recurrent Neural Network (RNN), and the other is a method based on an Auto Encoder (AE).

The algorithm based on the recurrent neural network is mainly used for learning the time sequence in data, reserving valuable historical information, predicting data at a future moment, and identifying abnormality according to errors of a predicted value and a true value. The method based on the self-encoder is more inclined to learn hidden features of data in a normal mode, and then abnormity is identified through the reconstructed error after decoding.

The RNN-based method can capture the time sequence of data, but does not consider the correlation among sequences and time modes under different scales, and can not effectively detect system-level abnormality; the method based on the self-encoder cannot avoid the problem of error accumulation caused by sequential decoding under the condition of a long-time sequence, and cannot accurately utilize multi-scale information to perform anomaly detection in a decoding stage.

Therefore, at present, for the cloud network end resource multi-dimensional time sequence, a scheme which can consider the correlation between the sequences and time modes under different scales and accurately utilize multi-scale information to perform anomaly detection in a decoding stage is lacked.

Disclosure of Invention

In view of this, the invention provides a cloud network end resource multi-dimensional time sequence anomaly detection method based on multi-scale decoding, provides a multi-dimensional time sequence anomaly detection scheme based on multi-scale integration decoding, and improves the accuracy of multi-dimensional time sequence anomaly detection.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

step 1, calculating the correlation among all time sequences and constructing a correlation characteristic matrix of the sequences.

And 2, based on the correlation characteristic matrix, encoding the time series correlation by using an encoder, and capturing a time mode by using an attention-based convolution long-short term memory network.

And 3, decoding by using decoders with different scales, constraining the output of different decoders by using the similarity of tensor kernels, reconstructing a characteristic matrix by fusing all decoding results, and calculating a reconstruction error to detect the abnormality.

In the embodiment of the invention, the step 1 further comprises the following steps:

after the multi-dimensional time sequence data are standardized, the multi-dimensional time sequence data are segmented by adopting a sliding window algorithm, and a plurality of different time sequence segments are obtained.

Assuming that the dimension of the time sequence is n, the length of the time sequence is T, the size of the time window is w, the time sequence segment obtained by the sliding window segmentation is represented as the time sequence segment obtained by the sliding window segmentation of the ith dimension data

Wherein

The data of the ith dimension corresponding to the t-w-t time in the time sequence segment are respectively.

In the embodiment of the invention, the step 1 specifically comprises the following steps:

calculating the correlation of each dimension data in the current time sequence segment and different dimension data in the historical time sequence segment through inner products to construct an N multiplied by N correlation characteristic matrix M ^t The ith row and the jth column in the correlation matrix represent the current timeCorrelation between the jth dimension data in sequence segments and the ith dimension data in historical time series segments, i.e.

The value of (c) is calculated by the following formula:

wherein the content of the first and second substances,

representing the value in the i-th dimension data corresponding to the time series segment,

representing the value of the ith dimension data in the corresponding time sequence segment, wherein t represents the current time; k is a scaling factor, k = w.

In the embodiment of the invention, the step 2 is specifically divided into the following steps:

step 2.1: after multi-dimensional time data feature extraction is completed, obtaining T-w +1 correlation feature matrixes;

step 2.2: modeling the time information in the correlation characteristic matrix by using an LSTM (least squares metric) and outputting potential characteristics of data, namely time series correlation codes;

the encoding process in which LSTM is used can be expressed as:

h _t ＝LSTM([x _t ；h _t-1 ])

h＝F _MLP (concat[h ₁ ；h ₂ ；…；h _T ])

where LSTM (. Circle.) represents an LSTM unit, h _t Indicating the hidden state at time t, from time t-1 to the hidden state h _t-1 And input x at time t _t Jointly determine when x is input _t When it is a correlation feature matrix, h _t The potential characteristics of the data are obtained; h is thenRepresenting the hidden state of the entire input data, from each instant of time the hidden state h ₁ ；h ₂ ；…；h _T Spliced along a time dimension; concat represents the splicing function, F _MLP Is a full connection layer.

5. The method for detecting the abnormality of the multidimensional time sequence of the cloud network side resource according to any one of claims 1 to 4, wherein the step 3 specifically comprises:

step 3.1: decoding the latent features of the data by adopting decoders with different output lengths, wherein the decoders output a plurality of reconstructed feature matrixes; in order to ensure the similarity of a plurality of decoder output sequences, tensor similarity is adopted to restrict the time mode of output;

step 3.2: and effectively fusing the decoder output in a coarse-to-fine mode by using a multi-scale fusion strategy, wherein the fused decoder output with the highest scale and the error value of the original input are used for measuring the abnormality, and obtaining an abnormality detection result.

6. The method for detecting the anomaly of the multi-dimensional time sequence of the cloud network side resource according to claim 5, wherein the step 3 comprises the following steps:

step 3.1: assume a decoder set of D, where the k-th decoder D ^(k) Is defined as T ^(k) The original sequence length is T, T ^(k) Is defined as follows:

T ^(k) ＝α _k T

α _k ＝1/τ ^k-1 ∈(0,1]

wherein alpha is _k For the coefficient of the kth decoder, α ₁ ＝1，τ>1, ensuring the output length of a decoder with the highest scale to be T;

step 3.2, the similarity of tensors is adopted to constrain the time modes output by different decoders;

the dimension of the input matrix sequence M is n multiplied by T, the k decoder outputs a characteristic matrix sequence Y ^(k) Has dimension of n × n × T ^(k) Input matrix sequence M and output feature matrix sequence Y ^(k) All regarded as third-order tensors, and each tensor is subjected to tensor decomposition to obtain the tenses with the same sizeA magnitude kernel that approximates the similarity of the surrogate tensor with the similarity of the tensor kernel;

recording a tensor core of the input matrix sequence M as C ^input The kth decoder outputs a sequence of feature matrices Y ^(k) Has a tensor kernel of C ^(k) Constrained Cos (M, Y) of two tensor similarities ^(k) ) Given by:

Cos(M,Y ^(k) )＝Cos(C ^(M) ,C ^(k) )

wherein

Representing similarity constraints of input and output temporal patterns, cos representing cosine similarity between two tensor kernels, L ^(D) Which represents the number of decoders,

given by the average of the tensor similarities of the outputs of the multiple decoders to the original input sequence;

step 3.3, the multi-scale fusion strategy is set as follows:

wherein

By

And

is formed by fusing, the parameter tau is a preset value in the definition of the length of a decoder, F' _MLP Is a two-layer fully-connected layer network, beta represents weight,

representing the final t-time hidden layer variable, wherein the potential characteristics of the t-time

By

And

jointly determining:

the decoding process of the kth decoder is:

wherein

Representing the underlying characteristics of the data at time t,

the initialization is zero and the number of the initial,

representing the output of the kth decoder at time t, W ^(k) And b ^(k) Are all learnable parameters;

by

And

together, δ is an artificially introduced normal-distribution-fitting noise.

Highest scale decoder D ⁽¹⁾ Is the reconstruction of the input sequence, the reconstructed feature matrix sequence Y ⁽¹⁾ Is shown as

The sequence of raw feature matrices M is represented as (M) ₁ ,M ₂ ,…,M _T ) The reconstruction error is then represented by:

wherein | · | purple ₂ Representing the two norms of the matrix.

Total Loss function

The two aspects of the error value and the time pattern similarity constraint of the reconstructed sequence and the original sequence are given as follows:

where λ is a hyperparameter representing the weighted value of the temporal pattern similarity error.

Step 3.4, detecting the abnormality by using the reconstruction error, which specifically comprises the following steps:

loss function after training rounds

And when a convergence value is reached, obtaining an offline anomaly detection model, and taking the anomaly score value recorded in the training process as an anomaly score threshold value.

And detecting the detection sample set by using the trained offline anomaly detection model, and judging that the score value exceeds a threshold value in the detection process as abnormal.

Has the advantages that:

the invention provides a multi-dimensional time sequence anomaly detection method based on multi-scale integrated decoding, and the algorithm of the method uses an inter-sequence correlation matrix to replace an original time sequence as the input of a model, thereby effectively retaining the inter-sequence correlation information; the method constructs a characteristic matrix and introduces the correlation among multi-dimensional time sequences, so that the system-level abnormality detection becomes possible; according to the invention, multi-scale decoding information is introduced in the multi-dimensional time sequence anomaly detection, the combination of coarse granularity and fine granularity fully utilizes information in different time modes for anomaly detection, and meanwhile, a multi-scale information integration decoding scheme is used for fully mining information in different scales, so that the problem of error accumulation is alleviated. The method is beneficial to further improving the accuracy of the multi-dimensional time series abnormality detection and promoting the practical application of the algorithm on large-scale data sets.

2. According to the multi-dimensional time sequence anomaly detection method based on multi-scale integrated decoding, in the multi-scale decoding process, the similarity of tensor cores is introduced to restrict the similarity of time patterns, and the similarity of the learned time patterns and the time patterns of input sequences is ensured

3. Compared with the prior art, the multi-dimensional time sequence anomaly detection method based on multi-scale integrated decoding needs to take values of a plurality of time windows when calculating the characteristic matrix, and adds multi-layer convolution operation, so that the calculation mode is relatively time-consuming; the invention introduces multi-scale information in the decoding stage, makes full use of the information in different time modes, and reduces the length of the decoder in an exponential decrement way and the calculation complexity.

4. According to the multi-dimensional time sequence anomaly detection method based on multi-scale integrated decoding, provided by the invention, the similarity constraint is added when an automatic encoder learns a time mode, and the similarity of the learned time mode and an input sequence is ensured by utilizing the similarity constraint among tensors, so that the accuracy of a multi-scale information fusion result in a decoding stage is ensured.

Drawings

Fig. 1 is a technical route diagram of a cloud network end resource multidimensional time sequence anomaly detection method based on multi-scale decoding according to the present invention.

Fig. 2 is a schematic diagram of a process of performing the Trucker decomposition on each segment of data in the embodiment of the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention provides a cloud network end resource multi-dimensional time sequence anomaly detection method based on multi-scale decoding. The overall technical roadmap is shown in fig. 1.

Firstly, calculating the correlation among sequences, and constructing a correlation characteristic matrix of the sequences; secondly, coding the time series correlation using an encoder and capturing the time pattern using an attention-based convolutional long-short term memory network given a feature matrix; and finally, decoding by using decoders of different scales, constraining the output of different decoders by using the similarity of tensor kernels, reconstructing a characteristic matrix by fusing all decoding results, and calculating a reconstruction error to detect the abnormality.

The method comprises the following steps:

the data state of the time sequence at the current moment is affected by historical data, the single moment data cannot completely reflect the characteristics of time sequence data, and in order to improve the stability of the data to be detected and effectively detect abnormal data in the time sequence data, the method firstly carries out standardization processing on the multi-dimensional time sequence data, then cuts the multi-dimensional time sequence data by adopting a sliding window algorithm and obtains a plurality of different time sequence segments.

Suppose multidimensional time-series data X = [ X ] ₁ ,x ₂ ,…,x _n ]The dimension of the time sequence is n, the length of the time sequence is T, the size of the time window is w, the time sequence segment obtained by the sliding window segmentation is represented as the time sequence segment obtained by the sliding window segmentation of the ith dimension data

Wherein

Step 1, calculating the correlation among time sequences and constructing a correlation characteristic matrix of the sequences; the step 1 specifically comprises the following steps:

calculating the correlation of each dimension data in the current time sequence segment and different dimension data in the historical time sequence segment through inner products to construct an N multiplied by N correlation characteristic matrix M ^t The ith row and the jth column in the correlation matrix represent the correlation between the jth dimension data in the current time sequence segment and the ith dimension data in the historical time sequence segment, i.e.

The value of (d) is calculated by the following formula:

wherein the content of the first and second substances,

representing the value in the corresponding time series segment of the ith dimension data,

And 2, effective potential representation of the characteristic matrix is a key aspect of anomaly detection of the multi-dimensional time sequence, time sequence correlation among different dimensions is better reserved, and T-w +1 correlation matrixes can be obtained after multi-dimensional time data characteristic extraction is completed. In the related research of time series, RNN is generally adopted to encode time series data, which can fully consider the influence of the recent state on the current state. However, RNN suffers from the drawback that it cannot handle the problem of gradient disappearance resulting from recursion, and it cannot exploit long-time-series information. LSTM is a variant of RNN which has been proposed to solve the above problems. LSTM adds filtering of past states on the basis of RNNs so that it can be chosen which states are more influential at the present time, rather than choosing the most recent state as simply as a normal RNN. Therefore, LSTM is used here to model the time information in the feature matrix and output the potential features of the data.

Based on the correlation characteristic matrix, encoding the time series correlation by using an encoder, and capturing a time mode by using an attention-based convolution long-short term memory network; the step 2 is specifically divided into the following steps:

step 2.1: after multi-dimensional time data feature extraction is completed, T-w +1 correlation feature matrixes are obtained;

step 2.2: modeling the time information in the correlation characteristic matrix by adopting LSTM, and outputting potential characteristics of data, namely time sequence correlation codes;

where LSTM is represented as:

h _t ＝LSTM([x _t ；h _t-1 ])

h＝F _MLP (concat[h ₁ ；h ₂ ；…；h _T ])

where LSTM () represents an LSTM unit, h _t Indicating the hidden state at time t, from time t-1 to the hidden state h _t-1 And input x at time t _t Jointly determine when x is input _t When it is a correlation feature matrix, h _t The potential characteristics of the data are obtained; h represents the hidden state of the whole input data, and the hidden state h is represented by each moment ₁ ；h ₂ ；…；h _T Spliced along a time dimension; concat represents the splicing function, F _MLP Is a fully connected layer.

And 3, decoding by using decoders of different scales, constraining the output of different decoders by using the similarity of tensor kernels, reconstructing a characteristic matrix by fusing all decoding results, and calculating a reconstruction error to detect the abnormity.

In order to capture the time behaviors of the time sequence under different scales, decoders with different output lengths are adopted to decode the potential features of the data, and a plurality of reconstructed feature matrixes are obtained. Decoders with short output length are interested in macroscopic temporal characteristics, while decoders with longer output length can capture more detailed local temporal patterns. Meanwhile, the output of a plurality of decoders needs to ensure the similarity of time sequences, and the similarity of tensors is adopted to constrain the time pattern of the output. Finally, the decoder outputs are effectively fused in a coarse-to-fine manner using an appropriate multi-scale fusion strategy, and the fused highest-scale decoder output and the error value of the original input are used to measure the anomaly.

The step 3 can be carried out by adopting the following specific steps:

step 3.1: decoding the latent features of the data by adopting decoders with different output lengths, wherein the decoders output a plurality of reconstructed feature matrixes; the output of a plurality of decoders guarantees the similarity of time sequences, and tensor similarity is adopted to restrict the time mode of output;

step 3.2: and effectively fusing the decoder output in a coarse-to-fine mode by using a multi-scale fusion strategy, wherein the fused decoder output with the highest scale and the error value of the original input are used for measuring the abnormality to obtain an abnormality detection result.

The invention also provides the following embodiment, wherein the step 3 is specifically divided into the following steps:

T ^(k) ＝α _k T

α _k ＝1/τ ^k-1 ∈(0,1]

wherein alpha is _k For coefficients of the kth decoder, α ₁ ＝1，τ>1, ensuring the output length of a decoder with the highest scale to be T;

step 3.2, constraining the time modes output by different decoders by adopting the similarity of tensors;

the output of the decoders is a sequence of feature matrices, the outputs of different decoders differing by the length of the sequence. Here the similarity of tensors is used to constrain the temporal patterns of the different decoder outputs. The dimension of the input matrix sequence M is n multiplied by T, the k decoder outputs a characteristic matrix sequence Y ^(k) Has a dimension of n × n × T ^(k) Input matrix sequence M and output feature matrix sequence Y ^(k) Taking the three-order tensors as the whole, carrying out tensor decomposition on each tensor to obtain tensor kernels with the same size, and approximating the similarity of the alternative tensors by the similarity of the tensor kernels.

Because the length T of the time sequence is generally very large, the whole tensor cannot be directly decomposed, the data are divided into a plurality of sections by using the thought of a time window, and the Trucker decomposition is respectively carried out on each section, as shown in figure 2.

The tensor kernel of the input matrix sequence M is recorded as C ^input The kth decoder outputs a sequence of feature matrices Y ^(k) Has a tensor kernel of C ^(k) Constrained Cos (M, Y) of two tensor similarities ^(k) ) Given by:

Cos(M,Y ^(k) )＝Cos(C ^(M) ,C ^(k) )

wherein

given by the average of the similarity of the outputs of the multiple decoders to the original input sequence tensor;

step 3.3, the multi-scale fusion strategy is set as follows:

wherein

By

And

By

And

jointly determining:

the decoding process of the kth decoder is:

wherein

Representing the underlying characteristics of the data at time t,

the initialization is zero and the number of the initial,

by

And

jointly deciding that delta is artificially introduced noise conforming to a normal distribution;

The sequence of original feature matrices M is represented as (M) ₁ ,M ₂ ,…,M _T ) The reconstruction error is then represented by:

wherein | · | purple ₂ A two-norm representing a matrix;

total Loss function

Step 3.4, detecting the abnormality by using the reconstruction error, specifically comprising the following steps:

loss function after training rounds

When a convergence value is reached, an offline anomaly detection model is obtained, and an anomaly score value recorded in the training process is used as an anomaly score threshold value;

Generally, the definition of the abnormal score is highly related to the Loss function of model training, and the reason why the model can detect the abnormality is to learn the characteristics and distribution of normal data through training, reduce the Loss of the normal data and distinguish the normal data from abnormal data. In this respect, the anomaly score is consistent with the starting point of the Loss function, except that the definition of the anomaly score exists in the detection phase and the Loss function exists in the training phase. After training for multiple rounds, the Loss function reaches a convergence value, and the obtained offline abnormality detection model cannot be directly used for judging abnormality. The error values of the reconstructed feature matrix and the original feature matrix can be obtained in the training or predicting process, and the error values are used as the abnormal score and used for subsequent abnormal judgment.

The definition of the threshold is closely related to the detection effect of the model, the maximum value of the abnormal score in the verification set is adopted to define the threshold in the method, and the score value exceeds the threshold in the detection process, so that the abnormal condition is judged. The threshold is defined as follows.

th＝γ·max{score(t) _valid }

Where th denotes a threshold, max { score (t) _valid Denotes the maximum value of the anomaly scores in the validation set, γ ∈ [1,2 ]]A threshold scaling parameter.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The cloud network end resource multi-dimensional time sequence anomaly detection method based on multi-scale decoding is characterized by comprising the following steps:

step 1, calculating the correlation among time sequences and constructing a correlation characteristic matrix of the sequences;

step 2, based on the correlation characteristic matrix, using an encoder to encode the time sequence correlation, and using a convolution long-short term memory network based on attention to capture a time mode;

2. The cloud network end resource multidimensional time series anomaly detection method based on multi-scale decoding as claimed in claim 1, wherein the step 1 is preceded by the following steps:

after the multi-dimensional time sequence data are standardized, the multi-dimensional time sequence data are segmented by adopting a sliding window algorithm, and a plurality of different time sequence segments are obtained;

Wherein

3. The method for detecting the anomaly of the multi-dimensional time series of the cloud network side resources based on the multi-scale decoding as claimed in claim 2, wherein the step 1 specifically comprises:

calculating the correlation of each dimension data in the current time sequence segment and different dimension data in the historical time sequence segment through inner products, and constructing an N multiplied by N correlation characteristic matrix M ^t The ith row and the jth column in the correlation matrix represent the correlation between the jth dimension data in the current time sequence segment and the ith dimension data in the historical time sequence segment, i.e.

The value of (d) is calculated by the following formula:

wherein the content of the first and second substances,

representing the numerical value of the ith dimension data in the corresponding time sequence segment, wherein t represents the current moment; k is a scaling factor, k = w.

4. The method for detecting the anomaly of the multi-dimensional time series of the cloud network resources based on the multi-scale decoding as claimed in claim 3, wherein the step 2 is specifically divided into the following steps:

the encoding process in which LSTM is used can be expressed as:

h _t ＝LSTM([x _t ；h _t-1 ])

h＝F _MLP (concat[h ₁ ；h ₂ ；…；h _T ])

where LSTM (. Circle.) represents an LSTM unit, h _t Indicating the hidden state at time t, from time t-1 to the hidden state h _t-1 And input x at time t _t Jointly determine when x is input _t When it is a correlation feature matrix, h _t The potential characteristics of the data are obtained; h represents the hidden state of the whole input data, and the hidden state h is represented by each moment ₁ ；h ₂ ；…；h _T Spliced along a time dimension; concat represents the splicing function, F _MLP Is a fully connected layer.

5. The method for detecting the anomaly of the multi-dimensional time series of the cloud network resources based on the multi-scale decoding as claimed in any one of claims 1 to 4, wherein the step 3 specifically comprises:

6. The method for detecting the anomaly of the multi-dimensional time series of the cloud network resources based on the multi-scale decoding as claimed in claim 5, wherein the step 3 comprises the following steps:

T ^(k) ＝α _k T

α _k ＝1/τ ^k-1 ∈(0,1]

the dimension of the input matrix sequence M is n x T, the k-th decoder outputs a characteristic matrix sequence Y ^(k) Has dimension of n × n × T ^(k) Input matrix sequence M and output feature matrix sequence Y ^(k) Taking the three tensors as the third-order tensors, carrying out tensor decomposition on each tensor to obtain tensor kernels with the same size, and approximating the similarity of the alternative tensors by the similarity of the tensor kernels;

Cos(M,Y ^(k) )＝Cos(C ^(M) ,C ^(k) )

wherein

by the output of a plurality of decodersGiving out an average value of the similarity of the original input sequence tensor;

step 3.3, the multi-scale fusion strategy is set as follows:

wherein

By

And

By

And

jointly determining:

the decoding process of the kth decoder is:

wherein

Representing potential characteristics of the data at time t,

the initialization is zero and the number of the initial,

by

And

wherein | · | purple ₂ A two-norm representing a matrix;

general description of the inventionLoss function

The method is given by two aspects of the error value and the time mode similarity constraint of the reconstructed sequence and the original sequence:

loss function after training rounds

and detecting the detection sample set by using the trained offline anomaly detection model, and judging that the score value exceeds a threshold value in the detection process to be abnormal.