CN116796272A

CN116796272A - Method for detecting multivariate time sequence abnormality based on transducer

Info

Publication number: CN116796272A
Application number: CN202310747003.0A
Authority: CN
Inventors: 赵进; 谢梦玮
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-09-22

Abstract

The invention relates to a method for detecting multivariate time sequence anomalies based on a transducer, which comprises the following steps: for a multivariable time sequence data set, carrying out normalization processing on the time sequence data in a characteristic dimension; applying sliding windows to the normalized time sequence data, and dividing the original time sequence data into a series of sliding windows to obtain a plurality of sliding window time sequence data; encoding the sliding window time sequence data through time stamp encoding and data encoding; solving the reconstruction values of the time sequences corresponding to all sliding windows by adopting an anomaly detection mode based on a transducer model; determining dynamic abnormal thresholds corresponding to each time sequence according to the reconstruction errors and the historical error factors; and solving the anomaly score of each time sequence according to the corresponding dynamic anomaly threshold value to obtain the anomaly detection result of the time sequence data. Compared with the prior art, the method can accurately and robustly detect the abnormality of the multivariate time sequence.

Description

Method for detecting multivariate time sequence abnormality based on transducer

Technical Field

The invention relates to the technical field of time sequence anomaly detection, in particular to a multivariate time sequence anomaly detection method based on a transducer.

Background

With the continuous improvement of industrialization, a large amount of data is generated and stored in the process, and among various data types, time series data is one of important data types. The time series data is the sequence data formed by collecting the time series data at regular intervals on certain indexes and arranging the collected results in sequence. The time sequence data describes the change condition of each dimension of the system along with time, and has important significance for capturing the correlation between the data before and after capturing and analyzing the development condition of the system.

Therefore, time series anomaly detection has long been an important issue in the industry by analyzing continuous time series data to determine which instances are different from other instances, i.e., to discover anomalies present therein. The time series data has practical application in many fields, such as financial markets, biological data, user behaviors, industrial equipment and the like, and has important significance in analyzing abnormal conditions of the time series data, ensuring normal operation of a system, and timely making pre-judgment to avoid economic loss and the like.

Traditional anomaly detection tasks are done by data mining professionals who report errors by manually analyzing data that does not follow normal trends. However, in systems that monitor industrial equipment, as the number of deployed sensors increases, the complexity of the data patterns increases, and the challenges facing manual fault identification become greater. Under the development of artificial intelligence, big data analysis and deep learning can effectively help an expert to solve the problem. At present, methods adopted by the time sequence anomaly detection technology mainly comprise a clustering or classification-based mode and a deep learning-based mode, but the methods still have corresponding problems:

first, it is difficult for the cluster-based abnormality detection method to determine the number of categories K of the clusters. A more classical method among the cluster-based methods is the K-Means clustering algorithm, which is also called sub-sequence time series clustering. Given a certain time series data, the time series data may be converted into a series of sub-sequences according to a defined sliding window length. After defining the number of clusters K, the K-Means clustering algorithm will act on the subsequence until the subsequence can converge to K categories. In order to detect a time series abnormality, it is necessary to calculate the distance from each sub-sequence to the nearest category, and it is generally calculated by using the euclidean distance, and if the distance is greater than the threshold value, the corresponding time series is abnormal. However, assuming the value of K in advance, it is difficult for a realistic system operation.

Second, the abnormality detection method based on classification regards an abnormality detection problem as a classification problem. The model is optimized and trained by the training data set, and then the data in the test set is detected by the model learned according to the training process. The classification-based abnormality detection method can be classified into multi-classification abnormality detection and single-classification abnormality detection according to the condition that the abnormality in the training set is marked. The initial support vector machine algorithm is a single classification anomaly detection method, which is a linear supervision method. The algorithm is extended by introducing kernel skills that enable the SVM to classify non-linearly. After that, a new method for detecting anomalies using an SVM (Support Vector Machine ) is introduced, called a class of SVM (One-Class Support Vector Machine, OC-SVM). OC-SVM is a semi-supervised approach in which the training set contains only one class: normal data. After training the set-fitting model, the test data is classified as being similar to the normal data, so that anomalies can be detected. However, in reality, the labels are usually unbalanced, the accuracy of the classification method on the data set with unbalanced labels is not high, and the training set of most of the real data sets usually contains abnormal data.

Third, the anomaly detection method based on the recurrent neural network relies on a deep neural network model of the recurrent neural network, which uses an input sequence as training data, predicts the data of the next time stamp for each input time stamp, and takes the prediction error as the basis of anomaly detection. For example, LSTM (Long Short-Term Memory network) is an autoregressive neural network that learns the sequential dependencies in sequential data, where predictions for each time stamp use feedback from previous time stamp outputs. However, since such a model is a cyclic model, the training speed is slow in the case of a long time series of inputs, and this behavior is more remarkable particularly when the input data is noisy. Furthermore, such systems do not take into account the correlation between different time sequences. Therefore, the effect of this class in data sets with sequence relatedness is not ideal.

Fourth, the anomaly detection method based on the graph neural network is one of the most recently focused systematic methods, but it relies on the construction of graph structures. The graph neural network mainly focuses on data existing in a graphic form, and mines the relation among different nodes and the relation of edges in the graph. Such methods cannot be directly used for anomaly detection under time series data, and require the construction of an explicit graph structure in advance. The anomaly detection system based on the graph neural network effectively utilizes the related information between time sequences, and improves the accuracy of anomaly detection. However, this method relies on explicitly building a graph data structure, where the characteristic relationship graph of the graph neural network structure is too small or too sparse when dealing with multi-variable time sequences with fewer dimensions or less close relationships between sequences. This makes the information that the neural network model can extract from the data limited, resulting in performance bottlenecks in the system.

Disclosure of Invention

The present invention has been made to overcome the above-mentioned drawbacks of the prior art, and an object of the present invention is to provide a method for detecting a multivariate time series abnormality based on a transducer, which can accurately and robustly detect an abnormality with respect to the multivariate time series.

The aim of the invention can be achieved by the following technical scheme: a method for detecting multivariate time series abnormality based on a transducer comprises the following steps:

s1, normalizing time sequence data in characteristic dimension aiming at a multivariable time sequence data set;

s2, applying sliding windows to the normalized time sequence data, and dividing the original time sequence data into a series of sliding windows to obtain a plurality of sliding window time sequence data;

s3, encoding the sliding window time sequence data through time stamp encoding and data encoding;

s4, solving the reconstruction values of the time sequences corresponding to all sliding windows by adopting an anomaly detection mode based on a transducer model;

s5, determining dynamic abnormal thresholds corresponding to the time sequences according to the reconstruction errors and the historical error factors;

s6, solving the anomaly score of each time sequence according to the corresponding dynamic anomaly threshold value to obtain the anomaly detection result of the time sequence data.

Further, the step S3 specifically includes the following steps:

s31, decomposing time sequence information contained in the time stamp into corresponding time stamp data, and performing time stamp coding to obtain a time stamp coding vector;

s32, analyzing the periodicity of the time sequence data by adopting a Fourier transform mode, and periodically encoding the time stamp data to obtain a periodic encoding vector;

s33, embedding and projecting the periodical coding vector and the timestamp coding vector to obtain a global time sequence code;

s34, carrying out local time sequence coding on time sequence data in the sliding window;

s35, adding the global time sequence code and the local time sequence code to the input time sequence data to obtain input data of a transducer model.

Further, the step S33 is specifically to perform embedding and projecting operations on the periodically encoded vector and the time stamp encoded vector through a learning embedding layer.

Further, the converter model in the step S4 includes an encoder, a feature fusion module, a first discriminator, a decoder, and a second discriminator, where the encoder includes a feature attention module and a time sequence attention module, and the encoder is configured to convert a subsequence in input data into a corresponding hidden variable;

the feature fusion module is used for combining a plurality of hidden variables output by the encoder into a vector representation;

a first countermeasure network is formed between the first discriminator and the encoder, the prior distribution and the posterior distribution of the hidden variables are guided to be approximate by applying prior distribution to the hidden variables and adopting a countermeasure training strategy;

the decoder is used for reconstructing the original input to obtain a reconstruction result;

and a second countermeasure network is formed between the second discriminator and the decoder and is used for exerting countermeasure training between the reconstruction result and the original input.

Further, the specific process of step S4 is as follows:

s41, transmitting original input data to an encoder, and respectively operating a characteristic attention module and a time sequence attention module by applying a multidimensional attention mechanism to convert the original time sequence input X into an hidden variable Y;

s42, repeating the operation of the step S41 on T non-overlapping subsequences to obtain corresponding T hidden variables Y;

s43, performing feature fusion operation on T examples, namely T hidden variables Y, in a linear interpolation mode to form a new hidden variable code Z;

s44, applying prior distribution to the hidden variables, and guiding the prior distribution and posterior distribution of the hidden variables to approximate by adopting an countermeasure training strategy;

s45, inputting hidden variables into a decoder, and reconstructing the original input of a model to obtain a reconstruction result X';

s46, applying countermeasure training between the reconstruction result X' and the original input X to obtain a trained abnormality detection model, transmitting the current input data to the abnormality detection model, and outputting to obtain the reconstruction values of the time sequences corresponding to all the sliding windows.

Further, in the step S41, the feature attention module is configured to learn correlations between different features of the time series, and the time series attention module is configured to learn long-term and short-term dependencies within the time series;

the specific process of step S41 is as follows:

firstly, transmitting input data to a characteristic attention module, calculating to obtain characteristic codes by using the input data of a time sequence, and combining the original time sequence input with the characteristic codes after the characteristic codes are obtained, wherein the original input X of the time sequence is converted into an hidden variable Y;

then, inputting the hidden variable Y into a time sequence attention module, acquiring time sequence characteristics, and after the time sequence characteristics are acquired, combining the input Y of the time sequence attention module, and converting the time sequence into a hidden variable Z:

Z＝k ₁ ·O _VA +k ₂ ·O _TA +X+time_enc(X)

wherein ,O_VA For characteristic encoding, O _TA For time sequence features, k ₁ The weighting coefficient, k, of the feature attention module ₂ The time_enc (X) is the time coding result of the input X, which is the weighting coefficient of the time sequence attention module.

Further, in the step S43, T instances are expressed as:

the T instances are combined into one vector by linear interpolation:

wherein ,is a feature vector with a timing t.

Further, the specific process of step S44 is as follows:

assuming that Z satisfies the a priori distribution p (Z), the original data X satisfies the distribution p (X), q (z|x) is the encoded distribution, and the encoder generates a posterior distribution about the hidden variable Z by:

wherein ,is the original data with the time sequence of t;

assuming that q (z| XKt) satisfies a gaussian distribution, back-propagating through the encoder network using a re-parameterization technique, a first discriminant D1 is used to guide the similarity between the posterior distribution q (Z) of the hidden variable Z and its a priori distribution p (Z), the objective of the first discriminant is to amplify the distance between each other; on the other hand, the generator G1 constituted by the encoder, in order to confuse the first discriminant D1, tries to narrow the gap between each other, the process is optimized with maximum and minimum strategies corresponding to the following optimization objectives:

wherein ,V(D₁ ，G ₁ ) Is the cross entropy loss between the encoder and the first arbiter.

Further, the specific process of applying the countermeasure training between the reconstruction result X' and the original input X in the step S46 is:

the decoder G2 generates a reconstruction from the generation of the encoder G1:

X′ _t ＝G ₂ (G ₁ (X _t ^K ))

the decoder G2 aims at confusing the second arbiter D2, and the corresponding optimization function for this process is as follows:

wherein ,V(D₂ ，G ₂ ) Is the cross entropy loss between the decoder and the second arbiter.

Further, the specific process of step S5 is as follows:

s51, calculating absolute values of differences between real time sequence data and reconstruction time sequence data of all time sequence data:

s52, caching the historical reconstruction errors, and adding the corresponding reconstruction errors into the historical error vector after calculating the corresponding reconstruction errors at the time t, wherein the historical error vector is represented by the following formula:

s53, carrying out smoothing treatment on the reconstruction errors by using a smoothing average model, carrying out multiple times of substitution on the errors, and marking an error sequence subjected to the sliding average treatment as:

s54, determining an error threshold epsilon based on the error sequence after the smoothing process:

wherein, Δμ is the difference of the error mean, μ is the error mean, Δσ is the difference of the error variance, σ is the error variance, if the reconstruction error corresponding to the time point is greater than the error threshold epsilon, it indicates that the time point is abnormal, otherwise, the time point is normal.

Further, the calculation formula of the anomaly score of the time sequence in the step S6 is as follows:

wherein ,s⁽ⁱ⁾ An anomaly score corresponding to the ith subsequence, e ⁽ⁱ⁾ Is the error of the ith sub-sequence.

Compared with the prior art, the invention has the following advantages:

1. firstly, carrying out normalization processing on time series data in a characteristic dimension and applying sliding windows to divide the original time series data into a series of sliding windows; then, time sequence data in the sliding window are encoded through time stamp encoding and data encoding; solving the reconstruction values of the time sequences corresponding to all sliding windows by adopting an anomaly detection method based on a transducer model; and determining a dynamic anomaly threshold corresponding to each time sequence according to the reconstruction error and the historical error factors, and further solving anomaly scores of each time sequence, so that anomaly degree of time sequence data is measured. Therefore, the abnormal detection of the multivariate time sequence can be accurately carried out, and the robustness of the detection process can be ensured.

2. In the invention, a transform model is designed to comprise an encoder, a feature fusion module, a first discriminator, a decoder and a second discriminator, and a subsequence in input data is converted into a corresponding hidden variable by the encoder; combining a plurality of hidden variables output by the encoder into a vector representation by using a feature fusion module; a first countermeasure network is formed between the first discriminator and the encoder, the prior distribution and the posterior distribution of the hidden variables are guided to be approximate by applying prior distribution to the hidden variables and adopting a countermeasure training strategy; reconstructing the original input by using a decoder to obtain a reconstruction result; a second countermeasure network is formed between the second arbiter and the decoder for applying countermeasure training between the reconstruction result and the original input. Therefore, the transducer model is well applied to time sequence anomaly detection, and the reconstruction values of the time sequences corresponding to all sliding windows can be accurately and reliably obtained.

3. In the invention, the encoder in the transducer model comprises the characteristic attention module and the time sequence attention module, the characteristic attention module is utilized to learn the correlation between different characteristics of the time sequence, and the time sequence attention module is utilized to learn the long-short-period dependency relationship inside the time sequence, so that the correlation between different characteristics in the time sequence and the long-short-period dependency relationship in the time sequence can be fully and effectively obtained, thereby ensuring the accuracy of subsequent abnormality detection.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of time series encoding in the present invention;

FIG. 3 is a schematic diagram of a transducer model architecture constructed in an embodiment;

FIG. 4 is a schematic diagram of an architecture of an encoder in a transducer model in an embodiment.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples.

Examples

As shown in fig. 1, a method for detecting multivariate time series anomalies based on a transducer includes the following steps:

By applying the technical scheme, the embodiment mainly comprises the following steps:

1. normalizing the time sequence data in the characteristic dimension for a multi-variable time sequence data set formed by a plurality of time points;

2. applying a sliding window to the normalized time series data, and dividing the original time series data into a series of sliding windows;

3. encoding the sliding window time sequence data through time stamp encoding and data encoding;

4. solving the reconstruction values of the time sequences corresponding to all sliding windows by adopting an anomaly detection method based on a transducer model;

5. determining dynamic abnormal thresholds corresponding to each time sequence according to the reconstruction errors and the historical error factors;

6. and further solving the anomaly score of each time sequence according to the dynamic anomaly threshold value, and measuring the anomaly degree of the time sequence data.

In the third step, firstly determining a time stamp corresponding to the time sequence data subjected to normalization and sliding window processing; then the time stamp data is decomposed into corresponding data information, then the data information comprises year, month, day, time, etc. is subjected to time stamp coding, and is embedded and projected by combining with periodic coding to form local time sequence coding and global time sequence coding; and finally, adding local time sequence codes and global time sequence codes to the input time sequence data to form the input data of the transducer model.

Specifically, as shown in fig. 2, the time sequence information contained in the time stamp is decomposed into corresponding time stamp data;

performing time stamp position coding on time stamp data comprising information such as year, month, day, time, and the like;

analyzing the periodicity of the time series data with a fourier transform;

periodically encoding the time stamp data;

embedding and projecting the periodic coding vector and the time stamp coding vector through a learnable embedding layer to form a global position code;

performing local position coding on time sequence data in the sliding window;

the global position code and the local position code are combined and appended to the input data.

In the fourth step, the constructed transducer model architecture is shown in fig. 3, and the specific flow is as follows:

(4.1) transmitting the input data to a transducer architecture-based encoder, applying a multidimensional attention mechanism, running a feature attention module and a time sequence attention module respectively, and converting the original time sequence input X into an hidden variable Y, wherein the encoder architecture is shown in figure 4;

(4.2) repeating the above operations on T non-overlapping subsequences to form T corresponding hidden variables Y;

(4.3) performing feature fusion operation on the T examples through linear interpolation by using a feature fusion module FFM to form a new hidden variable code Z;

(4.4) applying a priori distribution to hidden variables, employing an countermeasure training strategy (encoder-based generator G) ₁ And a first discriminator D ₁ ) Guiding the prior distribution and the posterior distribution of the hidden variables to approximate;

(4.5) inputting the hidden variable into a decoder based on a transducer architecture, and reconstructing an original input X' of the model;

(4.6) applying an countermeasure training between the reconstruction result and the original input, a second discriminator D ₂ The reconstruction result is guided to approach the original input data, and the difference between the two is amplified. Generator G constituted by a decoder ₂ Is targeted at confusing second discriminant D ₂ 。

(4.7) training two generators (i.e., generator G of encoders) ₁ Generator G of decoder configuration ₂ ) And two discriminants (i.e. first judgmentDevice D ₁ And a second discriminator D ₂ ) And applying the trained transducer model to the data set to obtain a reconstruction result corresponding to the time sequence data set.

The specific content of the multi-dimensional attention mechanism applied to the input data in the step (4.1) is as follows:

in order to obtain the correlation between different features in the time series and the long-short-term dependency in the time series, the multidimensional attention mechanism comprises a feature attention module VA and a time sequence attention module TA. First, a feature attention module is operated, which calculates a feature O using time-series input data X _VA . Obtaining characteristic O _VA Then, the original time series input X is combined, and the original time series input X is converted into an hidden variable Y. The timing attention module TA then calculates the timing feature O using the hidden variable Y as input _TA . Obtaining characteristic O _TA Then, the time sequence is converted into hidden variable Z by combining with the input Y of the time sequence attention module, and the hidden variable Z is combined with O _VA and O_TA Is a multidimensional feature, and the calculation process is as follows:

Z＝k ₁ ·O _VA +k ₂ ·O _TA +X+time_enc(X)

in the encoder, the workflow of the feature attention module VA comprises:

in the feature attention module VA, for a certain point in time T, the input of the module is a feature sequence X of all variables from T windows, which is first projected into three feature spaces, converted into Q (X), K (X), V (X), as shown in the following equation:

Q(X)＝X·W _Q ，K(X)＝X·W _K ，V(X)＝X·W _V

W _Q 、W _K 、W _V the corresponding linear transformation matrix for Q, K, V, respectively, is then used to calculate the attention between the variables using the features in the query space and the key space as shown in the following equation:

S＝Q(X)·K(X) ^T

s is the inner product of Q, K, alpha _(q，k) Is the attention coefficient obtained after normalization, wherein S _(q，k) Is the inner product of q rows and k columns, S _(q，j) Is the inner product of q rows and j columns, and finally the normalized attention moment array is applied to the value space V (X), and the convolution operation is used to calculate the output O _VA The following formula is shown:

O _VA ＝α·V(X)·W _o

alpha is the attention coefficient, W _o Is a convolution coefficient.

The workflow of the time sequence attention module TA then comprises:

for a certain fixed variable, the input of this module consists of the outputs of the position coding module and the feature attention module VA, and like the feature attention module, an attention matrix S can be calculated, where Si, j represents the attention of the history memory of time step j to the features of time step i, and Si, j is set to zero when i < j, i.e. the upper right corner of the matrix S is set to 0.

And (4.3) performing feature fusion operation on the T examples through linear interpolation, wherein the flow comprises the following steps:

the feature fusion layer merges T instances into one vector through linear interpolation, and the T instances are expressed as

The combination of which is shown below:

the step (4.4) is to adopt an countermeasure training strategy to guide the prior distribution and the posterior distribution of the hidden variables to reach approximation, and the flow is as follows:

and merging the generated hidden variables into a vector Z through a feature fusion module FFM. Assuming that Z satisfies the a priori distribution p (Z), the original data X satisfies the distribution p (X), and q (z|x) is the encoded distribution. The encoder generates a posterior distribution of the dry hidden variable Z by:

the antagonism network D1 directs the posterior distribution q (Z) to match the prior distribution p (Z). Assuming q (z| XKt) satisfies the gaussian distribution, a re-parameterization technique is employed for back-propagation through the encoder network. The first discriminant D1 is used to guide the similarity between the posterior distribution q (Z) of the hidden variable Z and its a priori distribution p (Z), the objective of which is to amplify the distance between each other; on the other hand, the generator G1 constituted by the encoder tries to narrow the gap between the first discriminators D1 in order to confuse them. The process corresponds to the following optimization objectives, which can be optimized with maximum and minimum policies:

the process of applying the countermeasure training between the reconstruction result and the original input in the step (4.6) is as follows:

the generator G2 constituted by the decoder aims at confusing the second arbiter D2. The corresponding optimization function for this procedure is as follows:

in the fifth step, according to the reconstruction error and the history error factors, the flow corresponding to the dynamic anomaly threshold value corresponding to each time sequence is comprehensively determined as follows:

(5.1) calculating absolute values of differences between the real time-series data and the reconstructed time-series data of all the time-series data.

And (5.2) caching the history reconstruction errors, performing anomaly judgment on the current data points, and comprehensively considering the history error factors. At time t, after the corresponding reconstruction error is calculated, it is added to the historical error vector as shown in the following formula:

(5.3) smoothing the reconstruction error by using a smoothing average model. And replacing the error for a plurality of times, and marking the error sequence subjected to the moving average processing as the following formula:

and (5.4) selecting an error threshold epsilon for the error sequence subjected to the smoothing treatment, and regarding the time point with the error threshold epsilon being larger than the error threshold as abnormal, otherwise, regarding the time point as normal. The method of selecting the threshold from the set is defined by the following equation:

(5.5) calculating an anomaly score for each anomaly time series for measuring the anomaly degree according to the obtained error threshold value:

to verify the effectiveness of the present solution, the present embodiment first prepares a data set for abnormality detection of an existing public data set SWaT, which is a safe water treatment data set in the present embodiment, collected by the igrout laboratory of singapore university of science and technology, the experiment is performed on the most advanced water treatment test bed, and the data set records sensor values, water level, flow rate, etc.; and actuator values, including valves and pumps. The data set collects data of 7 days of normal operation and data of 4 days of abnormal operation, the abnormal rate is about 11.98 percent, the data set comprises 51 dimensions, and the dividing ratio of the training set and the test set is about 1:1.

and then carrying out data preprocessing, carrying out normalization processing on the time sequence contained in the SWaT data set, specifically carrying out normalization operation on 51 dimensions respectively, selecting the maximum value and the minimum value of each dimension characteristic, carrying out difference between the value corresponding to each moment and the minimum value of the sequence, and obtaining the difference value between the maximum value and the minimum value, wherein the difference value is used as a new value. And (3) applying a sliding window to the normalized time sequence data, taking a sliding window length of 10 for the SWaT data set, and forming n-10 sliding window sequences after the time sequence data set contained in the SWaT data set is normalized and divided by the sliding window, wherein n is the time sequence length of a training set or a testing set.

Then training a transducer model, wherein main super parameters in the model are as follows: training environment, training batch, learning rate, optimizer, sliding window length, and number of modules of encoder and decoder. In this embodiment, a pyrerch framework is adopted to implement the proposed model architecture, and the setting of the hyper-parameters of the model is as follows:

training environment-GPU is NVIDIA RTX 3060 display card, CPU model Intel (R) Core (TM) i7-11700, 32G memory;

the training batch was set to 50;

the initial learning rate is set to 0.01;

the optimizer adopts an Adam optimizer;

selecting the length of a sliding window to be 10;

the number of modules of the encoder/decoder is set to 1, respectively.

Finally, experimental tests were performed, the time series data set is usually that the number of normal data is far greater than that of abnormal data, and the labels of the data set are extremely unbalanced, and the performance of the method is evaluated by using three indexes of the main stream in this embodiment, including accuracy (Pre), recall (Rec) and F1 score (F1). The experimental results obtained are shown in table 1, comparing the 6 baseline methods commonly used with the methods proposed in this scheme.

TABLE 1

Model	Pre	Rec	F1
				PCA	0.2667	0.2325	0.2484
DAGMM	0.2641	0.7182	0.3861
				LSTM-NDT	0.7778	0.5109	0.6167
OmniAnomaly	0.9678	0.6869	0.8035
				GDN	0.9697	0.6947	0.8094
MTAD-GAT	0.9689	0.6956	0.8098
				RTAD (invention)	0.9764	0.6997	0.8152

From the results in table 1, the F1 score obtained on the SWaT dataset for this protocol was 0.8152, which exceeded the baseline model compared to the baseline model, yielding the best results.

To further verify the validity of each module in the robust transducer approach based on multidimensional attention, the present embodiment also performs a comparison experiment of each module against the final result gain. The specific arrangement is as follows: firstly, a coder decoder structure of a transducer is not considered, a feedforward neural network is used instead, and a corresponding model is named as w/otransformer; secondly, removing the global position code, replacing the global position code with an absolute position code corresponding to the original transducer network, and naming a corresponding model as w/o ts_enc; secondly, modeling variable association by using a characteristic attention module is considered to be removed, namely time sequence attention association is only considered, and the model is named as w/o var_emb; and finally, modeling the long-short-period time sequence dependency relationship by removing the time sequence attention module, namely only considering the association between variables, wherein the model is named as w/o tim_emb. Table 2 shows the effect of different modules on the model.

TABLE 2

Model	Pre	Rec	F1
				w/o transformer	0.6696	0.5943	0.6297
w/o ts_enc	0.9678	0.6874	0.8038
				w/o var_emb	0.9346	0.6569	0.7715
w/o tim_emb	0.9296	0.6564	0.7694
				RTAD	0.9764	0.6997	0.8152

As can be seen from the results in table 2, the method proposed in this scheme has a gain for the final result for each module, and there is no redundant design.

In summary, the technical scheme innovatively applies the transducer model to the time sequence anomaly detection technology, and fully considers the correlation among different features in the time sequence and the long-term and short-term dependency in the time sequence, thereby effectively improving the accuracy and the robustness of the multivariate time sequence anomaly detection.

Claims

1. A method for detecting multivariate time series abnormality based on a transducer is characterized by comprising the following steps:

2. The method for detecting a multivariate time series anomaly based on a transducer according to claim 1, wherein the step S3 specifically comprises the following steps:

3. The method according to claim 1, wherein the transform model in the step S4 includes an encoder, a feature fusion module, a first discriminator, a decoder, and a second discriminator, the encoder includes a feature attention module and a time sequence attention module, and the encoder is configured to convert a subsequence in input data into a corresponding hidden variable;

4. The method for detecting multivariate time series anomalies based on Transformer according to claim 3, wherein the specific process of the step S4 is as follows:

5. The method for detecting a multivariate time series anomaly based on a transform according to claim 4, wherein in the step S41, the feature attention module is used for learning correlations between different features of the time series, and the time series attention module is used for learning long-term and short-term dependencies within the time series;

the specific process of step S41 is as follows:

Z＝k ₁ ·O _VA +k ₂ ·O _TA +X+time_enc (X) wherein O _VA For characteristic encoding, O _TA For time sequence features, k ₁ Attention model for characteristicWeighting coefficients, k, of the block ₂ The time_enc (X) is the time coding result of the input X, which is the weighting coefficient of the time sequence attention module.

6. The method for detecting a multivariate time series anomaly based on a transducer according to claim 5, wherein in step S43, T instances are represented as:

the T instances are combined into one vector by linear interpolation:

wherein ,is a feature vector with a timing t.

7. The method for detecting multivariate time series anomalies based on Transformer according to claim 6, wherein the specific process of the step S44 is as follows:

wherein ,is the original data with the time sequence of t;

assuming that q (Z-XKt) satisfies a gaussian distribution, back-propagating through the encoder network using a re-parameterization technique, a first discriminant D1 is used to guide the similarity between the posterior distribution q (Z) of the hidden variable Z and its a priori distribution p (Z), the objective of the first discriminant being to amplify the distance between each other; on the other hand, the generator G1 constituted by the encoder, in order to confuse the first discriminant D1, tries to narrow the gap between each other, the process is optimized with maximum and minimum strategies corresponding to the following optimization objectives:

8. The method for detecting multivariate time series anomalies based on a transform according to claim 7, wherein the specific process of applying countermeasure training between the reconstruction result X' and the original input X in the step S46 is:

9. The method for detecting multivariate time series anomalies based on Transformer according to claim 8, wherein the specific process of the step S5 is as follows:

wherein Deltav is the difference of the mean value of the errors, mu is the mean value of the errors, deltasigma is the difference of the variance of the errors, sigma is the variance of the errors, if the reconstruction error corresponding to the time point is larger than the error threshold epsilon, the time point is indicated to be abnormal, otherwise, the time point is normal.

10. The method for detecting a multivariate time series anomaly based on a transducer according to claim 9, wherein the calculation formula of the anomaly score of the time series in step S6 is: