CN115983087B

CN115983087B - Method for detecting time sequence data abnormality by combining attention mechanism with LSTM (link state machine) and terminal

Info

Publication number: CN115983087B
Application number: CN202211203293.4A
Authority: CN
Inventors: 刘慧�; 姜凯; 阮怀军; 赵佳; 周蕊; 梁慧玲
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2022-09-16
Filing date: 2022-09-29
Publication date: 2023-10-13
Anticipated expiration: 2042-09-29
Also published as: CN115983087A

Abstract

The invention provides a method for detecting time sequence data abnormality by combining an attention mechanism with an LSTM and a terminal, wherein input data X is added into a self-encoder; the attention introducing mechanism processes the input data X; calculating an abnormal state of the input data X using the reconstruction error; predicting input data through an ALAE model, and detecting data deviating from a normal value; the model calculates the independent anomaly score of each sensor, combines the independent anomaly scores into the anomaly score of each time point, and defines whether anomaly data appear at each time point through the anomaly scores. The present invention uses a self-encoder framework to capture the time dependence of a sequence by using a combination of attention mechanisms and long and short recurrent neural networks as constituent modules of the encoder and decoder; meanwhile, after the attention mechanism is added, the learning capacity is higher when the data dimension is higher, and the problem that the learning capacity is reduced when the dimension is too high is solved.

Description

Method for detecting time sequence data abnormality by combining attention mechanism with LSTM (link state machine) and terminal

Technical Field

The invention relates to the technical field of hydropower station abnormal data detection, in particular to a method and a terminal for detecting time sequence data abnormality by combining an attention mechanism with an LSTM.

Background

Time series data is defined as a group of data sequences with time characteristics, and isolated data values are connected through a time dimension, so that state changes of software and hardware of a computer are revealed, and more valuable information is contained. During the operation of a hydropower station, various sensors with time characteristic data are easy to generate abnormality under the interference of external environment. Investigation has shown that hydropower stations and other facilities or large internet company servers are shut down, and huge losses are caused every second. Therefore, the method has practical significance in effectively identifying and diagnosing the abnormal value, and can timely find out the abnormal data value to generate early warning, so that management staff can take measures to solve potential problems.

Hydropower station operation data anomaly values are generally considered to be values that are significantly offset from other data, whereas in time series data, data in a new period of time may be considered to be anomaly data in the past due to a large change in time, as the data changes over time. Anomalies in the time series data are therefore typically defined as data that do not conform to a well-defined normal behavior pattern. With the increase in data dimension and the increase in data volume, the abnormality of the time series data is difficult to be observed by naked eyes, and therefore, it is a very important and difficult problem for the abnormality detection work of the time series data of the hydropower station.

The conventional method is generally used for detecting abnormal data of a hydropower station, and the following methods are adopted: determining a reasonable fluctuation range of target detection data through distribution of historical data based on a statistical model, such as an anomaly detection algorithm based on a 3sigma criterion; based on a clustering model, such as a K-means algorithm, normal data are clustered to find the boundary between an abnormal value and a normal value for judgment; and an abnormality detection algorithm based on classification, such as OC-SVM, judges by finding a hyperplane between a normal value and an abnormal value. These methods all exhibit better performance when the data size is small and the dimensions are low. However, with the rapid expansion of data volume and the continuous increase of dimension, these methods have not been able to meet the detection of hydropower station abnormal data.

With the improvement of calculation force, various deep learning models are endless, and have excellent performance in data mining. Deep learning can be largely classified into a supervised method and an unsupervised method in anomaly detection of multi-metadata. The supervised method has improved performance compared to the unsupervised method, but the supervised method requires a large number of available marked training samples, and the number of abnormal samples in the real world data is much smaller than that of normal samples, so the unsupervised method is more practical in practical application. In the unsupervised method, a method based on prediction or reconstruction data is more common, and whether the data is abnormal or not is judged according to the abnormal score by predicting and reconstructing the data and then carrying out abnormal scoring according to the reconstruction error; the method is improved by predicting the data by mining the relation between the data characteristics, so that the learning capacity of the deep learning model when the data dimension is too high is enhanced, the accuracy of the multi-data prediction is improved, and the performance of the model is further improved. The method of prediction or reconstruction using only a deep learning model gradually decreases the performance as the data dimension gradually increases, so that the effect of the possible interrelationship between the multi-dimensions has to be taken into account when solving this problem using the concept of prediction or reconstruction.

The data dimension is a big key factor for limiting the anomaly detection performance of the multi-element time series data, and when the data dimension is continuously increased, the existing unsupervised anomaly detection method has insufficient learning ability, and can not completely capture the potential complex relationship among a plurality of variables, so that the anomaly detection effect is poor.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a method for detecting time sequence data anomalies by combining an attention mechanism with an LSTM, wherein an automatic encoder based on the LSTM can learn characteristic information of a longer sequence, and an attention network compensates for the problem of reduced learning ability when the dimension is too high.

The method for detecting time series data abnormality by combining an attention mechanism with LSTM comprises the following steps:

step 1: adding input data X to the self-encoder;

(11) Mapping the input data X into variables of a self-encoder model structure, and then reconstructing sequences in a potential space;

(12) Compression encoding is carried out on the input data X, and the high-dimensional input data X is represented by a low-dimensional vector, so that the typical characteristics of the input data X can be reserved by the compressed low-dimensional vector;

(13) Reducing a reconstruction error between the input data X and the reconstruction data X' by training;

(14) Compressing and restoring the input data X, extracting key features, and enabling the data obtained by compression and restoration to be close to real data;

step 2: the attention introducing mechanism processes the input data X;

step 3: calculating an abnormal state of the input data X using the reconstruction error;

predicting input data through an ALAE model, and detecting data deviating from a normal value;

the model calculates the independent anomaly score of each sensor, combines the independent anomaly scores into the anomaly score of each time point, and defines whether anomaly data appear at each time point through the anomaly scores.

It should be further noted that, the first step further includes:

will input the numberInput to the forgetting gate to generate a vector f according to X _t Vector f _t Between 0 and 1, where 1 means complete retention and 0 means complete forgetting;

f _t ＝σ ₁ (W _f ·[h _t-1 ，x _t ]+b _f )

input data X to input gate, vector i in input gate _t The hidden vector h from the last cell _t-1 And x currently entered _t The generation mode is as follows:

i _t ＝σ ₃ (W _i ·[h _t-1 ，x _t ]+b _i )

representing an update of the cell state, using tanh as an activation function,

the forgetting gate and the input gate vector jointly determine the cell state C of the current cell _t ，

Calculating the cell output o by _t The output of the hidden variable is determined by the state of the current cell;

o _t ＝σ ₃ (W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

after LSTM processing, the codec process is expressed as:

wherein the method comprises the steps ofAnd b is the weight matrix and offset in the decoder, y _t Is a vector in potential space;

and b' is the weight matrix and offset in the decoder, < >>The data value of the current timestamp reconstructed for the model.

It should be further noted that step 2 further includes:

define the state h= { h of the last time point input data X ₁ ,h ₂ ,...,h _t-1 Extracting the front and rear time point vector v from h _t ；

v _t Is h of each column h _i V _t Representing information related to the current time step;

setting the scoring function as f R ^m *R ^m Calculating the correlation between the input vectors;

front-back time point vector v _t The calculation is as follows:

the learning ability of high-dimensional sequence data is achieved by using a multi-head attention mechanism, and the final output is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W

h is the total number of attention heads, each defined as:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein the projection is a parameter matrix

It should be further noted that, step 3 further includes:

the anomaly score compares the data of the time t with the reconstructed data, and calculates an error value e of the time t moment:

normalization processing is carried out on error values based on an ALAE mode:

wherein is usedAnd->Median and quartile for the e-value current time point;

calculating an anomaly score value of the characteristic of each time point, and aggregating the first beta maximum anomaly scores of each time point to obtain the anomaly score value:

searching in the interval of the anomaly score to obtain a preset model threshold;

at the time of testing, any point in time at which the anomaly score presets the model threshold is considered an anomaly.

The invention also provides a self-encoder comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor is used for realizing the steps of a method for detecting time sequence data abnormality by combining an attention mechanism with an LSTM when executing the program.

From the above technical scheme, the invention has the following advantages:

the LSTM layer used in the invention is added into a mode of combining an encoder and a decoder to obtain the model LAE, so that the attention mechanism plays a role in the whole model. According to the method for detecting time sequence data anomalies by combining the attention mechanism with the LSTM, which is provided by the invention, the dependency relationship between characteristic data can be learned by combining the attention mechanism with the LSTM and introducing the attention mechanism with the LSTM into the self-encoder structure, the learning capability is better in data with more data characteristics, and the method has good performance in anomaly detection of high-dimensional data.

The present invention uses a self-encoder framework to capture the time dependence of a sequence by using a combination of attention mechanisms and long and short recurrent neural networks as constituent modules of the encoder and decoder; meanwhile, after the attention mechanism is added, the learning capacity is higher when the data dimension is higher, and the problem that the learning capacity is reduced when the dimension is too high is solved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for detecting anomalies in time series data by combining an attention mechanism with an LSTM;

FIG. 2 is a deep learning model architecture diagram under AE architecture combining LSTM and attention mechanisms;

FIG. 3 is a block diagram of LSTM;

FIG. 4 is a graph of WADI dataset experimental comparison results;

FIG. 5 is a graph showing the effect of the size of the sliding window on the experimental results;

Detailed Description

The attention mechanism and LSTM combined time sequence data anomaly detection method provided by the invention consists of an encoder and a decoder; the encoder and decoder parts are integrated with the attention network and the long-short-period cyclic neural network to detect abnormality of the multi-element time sequence data. The LSTM-based automatic encoder can learn the characteristic information of a longer sequence, and the attention network compensates for the problem of reduced learning ability when the dimension is too high. Model performance was verified by comparison with advanced methods in the real world collection of multiple time series data sets of two waterworks sensor data sets, SWAT and WADI, mars sensor data sets SMAP and MSL, and UCI data sets. Compared with a baseline method, the combined model provided by the invention has better detection performance.

The attention mechanism and LSTM combined time series data anomaly detection method can acquire and process associated data based on artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

FIG. 1 is a flow chart of a preferred embodiment of a method for detecting a time series data anomaly by combining the attention mechanism with the LSTM. The attention mechanism and LSTM combined method of detecting time series data anomalies is applied to one or more terminals, which are devices capable of automatically performing numerical calculations and/or information processing according to preset or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, and the like.

The terminal may be any electronic product that can interact with a user, such as a personal computer, tablet, smart phone, personal digital assistant (Personal Digital Assistant, PDA), interactive web tv (Internet Protocol Television, IPTV), etc.

The network in which the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.

The method for detecting the abnormal time sequence data by combining the attention mechanism with the LSTM is described in detail below with reference to fig. 1 to 3, and the method for detecting the abnormal time sequence data by combining the attention mechanism with the LSTM can be applied to analysis of abnormal values of hydropower station operation data, and the change trend of the hydropower station operation data provides support for evaluating whether the hydropower station operation data is abnormal or not, and has a positive effect on guaranteeing stable operation of the hydropower station.

Referring to fig. 1 to 3, a flowchart of a method for detecting an anomaly of time-series data by combining an attention mechanism with an LSTM is shown, in which a reconstructed anomaly detection model is constructed.

The specific method is as follows: constructing a coder-decoder model by combining a long-short-period cyclic neural network LSTM and an Attention mechanism Attention to detect abnormality; by learning a deep learning model, the model can reconstruct abnormal or normal sequences after the encoding and decoding processes. As shown in fig. 2, an LSTM self-encoder model ALAE based on an attention mechanism is proposed. In the invention, the quantity of the normal data fed back by the system is probably far greater than that of the abnormal data, and the invention adopts an unsupervised method and uses the normal data to train the model; model performance is then detected in the test set with anomalies.

The anomaly detection problem defining the multivariate time series is: given a multivariate time series x= { x ₁ ,x ₂ ,x ₃ ,...,x _n },x _i ∈R ^m ,(m>1) Vector x in the sequence _i Data represented as n variables at a time in a dataset; m represents the total number of variables in the physical environment. The output target of the model is the output vectorCalculating x through sliding window _t And->Performing anomaly scoring on the data of each time stamp; returning by anomaly scoreLabel y _i E {0,1}, data x representing each time point _i Is an abnormal situation of (a). The method of the invention uses a dynamic threshold method to screen out the optimal result in the process of returning the label by using the anomaly score; wherein y is _i When=1, the data representing this time point is abnormal.

The ALAE method of the invention adopts a reconstruction mode to model a normal time sequence, and models a time sequence x with a window size of n _t ＝{x _t-n ,x _t-n+1 ,...,x _t Single step time prediction. The model implements the encoding and decoding process by learning two mapping functions D (X) and E (X). In the first stage of the model, the input time series data is compressed by an encoder, key features of the data are extracted, the data are compressed to a low-dimensional space z at the moment, and the data of the potential space are reconstructed by a decoder in the second stage.

The model training targets are as follows:

AE(x)＝D(z),z＝E(x)

l _AE ＝||x-AE(x)|| ₂

the ALAE model of the invention also has two parts of structure of an encoder and a decoder, so as to realize accurate prediction of data abnormality. For the problem that the standard self-encoder model is difficult to fully learn under high-dimensional data, the ALAE model also comprises two parts of structures of an encoder and a decoder, so that the accurate prediction of data anomalies is realized, and the anomaly detection performance is improved from two aspects. On one hand, the LSTM is added into the structure of the self-encoder, so that the characteristics of the input normal data can be identified to execute a good reconstruction model to compensate the inherent limitation of AE; on the other hand, by combining a multi-head attention mechanism and LSTM, the model can still have better data reconstruction capability on long-sequence and multi-characteristic data. The model learns in the collected normal data, and the ALAE can learn the potential relation in the multivariable sequence through the LSTM and the codec formed by the attention mechanism, so that the model has excellent expression in predicting the normal sequence.

Further, as a refinement and extension of the specific implementation manner of the foregoing embodiment, for fully describing the specific implementation process in this embodiment, the method includes the following steps:

s101: adding input data X to the self-encoder;

in order to process the time characteristics of the sequence data, the encoder and the decoder in the ALAE method both use a long-short-period cyclic neural network LSTM.

Mapping input data X from the encoder model structure into a set of potential variables, and then reconstructing sequences in potential space; the processing of the data in the encoder is a compression encoding process of the data, and the high-dimensional data is represented by a low-dimensional vector, so that the typical characteristics of the input data can be reserved by the compressed low-dimensional vector, and the normal data can be reconstructed. The reconstruction error between the input data X and the reconstruction data X' is continuously reduced through training, and then the expected target is accurately predicted. Extracting key features by compressing and restoring the data, wherein the reconstructed data is as close to real data as possible; for long sequence data with temporal features, simple self-encoder structures are difficult to handle.

The strong learning ability of LSTM is used in the ALAE model to learn complex rules between long sequences as part of the encoder and decoder; after inputting the data into the model, the data first passes through the forgetting gate of the cell, which determines which information needs to be forgotten from the cell state, the vector f generated _t Between 0 and 1, where 1 means "fully reserved" and 0 means "fully forgotten".

f _t ＝σ ₁ (W _f ·[h _t-1 ，x _t ]+b _f )

The data then reaches the input gate, vector i in the input gate _t The hidden vector h from the last cell _t-1 And x currently entered _t Generation, which decides which information can be stored in the cell state:

i _t ＝σ ₃ (W _i ·[h _t-1 ，x _t ]+b _i )

representing an update of the cell state, typically using tanh as an activation function,

Finally, calculating the cell output o _t The method is consistent with the calculation method of the forgetting gate and the input gate vector; the output of the hidden variable is determined by the state of the current cell.

o _t ＝σ ₃ (W _o [h _t-1 ，x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

After LSTM improvement, the codec process can be expressed as:

wherein the method comprises the steps ofAnd b is the weight matrix and offset in the decoder, y _t Is a vector in potential space. />And b' is the weight matrix and offset in the decoder, < >>The data value of the current timestamp reconstructed for the model. Melting LSTMAfter the self-encoder is input, the time sequence is used as a part of model learning, so that the accuracy of model learning in time sequence can be remarkably improved.

S102: the attention introducing mechanism processes the input data X;

the improved self-encoder of the cyclic neural network models a long sequence by utilizing the transmission of the sequence up-and-down invention, and has better learning ability on a univariate time sequence. However, there may be complex potential relationships between multi-feature data, which may be underrepresented in learning when modeling time dependence. The method of the invention introduces a attentive mechanism to enhance the learning ability of the model on the multi-feature data.

ALAE characterizes the dependency relationship between the output features and the input features through an attention mechanism, the stronger the association degree is, the larger the input weight is, and the representation of the model in the multi-feature sequence is influenced through the weight. The method of the invention firstly defines the previous state h= { h ₁ ,h ₂ ,...,h _t-1 The upper and lower invention vectors v are extracted from h _t 。v _t Is h of each column h _i And represents information related to the current time step. v _t And the current h _t Further integration to produce predictions. Let the scoring function f R ^m *R ^m R, the function calculates the correlation between its input vectors. Vector v of the invention _t The calculation is as follows:

the multi-head attention mechanism is used in the ALAE method to enhance the learning ability of the high-dimensional sequence data, and the final output is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W

h is the total number of attention heads, each defined as:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein the projection is a parameter matrixIn the ALAE method, an attention parameter matrix is obtained through learning; the introduction of attention increases the complexity of model learning, but increases significantly in terms of accuracy of the experiment.

S103: calculating an abnormal state of the input data X using the reconstruction error;

the ALAE model detects data deviating from a normal value by predicting the data; the model calculates individual anomaly scores for each sensor, which are combined into anomaly scores for each time scale, thereby defining whether anomalies occur at that point in time by anomaly score. The anomaly score compares the data of the time t with the reconstructed data, and calculates an error value e of the time t moment:

since different features may have different characteristics, the error values may also have very different proportions. The ALAE method of the present invention normalizes the error value to prevent errors in the results due to bias:

wherein is usedAnd->The median and quartile of the current time scale for the e value make the anomaly score more robust.

The invention obtains an anomaly score value for each feature at each time point, introduces a parameter beta, and aggregates the first beta maximum anomaly scores at each time point to obtain the anomaly score value:

the ALAE method uses a method of threshold search to determine the threshold, and the outlier score has been normalized within the range in the method, so a search is made within the interval of the outlier score to obtain the optimal model threshold. At the time of testing, any moment when the anomaly score exceeds a threshold value is considered an anomaly.

In order to verify that the method of the present invention has a good performance in anomaly detection of multivariate data, the following examples are presented for the purpose of explanation.

The present invention uses 5 publicly available datasets to verify the proposed method, and the statistics of the 5 datasets are summarized in table 1.

SWAT and WADI datasets are based on truly built water treatment regimes, and are annotated by an operator simulating the attack scenarios of a real water treatment plant, representing a small-scale version for modern physical systems. The SWAT data set collection process lasts 11 days, runs normally for 7 days, and is used for 4 days in an attack scene, all normal and abnormal data obtained by 51 sensors are collected and marked, and 41 attacks are initiated in the whole process. The WADI dataset was collected as a SWAT bench extended water distribution bench for a total of 16 days, 14 days for normal operation and 2 days for the challenge scenario, with 123 sensors and actuators data collected.

Soil Moisture Active Passive (SMAP) satellite and spark science laboratory (MSL) rover datasets are two real world public datasets from the expert signature dataset of NASA. They contain data of 55/27 entities, each monitored by an m=25/55 index, respectively.

The Occupancy data set occuppancy is selected from UCI public data sets, which are collected for 7 days, and is to accurately detect Occupancy in an office through 6 features of light, temperature, humidity, etc.

Table 1: baseline dataset, (%) is the percentage of outlier data points in the dataset.

The method of the invention adopts standard evaluation index precision (Precison), recall ratio (Recall) and F1 Score (F1-Score) which are commonly used in abnormality detection tasks for evaluating the performance of the method.

Wherein the F1 score is an evaluation index integrating Precision and Recall. If the positive class is classified as a negative class, then this type of error is considered a False Negative (FN), and if the negative sample is classified as a positive class, then this type of error is considered a False Positive (FP). Similarly, true Positive (TP) and True Negative (TN) can be obtained; the F1 score calculated by the method can meet the requirement of unbalance of a data set used by the detection model, and the higher the F1 score is, the better the comprehensive performance of the model is. As shown in the above formula, P represents the accuracy of model abnormality judgment, and the larger the numerical value is, the higher the accuracy of true abnormality judgment is; r represents the probability of misjudgment of the model, and the larger the numerical value is, the smaller the probability of misscore abnormality of the model is. In the practical application scene, the real abnormality is usually detected, so the method of the invention focuses more on the Precision and F1 score of the model, and can reflect the comprehensive effect of the model.

The method provided by the invention uses normal data observation values in training, learns the normal behavior of the sequence, and then detects on the test set. In an actual application scenario, the occurrence of the anomaly is usually continuous, so when an anomaly occurs in a window, we consider that the observed value of the window is abnormal.

PCA principal component analysis in the invention reconstructs data from low-dimensional projections by mapping the data to one low-dimensional projection;

KNN: the K nearest neighbor algorithm uses the distance as an anomaly score to judge whether the distance is an anomaly value or not;

DAGMM: reconstructing data by an estimation model consisting of a compression network consisting of an automatic encoder and a Gaussian mixture model;

LSTM-VAE: the structure of the feedforward network in the VAE by using an automatic encoder is replaced by an LSTM, then the time sequence is reconstructed, and the abnormality judgment is carried out through the reconstruction error;

MAD-GAN: application in the GAN framework using LSTM as the base model enables reconstruction of data.

Experimental results obtained based on the manner of the present invention are shown in tables 2 and 3,

table 2: experimental results, (%) is the percentage of outlier data points in the dataset

Table 3: experimental results, (%) is the percentage of outlier data points in the dataset

Comparative experimental analysis: the anomaly detection accuracy, recall and F1 scores on the five data sets SWAT, WADI, SMAP, UCI and MSL for the proposed method ALAE and baseline method are shown in tables 2 and 3. Each baseline method uses its particular threshold selection method and corresponding F1 score calculation method.

In a real application scenario, the overall performance of the model is often more important, so that the invention focuses on the F1 score in the result. The proposed solution of the present invention achieves the highest F1 score on both the SWAT, WADI, UCI and SMAP data sets. While in terms of accuracy, the LSTM-VAE algorithm is slightly behind in WADI datasets, and in terms of recall, the MAD-GAN method is slightly behind in SMAP and MSL datasets.

As can be seen from the table, the conventional machine learning algorithm performs poorly on a larger number of higher dimensional datasets, while the deep learning model performs better. Deep learning models such as LSTM, LSTM-VAE and the like based on a recurrent neural network are excellent in modeling long sequences and acquiring correlation of the invention in time and out time; whereas the DAGMM method is often used to process multivariate data without time information, the data is not input in a time window, and thus is not suitable for modeling time correlation, which is important for anomaly detection of time-series data. In LSTM-based schemes, such as LSTM-VAE, the existing methods mostly ignore modeling the correlation between potential variables; MAD-GAN uses a recurrent neural network, adopts a general antagonism training mode to reconstruct original data, and is not enough in multi-feature sequence learning.

In table 3, the F1 score was slightly lower than the baseline method in the experiments with ALAE in the MSL dataset; the performance in SMAP is also only 1% higher. It has been found that there is a relationship between the features contained in the NASA dataset, such as temperature, radiation intensity, etc., but the relationship strength is much weaker than the relationship between the sensors in the SWAT dataset, and there is no advantage in focusing on the performance of models that learn the potential relationship between features. Therefore, when model learning is performed on such data, the model of the generation formula such as MAD-GAN shows more excellent effects.

The present invention makes experimental comparisons in the WADI dataset, and the anomaly shown in fig. 4 is due to anomaly caused by overflow of the main tank due to malicious opening of the electro-valve mv_001, the data changes being reflected in lt_001 and fit_001. The data predictive values and observed values of the three variables associated with the anomaly are shown, with the range of data values shown to be within the normal range. The error of the predicted value and the observed value of the data variable MV_001 after model processing is larger at the stage, and the ALAE model uses the maximum error in the variables for calculation, so that a larger anomaly score is generated, and the anomaly can be detected more accurately.

The parameter sensitivity analysis mode of the invention comprises the following steps: the entire set of variables is considered simultaneously in the data processing to capture potential interactions between the variables. The invention uses sliding window to divide multi-element data into a plurality of sub-sequences, and the sliding window is set as S ^W The sliding step length is S ^S Then the number of subsequences isDifferent encoder lengths affect the modeling capabilities of LSTM, so the sequence length is considered to be a super parameter in the method of the present invention. The present invention uses different sequence lengths in experiments to determine the best model performance.

FIG. 5 uses a WADI dataset with higher data dimension to verify the effect of the sliding window size on the experimental result, and as can be seen from FIG. 5, when the sliding window size is changed, the effect on the experimental result is larger, and as the sliding window is increased, the accuracy of the model is increased continuously; of the five points shown in the figure, the F1 number is the largest when the sliding window is 20, at which point the model has the best performance.

Table 4: model of ALAE and variants thereof performance of anomaly detection on SWAT, WADI and UCI datasets

To illustrate the importance of each component of the method ALAE of the present invention, the present invention conducted an ablation experiment as shown in Table 4. The invention uses only the LSTM layer to add the combination of the encoder and the decoder to obtain the model LAE to illustrate the effect of the attention mechanism on the whole model. From table 4, the ALAE method with the attention mechanism added performs more excellent; in the SWAT data set and the WADI data set with more features, the performance is more obvious, and in the UCI data set with fewer features, the performance of the model is reduced in a small extent, and it can be found that the attention mechanism can enhance the learning ability of the model in multi-feature data in the whole model learning. LSTM and attention mechanisms play an important role in the overall model structure and also make a more rational explanation of the model in comparison to the baseline approach.

According to the method for detecting time sequence data anomalies by combining the attention mechanism with the LSTM, which is provided by the invention, the dependency relationship between characteristic data can be learned by combining the attention mechanism with the LSTM and introducing the attention mechanism with the LSTM into the self-encoder structure, the learning capability is better in data with more data characteristics, and the method has good performance in anomaly detection of high-dimensional data. In five data sets used in the experiment, the ALAE performance of the method provided by the invention is slightly better than that of the baseline method.

The attention mechanism and LSTM combined method of detecting time series data anomalies provided by the present invention are elements and algorithm steps of examples described in connection with the embodiments disclosed in the present invention, and can be implemented in electronic hardware, computer software, or a combination of both, and to clearly illustrate the interchangeability of hardware and software, the components and steps of examples have been generally described in terms of functionality in the foregoing description. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Those skilled in the art will appreciate that the various aspects of the attention mechanism and LSTM combination detection timing data anomalies method provided by the present invention may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

The attention mechanism provided by the present invention in combination with the LSTM method of detecting temporal data anomalies may write program code for performing the operations of the present disclosure in any combination of one or more programming languages, including an object-oriented programming language such as Java, c++, etc., and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for detecting time series data anomalies by combining an attention mechanism with an LSTM, the method comprising:

step 1: adding input data X to the self-encoder;

step 2: the attention introducing mechanism processes the input data X;

front-back time point vector v _t The calculation is as follows:

the learning ability of the high-dimensional sequence data is enhanced by using a multi-head attention mechanism, and the final output is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W

h is the total number of attention heads, each defined as:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein the projection is a parameter matrix

the model calculates the independent anomaly score of each sensor, combines the independent anomaly scores into the anomaly score of each time point, and defines whether anomaly data appear at each time point or not through the anomaly score;

normalization processing is carried out on error values based on an ALAE mode:

wherein is usedAnd->Median and quartile for the e-value current time point;

any point in time at which the anomaly score exceeds a preset model threshold is considered an anomaly at the time of testing.

2. The method of claim 1, wherein the attention mechanism in combination with the LSTM detects time series data anomalies,

the first step further comprises:

input data X to forget gate generationVector f _t Vector f _t Between 0 and 1, where 1 means complete retention and 0 means complete forgetting;

f _t ＝σ ₁ (W _f ·[h _t-1 ,x _t ]+b _f )

i _t ＝σ ₃ (W _i ·[h _t-1 ,x _t ]+b _i )

representing an update of the cell state, using tanh as an activation function,

o _t ＝σ ₃ (W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

after LSTM processing, the codec process is expressed as:

3. A terminal comprising a memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that said processor, when executing said program, implements the steps of the method for detecting a time series data anomaly by a attention mechanism according to any one of claims 1 to 2 in combination with LSTM.