CN115983087A

CN115983087A - Method for detecting time sequence data abnormity by combining attention mechanism and LSTM and terminal

Info

Publication number: CN115983087A
Application number: CN202211203293.4A
Authority: CN
Inventors: 刘慧�; 姜凯; 阮怀军; 赵佳; 周蕊; 梁慧玲
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2022-09-16
Filing date: 2022-09-29
Publication date: 2023-04-18
Anticipated expiration: 2042-09-29
Also published as: CN115983087B

Abstract

The invention provides a method for detecting time sequence data abnormity by combining attention mechanism and LSTM and a terminal, wherein input data X is added into a self-encoder; an attention mechanism is introduced to process the input data X; calculating an abnormal state of the input data X using the reconstruction error; predicting input data through an ALAE model, and detecting data deviating from a normal value; the model calculates the individual abnormal scores of each sensor, combines the abnormal scores into the abnormal score of each time point, and defines whether abnormal data occur at each time point through the abnormal scores. The invention uses a self-encoder framework, and captures the time dependence of the sequence by using an attention mechanism and combining a long and short term recurrent neural network as a component module of an encoder and a decoder; meanwhile, after the attention mechanism is added, the learning ability is still high when the dimensionality of data is high, and the problem that the learning ability is reduced when the dimensionality is too high is solved.

Description

Method for detecting time sequence data abnormity by combining attention mechanism and LSTM and terminal

Technical Field

The invention relates to the technical field of hydropower station abnormal data detection, in particular to a method and a terminal for detecting time sequence data abnormality by combining an attention mechanism and an LSTM.

Background

The time sequence data is a group of data sequences with time characteristics by definition, isolated data values are connected through a time dimension, and therefore state changes of computer software and hardware are revealed, and more valuable information is contained. Various sensors that generate time-characterized data during operation of a hydroelectric power plant are susceptible to anomalies when subjected to external environmental disturbances. According to survey, equipment such as hydropower stations or large internet company servers are shut down, and huge loss is caused every second. Therefore, the method has practical significance for effectively identifying and diagnosing the abnormal value, can find the abnormal data value in time to generate early warning, and is convenient for management personnel to take measures to solve potential problems.

The abnormal values of the hydropower station operation data are generally considered to be values which are significantly deviated from other data, while in the time series data, the data in a new period of time may be considered to be abnormal data in the past time due to the great change of the data with time. Thus anomalies in time-series data are typically defined as data that do not conform to a well-defined normal behavior pattern. Along with the increase of data dimension and the increase of data volume, the abnormality of time series data is difficult to be observed by naked eyes, so that the abnormality detection work for the time series data of the hydropower station causes a very important and difficult problem.

The conventional method is used for detecting abnormal data of the hydropower station and comprises the following steps: determining a reasonable fluctuation range of target detection data through distribution of historical data based on a statistical model, such as an abnormality detection algorithm based on a 3sigma criterion; based on a clustering model, such as a K-means algorithm, clustering normal data to find a boundary between an abnormal value and a normal value for judgment; the classification-based anomaly detection algorithm, such as OC-SVM, makes a determination by finding a hyperplane between normal and abnormal values. Both of these methods show better performance when the amount of data is small and the dimensionality is low. However, with the rapid expansion of data volume and the rising of dimensionality, the methods cannot meet the detection of abnormal data of the hydropower station.

With the improvement of calculation power, various deep learning models are diversified and have excellent performance in data mining. Deep learning can be mainly classified into a supervised method and an unsupervised method in anomaly detection of metadata. The supervised method has improved performance compared with the unsupervised method, but the supervised method needs a large number of available labeled training samples, and the number of abnormal samples in real world data is far smaller than that of normal samples, so the unsupervised method is more practical in practical application. In the unsupervised method, a method based on prediction or data reconstruction is more common, the data is predicted and reconstructed, then the reconstruction error is used for carrying out abnormity scoring, and further the abnormity scoring judges whether the data is abnormal; the improved method is characterized in that data are predicted by mining the relation among data characteristics, so that the learning capacity of a deep learning model when the dimensionality of the data is too high is enhanced, the accuracy of multivariate data prediction is improved, and the performance of the model is improved. The performance of the method of prediction or reconstruction only using the deep learning model is gradually reduced when the data dimension is gradually increased, so when the problem is solved by using the idea of prediction or reconstruction, the influence of the possible interrelation among multiple dimensions on the result must be considered.

The data dimension is a key factor for limiting the abnormality detection performance of the multivariate time series data, and when the data dimension continuously climbs, the existing unsupervised abnormality detection method has insufficient learning capability and can not completely capture the potential complex relation among a plurality of variables, so that the abnormality detection effect is poor.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a method for detecting the time series data abnormity by combining an attention mechanism and an LSTM (least squares TM). In the method, an LSTM-based automatic encoder can learn feature information of a longer sequence, and an attention network makes up for the problem of decline of learning capacity when the dimensionality is too high.

The method for detecting the time series data abnormity by combining the attention mechanism and the LSTM comprises the following steps:

step 1: adding input data X into a self-encoder;

(11) Mapping input data X into variables of an autoencoder model structure, and then reconstructing sequences in a potential space;

(12) Performing compression coding on input data X, and representing high-dimensional input data X by using low-dimensional vectors, so that the compressed low-dimensional vectors can retain the typical characteristics of the input data X;

(13) Reducing reconstruction errors between the input data X and the reconstructed data X' through training;

(14) Compressing and restoring input data X, and extracting key features to enable the compressed and restored data to be close to real data;

step 2: an attention mechanism is introduced to process the input data X;

and step 3: calculating an abnormal state of the input data X using the reconstruction error;

predicting input data through an ALAE model, and detecting data deviating from a normal value;

the model calculates the individual abnormal scores of each sensor, combines the abnormal scores into the abnormal score of each time point, and defines whether abnormal data occur at each time point through the abnormal scores.

It should be further noted that the first step further includes:

inputting input data X into forgetting gate to generate vector f _t Vector f _t Between 0 and 1, where 1 represents complete retention and 0 represents complete forgetting;

f _t ＝σ ₁ (W _f ·[h _t-1 ，x _t ]+b _f )

inputting input data X into an input gate, a vector i in the input gate _t Hidden vector h from the previous cell _t-1 And currently inputted x _t The generation mode is as follows:

i _t ＝σ ₃ (W _i ·[h _t-1 ，x _t ]+b _i )

indicating an update of the cell state, using tanh as an activation function,

forgetting gate and entry gate vectors together determine the cell state C of the current cell _t ，

The cell output o is calculated by the formula _t The output of the hidden variable is determined by the state of the current cell;

o _t ＝σ ₃ (W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

after LSTM processing, the encoding and decoding process is expressed as:

wherein

And b is the weight matrix and offset in the decoder, y _t Is a vector in the potential space; />

And b' is the weight matrix and offset in the decoder, based on the sum of the weights and offsets>

The reconstructed data value of the current timestamp for the model.

It should be further noted that step 2 further includes:

defining the state h = { h) of the input data X at the last time point ₁ ,h ₂ ,...,h _t-1 H, extracting front and back time point vectors v from h _t ；

v _t Is each column h of h _i Weighted sum of v _t Representing information related to a current time step;

let the scoring function be f: R ^m *R ^m → R, calculating the correlation between its input vectors;

front and rear time point vector v _t The calculation is as follows:

the learning ability of the multi-head attention mechanism to the high-dimensional sequence data is finally output as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W

h is the total number of attention heads, each defined as:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein the projection is a parameter matrix

It should be further noted that step 3 further includes:

and (3) comparing the data at t time with the reconstructed data by the abnormal score, and calculating an error value e at the time t:

and (3) normalizing the error value based on an ALAE mode:

therein use is made of

And &>

Median and quartile of the e value at the current time point;

calculating an abnormal score value of the characteristics of each time point, and aggregating the first beta maximum abnormal scores of each time point to obtain an abnormal score value:

searching in the abnormal score interval to obtain a preset model threshold value;

any time point when the abnormal score presets the model threshold value is considered as abnormal when testing.

The invention also provides a self-encoder, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the processor realizes the steps of the method for detecting the time series data exception by combining the attention mechanism and the LSTM when executing the program.

According to the technical scheme, the invention has the following advantages:

the LSTM layer used by the invention is combined with the encoder and the decoder to obtain the model LAE, so as to play a role of attention mechanism on the whole model. According to the attention mechanism and LSTM combined time sequence data anomaly detection method provided by the invention, the attention mechanism and the LSTM are combined and introduced into the self-encoder structure, so that the dependency relationship between characteristic data can be learned, the learning capability is better in data with more data characteristics, and the method is good in anomaly detection of high-dimensional data.

The invention uses a self-encoder framework, and captures the time dependence of the sequence by using an attention mechanism and combining a long and short term recurrent neural network as a component module of an encoder and a decoder; meanwhile, after the attention mechanism is added, the learning ability is still high when the dimensionality of data is high, and the problem that the learning ability is reduced when the dimensionality is too high is solved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for detecting timing data anomalies by attention mechanism in combination with LSTM;

FIG. 2 is a diagram of a deep learning model architecture under the AE architecture in combination with LSTM and attention mechanism;

FIG. 3 is a diagram of the structure of the LSTM;

FIG. 4 is a graph of experimental comparison results for a WADI data set;

FIG. 5 is a graph of the effect of the size of the sliding window on the experimental results;

Detailed Description

The attention mechanism and LSTM combined method for detecting the time sequence data abnormity provided by the invention consists of an encoder and a decoder; the encoder and the decoder are integrated with an attention network and a long-short-term recurrent neural network to detect the abnormality of the multivariate time sequence data. The LSTM-based automatic encoder can learn feature information of a long sequence, and the attention network makes up the problem that the learning capability is reduced when the dimensionality is too high. And comparing the multivariate time sequence data sets such as two water plant sensor data sets SWAT and WADI collected in the real world, mars sensor data sets SMAP and MSL, UCI data sets and the like with an advanced method to verify the performance of the model. Compared with a baseline method, the combined model provided by the invention has better detection performance.

The attention mechanism and the LSTM combined time series data abnormity detection method can acquire and process associated data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application device that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

FIG. 1 is a flow chart showing a preferred embodiment of the method for detecting temporal data anomalies according to the present invention combining the attention mechanism with LSTM. The method for detecting the time series data exception by combining the attention mechanism and the LSTM is applied to one or more terminal machines, wherein the terminal machines are equipment capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and hardware of the terminal machines comprises but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), embedded equipment and the like.

The terminal may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), an interactive Internet Protocol Television (IPTV), and the like.

The Network where the terminal is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

The method for detecting the abnormal time series data by combining the attention mechanism with the LSTM of the invention is described in detail below with reference to FIGS. 1 to 3, and the method for detecting the abnormal time series data by combining the attention mechanism with the LSTM can be applied to analysis of abnormal values of hydropower station operation data, variation trend of the hydropower station operation data, support for evaluating whether the hydropower station operation data is abnormal, and play a positive role in ensuring stable operation of the hydropower station.

Fig. 1 to 3 are flow charts of a method for detecting temporal data anomalies by combining attention mechanism and LSTM according to an embodiment, in which a reconstructed anomaly detection model is constructed.

The concrete mode is as follows: constructing a codec model by combining a long-short-term cyclic neural network (LSTM) and Attention mechanism Attention to detect the abnormality; by learning a deep learning model, the model can reconstruct an abnormal or normal sequence after the encoding and decoding processes. As shown in fig. 2, an attention-based LSTM autoencoder model ALAE is proposed. In the invention, the quantity of the normal data fed back by the system is possibly far greater than that of the abnormal data, and the model is trained by using the normal data by adopting an unsupervised method; the model performance is then tested in the test set with anomalies.

The anomaly detection problem that defines a multivariate time series is: given a multivariate time series x = { x = { x ₁ ,x ₂ ,x ₃ ,...,x _n },x _i ∈R ^m ,(m>1) Vector x in the sequence _i Data represented in the dataset as n variables at a time; m represents the total number of variables in the entity environment. Output target of modelIs an output vector

Computing x through a sliding window _t And/or>

The reconstruction error of (2) performs an anomaly scoring on the data of each timestamp; returning tag y with exception score _i E {0,1}, representing data x at each time point _i The abnormal condition of (2). The method screens out the optimal result by using a dynamic threshold value method in the process of returning the abnormal score to the label; wherein y is _i If =1, the data at that time point is abnormal.

In the ALAE method, a normal time sequence is modeled in a reconstruction mode, and a time sequence x with a window size of n is subjected to _t ＝{x _t-n ,x _t-n+1 ,...,x _t Carry out single step time prediction. The model implements the encoding and decoding process by learning two mapping functions D (X) and E (X). In the first stage of the model, the input time series data is compressed by an encoder, key features of the data are extracted, the data are compressed to a low-dimensional space z, and in the second stage, the data of a potential space are reconstructed by a decoder.

The model training targets are:

AE(x)＝D(z),z＝E(x)

l _AE ＝||x-AE(x)|| ₂

the ALAE model of the invention also comprises two parts of structures of an encoder and a decoder, so that the accurate prediction of data abnormity is realized. For the problem that a standard self-encoder model is difficult to fully learn under high-dimensional data, the ALAE model also comprises an encoder and a decoder, so that the performance of anomaly detection is improved from two aspects by accurately predicting data anomalies. On one hand, the LSTM is added into the structure of the self-encoder, so that the characteristics of input normal data can be identified, and a good reconstruction model is executed to make up the inherent limitation of AE; on the other hand, by using a multi-head attention mechanism and combining the LSTM, the model can still have better capability of reconstructing data on long-sequence and multi-feature data. The model learns in the collected normal data, and the ALAE can learn the potential relation in the multivariate sequence through a codec consisting of the LSTM and the attention mechanism, and has excellent performance in predicting the normal sequence.

Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the specific implementation process in this embodiment, the method includes the following steps:

s101: adding input data X into an autoencoder;

in order to process the time characteristics of sequence data, the encoder and the decoder in the ALAE method both use a long-short-term cyclic neural network LSTM.

The self-encoder model structure maps input data X into a group of latent variables, and then sequences in a latent space are reconstructed; the processing of the data in the encoder is a compression encoding process of the data, and high-dimensional data is represented by low-dimensional vectors, so that the compressed low-dimensional vectors can keep the typical characteristics of the input data, and further normal data can be reconstructed. And the reconstruction error between the input data X and the reconstruction data X' is continuously reduced through training, so that the expected target is accurately predicted. By compressing and restoring the data, extracting key features, the reconstructed data is as close to real data as possible; for long sequence data with temporal characteristics, a simple self-encoder structure is difficult to handle.

The strong learning power of LSTM is used in the ALAE model to learn complex rules between long sequences as part of the encoder and decoder; after the data is input into the model, the data first passes through the forgetting gates of the cells, which determine which information needs to be forgotten from the cell state, and the vector f is generated _t Between 0 and 1, where 1 means "completely reserved" and 0 means "completely forgotten".

f _t ＝σ ₁ (W _f ·[h _t-1 ，x _t ]+b _f )

The data then arrives at the input gate, vector i in the input gate _t Hidden vector h from the last cell _t-1 And currently inputted x _t Generation, which decides which information can be stored in the cell state:

i _t ＝σ ₃ (W _i ·[h _t-1 ，x _t ]+b _i )

an update, representing the state of the cell, typically uses tanh as an activation function,

forgetting gate and input gate vectors together determine the cell state C of the current cell _t ，

Finally, the cell output o of the cell is calculated _t The calculation method is consistent with the calculation methods of the vector of the forgetting gate and the vector of the input gate; the output of the hidden variable is determined by the state of the current cell.

o _t ＝σ ₃ (W _o [h _t-1 ，x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

After LSTM refinement, the codec process can be expressed as:

wherein

And b is the weight matrix and offset in the decoder, y _t Is a potential air spaceVectors in between. />

The reconstructed data value of the current timestamp for the model. After the LSTM is integrated into the self-encoder, the time sequence is used as a part of model learning, and the accuracy of the model in time sequence learning can be remarkably improved.

S102: an attention mechanism is introduced to process the input data X;

the self-encoder improved by the recurrent neural network utilizes the transfer invented from the top to the bottom of the sequence to model the long sequence, and has better learning ability for the univariate time sequence. But there may be complex potential relationships between multi-feature data, and learning ability is not sufficient when modeling time dependence. The method of the invention introduces an attention mechanism to enhance the learning capability of the model on the multi-feature data.

ALAE describes the dependency relationship between the output features and the input features through attention mechanism, the stronger the association degree, the larger the input weight, and influences the expression of the model in the multi-feature sequence through the weight. The method of the present invention first defines the previous state h = { h = ₁ ,h ₂ ,...,h _t-1 H, extracting upper and lower invention vectors v from h _t 。v _t Is each column h of h _i Represents information associated with the current time step. v. of _t H from the current _t Further integrated to produce predictions. Hypothesis scoring function f R ^m *R ^m → R, the function calculates the correlation between its input vectors. Upper and lower invention vector v _t The calculation is as follows:

in the ALAE method, a multi-head attention mechanism is used for enhancing the learning capability of high-dimensional sequence data, and the final output is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W

h is the total number of attention heads, each defined as:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein the projection is a parameter matrix

In the ALAE method, an attention parameter matrix is obtained through learning; the introduction of attention increases the complexity of model learning, but the improvement in the accuracy of the experiment is significant.

S103: calculating an abnormal state of the input data X using the reconstruction error;

the ALAE model detects data deviating from a normal value through predicting data; the model calculates the individual abnormal score of each sensor, and combines the abnormal scores of each time scale, so that whether the time point is abnormal or not is defined by the abnormal scores. And (3) comparing the data of t time with the reconstructed data by the abnormal score, and calculating an error value e at the time t:

since different features may have different characteristics, the error values may also have very different proportions. The ALAE method of the present invention normalizes the error values to prevent errors in the results due to bias:

therein use is made of

And &>

The median and quartile of the e-value current time scale make the anomaly score more robust.

Therefore, the invention obtains an abnormal score value for each characteristic of each time point, introduces a parameter beta, and aggregates the previous beta maximum abnormal scores at each moment to obtain an abnormal score value:

the ALAE method uses a method of threshold search to determine the threshold, where the outlier scores have been normalized within a range, and therefore searches within the interval of outlier scores to obtain the optimal model threshold. Any time the anomaly score exceeds the threshold value at the time of testing is considered an anomaly.

To verify that the method of the present invention performs well in the detection of anomalies in multivariate data, the following examples are presented for purposes of illustration.

The invention uses 5 publicly available data sets to validate the method proposed by the invention, and the statistical data of the 5 data sets are summarized in table 1.

SWAT and WADI data sets are based on a water treatment mode of real construction, and operators simulate attack scenes of real water treatment plants to label the data sets, and represent small-scale versions for modern physical systems. The SWAT data set collection process lasts 11 days, normally runs for 7 days, all normal and abnormal data obtained by 51 sensors are collected and labeled in an attack scene for 4 days, and the whole process launches 41 attacks. The WADI dataset was collected as a water distribution bench extended from the SWAT bench for 16 days, wherein 14 days were normal operation and 2 days were attack scenarios, and data from 123 sensors and actuators were collected.

Soil Moisture Active Passive (SMAP) satellite and Mars Science Laboratory (MSL) rover datasets are two real world public datasets, expert tagged datasets from NASA. They contain data of 55/27 entities, respectively, each monitored by the M =25/55 index.

The Occupancy data set Occupancy is selected from UCI public data sets which are collected for 7 days and accurately detect Occupancy in an office through 6 characteristics of light, temperature, humidity and the like.

Table 1: the baseline dataset, (%) is the percentage of abnormal data points in the dataset.

The method adopts the standard evaluation index precision (Precison), recall rate (Recall) and F1 Score (F1-Score) commonly used in the abnormity detection task to evaluate the performance of the method.

Wherein the F1 score is an evaluation index combining Precision and Recall. An error of this type is considered a False Negative (FN) if the positive class is classified as negative, and a False Positive (FP) if the negative sample is classified as positive by an error. Similarly, true Positives (TP) and True Negatives (TN) can be obtained; the F1 score calculated by the method can meet the requirement of detecting the imbalance of the data set used by the model, and the higher the F1 score is, the better the comprehensive performance of the model is. As the formula, P represents the accuracy rate of the model for judging the abnormity, and the higher the numerical value is, the higher the accuracy rate of the real abnormity is judged to be; r represents the probability of misjudgment of the model, and the larger the numerical value is, the smaller the probability of misclassification abnormality of the model is. In practical application scenarios, real anomalies are generally required to be detected, so the method of the invention focuses more on Precision and F1 scores of the model, and can reflect the comprehensive effect of the model.

The method provided by the invention uses a normal data observation value during training, learns the normal behavior of the sequence, and then detects on a test set. In an actual application scenario, the occurrence of the anomaly is generally continuous, and therefore when an anomaly occurs in one window, we consider that the anomaly occurs in the observation value of the window.

According to the PCA principal component analysis, data are mapped to a low-dimensional projection, and the data are reconstructed through the low-dimensional projection;

KNN: the K nearest neighbor algorithm uses the distance as an abnormal score to judge whether the distance is an abnormal value;

DAGMM: reconstructing data by a compression network consisting of an automatic encoder and an estimation model consisting of a Gaussian mixture model;

LSTM-VAE: replacing a structure of a feedforward network in the VAE using an automatic encoder with LSTM, then reconstructing a time sequence, and performing abnormity judgment through a reconstruction error;

MAD-GAN: application in the GAN framework using LSTM as the base model enables reconstruction of the data.

The results of the experiments based on the mode of the present invention are shown in tables 2 and 3,

table 2: experimental results, (%) is the percentage of abnormal data points in the data set

Table 3: results of the experiment, (%) is the percentage of abnormal data points in the dataset

And (3) comparative experiment analysis: the anomaly detection accuracy, recall rate and F1 score of the proposed method ALAE and baseline method on five datasets, SWAT, WADI, SMAP, UCI and MSL are shown in table 2 and table 3. Each baseline method uses its specific threshold selection method and corresponding F1 score calculation method.

In a real application scenario, the overall performance of the model tends to be more important, so the invention focuses more on the F1 score in the result. The scheme proposed by the invention obtains the highest F1 score on four data sets, namely SWAT, WADI, UCI and SMAP. While in terms of accuracy, the WADI dataset is slightly behind the LSTM-VAE algorithm, and in terms of recall, the SMAP and MSL datasets are slightly behind the MAD-GAN method.

As can be seen from the table, the traditional machine learning algorithm does not perform well on a data set with a large quantity and a high dimensionality, and the deep learning model performs well. The deep learning models such as LSTM, LSTM-VAE and the like based on the recurrent neural network are excellent in the correlation aspect of the invention on long sequence modeling and acquisition time; the DAGMM method is often used for processing multivariate data without time information, and data is not input in a time window, so that the DAGMM method is not suitable for modeling time correlation, and the time correlation is very important for anomaly detection of time series data. In the existing method, in the scheme based on LSTM, such as LSTM-VAE, the modeling of the correlation between latent variables is mostly ignored; the MAD-GAN uses a recurrent neural network, adopts a general antagonism training mode to reconstruct original data, and has insufficient performance in multi-feature sequence learning.

In table 3, ALAE in the experiments with MSL data set, the F1 score was slightly lower than the baseline method; the performance in SMAP is also only 1% higher. According to research, in the NASA data set, the characteristics such as temperature, radiation intensity and the like are related, but the relationship strength is far weaker than that of the relationship between the sensors in the SWAT data set, and the model for mainly learning the potential relationship between the characteristics has no advantage. Therefore, when model learning is performed on such data, the effect of the model of the generation formula such as MAD-GAN is more excellent.

The invention carries out experimental comparison in WADI data set, the abnormity shown in figure 4 is caused by the overflow of the main tank due to the malicious opening of the electric valve MV _001, and the data change is reflected in LT _001 and FIT _ 001. The predicted data value and the observed value of three variables related to the abnormity are shown in the figure, and the range of the data values is within the normal range. The predicted value and the observed value of the data variable MV-001 after model processing have larger errors at this stage, and because the ALAE model uses the largest error in the variables for calculation, a larger abnormal score is generated, and the abnormality can be detected more accurately.

The parameter sensitivity analysis mode of the invention comprises the following steps: the whole variable set is considered at the same time when the data is processed, so that potential interaction between the variables is captured. The invention divides the multivariate data into a plurality of subsequences by using a sliding window, and the sliding window is set to be S ^W The sliding step length is S ^S Then the number of subsequences is

Different encoder lengths will affect the modeling capability of the LSTM, so the sequence length is considered as a hyper-parameter in the method of the invention. The present invention uses different sequence lengths in experiments to determine the best model performance.

Fig. 5 uses a WADI dataset with a higher data dimension to verify the influence of the size of the sliding window on the experimental result, and it can be found from fig. 5 that when the size of the sliding window changes, the influence on the experimental result is larger, and as the size of the sliding window increases, the accuracy of the model increases continuously; in the five points shown in the figure, the F1 value is the largest when the sliding window is 20, and the model has the best performance.

Table 4: models of ALAE and variants thereof Exception detection Performance on SWAT, WADI and UCI datasets

To illustrate the importance of each component of ALAE in the method of the invention, the invention performed ablation experiments as shown in Table 4. The invention only uses the LSTM layer to add the mode of combining the encoder and the decoder to obtain the model LAE, so as to illustrate the effect of the attention mechanism on the whole model. From table 4, it can be seen that the ALAE method with the attention mechanism added performs more excellently; in the SWAT data set and the WADI data set with more features, the performance is more obvious, in the UCI data set with less features, the performance of the model is reduced in a small range, and it can be found that the learning capability of the model in the multi-feature data can be enhanced in the whole model learning by paying attention to a force mechanism. The LSTM and attention mechanism play an important role in the structure of the whole model and make a more reasonable explanation for the model in comparison of a baseline method.

According to the attention mechanism and LSTM combined time sequence data anomaly detection method provided by the invention, the attention mechanism and the LSTM are combined and introduced into the self-encoder structure, so that the dependency relationship between characteristic data can be learned, the learning capability is better in data with more data characteristics, and the method is good in anomaly detection of high-dimensional data. In the five data sets used in the experiments, the method ALAE proposed by the present invention performed slightly better than the baseline method.

The method for detecting time series data abnormity by combining attention mechanism and LSTM is that the units and algorithm steps of each example described in connection with the embodiment disclosed in the invention can be realized by electronic hardware, computer software or the combination of the two, and in order to clearly illustrate the interchangeability of hardware and software, the components and steps of each example have been generally described according to functions in the above description. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Those skilled in the art will appreciate that aspects of the present invention, which provide an attention mechanism in combination with an LSTM method of detecting temporal data anomalies, may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.), or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

The attention mechanism provided by the present invention in combination with the LSTM method of detecting temporal data anomalies may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for detecting time series data abnormity by combining an attention mechanism and an LSTM is characterized by comprising the following steps:

step 1: adding input data X into an autoencoder;

step 2: an attention mechanism is introduced to process the input data X;

2. The method of claim 1 for detecting temporal data anomalies in conjunction with an LSTM,

the first step further comprises the following steps:

inputting input data X into forgetting gate to generate vector f _t Vector f _t Between 0 and 1, where 1 means complete retention and 0 means complete forgetting;

f _t ＝σ ₁ (W _f ·[h _t-1 ，x _t ]+b _f )

i _t ＝σ ₃ (W _i ·[h _t-1 ，x _t ]+b _i )

indicating an update of the cell state, using tanh as an activation function,

Calculating the cell output o by _t The output of the hidden variable is determined by the state of the current cell;

o _t ＝σ ₃ (W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

after LSTM processing, the encoding and decoding process is expressed as:

wherein

And b is the weight matrix and offset in the decoder, y _t Is a vector in the potential space;

The reconstructed data value of the current timestamp for the model.

3. The method of claim 1 for detecting temporal data anomalies in conjunction with an LSTM,

the step 2 further comprises:

defining the state h = { h) of the input data X at the last time point ₁ ,h ₂ ,...,h _t-1 H, extracting a front time point vector v and a rear time point vector v from h _t ；

front and rear time point vector v _t The calculation is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W

h is the total number of attention heads, each defined as:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein the projection is a parametric matrix

4. The attention mechanism in combination with an LSTM to detect temporal data anomalies method of claim 1,

step 3 also includes:

and (3) comparing the data of t time with the reconstructed data by the abnormal score, and calculating an error value e at the time t:

and (3) normalizing the error value based on an ALAE mode:

therein use is made of

And &>

E is the median and quartile of the current time point;

5. A terminal comprising a memory, a processor and a computer program stored on said memory and executable on said processor, wherein said processor when executing said program performs the steps of the method of detecting temporal data anomalies in combination with an LSTM according to any one of claims 1 to 4.