CN116361640A - Multi-variable time sequence anomaly detection method based on hierarchical attention network - Google Patents

Multi-variable time sequence anomaly detection method based on hierarchical attention network Download PDF

Info

Publication number
CN116361640A
CN116361640A CN202310024568.6A CN202310024568A CN116361640A CN 116361640 A CN116361640 A CN 116361640A CN 202310024568 A CN202310024568 A CN 202310024568A CN 116361640 A CN116361640 A CN 116361640A
Authority
CN
China
Prior art keywords
sequence
variable
graph
variables
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310024568.6A
Other languages
Chinese (zh)
Inventor
栾宁
张震宇
赵琳
冯曙明
曹杰
王惠
陶海成
缪佳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Finance and Economics
Jiangsu Electric Power Information Technology Co Ltd
Original Assignee
Nanjing University of Finance and Economics
Jiangsu Electric Power Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Finance and Economics, Jiangsu Electric Power Information Technology Co Ltd filed Critical Nanjing University of Finance and Economics
Priority to CN202310024568.6A priority Critical patent/CN116361640A/en
Publication of CN116361640A publication Critical patent/CN116361640A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-variable time sequence abnormality detection method based on a hierarchical attention network, which comprises the steps of firstly extracting time and sequence characteristics by adopting a Bi-directional gating circulation unit Bi-GRU, then constructing variables and time sequences into a similar graph by adopting a graph attention network GAN, wherein nodes of the graph represent variables in the sequence, edges of the graph represent the relationship among the variables, constructing a first graph attention layer on the graph, extracting characteristic representation of the relationship among different variables, namely variable learning, constructing a second graph attention layer on the graph, learning interactions between the variables and the sequences, namely sequence learning, and finally reconstructing the time sequence by adopting an automatic encoder, and calculating a loss value to detect abnormal sequences. The method can effectively detect the abnormality in the multivariate time series, and the experimental result is superior to the current most advanced method.

Description

Multi-variable time sequence anomaly detection method based on hierarchical attention network
Technical Field
The invention relates to the field of artificial intelligence, in particular to a multivariate time sequence anomaly detection method based on a hierarchical attention network.
Background
Anomaly detection aims at identifying data records or events that are significantly different from other normal data, and in recent years, many researchers have made a lot of research work on anomaly detection in the fields of network security, fraud detection, and the like. The abnormality detection is generally related to time-series data, and the conventional abnormality detection method based on univariate time-series ignores the relationship between different variables, whereas the abnormality detection method based on multivariate time-series (MultivariateTimeSeries, MTS) can sufficiently capture the time correlation.
Anomaly detection in multivariate time series plays an important role in most physical systems in the real world, such as smart grids. Analysis of the relationship between different variables based on MTS data is important for detecting anomalies. With the progress of technology, MTS data presents characteristics of high dimension, complexity, dependency among variables and the like, and challenges are brought to anomaly detection. The method such as distance, linear model, probability and density estimation cannot meet the requirement of high-dimensional complex multivariate time sequence anomaly detection, the characteristic representation capability of time sequence data can be greatly improved by adopting a deep learning method, and most of the existing methods adopt a graph neural network to learn the relationship among variables, but the method often ignores the fact that the variable relationship is different in different sequences, namely the relationship among the variables has dynamic property and evolves along with time. For example, in a smart grid, three sensors, namely a voltage sensor, a temperature sensor and a current sensor, are provided to monitor the grid operating conditions. Normally, three sensors follow the same trend, i.e. the temperature always rises with increasing voltage or current, but some specific cases, while violating such trend, cannot be determined as abnormal, i.e. when the outdoor temperature is high, the temperature in the factory is manually lowered, which is obviously a common problem in real-world applications: how to capture time dependencies in different time series and integrate their relationships with different variables.
Patent 202210819480.9, 202210351790.2 proposes to detect abnormal sequences based on a graph neural network, but only considers the static correlation that may exist between variables, ignoring that the relationship between variables is different in different time sequences. Patent 202210042038X proposes that based on time series detection of the generation countermeasure network, correlation features between time series can be extracted, but that a global dependency is obtained by using a full connection layer, but that correlation between variables is ignored. Methods such as multi-variable time sequence anomaly detection based on inter-level measurement and time embedding proposed by Li et al, multi-variable time sequence anomaly detection based on a graph neural network proposed by Deng et al, multi-variable time sequence anomaly detection based on a graph attention network proposed by Zhao et al are all implicit or explicit modeling on the relation between different variables, only static correlation between different variables is studied, and the dynamic property of the variable relation is ignored.
Disclosure of Invention
The invention aims to provide a multivariate time sequence abnormality detection method (Hierarchical Attention Networks for Context Anomaly Detection is abbreviated as HAN-CAD) based on a hierarchical attention network, which is used for extracting dynamic characteristics of variables in different time sequences based on the hierarchical attention network, improving MTS abnormality detection efficiency, effectively detecting abnormalities in the multivariate time sequence and enabling experimental results to be superior to the current most advanced method.
The invention adopts the following technical scheme:
a multi-variable time sequence anomaly detection method based on a layered attention network firstly adopts a Bi-directional gating circulation unit Bi-GRU to extract characteristics of time and sequence, then adopts a graph attention network GAN to construct a similar graph, nodes of the graph represent variables in the sequence, edges of the graph represent relations among the variables, a first graph attention layer is constructed on the graph, characteristic representation of the relation among different variables, namely variable learning, is extracted, a second graph attention layer is constructed on the graph, interaction between the learning variables and the sequence, namely sequence learning, finally adopts an automatic encoder to reconstruct a time sequence, calculates loss values, and performs experimental verification on a real data set.
The method comprises the following specific steps:
step 1, data definition: defining a set of multivariate time series data
Figure BDA0004044211970000021
Step 2 Feature Learning (Feature Learning): extracting relevant time features v of variable and time series using Bi-gating cyclic units (Bidirectional Gated Recurrent Unit, bi-GRU) i
Step 3 Variable-level Learning: obtaining the relationship alpha between different variables using a graph attention network (Graph Attention Network, GAN) ij
Step 4 Sequence-level Learning: and learning the evolution relation between the variable and the time sequence by adopting an attention mechanism, and acquiring the characteristic vector s of the sequence.
Step 5 Reconstruction-based Detection method (Reconstruction-based Detection): reconstructing sequence X to be an Automatic Encoder (AE)
Figure BDA0004044211970000022
Thereby detecting an abnormal sequence.
Further, in the step 1,
Figure BDA0004044211970000023
the observation values of d variables are represented by the variable characteristics when the time step is t, N represents the maximum length of the time step, and the normal sequence is detected>
Figure BDA0004044211970000031
If there is abnormality, i is not less than 1 and not more than N, j is not less than 1 and not more than N.
In step 2, the variable and the time series related time feature v are extracted i The method comprises the following steps:
1) Taking sequence X of length L L ∈R d×L As the input of the feature learning model, L is more than or equal to 1 and less than or equal to N, and vector v i And s are respectively taken as the variable i and the characteristic output of the sequence;
2) The Bi-gate control circulation unit Bi-GRU network model is adopted to obtain the time dependence in the time sequence, extract the learning variable and the sequence characteristic, and enable
Figure BDA0004044211970000032
For feature representation of variable initialization, wherein the variable i comprises continuous L observed values (such as running values of hardware devices such as a CPU, a memory, a network and the like in a smart grid server in a continuous reading time step), the feature values are updated by adopting a nonlinear conversion method, and a calculation formula is as follows:
Figure BDA0004044211970000033
Figure BDA0004044211970000034
Figure BDA0004044211970000035
Figure BDA0004044211970000036
wherein the method comprises the steps of
Figure BDA0004044211970000037
And W is i All represent training weight matrix,/->
Figure BDA0004044211970000038
Model output representing variable i at time step t-1,/o>
Figure BDA0004044211970000039
Representing the output value of variable i calculated by the reset gate neurons of Bi-GRU at time step t,/>
Figure BDA00040442119700000310
Output value of update gate neuron calculation representing variable i through BGRU at time step t, +.>
Figure BDA00040442119700000311
Figure BDA00040442119700000312
Representing the candidate hidden value of the full connection layer calculated by using the activation function tanh when the variable i is t in time step, +.>
Figure BDA00040442119700000313
Discarding the hidden value of the previous time step t-1 if the candidate hidden value is close to 0, and retaining the hidden value of the previous time step t-1 if the candidate hidden value is close to 1, < >>
Figure BDA00040442119700000314
Representing the forward result output of Bi-GRU model, < >>
Figure BDA00040442119700000315
The reverse result output of the Bi-GRU model is represented; the final variable i is characterized as:
Figure BDA00040442119700000316
in step 3, the relation alpha between different variables is obtained ij The method comprises the following steps:
1) Performing relationship modeling by adopting a graph attention network GAN to obtain the characteristics of the updated variables; firstly, constructing a similarity graph between different variables, namely a variable graph G= { V, E }, wherein nodes V and edges E of the graph respectively represent the variables and the relations between the variables, and a node set V= { V 1 ,v 2 ,…,v d The node characteristics in the node set V are variable characteristics extracted by the Bi-GRU, namely { V } 1 ,v 2 ,…,v d -a }; the similarity between variables is calculated as follows:
Figure BDA0004044211970000041
the calculation results are sorted according to descending order;
2) Selecting the first k similar pairs as edges, and modeling the relation between variables by adopting a graph attention mechanism so as to learn the characteristics of the variables;
3) Node characteristic v with stronger expression capacity is extracted by adopting multi-head attention mechanism i The calculation formula is as follows:
Figure BDA0004044211970000042
wherein H represents the number of heads, H is more than or equal to 1 and less than or equal to H, N i Representing a set of fields, α, for node i ij In order to normalize the attention weight of the neighbor node of node i using the softmax function, specifically the attention value representing the contribution of neighbor node j to node i,
Figure BDA0004044211970000043
representing the first k attention weights, node j is one of the neighbors of node i, +.>
Figure BDA0004044211970000044
Representing the first k AND nodes v i Is connected withNode v j Weight sum, alpha of features ij The calculation formula is as follows:
Figure BDA0004044211970000045
Figure BDA0004044211970000046
wherein r is ij Representing the dependency of node j on node i,
Figure BDA0004044211970000047
representing concatenation, W r Training weights are represented, leakyReLU is a nonlinear conversion activation function, and d is the number of nodes, namely d variables in step 1.
In step 4, the step of obtaining the feature vector s of the sequence is as follows:
1) Learning interactions between variables and sequences using an attention mechanism, sequence attention weights beta j The calculation formula is as follows:
m j =LeakyReLU(W m (v j ))
Figure BDA0004044211970000048
wherein m is j Values representing neighbor nodes j non-linearly transformed with the LeakyReLU function, s representing eigenvectors of the sequence, normalized calculated with the softmax function, W m Representing training parameters;
2) Updating sequence characteristics, wherein the calculation formula is as follows:
Figure BDA0004044211970000051
in step 5, the reconstructed sequence X is
Figure BDA0004044211970000055
The method comprises the following steps:
2) According to the hierarchical attention process of steps 1-4, the sequence x= { X is obtained 1 ,x 2 ,…,x L };
2) Reconstructing sequence X using an automatic encoder to let f e (. Cndot.) represents the code, f d (. Cndot.) represents decoding, for a feature vector s of sequence X, the encoding process maps s to an implicit representation z, and the decoding process maps z to a reconstructed representation z
Figure BDA0004044211970000052
The encoding and decoding calculation formula is as follows:
z=f e (s',W e )
Figure BDA0004044211970000053
wherein W is e And W is d Is a training parameter;
3) The Loss function Loss is calculated as follows:
Figure BDA0004044211970000054
wherein I II 2 Representing iota 2 And if the loss value after reconstruction is larger than a certain threshold value, regarding that the sequence is abnormal, and continuously adjusting the threshold value to obtain the maximum F1 comprehensive value.
The invention has the following characteristics:
(1) In order to fully utilize the information of the previous time step (forward direction) and the later time step (reverse direction) and obtain the time dependence in the sequence, the invention adopts a Bi-directional gating circulating unit network Bi-GRU, thereby better extracting the variable and the sequence characteristics.
(2) Learning the characteristics of variables in a multivariate time series alone often does not adequately capture the characteristics of the sequence anomalies, and furthermore the relationships between the variables can reveal different time-dependent patterns. Therefore, in order to better detect the abnormality in the sequence, the invention adopts the graph attention network GAN to model the relationship among the variables, analyzes the mutual influence among the variables and integrates the variable characteristics.
(3) The relationship between variables is not stable, always evolving over time, and anomalies of strongly correlated variables can vary significantly in different time sequences. Whereas previous studies would treat the variable and sequence equally and assign them the same weight, this is not indicative of the effect of the sequence on the variable. To obtain time series dependencies, the present invention learns the interactions between variables and sequences using another attentional mechanism.
Drawings
FIG. 1 is a HAN-CAD framework of the detection method proposed by the present invention;
FIG. 2 shows the F1 values of the HAN-CAD method and the MTAD-GAT, GDN, interFusion method under different sliding windows;
FIG. 3 shows F1 values of different edge ratios of MTAD-GAT, GDN and HAN-CAN in three data sets according to the method of the present invention;
Detailed Description
The technical results of the present invention will be described in detail below with reference to the accompanying drawings. For a clearer description of embodiments of the invention or of solutions in the prior art, it is evident that the figures in the following description are only some embodiments of the invention, from which other solutions can be obtained for a person skilled in the art without inventive effort.
A multi-variable time sequence anomaly detection method based on a layered attention network firstly adopts a Bi-directional gating circulation unit Bi-GRU to extract characteristics of time and sequence, then adopts a graph attention network GAN to construct a similar graph, nodes of the graph represent variables in the sequence, edges of the graph represent relations among the variables, a first graph attention layer is constructed on the graph, characteristic representation of the relation among different variables, namely variable learning, is extracted, a second graph attention layer is constructed on the graph, interaction between the learning variables and the sequence, namely sequence learning, finally adopts an automatic encoder to reconstruct a time sequence, calculates loss values, and performs experimental verification on a real data set. The method comprises the following steps:
step 1: data definition
Defining a set of multivariate time series data of length N
Figure BDA0004044211970000061
The observed value of d variables is represented by variable characteristics when the time step is t, N represents the maximum length of the time step, and the normal sequence is detected
Figure BDA0004044211970000062
If there is abnormality, i is not less than 1 and not more than N, j is not less than 1 and not more than N. For example, when i=1, j=4, d=1 (representing 1 variable of current, and when d=2, 2 variables of current and voltage may be represented), the following is preferable>
Figure BDA0004044211970000063
Representing the observed value of the current variable, i.e. the characteristic value, between time step 1 and time step 4.
Step 2: feature learning
1) Taking sequence X of length L L ∈R d×L As the input of the feature learning model, L is more than or equal to 1 and less than or equal to N, and vector v i And s are respectively taken as the variable i and the characteristic output of the sequence;
2) The Bi-gate control circulation unit Bi-GRU network model is adopted to obtain the time dependence in the time sequence, extract the learning variable and the sequence characteristic, and enable
Figure BDA0004044211970000071
For feature representation of variable initialization, wherein the variable i comprises continuous L observed values (such as running values of hardware devices such as a CPU, a memory, a network and the like in a smart grid server in a continuous reading time step), the feature values are updated by adopting a nonlinear conversion method, and a calculation formula is as follows:
Figure BDA0004044211970000072
Figure BDA0004044211970000073
Figure BDA0004044211970000074
Figure BDA0004044211970000075
in the formulas (1), (2), (3) and (4),
Figure BDA0004044211970000076
and W is i All represent training weight matrix,/->
Figure BDA0004044211970000077
The model output at time step t-1 represents the variable i. />
Figure BDA0004044211970000078
Representing the output value of variable i calculated by the reset gate neurons of Bi-GRU at time step t,/>
Figure BDA0004044211970000079
Output value of update gate neuron calculation representing variable i through BGRU at time step t, +.>
Figure BDA00040442119700000710
Figure BDA00040442119700000711
When the time step of the variable i is t, the candidate hidden value of the full-connection layer calculated by adopting the activation function tanh is represented,
Figure BDA00040442119700000712
if the candidate concealment value is close to 0, the concealment value of the previous time step t-1 is discarded, and if it is close to 1, the concealment value of the previous time step t-1 is retained. />
Figure BDA00040442119700000713
Representing the forward result output of Bi-GRU model, < >>
Figure BDA00040442119700000714
And (5) representing the reverse result output of the Bi-GRU model. Thus, the final variable i is characterized as:
Figure BDA00040442119700000715
step 3: variable learning
1) And carrying out relational modeling by adopting a graph attention network GAN to obtain the characteristics of the updated variables. Firstly, constructing a similarity graph between different variables, namely a variable graph G= { V, E }, wherein nodes V and edges E of the graph respectively represent the variables and the relations between the variables, and a node set V= { V 1 ,v 2 ,…,v d The node characteristics in the node set V are the variable characteristics extracted by the Bi-GRU in the step 2, namely { V } 1 ,v 2 ,…,v d }. The similarity between variables is calculated as follows:
Figure BDA00040442119700000716
the calculation results are sorted in descending order.
2) And selecting the first k similar pairs as edges, and modeling the relation between variables by adopting a graph attention mechanism, so as to learn the characteristics of the variables.
3) Node characteristic v with stronger expression capacity is extracted by adopting multi-head attention mechanism i The calculation formula is as follows:
Figure BDA0004044211970000081
in the formula (7), H represents the number of heads, H is 1.ltoreq.h, N i Representing a set of fields, α, for node i ij Annotating normalized computation for neighbor nodes of node i using softmax functionThe attention weight, in particular the attention value representing the contribution of the neighbor node j to node i,
Figure BDA0004044211970000082
representing the first k attention weights, node j is one of the neighbors of node i,
Figure BDA0004044211970000083
representing the first k AND nodes v i Connection node v j Weight sum, alpha of features ij The calculation formula is as follows:
Figure BDA0004044211970000084
Figure BDA0004044211970000085
in the formulas (8), (9), r ij Representing the dependency of node j on node i,
Figure BDA0004044211970000086
representing concatenation, W r Training weights are represented, leakyReLU is a nonlinear conversion activation function, and d is the number of nodes, namely d variables in step 1.
Step 4: sequence learning
1) Learning interactions between variables and sequences using an attention mechanism, sequence attention weights beta j The calculation formula is as follows:
m j =LeakyReLU(W m (v j )) (10)
Figure BDA0004044211970000087
in the formulas (10) and (11), m j Values representing neighbor nodes j non-linearly transformed with the LeakyReLU function, s representing eigenvectors of the sequence, normalized calculated with the softmax function, W m Representing training parametersA number.
2) Updating sequence characteristics, wherein the calculation formula is as follows:
Figure BDA0004044211970000088
step 5: detection method based on reconstruction
3) According to the hierarchical attention procedure of steps 1 to 4, the sequence x= { X can be obtained 1 ,x 2 ,…,x L }。
2) Reconstructing sequence X using an automatic encoder to let f e (. Cndot.) represents the code, f d (. Cndot.) represents decoding, for a feature vector s of sequence X, the encoding process maps s to an implicit representation z, and the decoding process maps z to a reconstructed representation z
Figure BDA0004044211970000091
The encoding and decoding calculation formula is as follows:
z=f e (s',W e ) (13)
Figure BDA0004044211970000092
in formulas (13) and (14), W e And W is d Is a training parameter.
3) The Loss function Loss is calculated as follows:
Figure BDA0004044211970000093
in equation (15), I I.I 2 Representing iota 2 Norm, if the loss value after reconstruction is greater than a certain threshold, the sequence can be considered to be abnormal, and the threshold is continuously adjusted to obtain the maximum F1 comprehensive value.
The validity of the proposed method is verified based on the actual data.
The experiment uses 3 multivariate time series anomaly detection datasets: SMD, WADI and ASD, wherein the SMD data set address is https:// gitsub.com/NetManAIOps/OmniAnomaly, WADI data set address is https:// iturst.sutd.edu. Sg/iturst-labs_datasets/dataset_info/, the ASD data set address is https:// gitsub.com/zhhlee/InterFusion, SMD and WADI is a commercial experimental data set commonly used for multivariate time series anomaly detection, and ASD is a new data set from a large Internet company. The data sample information for each data set is shown in table 1.
TABLE 1 three data set sample information
Data set ASD SMD WADI
Feature number 19 38 112
Training sample number 8640 28479 335999
Number of test samples 4320 28479 172801
Abnormal sample ratio (%) 3.40 5.84 5.85
In the experiment, the sliding window lengths of ASD, SMD and WADI were set to 100, 100 and 30, respectively, model parameters were optimized using Adam optimizer, learning rate was set to 5e-4, variable and sequence representation length was 64, dropout algorithm was used to prevent the training results from being over-fitted, dropout probability was set to 0.2, indicating that dropout algorithm randomly lost some neurons of the training model with 20% probability, even if neurons were inactive, the head number of the multi-head attention mechanism was 2. All experimental data are trained on a Microsoft server, the CPU main frequency of the Microsoft server is 3.60GHz, the model is Intel I9-9900k, the GPU memory is 11GB, and the model of a display card chip is NvidiaGeForceRTX2080Ti.
In order to verify the superiority of HAN-CAD of the method provided by the invention, five newly proposed MTS abnormality detection methods of LSTM-AE, MAD-GAN and MTAD-GAT, GDN, interFusion are selected and compared with the detection method of HAN-CAD provided by the invention in terms of accuracy, recall and F1 value, the experimental results are shown in table 2, the experimental results show that the detection method of HAN-CAD provided by the invention shows better results in terms of accuracy and recall, the F1 value is the largest, and the table 2 also shows that 1) the training results are sequentially from top to bottom, namely the method of the invention, interfusion, GDN, MTAD-GAT, MAD-GAN and LSTM-AE, wherein the detection method of HAN-CAD, interFusion, GDN, MTAD-GAT provided by the invention is superior to the traditional reconstruction method. 2) All detection methods have poorer results on the WADI data set than the results on the other two data sets because the WADI data set contains 112 variables, complex relationships exist among the variables, and the anomaly detection is difficult, but the method HAN-CAD provided by the invention has the optimal performance, and the method HAN-CAD provided by the invention is verified to be capable of acquiring more complex variable relationships, namely dynamic relationships among variables in different sequences.
TABLE 2 LSTM-AE, MAD-GAN, MTAD-GAT, GDN, interFusion, accuracy, recall, F1 values of HAN-CAD of the proposed method
Figure BDA0004044211970000101
Figure BDA0004044211970000111
FIG. 2 shows the comparison of F1 values of HAN-CAD and MTAD-GAT, GDN, interFusion according to the method proposed by the present invention under different sliding windows, wherein in FIG. 2, the left side 1 is the MTAD-GAT method, the left side 2 is the GDN method, the left side 3 is the Interfusion method, and the left side 4 is the HAN-CAD method. FIG. 2 shows that the HAN-CAD method of the present invention always performs best and stably on 3 data sets, and the HAN-CAD method of the present invention obtains the highest F1 value when the sliding window of the time length is 100. The experimental results of the other three methods have certain fluctuation, which indicates that the MTS abnormality detection based on the graph neural network can be more robust by integrating the relationship among variables and the time sequence.
Fig. 3 shows F1 values of different edge ratios of MTAD-GAT, GDN and the proposed method HAN-CAN on three data sets, and fig. 3 shows that the proposed method HAN-CAN is better than the graph neural network method in MTAD-GAT, GDN2, and in addition, the MTAD-GAT and GDN methods perform worse when the number of edges of the graph is less, whereas the anomaly checking method based on the graph neural network has few extracted nonlinear structural features in a sparse graph with a smaller number of edges.
Table 3 shows a comparison of the accuracy, recall and F1 values of the HAN-CAD method, the w/o feature learning method, the HAN-CAD method without using Bi-GRU for feature learning, and the w/o variable learning method without using GAN for variable learning, which are presented in the present invention, over 3 data sets, and Table 3 shows that if Bi-GRU and GAN are not used, experimental result values are relatively low, illustrating that the graph attention mechanism is very important for feature learning because the graph attention mechanism can extract complex relationships between variables.
TABLE 3 accuracy, recall and F1 values for three methods of HAN-CAD, w/o feature learning, w/o variable learning
Figure BDA0004044211970000112
Figure BDA0004044211970000121
The method is based on a hierarchical attention network, extracts dynamic characteristics of variables in different time sequences, improves MTS abnormality detection efficiency, can effectively detect abnormalities in a multi-variable time sequence, and has experimental results superior to the current most advanced method.

Claims (7)

1. A multi-variable time sequence anomaly detection method based on a hierarchical attention network is characterized by comprising the following steps of: firstly, extracting time and sequence characteristics by adopting a Bi-gate control circulation unit Bi-GRU; then, a graph attention network GAN is adopted to construct a similar graph, nodes of the graph represent variables in the sequence, and edges of the graph represent relations among the variables; constructing a first graph annotation force layer on a graph, and extracting characteristic representations of the relationships among different variables, namely variable learning; constructing a second graph annotation layer on the graph, wherein the interaction between the learning variable and the sequence is sequence learning; and finally, reconstructing a time sequence by adopting an automatic encoder, calculating a loss value, and performing experimental verification on a real data set.
2. The hierarchical attention network based multivariate time series anomaly detection method of claim 1, comprising the steps of:
step 1, data definition: defining a set of multivariate time series data
Figure FDA0004044211960000011
Step 2, feature learning: using bi-directional gatingRing unit extracts the relevant temporal features v of the variable and the time series i
Step 3, variable learning: obtaining relationships alpha between different variables using a graph attention network ij
Step 4, sequence learning: learning an evolution relation between a variable and a time sequence by adopting an attention mechanism, and acquiring a characteristic vector s of the sequence;
step 5, a detection method based on reconstruction: reconstructing sequence X into sequence X by using automatic encoder
Figure FDA0004044211960000012
Thereby detecting an abnormal sequence.
3. The hierarchical attention network based multivariate time series anomaly detection method of claim 2, wherein: in the step (1) of the process,
Figure FDA0004044211960000013
the observation values of d variables are represented by the variable characteristics when the time step is t, N represents the maximum length of the time step, and the normal sequence is detected>
Figure FDA0004044211960000014
Figure FDA0004044211960000015
If there is abnormality, i is not less than 1 and not more than N, j is not less than 1 and not more than N.
4. The hierarchical attention network based multivariate time series anomaly detection method of claim 2, wherein: in step 2, the variable and the time series related time feature v are extracted i The method comprises the following steps:
1) Taking sequence X of length L L ∈R d×L As the input of the feature learning model, L is more than or equal to 1 and less than or equal to N, and vector v i And s are respectively taken as the variable i and the characteristic output of the sequence;
2) Bi-GRU network adopting bidirectional gating circulation unitThe model acquires time dependence in a time sequence, extracts a learning variable and sequence characteristics, and enables
Figure FDA0004044211960000016
For the characteristic representation of variable initialization, wherein the variable i comprises L continuous observed values, the characteristic value is updated by adopting a nonlinear conversion method, and the calculation formula is as follows:
Figure FDA0004044211960000021
Figure FDA0004044211960000022
Figure FDA0004044211960000023
Figure FDA0004044211960000024
wherein the method comprises the steps of
Figure FDA0004044211960000025
And W is i All represent training weight matrix,/->
Figure FDA0004044211960000026
Model output representing variable i at time step t-1,/o>
Figure FDA0004044211960000027
Representing the output value of variable i calculated by the reset gate neurons of Bi-GRU at time step t,/>
Figure FDA0004044211960000028
Representing the passage of a variable i at a time step tUpdating the output value calculated by the gate neurons of the BGRU,>
Figure FDA0004044211960000029
Figure FDA00040442119600000210
Figure FDA00040442119600000211
representing the candidate hidden value of the full connection layer calculated by using the activation function tanh when the variable i is t in time step, +.>
Figure FDA00040442119600000212
Discarding the hidden value of the previous time step t-1 if the candidate hidden value is close to 0, and retaining the hidden value of the previous time step t-1 if the candidate hidden value is close to 1, < >>
Figure FDA00040442119600000213
Representing the forward result output of Bi-GRU model, < >>
Figure FDA00040442119600000214
The reverse result output of the Bi-GRU model is represented; the final variable i is characterized as:
Figure FDA00040442119600000215
5. the hierarchical attention network based multivariate time series anomaly detection method of claim 2, wherein: in step 3, the relation alpha between different variables is obtained ij The method comprises the following steps:
1) Performing relationship modeling by adopting a graph attention network GAN to obtain the characteristics of the updated variables; firstly, constructing a similarity graph between different variables, namely a variable graph G= { V, E }, wherein nodes V and edges E of the graph respectively represent the variables and the relations between the variables, and a node set V= { V 1 ,v 2 ,…,v d The node characteristics in the node set V are variable characteristics extracted by the Bi-GRU, namely { V } 1 ,v 2 ,…,v d -a }; the similarity between variables is calculated as follows:
Figure FDA00040442119600000216
the calculation results are sorted according to descending order;
2) Selecting the first k similar pairs as edges, and modeling the relation between variables by adopting a graph attention mechanism so as to learn the characteristics of the variables;
3) Node characteristic v with stronger expression capacity is extracted by adopting multi-head attention mechanism i The calculation formula is as follows:
Figure FDA0004044211960000031
wherein H represents the number of heads, H is more than or equal to 1 and less than or equal to H, N i Representing a set of fields, α, for node i ij In order to normalize the attention weight of the neighbor node of node i using the softmax function, specifically the attention value representing the contribution of neighbor node j to node i,
Figure FDA0004044211960000032
representing the first k attention weights, node j is one of the neighbors of node i, +.>
Figure FDA0004044211960000033
Representing the first k AND nodes v i Connection node v j Weight sum, alpha of features ij The calculation formula is as follows:
Figure FDA0004044211960000034
Figure FDA0004044211960000035
wherein r is ij Representing the dependency of node j on node i,
Figure FDA0004044211960000036
representing concatenation, W r Training weights are represented, leakyReLU is a nonlinear conversion activation function, and d is the number of nodes.
6. The hierarchical attention network based multivariate time series anomaly detection method of claim 2, wherein: in step 4, the step of obtaining the feature vector s of the sequence is as follows:
1) Learning interactions between variables and sequences using an attention mechanism, sequence attention weights beta j The calculation formula is as follows:
m j =LeakyReLU(Wm(v j ))
Figure FDA0004044211960000037
wherein m is j Values representing neighbor nodes j non-linearly transformed with the LeakyReLU function, s representing eigenvectors of the sequence, normalized calculated with the softmax function, W m Representing training parameters;
2) Updating sequence characteristics, wherein the calculation formula is as follows:
Figure FDA0004044211960000038
7. the hierarchical attention network based multivariate time series anomaly detection method of claim 2, wherein: in step 5, the reconstructed sequence X is
Figure FDA0004044211960000039
The method comprises the following steps:
1) According to the hierarchical attention process of steps 1-4, the sequence x= { X is obtained 1 ,x 2 ,…,x L };
2) Reconstructing sequence X using an automatic encoder to let f e (. Cndot.) represents the code, f d (. Cndot.) represents decoding, for a feature vector s of sequence X, the encoding process maps s to an implicit representation z, and the decoding process maps z to a reconstructed representation z
Figure FDA0004044211960000041
The encoding and decoding calculation formula is as follows:
z=f e (s’,W e )
Figure FDA0004044211960000042
wherein W is e And W is d Is a training parameter;
3) The Loss function Loss is calculated as follows:
Figure FDA0004044211960000043
wherein I II 2 Representing iota 2 And if the loss value after reconstruction is larger than a certain threshold value, regarding that the sequence is abnormal, and continuously adjusting the threshold value to obtain the maximum F1 comprehensive value.
CN202310024568.6A 2023-01-09 2023-01-09 Multi-variable time sequence anomaly detection method based on hierarchical attention network Pending CN116361640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310024568.6A CN116361640A (en) 2023-01-09 2023-01-09 Multi-variable time sequence anomaly detection method based on hierarchical attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310024568.6A CN116361640A (en) 2023-01-09 2023-01-09 Multi-variable time sequence anomaly detection method based on hierarchical attention network

Publications (1)

Publication Number Publication Date
CN116361640A true CN116361640A (en) 2023-06-30

Family

ID=86938454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310024568.6A Pending CN116361640A (en) 2023-01-09 2023-01-09 Multi-variable time sequence anomaly detection method based on hierarchical attention network

Country Status (1)

Country Link
CN (1) CN116361640A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541794A (en) * 2023-07-06 2023-08-04 中国科学技术大学 Sensor data anomaly detection method based on self-adaptive graph annotation network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541794A (en) * 2023-07-06 2023-08-04 中国科学技术大学 Sensor data anomaly detection method based on self-adaptive graph annotation network
CN116541794B (en) * 2023-07-06 2023-10-20 中国科学技术大学 Sensor data anomaly detection method based on self-adaptive graph annotation network

Similar Documents

Publication Publication Date Title
Wang et al. A novel weighted sparse representation classification strategy based on dictionary learning for rotating machinery
CN110020623B (en) Human body activity recognition system and method based on conditional variation self-encoder
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
Che et al. Hybrid multimodal fusion with deep learning for rolling bearing fault diagnosis
CN109612513B (en) Online anomaly detection method for large-scale high-dimensional sensor data
CN113673346B (en) Motor vibration data processing and state identification method based on multiscale SE-Resnet
Lee et al. Studies on the GAN-based anomaly detection methods for the time series data
CN113159163A (en) Lightweight unsupervised anomaly detection method based on multivariate time series data analysis
CN112465798B (en) Anomaly detection method based on generation countermeasure network and memory module
CN116361640A (en) Multi-variable time sequence anomaly detection method based on hierarchical attention network
CN116522265A (en) Industrial Internet time sequence data anomaly detection method and device
CN114067915A (en) scRNA-seq data dimension reduction method based on deep antithetical variational self-encoder
CN116796272A (en) Method for detecting multivariate time sequence abnormality based on transducer
CN112163020A (en) Multi-dimensional time series anomaly detection method and system
CN116451117A (en) Power data anomaly detection method based on federal learning
CN115587335A (en) Training method of abnormal value detection model, abnormal value detection method and system
CN117056874A (en) Unsupervised electricity larceny detection method based on deep twin autoregressive network
Zhang et al. MS-TCN: A multiscale temporal convolutional network for fault diagnosis in industrial processes
Terbuch et al. Hybrid machine learning for anomaly detection in industrial time-series measurement data
CN116306780B (en) Dynamic graph link generation method
CN117009900A (en) Internet of things signal anomaly detection method and system based on graph neural network
CN111858343A (en) Countermeasure sample generation method based on attack capability
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering
CN110990383A (en) Similarity calculation method based on industrial big data set
Wu et al. Genetic-algorithm-based Convolutional Neural Network for Robust Time Series Classification with Unreliable Data.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination