CN114298200A

CN114298200A - Abnormal data diagnosis method based on deep parallel time sequence relation network

Info

Publication number: CN114298200A
Application number: CN202111589040.0A
Authority: CN
Inventors: 凡时财; 杨淳; 邹见效; 徐红兵
Original assignee: Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Current assignee: Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-08

Abstract

The invention discloses an abnormal data diagnosis method based on a deep parallel time sequence relation network, which comprises the steps of collecting characteristic data under various abnormal working conditions of an industrial production system and standardizing the characteristic data to obtain a training data matrix, then extracting to obtain a characteristic vector sequence, taking the characteristic vector sequence as input and taking the corresponding abnormal working condition serial number as output to form a training sample, and constructing a DPTRN model. The invention can improve the processing speed of time sequence data and ensure the detection performance of abnormal data.

Description

Abnormal data diagnosis method based on deep parallel time sequence relation network

Technical Field

The invention belongs to the technical field of diagnosis of abnormal data in an industrial process, and particularly relates to an abnormal data diagnosis method based on a deep parallel time sequence relation network.

Background

With the continuous growth of modern industrial technology, the scale of modern industry is more and more complex. If abnormal data occurring in the industrial process is not identified and solved in time, not only economic loss is brought, but also the life safety of personnel is threatened in serious cases. Therefore, it is essential to monitor industrial processes using robust and reliable anomaly data diagnostic techniques.

The traditional abnormal data diagnosis method based on modeling cannot adapt to the increasingly modern industrial system due to the characteristics of high complexity, poor maintainability, low robustness and the like, so that the method based on data driving is concerned more and more widely. The data driving-based method analyzes the potential rule of a data mode according to historical data acquired in an industrial process, and obtains a data model with both robustness and accuracy, so that abnormal data detection or fault diagnosis can be realized on new data.

Deep learning, the most fire-based data-driven approach in recent years, has achieved a great deal of practical success in the field of industrial fault detection and diagnosis. Compared with the traditional machine learning method, deep learning can avoid a large amount of artificial feature engineering work, can automatically learn the potential high-dimensional expression of data, and has outstanding advantages on various evaluation indexes.

The time-series data is data recorded in time series. Industrial processes generally have characteristics that evolve over time, and the characteristics of the evolution of the industrial process cannot be fully considered with only a single point-in-time characteristic. The abnormal data diagnosis method based on the time series data can more fully utilize the historical information, learn the change characteristics of the industrial process along with the evolution of time, and have strong characteristic extraction capability and abnormal data diagnosis capability.

The deep learning processing of the time series data generally employs a Neural Network based on a Recurrent Neural Network (RNN), a Long-short Term Memory (LSTM), or a Gated Recurrent Unit (GRU). However, these neural network structures extract features in a manner of serially processing time series data, and the data processing speed is limited, and cannot meet the requirement of rapid and real-time diagnosis in an industrial process.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an abnormal data diagnosis method based on a Deep Parallel Time sequence relation Network, which combines the relation characteristics among characteristic data at all times by adopting a Multilayer Perceptron (MLP), and provides a Deep Parallel Time sequence relation Network (DPTRN) model on the basis, thereby realizing the Parallel processing of Time sequence data, greatly improving the processing speed of the Time sequence data and ensuring the detection performance of abnormal data.

In order to achieve the above object, the method for diagnosing abnormal data based on a deep parallel time series relationship network of the present invention comprises the following steps:

s1: under D abnormal working conditions of the industrial production system, a plurality of preset sensors acquire working data of various abnormal working conditions, and the dimensionality of a feature vector at each sampling moment is M; recording the characteristic vector obtained at the t-th sampling moment under the d-th abnormal working condition as x_d(t)，d＝1,2,…,D，t＝1,2,…,N_d，N_dRepresenting the number of sampling moments under the d-th abnormal working condition; the feature vector x_d(t) as row vectors, and performing ascending arrangement according to sampling time to obtain an original training data matrix X_d：

S2: the original data matrix X_dNormalizing to obtain a normalized training data matrix

S3: will train the data matrix

Feature vector of

Divided into Q according to a predetermined time sequence length K_dA sequence of feature vectors

Wherein Q is 1,2, …, Q_d，Q_dRepresenting a training data matrix

The number of the feature vector sequences obtained by the division,

represents rounding down;

s4: each feature vector sequence obtained in the step S3

As an input in a training sample, taking the serial number d corresponding to the abnormal working condition as an expected output, namely forming a training sample;

s5: the method comprises the following steps of constructing a DPTRN model, wherein the DPTRN model comprises a relation module, a decoupling position vector calculation module, a relation weight calculation module, a historical information vector calculation module, a vector splicing module and a multilayer perceptron, wherein:

the relation module is used for extracting an input feature vector sequence to obtain a preliminary relation weight vector and sending the preliminary relation weight vector to the relation weight calculation module, and the specific method is as follows:

the input feature vector sequence is recorded as

Wherein Z (k) represents the kth M-dimensional feature vector of Z in the sequence of feature vectors;

the relation module comprises K-1 relation units, wherein the kth 'relation unit is used for calculating and obtaining the primary relation weight between the characteristic vector z (K') and the characteristic vector z (K)

K' is 1,2, …, K-1, thus obtaining a preliminary relationship weight vector

Each relationship unit comprises a vector splicing unit and a multilayer perceptron unit, wherein:

the vector splicing unit is used for splicing the eigenvector z (k ') and the eigenvector z (K), obtaining a spliced eigenvector C (k ') and sending the spliced eigenvector C (k ') to the multilayer perceptron, and the eigenvector C_k′The expression of (a) is as follows:

wherein contact () represents vector stitching;

the multi-layer perceptron unit receives the spliced eigenvector C (k '), processes to obtain the primary relation weight of the eigenvector z (k') and the eigenvector z (K)

The decoupling position vector calculation module is used for extracting decoupling position vectors of historical time and current time from the input characteristic vector sequence Z and sending the decoupling position vectors to the relation weight calculation module; the decoupling position vector calculation module comprises a position coding module and a position vector decoupling module, wherein:

the position coding module is used for respectively generating corresponding M-dimensional position codes PE (k) for each eigenvector Z (k) in the eigenvector sequence Z and sending the M-dimensional position codes PE (k) to the position vector decoupling module;

the position vector decoupling module calculates the decoupled position codes DPE (K ') of the feature vectors z (K ') at the previous K-1 historical moments according to the position codes pe (K) of each feature vector z (K), so as to obtain the decoupled position vectors DPE at the previous K-1 historical moments [ DPE (1), DPE (2), …, DPE (K-1) ], and the calculation formula of the position codes DPE (K ') is as follows:

dpe(k′)＝[PE(k′)·Pos_query]⊙[PE(K)·Pos_key]

wherein, the lines indicate an inner product,

representing a query matrix of the location vector,

representing a position vector key value matrix;

the relation weight calculation module is used for calculating a relation weight vector according to the preliminary relation weight vector

And decoupling position vector DPE ═ DPE (1), DPE (2), …, DPE (K-1)]The final relationship weight vector RW ═ RW (1), RW (2), …, RW (K-1) is calculated]Wherein

Then sending the relation weight vector RW to a historical information vector calculation module;

the historical information vector calculation module is used for processing the first K-1 eigenvectors Z (K') in the eigenvector sequence Z according to the relation weight vector RW to obtain eigenvectors

Represents the vector outer product, then for K-1 feature vectors

Summing and pooling to obtain history information vector HI, and obtaining history informationThe information vector HI is sent to a vector splicing module;

the vector splicing module is used for splicing the historical information vector HI and the characteristic vector Z (K) in the characteristic vector sequence Z and sending the obtained splicing vector Con to the multilayer perceptron;

the multilayer perceptron is used for processing the splicing vector Con to obtain an abnormal working condition serial number corresponding to the input characteristic vector sequence;

s6: training the DPTRN model constructed in the step S5 by adopting the training sample obtained in the step S4 to obtain a trained DPTRN model;

s7: when abnormal data diagnosis needs to be carried out on the industrial production system, the same working data acquisition method as the step S1 is adopted to obtain M-dimensional characteristic vectors X (T-K) at the current moment T and the previous K-1 moments to form a data matrix X_T：

The same method as that in step S3 is applied to the data matrix X_TCarrying out standardization processing to obtain a standardized data matrix

Data matrix

Inputting the DPTRN model trained in the step S6 to obtain an abnormal data diagnosis result.

The invention relates to an abnormal data diagnosis method based on a deep parallel time sequence relation network, which comprises the steps of collecting characteristic data under various abnormal working conditions of an industrial production system and standardizing the characteristic data to obtain a training data matrix, then extracting to obtain a characteristic vector sequence, taking the characteristic vector sequence as input and taking the corresponding abnormal working condition serial number as output to form a training sample, and constructing a DPTRN model.

The DPTRN model is constructed based on the multilayer perceptron, and can capture the relation between each historical moment and the current moment in time sequence data, so that the DPTRN model has the capability of processing data in parallel. Compared with the traditional method of extracting the time sequence data characteristics in a serial mode by using models such as RNN, LSTM and GRU, the data processing efficiency of the method is greatly improved. In addition, by means of technologies such as decoupling position coding and relation weight, the abnormal data diagnosis capability of the method is guaranteed.

Drawings

FIG. 1 is a flow chart of an embodiment of an abnormal data diagnosis method based on a deep parallel time series relationship network according to the present invention;

FIG. 2 is a block diagram of a DPTRN model according to the present invention;

FIG. 3 is a block diagram of a relationship unit in the present invention;

FIG. 4 is a block diagram of the decoupled position vector calculation module of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

Examples

FIG. 1 is a flowchart of an embodiment of an abnormal data diagnosis method based on a deep parallel time series relationship network according to the present invention. As shown in fig. 1, the method for diagnosing abnormal data based on a deep parallel time series relationship network of the present invention specifically includes the steps of:

s101: collecting training data:

under D abnormal working conditions of the industrial production system, a plurality of preset sensors collect working data of various abnormal working conditions, and the characteristics of each sampling momentThe dimension of the eigenvector is M, i.e., the number of feature data at each sampling instant is M. Recording the characteristic vector obtained at the t-th sampling moment under the d-th abnormal working condition as x_d(t)，d＝1,2,…,D，t＝1,2,…,N_d，N_dAnd the number of sampling moments under the d-th abnormal condition is shown. The feature vector x_d(t) as row vectors, and performing ascending arrangement according to sampling time to obtain an original training data matrix X_d：

S102: training data normalization:

to facilitate subsequent data processing, the original data matrix X is used_dNormalizing to obtain a normalized training data matrix

The standardized calculation formula in this embodiment is:

wherein x is_d(t) (m) represents a feature vector x_d(t) M-th feature data, M-1, 2, …, M,

representing characteristic data x_d(t) (m) normalized value, mean (X)_d(m)) represents the raw data matrix X_dThe mean value of the mth feature data in all the feature vectors of (1), std (X)_d(m)) represents the raw data matrix X_dThe covariance of the mth feature data in all feature vectors of (a).

By using the above formula, each feature data can be expressed in a form of 0 as a mean and 1 as a variance.

S103: data time sequencing:

in order to improve the accuracy of abnormal data diagnosis, the invention adopts time sequence data as input data of an abnormal data diagnosis model, so that a training data matrix is required

The time-series data is divided. In order to avoid data leakage among samples, the invention adopts a non-crossed time sequence interception mode, and the specific method comprises the following steps:

will train the data matrix

Feature vector of

Wherein Q is 1,2, …, Q_d，Q_dRepresenting a training data matrix

The number of the feature vector sequences obtained by the division,

indicating a rounding down.

S104: obtaining a training sample:

each feature vector sequence obtained in step S103

And as an input in a training sample, taking the corresponding abnormal working condition serial number d as an expected output, namely forming a training sample.

S105: constructing a DPTRN model:

in order to realize abnormal data diagnosis, a DPTRN model is constructed in the invention. Fig. 2 is a structural diagram of a DPTRN model in the present invention. As shown in fig. 2, the DPTRN model of the present invention includes a relationship module, a decoupling position vector calculation module, a history information vector calculation module, a vector concatenation module, and a multilayer perceptron, and each of the constituent modules is described in detail below.

the input feature vector sequence is recorded as

K' is 1,2, …, K-1, thus obtaining a preliminary relationship weight vector

Fig. 3 is a structural diagram of a relationship unit in the present invention. As shown in fig. 3, each relationship unit in the present invention includes a vector stitching unit and a multi-layer sensor unit, wherein:

the vector splicing unit is used for splicing the feature vector z (k ') and the feature vector z (k), obtaining a spliced feature vector C (k') and sending the spliced feature vector C (k ') to the multilayer perceptron unit, wherein an expression of the feature vector C (k') is as follows:

wherein contact () represents vector concatenation.

Therefore, the feature vector at the moment K and the historical moment K' are subjected to subtraction operation and addition operation, and then the operation result is spliced with the two original feature vectors, so that the capability of model mining data relation can be improved.

The MLP module is a common neural network, and the specific principle and process thereof are not described herein.

In this embodiment, in order to reduce the size of the DPTRN model and the difficulty of model training, the multi-layer perceptron units in the K-1 relationship units in this embodiment share the weight parameters, that is, share the weight for all historical moments.

From the above process, it can be seen that in order to highlight the importance of a specific history time and prevent the important history time from being smoothed, the time relation unit will not apply to the final RW_preSoftmax normalization is performed, which means RW_preThe sum at each historical time is not 1.

The decoupling position vector calculation module is used for extracting decoupling position vectors of historical time and current time from the input characteristic vector sequence Z and sending the decoupling position vectors to the historical information vector calculation module. The decoupling position vector is introduced to enable the DPTRN model to consider the relation of each time node when parallel processing time series data. FIG. 4 is a block diagram of the decoupled position vector calculation module of the present invention. As shown in fig. 4, the decoupling position vector calculation module includes a position encoding module and a position vector decoupling module, wherein:

the position coding module is used for respectively generating corresponding M-dimensional position codes PE (k) for each eigenvector Z (k) in the eigenvector sequence Z and sending the M-dimensional position codes PE (k) to the position vector decoupling module. The position coding generally includes that the position coding module in the present embodiment adopts an absolute position coding method in a BERT model.

dpe(k′)＝[PE(k′)·Pos_query]⊙[PE(K)·Pos_key]

wherein, the lines indicate an inner product,

represents a trainable position vector query matrix that is,

representing a trainable position vector key-value matrix.

In the conventional method, position codes are usually added to the characteristic vectors in a live broadcast manner, but noise is easily introduced, so that the position vector query matrix and the position vector key value matrix are arranged, the problem of noise caused by the position codes is solved, the relation between each historical moment and the current moment is further learned, and the network convergence efficiency and robustness are improved.

The relationship weight vector RW is then sent to the history information vector calculation module.

Historical information vector calculation moduleProcessing the first K-1 eigenvectors Z (K') in the eigenvector sequence Z according to the relation weight vector RW to obtain eigenvectors

Represents the vector outer product, then for K-1 feature vectors

And performing summation pooling (summing) to obtain a historical information vector HI, and then sending the obtained historical information vector HI to a vector splicing module.

The vector splicing module is used for splicing the historical information vector HI and the feature vector Z (K) in the feature vector sequence Z and sending the obtained splicing vector Con to the multilayer perceptron.

The multilayer perceptron is used for processing the splicing vector Con to obtain an abnormal working condition serial number corresponding to the input characteristic vector sequence.

S106: training a DPTRN model:

and (5) training the DPTRN model constructed in the step (S105) by adopting the training sample obtained in the step (S104) to obtain a trained DPTRN model.

In order to improve the convergence speed and the convergence stability of the invention, the embodiment adopts an Adam optimization strategy in training the DPTRN model, and the strategy has the advantages of high calculation efficiency, small memory requirement and stable gradient propagation. In addition, in order to improve the robustness and the generalization of the model, the model introduces an L2 regularization and a Dropout strategy in the training process, so that the possible over-fitting tendency in the training process is avoided.

S107: and (3) abnormal data diagnosis:

when abnormal data diagnosis needs to be carried out on the industrial production system, the same working data acquisition method as the step S101 is adopted to obtain M-dimensional characteristic vectors X (T-K) at the current moment T and the previous K-1 moments to form a data matrix X_T：

The same method is adopted for the data matrix X in step S103_TCarrying out standardization processing to obtain a standardized data matrix

Data matrix

Inputting the DPTRN model trained in the step S106 to obtain an abnormal data diagnosis result.

In order to better illustrate the technical effects of the invention, the invention is difficult to experiment by adopting a specific example. In this embodiment, two data sets are used, which are respectively a TE chemical process data set and a KDDCUP99 data set.

The TE chemical process is a real chemical process. The TE chemical process comprises five main units: because the internal mechanism of the reactor, the condenser, the compressor, the separator and the stripping tower is relatively complex, the TE process is widely applied to the verification of various abnormal data diagnosis methods. The whole TE chemical process mainly comprises 22 continuous process measurement variables, 19 composition measurement variables and 12 operation variables, and can simulate normal working conditions and 20 abnormal working conditions.

KDDCUP is a competition organized by ACM and SIGKDD in the field of annual machine learning and data mining, and KDDCUP99 is the dataset used for the 1999 competition. The data set collects 9 weeks of network connection data from a simulated U.S. air force local area network, and the data set has two types, namely a normal type and an attack type. The data set has 41 features, 9 of which are discrete features and the remaining features are continuous features. Since the data sets are acquired strictly in time sequence, they are widely used in the study of time series data methods.

An abnormal data diagnosis method based on 4 models is adopted in the experiment, wherein the 4 models are MLP, LSTM + MLP, Bi _ LSTM + MLP and 1DCNN + MLP. In order to ensure that the experimental result is only influenced by the feature extraction part, the classification layer of each network structure adopts the MLP with the same structure. Table 1 is a table of structural information of 4 models in this experiment.

TABLE 1

In addition, on the basis of the DPTRN model adopted by the present invention, two models, i.e., a DPTRN _ a that does not use a position vector and a DPTRN _ b that uses a position vector but is not decoupled, are provided as comparison methods. In terms of performance parameters, training time, reasoning time, recall rate, accuracy and F1 value of a single sample in the experiment are used as evaluation parameters.

Table 2 is a table of performance parameters for abnormal data diagnosis of the TE chemical process data set using the present invention and 6 comparison methods.

TABLE 2

Table 3 is a table of abnormal data diagnostic performance parameters for KDDCUP99 data sets using the present invention and 6 comparison methods.

TABLE 3

As shown in tables 2 and 3, under the condition of approximate parameter quantity, the method provided by the invention utilizes the characteristic of parallel computation, and has a better abnormal data diagnosis effect under the condition of ensuring that the training time and the inference time are low in cost.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. An abnormal data diagnosis method based on a deep parallel time sequence relation network is characterized by comprising the following steps:

S3: will train the data matrix

Feature vector of

Wherein Q is 1,2, …, Q_d，Q_dRepresenting a training data matrix

The number of the feature vector sequences obtained by the division,

represents rounding down;

s4: each feature vector sequence obtained in the step S3

the input feature vector sequence is recorded as

the relation module comprises K-1 relation units, and the kth' relation unit is used for calculating to obtain the feature directionPreliminary relational weighting between the quantity z (k') and the feature vector z (K)

Thereby obtaining a preliminary relationship weight vector

wherein contact () represents vector stitching;

dpe(k′)＝[PE(k′)·Pos_query]⊙[PE(K)·Pos_key]

wherein, the lines indicate an inner product,

representing a query matrix of the location vector,

representing a position vector key value matrix;

Represents the vector outer product, then for K-1 feature vectors

Summing and pooling to obtain a historical information vector HI, and then sending the obtained historical information vector HI to a vector splicing module;

Data matrix

2. The abnormal data diagnosis method according to claim 1, wherein the standardized calculation formula in step S2 is:

3. The abnormal data diagnosis method according to claim 1, wherein the weight parameters are shared by the multi-layer perceptron units among the K-1 relational units in step S5.

4. The abnormal data diagnosis method according to claim 1, wherein the position coding module in step S5 adopts an absolute position coding method in a BERT model.