CN115484102A - Industrial control system-oriented anomaly detection system and method - Google Patents

Industrial control system-oriented anomaly detection system and method Download PDF

Info

Publication number
CN115484102A
CN115484102A CN202211131264.1A CN202211131264A CN115484102A CN 115484102 A CN115484102 A CN 115484102A CN 202211131264 A CN202211131264 A CN 202211131264A CN 115484102 A CN115484102 A CN 115484102A
Authority
CN
China
Prior art keywords
data
time
dimensional
abnormality
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211131264.1A
Other languages
Chinese (zh)
Inventor
夏武
还约辉
杨根科
褚健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Guoli Network Security Technology Co ltd
Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Original Assignee
Zhejiang Guoli Network Security Technology Co ltd
Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Guoli Network Security Technology Co ltd, Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University filed Critical Zhejiang Guoli Network Security Technology Co ltd
Priority to CN202211131264.1A priority Critical patent/CN115484102A/en
Publication of CN115484102A publication Critical patent/CN115484102A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • H04L67/125Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks involving control of end-device applications over a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses an anomaly detection system and method for an industrial control system, which relate to the field of anomaly detection, and the system comprises: the device comprises a data acquisition module, a data preprocessing module, an anomaly detection model training module, a threshold setting module, an anomaly detection module and a result output module. The method comprises the following steps: step 1, collecting training data; step 2, data preprocessing is carried out; step 3, learning time dimension characteristics; step 4, learning time series correlation; step 5, reconstructing the multidimensional time sequence data; step 6, setting an abnormal threshold value; step 7, real-time online detection; and 8, outputting a detection result. The method can effectively model the time sequence, learn the periodic rule of the normal sequence and effectively perform robust modeling on the abnormally polluted time sequence data; and the reconstruction error in the training process is learned by adopting an extreme value theorem, and the threshold value is automatically set, so that the inconvenience of setting according to an empirical value is avoided.

Description

Industrial control system-oriented anomaly detection system and method
Technical Field
The invention relates to the field of anomaly detection, in particular to an anomaly detection system and method for an industrial control system.
Background
Industrial control systems are widely used in industrial sectors and key infrastructures, such as in the electrical, water and wastewater, oil and gas, chemical, transportation, pharmaceutical, pulp and paper, food and beverage, and discrete manufacturing (e.g., automotive, aerospace, and durable) industries. The long-term stable operation of industrial production can be ensured by carrying out timely and effective abnormality detection for an industrial control system.
In an actual production process, data with tags is often difficult to obtain, so the anomaly detection is mostly based on an unsupervised method. The industrial control system has a complex structure, and is difficult to be applied to single-variable time sequence anomaly detection, a large number of sensors are used for monitoring the current state in real time in the industrial production process, and an actuator is used for operating the current system, so that multivariate time sequence data acquired from the sensors and the actuator can be used as an important research object in the field of anomaly detection.
Although traditional anomaly detection methods based on machine learning are good in interpretability, the methods need to rely on expert experience to establish complex feature engineering on time series data. With the development of technologies and the improvement of computing power, an anomaly detection method based on deep learning starts to receive wide attention. The recurrent neural network can capture long-term dependence in the time sequence data, but correlation among different time sequences is not considered, so that the effect is not ideal when multi-dimensional time sequence data of potential correlation of a modeling tool sequence is built, in an actual industrial scene, the change of working load and working environment noise are unavoidable, the robustness of the model is poor, in addition, the setting of an anomaly detection threshold value depends on experience, and the flexibility is lacked.
Jin Yaohui et al, in the Chinese patent application "a method and system for detecting abnormalities in a multidimensional time series" (application number: 202011060906.4), provides a method and system for detecting abnormalities in a multidimensional time series, comprising: reconstructing the sampled low-dimensional variables into a multi-dimensional time sequence through a cyclic neural network self-encoder, optimizing the model by adopting a regularization method based on a time sequence Markov smooth hypothesis, and calculating time sequence abnormal values based on reconstructed time sequence probability distribution. But the method only learns the rule of the time sequence in the time dimension and ignores the potential correlation between the time sequences, and the lack of the potential correlation between the time sequences can cause low model accuracy and low detection performance,
zhao Peihai et al, in the chinese patent application "real-time anomaly detection method using unsupervised deep neural network for multidimensional time series data" (application number: 202110848400.8), provides a method and system for detecting anomalies in multidimensional time series, comprising: calculating a correlation characteristic matrix SFM for the acquired data, inputting a characteristic matrix sequence into a characteristic extraction and reconstruction data module which takes four layers of convolutional neural networks as a characteristic extractor, adding one layer of LSTM network structure to each layer of convolutional neural networks, reconstructing the characteristic extraction matrix output by each layer of LSTM network structure to obtain a reconstruction matrix, forming the reconstruction matrix sequence by all the reconstruction matrices, taking the reconstruction matrix sequence as the input of a linear regression, taking the output expression form of the linear regression as the predicted acquired data of an n-order square matrix PSFM, calculating the difference between the PSFM and the SFM to obtain an abnormal score sc, and judging whether the abnormal score sc reaches the abnormal range according to a given threshold value delta. Although the method considers the correlation between time sequences, the correlation calculation is limited to a linear relation, and a good effect is difficult to achieve for a system with a complex relation.
Pi Dechang et al, in the chinese patent application "a method and system for detecting abnormality of multidimensional time series data" (application number: 202111371649.0), provides a method and system for detecting abnormality of multidimensional time series data, comprising the steps of: firstly, preprocessing satellite telemetering data to obtain coding additional data and time scale fusion characteristics; then, fusing the coding additional data and the fusion characteristics to obtain fusion input information; then, inputting the fusion input information into a Transformer variational self-encoder for encoding and decoding, and then obtaining a reconstruction result, and calculating to obtain a reconstruction error; and then, smoothing the reconstruction error by adopting a weighted moving average method, judging that the satellite telemetry data to be detected is abnormal when the smoothing error exceeds a threshold range, and recording an abnormal time point. The method can effectively capture the correlation among time sequences, but still uses empirical values in the threshold setting of the anomaly detection, and is lack of flexibility and adaptability.
Accordingly, those skilled in the art are directed to developing a new system and method for detecting anomalies in an industrial control system that overcomes the above-mentioned deficiencies of the prior art.
Disclosure of Invention
In view of the above defects in the prior art, the technical problem to be solved by the present invention is how to overcome the defects in the prior art that the model performance is low due to lack of learning of the potential relationship between time series, and the overfitting of the anomaly detection deep learning model and the setting of the anomaly threshold value depend on experience under the condition that noise exists in the training data set.
In order to achieve the purpose, the invention provides a system and a method for detecting the abnormity of the multidimensional time sequence oriented to an industrial control system.
The invention provides a multi-dimensional time series abnormity detection system for an industrial control system, which comprises:
the data acquisition module records multi-dimensional time sequence data of the industrial control system;
the data preprocessing module is connected with the data acquisition module and is used for preprocessing the acquired multidimensional time sequence data to obtain a plurality of batches of multidimensional time sequence subsequences;
the anomaly detection model training module is connected with the data preprocessing module, receives the multi-dimensional time sequence subsequences of the batches, constructs and trains a neural network model for anomaly detection, namely an anomaly detection model, and outputs the input reconstruction data of the multi-dimensional time sequence data;
the threshold setting module is connected with the anomaly detection model training module, calculates an error between the reconstruction data and the input multi-dimensional time series data, is called a reconstruction error, uses the reconstruction error as sample data, adopts extreme value theorem to learn, and automatically sets an anomaly threshold;
the abnormality detection module inputs the time series data acquired in real time into the trained abnormality detection model after passing through the data preprocessing module, calculates the reconstruction error, takes the reconstruction error as an abnormality score, and compares the abnormality score with the abnormality threshold; when the abnormity score is smaller than the abnormity threshold value, the time sequence data acquired in real time is considered to be abnormal;
and the result output module is connected with the abnormity detection module and outputs an abnormity detection result for the detected abnormal time sequence data collected in real time.
Further, the real-time collected time sequence data is also collected by the data collection module.
Furthermore, in the data preprocessing module, the data preprocessing module firstly adopts a sliding window technology to perform segmentation, and then performs batch-dividing and normalization processing on the segmented subsequence.
Further, in the anomaly detection model training module, the anomaly detection model adopts the time sequence characteristics of a one-dimensional convolutional neural network learning time sequence, adopts the correlation between random cyclic neural network learning time sequences, and reconstructs the input multi-dimensional time sequence data through the idea of a variational self-encoder; constructing a loss function through the reconstruction error, the posterior distribution of the hidden space vector and the KL divergence of the assumed prior distribution; and carrying out model training by optimizing the loss function, and saving the parameters of the abnormal detection model when the loss function reaches the minimum.
Further, in the result output module, the abnormality detection result includes an abnormality occurrence position, an occurrence time, and a duration length.
The invention also provides a multi-dimensional time series anomaly detection method for the industrial control system, which comprises the following steps:
step 1, a sensors and b actuators of an industrial control system are continuously sampled according to preset frequency f, the sampling time length is T, a sample X of multi-dimensional time sequence data is obtained, and the size of the sample X belongs to R N×M Wherein, N is the length of sampling data, and is calculated by the frequency f and the length of sampling time T; m is the sample data dimension, M = a + b;
step 2, data preprocessing is carried out, and a sliding time window is set and comprises an initial time s t And a termination time e t (ii) a The length of the sliding time window is w = e t -s t A width of the sample data dimension M of the multi-dimensional time series data; sliding the sliding time window over the multi-dimensional time series data until the end of the series data; setting the step size of each sliding as s, dividing the multi-dimensional time sequence data into a plurality of subsequence fragments x with the sampling time length of w and the sampling data dimension of M, wherein x belongs to R w×M (ii) a When the sampling time length is less than the length of the sliding time window, directly taking the segment as a subsequence; selected batch size b s Dividing the divided subsequence into a plurality of subsequences of size b s Each batch having a subsequence size of (b) s ,w,M);
Step 3, learning time dimension characteristics, firstly, using a plurality of one-dimensional convolution godsOne-dimensional convolution is carried out on the input multi-dimensional time sequence data along the time dimension of the multi-dimensional time sequence data through a network, and low-dimensional representation z of time sequence characteristics is obtained through learning 1 Then, z is expressed for the low dimension of the time sequence characteristic 1 Carrying out deconvolution, and outputting d;
step 4, learning time sequence correlation, firstly, inputting d into a variational coding network of the random cyclic neural network, and learning the correlation between time sequences to obtain low-dimensional representation z 2 Then, z is 2 Obtaining an enhanced low-dimensional representation through realNVP flow; said step 3 and said step 4 constitute an approximate inference network in the structure of the variational self-encoder:
Figure BDA0003850509900000041
the method specifically comprises the following steps:
Figure BDA0003850509900000042
wherein:
f (-) and f -1 (. Cndot.) represents a one-dimensional convolution operation and a deconvolution operation,
Figure BDA0003850509900000043
represents a gated cyclic unit GRU;
step 5, reconstructing the multidimensional time sequence data;
step 6, setting an abnormal threshold value;
step 7, real-time online detection;
and 8, outputting a detection result.
Further, in the step 5, the low-dimensional representation z of the time-series feature 1 Deconvolution is carried out to obtain an input e of a decoding network, time sequence information contained in the e is used as external input and is input into the random circular neural network, a neural network model used for anomaly detection is constructed and trained, the model is called as an anomaly detection model, and meanwhile, the low-dimensional representation z between time sequences is combined 2 To realize the reconstruction of the original input multi-dimensional time series data, and obtain the reconstruction data, which can be expressed as:
p θ (x,z 1 ,z 2 )=p θ (x|z 1 z,2)p θ (z 2 |z 1 )
the method specifically comprises the following steps:
Figure BDA0003850509900000047
wherein: g (-) represents a pair of z 1 Performing deconvolution operation, and constructing a loss function to jointly optimize the approximate model and the generation model;
according to the optimization function of the variational self-encoder:
Figure BDA0003850509900000044
in the form of (a) a (b),
the optimization function is constructed as:
Figure BDA0003850509900000045
the optimization method comprises Monte Carlo sampling, an SGVB estimator and a re-parameter skill.
Further, in the step 6, an error between the reconstructed data and the original inputted multi-dimensional time series data is calculated, obtain a series of reconstruction error = { er 1 ,er 2 …};
Taking the reconstruction error as a sample, automatically setting a threshold value as threshold by adopting an extreme value theorem, and calculating the reconstruction error by the extreme value theorem:
Figure BDA0003850509900000046
the calculation formula of the anomaly threshold value is as follows:
Figure BDA0003850509900000051
wherein th is an initial setting threshold value,
Figure BDA0003850509900000052
q is the set probability for the parameter to be learned, N is the number of input samples, N t And iteratively updating the detection threshold value by adding the abnormality score calculated in real time for the number of samples larger than the initial threshold value.
Further, in the step 7, time series data acquired in real time at time t are input into the anomaly detection model, and an anomaly score, i.e. a reconstruction probability of the anomaly detection model on the input data, is obtained; when score < threshold, an abnormality is considered to have occurred, otherwise, it is considered to be normal.
Further, in step 8, for the detected abnormal time series segment, the abnormal possibility of all dimensions of the input data at that time is calculated, and according to the sequence from high to low, the top k dimensions are selected as the abnormal, and the name of the sensor or actuator corresponding to the abnormal dimension, the time when the abnormal occurs and the duration length are output.
The system and the method for detecting the abnormality of the multidimensional time series facing the industrial control system at least have the following technical effects:
1. the technical scheme provided by the invention is based on the strategy of a hierarchical variational self-encoder, so that the time sequence relation of a multi-dimensional time sequence on time is learned, the potential correlation existing among different time sequences is captured, the time sequence can be effectively modeled, and the periodic rule of a normal sequence is learned;
2. according to the technical scheme provided by the invention, through the one-dimensional convolution network and the reverse one-dimensional convolution operation, the input is reconstructed to filter the abnormity existing in the original data, and the robustness modeling can be effectively carried out on the time sequence data polluted by the abnormity;
3. the technical scheme provided by the invention is that an extreme value theorem is adopted to learn the reconstruction error in the training process, the threshold value is automatically set, and the inconvenience of setting according to an empirical value is avoided.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is an overall block diagram of a preferred embodiment of the present invention;
FIG. 2 is a multi-dimensional time series and sliding window diagram of the embodiment of FIG. 1;
FIG. 3 is a stochastic recurrent neural network of the embodiment of FIG. 1;
FIG. 4 is a diagram illustrating an overall structure of the anomaly detection model of the embodiment shown in FIG. 1.
Detailed Description
The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
The invention aims to solve the technical problem of how to overcome the defects of low model performance caused by lack of learning of potential relations among time sequences, overfitting of an abnormal detection deep learning model under the condition that noise exists in training data set and the setting of an abnormal threshold value depends on experience in the prior art. In order to solve the technical problem, the time sequence relation of the multidimensional time sequence in time is learned by adopting one-dimensional convolution, potential correlation existing between different time sequences is captured by a random variational self-encoder, data reconstruction is carried out according to the time sequence relation and the correlation information, model training is carried out based on the strategy of the variational self-encoder, and the time sequence can be effectively modeled to learn the periodic law of a normal sequence. In addition, the extreme value theorem is adopted to learn the reconstruction error generated in the training process, and the abnormal detection threshold value is automatically set, so that the inconvenience of setting according to the empirical value is avoided.
According to the system and the method for detecting the abnormity of the multi-dimensional time sequence oriented to the industrial control system, a deep learning method is adopted to learn the normal cycle rule of the industrial multi-dimensional time sequence, and the abnormity of the time sequence is detected based on the probability distribution of the reconstructed time sequence, so that the accuracy and the stability of abnormity detection are improved, and the reliability of a detection model is enhanced.
As shown in fig. 1, the system for detecting an anomaly in a multidimensional time series for an industrial control system according to the present invention includes:
the data acquisition module records multi-dimensional time sequence data of the industrial control system; the data acquisition module records current state information of a sensor and an actuator, which is obtained by continuous sampling at a fixed frequency during normal operation of an industrial control system, generates multi-dimensional time series data, and constructs a data set required by neural network modeling in a model training stage of an anomaly detection system; in the model operation stage of the anomaly detection system, a data acquisition module acquires system state data in real time for anomaly detection;
the data preprocessing module is connected with the data acquisition module and is used for preprocessing the acquired multidimensional time sequence data to obtain a plurality of batches of multidimensional time sequence subsequences; specifically, the data preprocessing module preprocesses the acquired multidimensional time sequence data, divides the multidimensional time sequence by adopting a sliding window technology, and then performs batch normalization on the divided subsequences to obtain a plurality of batches of multidimensional time sequence subsequences.
And the anomaly detection model training module is connected with the data preprocessing module, receives a plurality of batches of multi-dimensional time sequence subsequences, constructs and trains a neural network model for anomaly detection, namely an anomaly detection model, and the output of the anomaly detection model training module is input reconstruction data of the multi-dimensional time sequence data. Specifically, the anomaly detection model adopts the time sequence characteristics of a one-dimensional convolutional neural network learning time sequence, adopts the correlation between random cyclic neural network learning time sequences, and reconstructs input multi-dimensional time sequence data through the idea of a variational self-encoder; constructing a loss function through the reconstruction error, the posterior distribution of the hidden space vector and the KL divergence of the assumed prior distribution; and performing model training by optimizing the loss function, and saving the parameters of the abnormal detection model when the loss function reaches the minimum.
The threshold setting module is connected with the anomaly detection model training module, calculates errors between the reconstruction data and the input multi-dimensional time sequence data, is called reconstruction errors, takes the reconstruction errors as sample data, adopts extreme value theorem to learn, and automatically sets an anomaly threshold;
the abnormality detection module inputs the time sequence data acquired in real time into a trained abnormality detection model after passing through the data preprocessing module, calculates a reconstruction error, takes the reconstruction error as an abnormality score, and compares the abnormality score with an abnormality threshold; and when the abnormality score is smaller than the abnormality threshold value, the time sequence data acquired in real time is considered to have abnormality.
And the time sequence data acquired in real time are acquired by the data acquisition module.
And the result output module is connected with the abnormity detection module and outputs an abnormity detection result for the time sequence data acquired in real time of the detected abnormity. The abnormality detection result comprises an abnormality occurrence position, an occurrence time and a duration length.
The invention also provides a multi-dimensional time sequence anomaly detection method facing the industrial control system, which comprises the steps of firstly measuring and collecting signals of a sensor and an actuator in the industrial production process to generate a multi-dimensional time sequence, segmenting the collected multi-dimensional time sequence into time sequence segments according to a sliding window, carrying out layered learning on low-dimensional characteristics of the time sequence, learning time sequence characteristics along the time dimension of the time sequence by adopting a one-dimensional convolution neural network, learning potential correlation between the time sequence along the characteristic dimension of the time sequence by adopting a random circulation neural network, reconstructing the input time sequence by using the learned time sequence characteristics and correlation information, training and optimizing a model by adopting a strategy construction loss function of a variational self-encoder, calculating an error between reconstructed data and an original input as a reconstruction error, learning the reconstruction error, setting an anomaly detection threshold, inputting the data collected in real time into an anomaly detection model, calculating an anomaly score, carrying out threshold judgment, and outputting an anomaly alarm and positioning the anomaly if the anomaly is judged to be abnormal.
Specifically, the method comprises the following steps:
step 1, a sensors and b actuators of an industrial control system are continuously sampled according to preset frequency f, the sampling time length is T, a sample X of multi-dimensional time sequence data is obtained, and the size of the sample X belongs to R N×M Wherein, N is the length of sampling data, and is calculated by frequency f and sampling time length T; m is the dimension of the sampled data, M = a + b;
step 2, data preprocessing is carried out, as shown in fig. 2, a sliding time window is set, including the starting time s t And a termination time e t (ii) a The length of the sliding time window is w = e t -s t The width is the dimension M of the sampling data of the multidimensional time sequence data; sliding a sliding time window on the multi-dimensional time sequence data until the sequence data is finished; setting the step size of each sliding as s, dividing the multi-dimensional time sequence data into a plurality of subsequence fragments x with the sampling time length of w and the sampling data dimension of M, wherein x belongs to R w×M (ii) a When the sampling time length is less than the length of the sliding time window, directly taking the segment as a subsequence; selected batch size b s The sub-sequence fragments after being divided into a plurality of fragments with the size b s Each batch having a subsequence size of (b) s W, M), and then inputting the abnormal detection model in sequence for training;
step 3, learning time dimension characteristics, firstly, using a plurality of one-dimensional convolution neural networks to carry out one-dimensional convolution on input multi-dimensional time sequence data along the time dimension of the input multi-dimensional time sequence data, and learning to obtain low-dimensional representation z of the time sequence characteristics 1 Then, z is expressed in a low dimension of the time series feature 1 Carrying out deconvolution, and outputting d; the deconvolution operation aims at filtering abnormal data noise existing in training data and ensuring the accuracy of the model and the consistency of subsequent time sequence correlation learning;
step 4,To learn the time series correlation, as shown in FIG. 3, first, d is input to a variational coding network of a stochastic recurrent neural network, and the correlation between time series is learned to obtain a low-dimensional representation z 2 Then, z is 2 Obtaining an enhanced low-dimensional representation through realNVP flow; step 3 and step 4 constitute an approximate inference network in the structure of the variational self-encoder:
Figure BDA0003850509900000081
the method comprises the following specific steps:
Figure BDA0003850509900000082
wherein:
f (-) and f -1 (. Cndot.) represents a one-dimensional convolution operation and a deconvolution operation,
Figure BDA0003850509900000083
represents a gated cyclic unit GRU;
step 5, reconstructing multidimensional time sequence data, and expressing z in low dimension of time sequence feature 1 Deconvolution is carried out to obtain input e of a decoding network, time sequence information contained in the e is used as external input and is input into a random cyclic neural network, a neural network model for anomaly detection is constructed and trained, the model is called as an anomaly detection model, and meanwhile, low-dimensional representation z between time sequences is combined 2 To realize the reconstruction of the original input multidimensional time sequence data, and obtain the reconstruction data, which can be expressed as:
p θ (x,z 1 ,z 2 )=p θ (x|z 1 ,z 2 )p θ (z 2 |z 1 )
the method specifically comprises the following steps:
Figure BDA0003850509900000089
wherein: g (-) represents a pair of z 1 Carrying out deconvolution operation, and constructing a loss function to jointly optimize the approximate model and the generation model;
according to the optimization function of the variational self-encoder:
Figure BDA0003850509900000084
in the form of (a) a (b),
the optimization function is constructed as:
Figure BDA0003850509900000085
the optimization mode comprises Monte Carlo sampling, an SGVB estimator and a re-parameter skill;
and step 6, setting an abnormal threshold value, as shown in figure 4, calculating the error between the reconstruction data and the original input multi-dimensional time sequence data to obtain a series of reconstruction errors error = { er = } 1 ,er 2 …};
The reconstruction error is taken as a sample, the threshold is automatically set to be threshold by adopting the extreme value theorem, and the threshold is determined by the extreme value theorem:
Figure BDA0003850509900000086
the calculation formula of the abnormal threshold value is as follows:
Figure BDA0003850509900000087
wherein, th is a threshold value which is initially set,
Figure BDA0003850509900000088
q is the set probability for the parameter to be learned, N is the number of input samples, N t Adding the abnormal scores calculated in real time to the number of the samples larger than the initial threshold value so as to carry out iterative update on the detection threshold value;
step 7, real-time online detection, namely inputting the time sequence data acquired in real time at the time t into an anomaly detection model to obtain an anomaly score, namely the reconstruction probability of the anomaly detection model on the input data; when score < threshold, an anomaly is considered to have occurred, otherwise, it is considered to be normal;
and 8, outputting a detection result, calculating the abnormal possibility of all dimensions of the input data at the moment for the detected abnormal time sequence segment, selecting the first k dimensions as the abnormality according to the sequence from high to low, and outputting the name of the sensor or the actuator corresponding to the abnormal dimension, the time of the occurrence of the abnormality and the duration length of the abnormality.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A multi-dimensional time series anomaly detection system for an industrial control system is characterized by comprising:
the data acquisition module records multi-dimensional time sequence data of the industrial control system;
the data preprocessing module is connected with the data acquisition module and used for preprocessing the acquired multi-dimensional time sequence data to obtain a plurality of batches of multi-dimensional time sequence subsequences;
the anomaly detection model training module is connected with the data preprocessing module, receives the multi-dimensional time sequence subsequences of the batches, constructs and trains a neural network model for anomaly detection, namely an anomaly detection model, and outputs the input reconstruction data of the multi-dimensional time sequence data;
the threshold setting module is connected with the anomaly detection model training module, calculates an error between the reconstruction data and the input multi-dimensional time sequence data, is called a reconstruction error, takes the reconstruction error as sample data, adopts extreme value theorem to learn, and automatically sets an anomaly threshold;
the abnormality detection module inputs the time series data acquired in real time into the trained abnormality detection model after passing through the data preprocessing module, calculates the reconstruction error, takes the reconstruction error as an abnormality score, and compares the abnormality score with the abnormality threshold; when the abnormity score is smaller than the abnormity threshold value, the time sequence data acquired in real time is considered to be abnormal;
and the result output module is connected with the abnormity detection module and outputs an abnormity detection result for the detected abnormal time sequence data collected in real time.
2. The industrial control system-oriented multi-dimensional time series anomaly detection system of claim 1, wherein the real-time collected time series data is also collected by the data collection module.
3. The system for detecting the abnormality of the multidimensional time series oriented to the industrial control system as recited in claim 1, wherein in the data preprocessing module, the data preprocessing module firstly adopts a sliding window technology to segment, and then carries out the processing of dividing and normalizing the segmented subsequence.
4. The system for detecting the abnormality of the multidimensional time series facing the industrial control system according to claim 1, wherein in the abnormality detection model training module, the abnormality detection model adopts a one-dimensional convolutional neural network to learn the time series characteristics of the time series, adopts a random cyclic neural network to learn the correlation between the time series, and reconstructs the input multidimensional time series data through the idea of a variational self-encoder; constructing a loss function through KL divergence of the reconstruction error, posterior distribution of the implicit space vector and the assumed prior distribution; and carrying out model training by optimizing the loss function, and saving the parameters of the abnormal detection model when the loss function reaches the minimum.
5. The system for detecting the abnormality in the multidimensional time series oriented to the industrial control system according to claim 1, wherein in the result output module, the abnormality detection result includes an abnormality occurrence position, an occurrence time, and a duration length.
6. The industrial control system oriented multi-dimensional time series anomaly detection method as recited in claim 1, characterized in that the method comprises the steps of:
step 1, a sensors and b actuators of an industrial control system are continuously sampled according to preset frequency f, the sampling time length is T, a sample X of multi-dimensional time sequence data is obtained, and the size of the sample X belongs to R N×M Wherein, N is the length of sampling data, and is calculated by the frequency f and the length of sampling time T; m is the sample data dimension, M = a + b;
step 2, data preprocessing is carried out, and a sliding time window is set and comprises an initial time s t And a termination time e t (ii) a The length of the sliding time window is w = e t -s t A width of the sample data dimension M of the multi-dimensional time series data; sliding the sliding time window on the multi-dimensional time sequence data until the sequence data is finished; setting the step size of each sliding as s, dividing the multi-dimensional time sequence data into a plurality of subsequence fragments x with the sampling time length of w and the sampling data dimension of M, wherein x belongs to R w×M (ii) a When the sampling time length is less than the length of the sliding time window, directly taking the fragment as a subsequence; selected batch size b s The sub-sequence fragments after being divided into a plurality of fragments with the size b s Each batch having a subsequence size of (b) s ,w,M);
Step 3, learning time dimension characteristics, firstly, using a plurality of one-dimensional convolution neural networks for inputThe input multi-dimensional time sequence data is subjected to one-dimensional convolution along the time dimension thereof, and the low-dimensional representation z of the time sequence characteristic is obtained through learning 1 Then, z is expressed for the low dimension of the time sequence characteristic 1 Carrying out deconvolution, and outputting d;
step 4, learning time sequence correlation, firstly, inputting d into a variational coding network of the random cyclic neural network, learning the correlation between time sequences to obtain low-dimensional expression z 2 Then, z is 2 Obtaining an enhanced low-dimensional representation through realNVP flow; said step 3 and said step 4 constitute an approximate inference network in the structure of the variational self-encoder:
Figure FDA0003850509890000021
the method comprises the following specific steps:
Figure FDA0003850509890000022
wherein:
f (-) and f -1 (. Cndot.) represents a one-dimensional convolution operation and a deconvolution operation,
Figure FDA0003850509890000023
represents a gated cyclic unit GRU;
step 5, reconstructing the multidimensional time sequence data;
step 6, setting an abnormal threshold value;
step 7, real-time online detection;
and 8, outputting a detection result.
7. The industrial control system-oriented multi-dimensional time series anomaly detection method as recited in claim 6, wherein in the step 5, the low-dimensional representation z of the time series characteristic 1 Deconvolution is carried out to obtain an input e of a decoding network, and time sequence information contained in the e is used as external input and is input into the random loopIn a neural network, a neural network model for anomaly detection, called an anomaly detection model, is constructed and trained while incorporating said low-dimensional representation z between time series 2 To realize the reconstruction of the original input multi-dimensional time series data, and obtain the reconstruction data, which can be expressed as:
p θ (x,z 1 ,z 2 )=p θ (x|z 1 ,z 2 )p θ (z 2 |z 1 )
the method comprises the following specific steps:
Figure FDA0003850509890000031
wherein: g (-) denotes the pair z 1 Carrying out deconvolution operation, and constructing a loss function to jointly optimize the approximate model and the generation model;
according to the optimization function of the variational self-encoder:
Figure FDA0003850509890000032
in the form of (a) a (b),
the optimization function is constructed as:
Figure FDA0003850509890000033
the optimization method comprises Monte Carlo sampling, an SGVB estimator and a re-parameter skill.
8. The method as claimed in claim 7, wherein in step 6, the error between the reconstruction data and the original input multi-dimensional time series data is calculated, and a series of reconstruction errors error = { er = is obtained 1 ,er 2 …};
Taking the reconstruction error as a sample, automatically setting a threshold value as threshold by adopting an extreme value theorem, and calculating the reconstruction error by the extreme value theorem:
Figure FDA0003850509890000034
the calculation formula of the anomaly threshold value is as follows:
Figure FDA0003850509890000035
wherein th is an initial setting threshold value,
Figure FDA0003850509890000036
q is the set probability for the parameter to be learned, N is the number of input samples, N t And iteratively updating the detection threshold value by adding the abnormality score calculated in real time for the number of samples larger than the initial threshold value.
9. The method as claimed in claim 8, wherein in step 7, the time series data collected in real time at time t are input into the anomaly detection model to obtain an anomaly score, score being the reconstruction probability of the anomaly detection model to the input data; when score < threshold, an abnormality is considered to have occurred, otherwise, it is considered to be normal.
10. The method for detecting the abnormality in the multidimensional time series oriented to the industrial control system according to claim 9, wherein in the step 8, the possibility of the abnormality in all dimensions of the input data at that time is calculated for the detected abnormality time series segments, the top k dimensions are selected as the abnormality in the order from high to low, and the name of the sensor or the actuator corresponding to the abnormal dimension, the time when the abnormality occurs, and the duration are output.
CN202211131264.1A 2022-09-16 2022-09-16 Industrial control system-oriented anomaly detection system and method Pending CN115484102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211131264.1A CN115484102A (en) 2022-09-16 2022-09-16 Industrial control system-oriented anomaly detection system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211131264.1A CN115484102A (en) 2022-09-16 2022-09-16 Industrial control system-oriented anomaly detection system and method

Publications (1)

Publication Number Publication Date
CN115484102A true CN115484102A (en) 2022-12-16

Family

ID=84392798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211131264.1A Pending CN115484102A (en) 2022-09-16 2022-09-16 Industrial control system-oriented anomaly detection system and method

Country Status (1)

Country Link
CN (1) CN115484102A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795350A (en) * 2023-01-29 2023-03-14 北京众驰伟业科技发展有限公司 Abnormal data information processing method in production process of blood rheology test cup
CN115964636A (en) * 2022-12-23 2023-04-14 浙江苍南仪表集团股份有限公司 Gas flow abnormity detection method and system based on machine learning and dynamic threshold
CN116361673A (en) * 2023-06-01 2023-06-30 西南石油大学 Quasi-periodic time sequence unsupervised anomaly detection method, system and terminal
CN116662811A (en) * 2023-06-13 2023-08-29 无锡物联网创新中心有限公司 Time sequence state data reconstruction method and related device of industrial equipment
CN116738170A (en) * 2023-06-13 2023-09-12 无锡物联网创新中心有限公司 Abnormality analysis method and related device for industrial equipment
CN117150407A (en) * 2023-09-04 2023-12-01 国网上海市电力公司 Abnormality detection method for industrial carbon emission data
CN118378092A (en) * 2024-06-20 2024-07-23 阿里云飞天(杭州)云计算技术有限公司 Model training method, abnormality detection system, electronic device, and storage medium
CN118378092B (en) * 2024-06-20 2024-10-25 阿里云飞天(杭州)云计算技术有限公司 Model training method, abnormality detection system, electronic device, and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964636A (en) * 2022-12-23 2023-04-14 浙江苍南仪表集团股份有限公司 Gas flow abnormity detection method and system based on machine learning and dynamic threshold
CN115964636B (en) * 2022-12-23 2023-11-07 浙江苍南仪表集团股份有限公司 Gas flow abnormality detection method and system based on machine learning and dynamic threshold
CN115795350A (en) * 2023-01-29 2023-03-14 北京众驰伟业科技发展有限公司 Abnormal data information processing method in production process of blood rheology test cup
CN116361673B (en) * 2023-06-01 2023-08-11 西南石油大学 Quasi-periodic time sequence unsupervised anomaly detection method, system and terminal
CN116361673A (en) * 2023-06-01 2023-06-30 西南石油大学 Quasi-periodic time sequence unsupervised anomaly detection method, system and terminal
CN116662811A (en) * 2023-06-13 2023-08-29 无锡物联网创新中心有限公司 Time sequence state data reconstruction method and related device of industrial equipment
CN116738170A (en) * 2023-06-13 2023-09-12 无锡物联网创新中心有限公司 Abnormality analysis method and related device for industrial equipment
CN116662811B (en) * 2023-06-13 2024-02-06 无锡物联网创新中心有限公司 Time sequence state data reconstruction method and related device of industrial equipment
CN116738170B (en) * 2023-06-13 2024-06-18 无锡物联网创新中心有限公司 Abnormality analysis method and related device for industrial equipment
CN117150407A (en) * 2023-09-04 2023-12-01 国网上海市电力公司 Abnormality detection method for industrial carbon emission data
CN117150407B (en) * 2023-09-04 2024-10-01 国网上海市电力公司 Abnormality detection method for industrial carbon emission data
CN118378092A (en) * 2024-06-20 2024-07-23 阿里云飞天(杭州)云计算技术有限公司 Model training method, abnormality detection system, electronic device, and storage medium
CN118378092B (en) * 2024-06-20 2024-10-25 阿里云飞天(杭州)云计算技术有限公司 Model training method, abnormality detection system, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN115484102A (en) Industrial control system-oriented anomaly detection system and method
CN111504676B (en) Equipment fault diagnosis method, device and system based on multi-source monitoring data fusion
Zhang et al. Deep learning-driven data curation and model interpretation for smart manufacturing
CN108304941A (en) A kind of failure prediction method based on machine learning
CN105467975A (en) Equipment fault diagnosis method
Tian et al. Identification of abnormal conditions in high-dimensional chemical process based on feature selection and deep learning
CN113344295A (en) Method, system and medium for predicting residual life of equipment based on industrial big data
CN117290800B (en) Timing sequence anomaly detection method and system based on hypergraph attention network
CN117314900B (en) Semi-self-supervision feature matching defect detection method
Zhang et al. Gated recurrent unit-enhanced deep convolutional neural network for real-time industrial process fault diagnosis
CN114500004A (en) Anomaly detection method based on conditional diffusion probability generation model
CN117131110B (en) Method and system for monitoring dielectric loss of capacitive equipment based on correlation analysis
CN114385614A (en) Water quality early warning method based on Informmer model
CN114118225A (en) Method, system, electronic device and storage medium for predicting remaining life of generator
CN110779988A (en) Bolt life prediction method based on deep learning
CN116502164A (en) Multidimensional time series data anomaly detection method, device and medium based on countermeasure training and frequency domain improvement self-attention mechanism
Yang et al. Remaining useful life prediction based on normalizing flow embedded sequence-to-sequence learning
Pan et al. Unsupervised fault detection with a decision fusion method based on Bayesian in the pumping unit
CN114741945B (en) On-line fault diagnosis method for aero-engine
Gao et al. A novel fault detection model based on vector quantization sparse autoencoder for nonlinear complex systems
Li et al. Knowledge enhanced ensemble method for remaining useful life prediction under variable working conditions
CN113052302B (en) Machine health monitoring method and device based on cyclic neural network and terminal equipment
CN113984389A (en) Rolling bearing fault diagnosis method based on multi-receptive-field and improved capsule map neural network
CN118132934A (en) Real-time state analysis method and system for machine tool spindle
CN117744495A (en) Method for predicting service life of extra-large bearing driven by multiple models in different degradation stages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination