CN114595448B - Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium - Google Patents

Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium Download PDF

Info

Publication number
CN114595448B
CN114595448B CN202210247513.7A CN202210247513A CN114595448B CN 114595448 B CN114595448 B CN 114595448B CN 202210247513 A CN202210247513 A CN 202210247513A CN 114595448 B CN114595448 B CN 114595448B
Authority
CN
China
Prior art keywords
sequence
correlation
sequence length
length
industrial control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210247513.7A
Other languages
Chinese (zh)
Other versions
CN114595448A (en
Inventor
丁潇
徐丽娟
赵大伟
周洋
陈川
仝丰华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202210247513.7A priority Critical patent/CN114595448B/en
Publication of CN114595448A publication Critical patent/CN114595448A/en
Application granted granted Critical
Publication of CN114595448B publication Critical patent/CN114595448B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention relates to an industrial control abnormity detection method, system, equipment and storage medium based on correlation analysis and three-dimensional convolution. Calculating the correlation between target data acquired at adjacent time to determine the length of the longest sequence, further determining the size of an RGB (red, green and blue) image according to the length of the longest sequence, calculating the correlation of observation data and comparing the correlation with a sequence length list to obtain a coarse-grained abnormal sequence; and obtaining sequences with different lengths according to the sequence length list as input, learning data characteristics from two dimensions of time and space by using an improved three-dimensional convolutional neural network, deeply analyzing key information points of the data, and analyzing abnormal data from fine granularity. According to the invention, the industrial control data is analyzed from two stages of coarse granularity and fine granularity, so that abnormal data in the industrial control process can be effectively detected, and the accuracy of abnormal detection is improved.

Description

Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium
Technical Field
The invention belongs to the technical field of industrial control system anomaly detection research, and particularly relates to an industrial control anomaly detection method, system, equipment and storage medium based on correlation analysis and three-dimensional convolution.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the rapid development of the internet and industrial electronics technology, a large number of critical infrastructures are encountering unprecedented massive attacks due to the lack of sophisticated security measures. The industrial control system is used as the central nerve of the national key infrastructure, and the safety of the industrial control system is related to the safety of the society and the country. Attackers attack critical infrastructure, not only compromising social order and public interest, but also questioning the security of the industry related to industrial internet development. From recent reports of internet security threats, it is known that the number of attacks that industrial control systems are subjected to is growing dramatically. With the rapid development of the internet technology, the attack technology is also developed, and the frequent occurrence of industrial control attack events continuously warns people of strengthening the protection of an industrial control system. The current state of vulnerability of industrial control systems has its historical cause, namely, the lack of security performance concerns for industrial control systems designed before the network threat has emerged. It is therefore an urgent and unwelcome matter to prevent industrial control systems from being attacked.
Industrial control systems are equipped with various network sensors and actuators that generate a large amount of multivariate time series data that can be used to detect anomalies. The abnormity detection is an important technology for preventing the industrial control system from accidents and ensuring the reliable operation of the industrial control system. In order to ensure the safe operation of the industrial control system, many researchers at home and abroad have conducted intensive research on the abnormality detection technology. The current anomaly detection methods for industrial control systems can be divided into three categories: the method comprises the steps of anomaly detection based on a traditional feature mining method, anomaly detection based on classical machine learning and anomaly detection based on deep learning. The anomaly detection method based on the traditional feature mining method comprises a time series correlation diagram method proposed by a small European team, a state transition diagram method proposed by a Luxuefen team and the like, however, the time series correlation diagram method has the limitation that the anomaly between binary data cannot be detected, and the potential correlation between sensors is rarely considered in the state transition diagram method. The anomaly detection method based on the classical machine learning makes certain progress in the aspect of improving the anomaly detection accuracy, however, along with the continuous increase of the data scale, the anomaly detection method based on the classical machine learning also presents limitations, wherein the limitations include the increase of the calculation cost and the reduction of the detection performance, and the time correlation of industrial control data is difficult to utilize. Compared with the traditional feature mining method and the classical machine learning-based anomaly detection method, the deep learning-based anomaly detection method has the main advantages that the inherent nonlinear correlation hidden in a large-scale multivariate time sequence can be mined, the features are not required to be designed manually, however, the deep learning-based anomaly detection method has some defects at present, only the time sequence features of data are considered when the industrial control data anomaly detection task is processed, and the dependence among sensors is required to be considered in order to improve the accuracy of anomaly detection and reduce the false alarm rate. In addition, determining the length of the self-adaptive time series input into the three-dimensional convolutional neural network is a difficulty. Therefore, inspired by the above research issues, the present invention proposes to apply an improved deep learning method to industrial control system anomaly detection.
Disclosure of Invention
Aiming at the defects of the existing industrial control abnormity detection technology, the invention provides an industrial control abnormity detection method based on correlation analysis and three-dimensional convolution.
The invention aims to solve the defects that the potential relation between data mined by an industrial control system anomaly detection method is insufficient, the characteristics of two dimensions of time and space are not considered, and the analysis is impossible to realize the long-time dependence relation, solve the problem that the time sequence input into a three-dimensional convolutional neural network can only be of a fixed length, ensure high accuracy and low false alarm rate of an anomaly detection model, and aim to improve the robustness of the industrial control system anomaly detection model.
The method comprises the steps of firstly converting industrial control data into an RGB (red, green and blue) graph, applying a three-dimensional convolutional neural network to an industrial control abnormity detection task by considering the characteristics of time correlation and space correlation among the industrial control data, and simultaneously realizing the memory of the long-time dependency relationship of a multidimensional time sequence by combining an attention mechanism. The invention can effectively realize the abnormal detection of the data of the industrial control system and can realize the abnormal detection requirements of high accuracy and low false alarm rate.
The invention also provides an industrial control abnormity detection system, equipment and a storage medium based on correlation analysis and three-dimensional convolution.
Interpretation of terms:
1. three-dimensional convolutional neural networks: the method is changed from the basis of a two-dimensional convolutional neural network, solves the problem that the two-dimensional convolutional neural network cannot utilize time sequence information, can form a cube by stacking a plurality of continuous graphs, and can fully mine the characteristics of two dimensions of time and space by utilizing the advantage that a convolution kernel of the cube is formed.
2. An attention mechanism is as follows: the method is a technology capable of assigning weight to input to realize the memory of key information, can screen important information from a large amount of information and assign high weight to the important information, and conversely, assigns low weight to the less important information.
3. RGB graph: the color map is a color map with R, G, B three channels, R, G, B respectively represents three colors of red, green and blue, the value range of each color in the RGB map is 0-255, and the more the proportion of each color is, the brighter the obtained color is.
The technical scheme of the invention is as follows:
an industrial control anomaly detection method based on correlation analysis and three-dimensional convolution comprises the following steps:
acquiring industrial control data, preprocessing the industrial control data, and dividing the preprocessed industrial control data into a training set and a test set;
designing a dynamic sequence length for the training set based on the correlation;
generating an RGB map based on the dynamic sequence length;
inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model;
performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) graph from the test set, and inputting the trained anomaly detection model to judge whether an anomaly occurs;
the industrial control data to be detected are subjected to preprocessing, dynamic sequence length design and RGB (red, green and blue) graph generation based on the dynamic sequence length in sequence, and the generated RGB graph is input into an anomaly detection model subjected to test of a test set to judge whether anomaly occurs or not;
the method comprises the steps of designing the length of a dynamic sequence, and carrying out correlation calculation by utilizing correlation characteristics of the sequence to obtain time sequences with different time lengths.
According to the invention, preferably, the correlation calculation is performed by using the correlation characteristics of the sequences to obtain the time sequences with different time lengths, and the method comprises the following steps: normalizing industrial control data; and calculating the correlation between the industrial control data of two adjacent times, if the correlation exceeds a threshold, adding one to the current sequence length until the correlation is lower than the threshold, and finally obtaining the sequence length.
Specifically, the method comprises the following steps:
step 1: suppose that in an industrial control system, E ═ E 1 ,E 2 ,...,E m ]Representing m devices including sensors and actuators, E m Represents the mth device; x for measuring device value, i.e. industrial control data, obtained at time i i It is shown that,
Figure BDA0003545631120000031
wherein
Figure BDA0003545631120000032
Is a device E m A measurement at time i;
industrial control data are normalized, and the specific method of normalization is shown as formula (I):
Figure BDA0003545631120000033
in the formula (I), the compound is shown in the specification,
Figure BDA0003545631120000034
is x i Is x i Standard deviation of (1), x i * Is normalized x i
Figure BDA0003545631120000035
Step 2: assume num represents the current sequence length for recording the current num of consecutive measuring device values x i * There is strong correlation between the two, i is more than or equal to 1 and less than or equal to num, and the measuring equipment value x of any two adjacent time is calculated i * And x j * Correlation between, if x i * And x j * If the correlation between the sequences exceeds the threshold tau, the current sequence length num is added with one until the correlation is lower than the threshold, the current sequence length num is recorded, the current sequence length num is added into the sequence length list L, the new sequence length is recorded again, and x is calculated i * And x j * The specific method of the correlation is shown in formula (II):
Figure BDA0003545631120000036
in the formula (II), the compound is shown in the specification,
Figure BDA0003545631120000037
is x i * The average value of (a) of (b),
Figure BDA0003545631120000038
is x j * The mean value of (a);
finally, a sequence length list L ═ L is obtained 1 ,l 2 ,...,l n ]。
According to a preferred embodiment of the present invention, the generating of the RGB map based on the dynamic sequence length comprises: and (4) normalization processing, namely reconstructing a matrix after the normalization processing, wherein each element in the matrix corresponds to 256 gray values to obtain a gray map, and superposing the three continuous gray maps into an RGB map.
The method specifically comprises the following steps:
obtaining the longest sequence length L of L according to the sequence length list L MAX Assuming that M is the closest M × L MAX The number of squares of;
step a: let E ═ E 1 ,E 2 ,...,E m ]Representing m devices including sensors and actuators, S ═ S 1 ,s 2 ,...,s m ]The measured values corresponding to m devices are represented, the measured value of each device is a column vector, and each column vector in the S is normalized, wherein the specific method is as shown in formula (III):
Figure BDA0003545631120000041
in the formula (III), the compound represented by the formula (III),
Figure BDA0003545631120000042
is s i The minimum value of (a) is determined,
Figure BDA0003545631120000043
is s i Maximum value of s i ' is normalized S;
step b: according to the sequence length list L, will L i The values of (i is more than or equal to 1 and less than or equal to n) pieces of measuring equipment are respectively reconstructed into one piece according to the sequence
Figure BDA0003545631120000044
When l is a matrix of i X m (1. ltoreq. i. ltoreq. n) is less than
Figure BDA0003545631120000045
Then, the zero padding is used at the end, and n is obtained through the step
Figure BDA0003545631120000046
A matrix of (a);
step c: each element in the matrix has 256 possible values corresponding to 256 gray values to obtain a gray map, three continuous gray maps are superposed into an RGB map, and all the RGB maps are arranged according to a time sequence to obtain an RGB map stream.
Inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model, wherein the training comprises the following steps:
the specific implementation process is as follows:
constructing a three-dimensional convolutional neural network, namely 3D ResNet, setting the convolutional kernel size of a convolutional layer to be 3 multiplied by 3, setting the step length stride to be 1, setting the padding to be 1, and totally 128 convolutional kernels; the size of the convolution kernel is expressed as (k) w ,k h ,k c ),k w ,k h ,k c Respectively representing the width, the height and the channel number of the convolution kernel;
let the size of the input data be (w) in ,h in ,c in ) The size of the output data is (w) out ,h out ,c out ) (ii) a The size calculation method of the output characteristic diagram is shown as formulas (IV), (V) and (VI):
Figure BDA0003545631120000047
Figure BDA0003545631120000051
c out =c out (Ⅵ)
after the anomaly detection model is constructed, the mean square error MSE is used as a loss function in the training process, and the formula (VII) is shown as follows:
Figure BDA0003545631120000052
in the formula (VII), y i The actual value is represented by the value of,
Figure BDA0003545631120000053
representing the predicted value, n representing the total number of sequences, and the smaller the MSE represents the more accurate the model training.
Preferably, according to the present invention, the test set is subjected to correlation calculation to obtain the sequence length, and the longest sequence length L obtained by using the training set MAX Generating an RGB (red, green and blue) image from the test set, inputting the trained anomaly detection model to judge whether an anomaly occurs, and comprising the following steps:
step A: let L be the sequence length list obtained from the training set 1 ,l 2 ,...,l n ]The longest sequence length obtained from the training set is L MAX The sequence length obtained by the test set is listed as L test =[l 1 ,l 2 ,...,l r ]The current sequence length num is used for recording the sequence length and the initial value is set to 1, M is the closest M × L MAX M refers to the number of measuring equipment in the industrial control system;
and B: shallow analysis abnormal data: calculating the data sequence length l of the test set r : firstly, test data are normalized according to a formula (I), then the test data are subjected to correlation calculation between two adjacent test data according to a time sequence, if a correlation threshold tau exceeds 0.9, a num value is added by one until the correlation between the two adjacent test data is lower than the correlation threshold tau, and then the current sequence length num value is recorded as l r Is prepared by r Add to test set sequence Length List L test In (c), and determining r Whether in the sequence length list L ═ L 1 ,l 2 ,...,l n ]If the data sequence length of the test set is not in the sequence length list L, judging whether the next data sequence length of the test set is in the sequence length list L or not;
and C: deep analysis of abnormal data: normalizing the test set according to a formula (III), and utilizing the test set sequence length list L obtained in the step B test =[l 1 ,l 2 ,...,l r ]Is prepared by r The strip measuring device values are reconstructed in time sequence to form one
Figure BDA0003545631120000054
R is more than or equal to 1 and less than or equal to n, when l r X m (r is more than or equal to 1 and less than or equal to n) is less than
Figure BDA0003545631120000055
Then, the zero padding is used at the end, and n is obtained through the step
Figure BDA0003545631120000056
Each element in the matrix corresponds to 256 gray values, each matrix obtains a gray image, three continuous gray images are superposed into an RGB image, and all the RGB images are arranged according to a time sequence to obtain an RGB image stream of the test set; predicting RGB image flow of the test set by using a trained abnormal detection model, predicting a sequence with a future time length t by using a historical value with the current time length t, and then, obtaining a residual vector as shown in a formula (VIII):
Figure BDA0003545631120000061
in the formula (VIII),
Figure BDA0003545631120000062
representing a residual vector obtained by predicting a sequence with a future time length t by using a historical value with the current time length t.
Figure BDA0003545631120000063
Representing the actual value of the sequence for a future time length t.
Figure BDA0003545631120000064
Representing sequences of time length t for the futureThe predicted value of the column. t represents a sequence for predicting a future time length t by using a historical value of the current time length t;
normalizing the residual vector to obtain
Figure BDA0003545631120000065
The regularization method is shown as formula (IX):
Figure BDA0003545631120000066
in the formula (IX),
Figure BDA0003545631120000067
is that
Figure BDA0003545631120000068
The average value of (a) is calculated,
Figure BDA0003545631120000069
is that
Figure BDA00035456311200000610
Standard deviation of (d);
if it is not
Figure BDA00035456311200000611
Satisfy the requirements of
Figure BDA00035456311200000612
The method is judged to be abnormal, otherwise, the method is judged to be normal.
According to the invention, the preprocessing comprises the step of complementing the missing value by adopting a mean interpolation mode on the industrial control data.
An industrial control anomaly detection system based on correlation analysis and three-dimensional convolution comprises:
a data pre-processing module configured to: acquiring industrial control data, preprocessing the industrial control data, and dividing the preprocessed industrial control data into a training set and a test set;
a sequence correlation calculation module configured to: performing correlation calculation by using the correlation characteristics of the sequence to obtain time sequences with different time lengths;
an RGB map generation module configured to: generating an RGB map based on the dynamic sequence length;
the construction model module and the parameter tuning module are configured to: inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model;
an anomaly detection module configured to: performing correlation calculation on the test set to obtain the sequence length, and using the longest sequence length L obtained by the training set MAX And generating an RGB (red, green and blue) graph by using the test set, and inputting the trained anomaly detection model to judge whether an anomaly occurs.
A computer device comprises a storage and a processor, wherein the storage stores a computer program, and the processor realizes the steps of the industrial control abnormity detection method based on correlation analysis and three-dimensional convolution when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the industrial control anomaly detection method based on correlation analysis and three-dimensional convolution.
The invention has the beneficial effects that:
the existing abnormal detection method of the industrial control system has the defects of low accuracy and high false alarm rate, some abnormal detection models do not consider the time sequence and space characteristics of industrial control data during modeling, and some abnormal detection models only consider the time dependence relationship in a fixed short time when considering the time sequence characteristics of the data. Compared with the existing industrial control system abnormity detection model, the invention has the following beneficial effects:
1. according to the invention, correlation calculation is carried out on the time sequence data, time sequences with different lengths are obtained according to the correlation, and the problem that only the time sequence data with fixed length can be analyzed is solved;
2. the invention takes the RGB image as a carrier, uses the improved three-dimensional convolution neural network to capture the characteristics with two dimensions of time and space, and can comprehensively analyze the time-space correlation of multi-dimensional data in an industrial control system;
3. the method analyzes sequence data from coarse granularity, realizes the detection of the sequence length abnormity of the time sequence data, further analyzes the time sequence data from fine granularity, and realizes the abnormity detection of the data by deeply analyzing the data characteristics by using a deep learning method.
Drawings
FIG. 1 is a general block diagram of the anomaly detection method of the present invention;
FIG. 2 is a flow chart of an industrial control anomaly detection method based on correlation analysis and three-dimensional convolution;
FIG. 3 is a schematic diagram of an anomaly detection model according to the present invention;
FIG. 4(a) is a grayscale map generated by the present invention on a BATADAL data set;
FIG. 4(b) is an RGB map generated by the present invention on a BATADAL dataset;
FIG. 5(a) is a gray scale map generated on a SWaT data set by the present invention;
FIG. 5(b) is a graph of RGB generated on a SWaT dataset by the present invention;
FIG. 6(a) is a gray scale map generated by the present invention on a WADI data set;
FIG. 6(b) is an RGB map generated by the present invention on a WADI dataset;
FIG. 7(a) is a graph illustrating the variation of training loss on BATADAL data set according to the present invention;
FIG. 7(b) is a schematic representation of the training loss variation of the present invention on a SWaT data set;
fig. 7(c) is a graphical illustration of the training loss variation of the present invention on a WADI dataset.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
An industrial control anomaly detection method based on correlation analysis and three-dimensional convolution is shown in fig. 1 and 2, and comprises the following steps:
acquiring industrial control data, preprocessing the industrial control data, and dividing the preprocessed industrial control data into a training set and a test set;
designing a dynamic sequence length for the training set based on the correlation;
generating an RGB map based on the dynamic sequence length;
inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model;
performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) image from the test set, and inputting the trained anomaly detection model to judge whether an anomaly occurs;
the industrial control data to be detected are subjected to preprocessing, dynamic sequence length design and RGB (red, green and blue) graph generation based on the dynamic sequence length in sequence, and the generated RGB graph is input into an anomaly detection model subjected to test of a test set to judge whether anomaly occurs or not;
the method comprises the steps of designing the length of a dynamic sequence, and carrying out correlation calculation by utilizing correlation characteristics of the sequence to obtain time sequences with different time lengths.
Example 2
The industrial control anomaly detection method based on correlation analysis and three-dimensional convolution is different from the method in that:
and preprocessing, namely, completing missing values of the industrial control data in a mean interpolation mode.
Specifically, the method comprises the following steps: the method comprises the steps that primary equipment data (including sensor and actuator data) are acquired by data acquisition equipment of an industrial control system in a period of time, the acquired industrial control data need to be preprocessed in consideration of the actual characteristics of original data acquired by the industrial control system, specifically, missing values are processed, a mean value interpolation mode is adopted to complete the missing values, and the mean value in a period of time before the missing values is utilized to interpolate the missing values. The data are processed according to the following steps of 6: 4 into a training set D train And test set D test
Performing correlation calculation by using correlation characteristics of the sequence to obtain time sequences with different time lengths, wherein the correlation calculation comprises the following steps: normalizing industrial control data; and calculating the correlation between the industrial control data of two adjacent times, if the correlation exceeds a threshold, adding one to the current sequence length until the correlation is lower than the threshold, and finally obtaining the sequence length. The method takes the periodic characteristics of the industrial control system into consideration, and utilizes the correlation characteristics of the sequence to carry out correlation calculation, so as to obtain time sequences with different time lengths. This part uses only the training set D train
Specifically, the method comprises the following steps:
step 1: suppose that in an industrial control system, E ═ E 1 ,E 2 ,...,E m ]Representing m devices including sensors and actuators, E m Represents the mth device; x for measuring device value, i.e. industrial control data, obtained at time i i It is shown that,
Figure BDA0003545631120000081
wherein
Figure BDA0003545631120000082
Is a device E m A measurement at time i;
in order to calculate the correlation of the sequence, the industrial control data needs to be normalized by x i In (1)
Figure BDA0003545631120000091
For example, the specific method of normalization is shown in formula (I):
Figure BDA0003545631120000092
in the formula (I), the compound is shown in the specification,
Figure BDA0003545631120000093
is x i Is x i Standard deviation of (1), x i * Is normalized x i
Figure BDA0003545631120000094
Step 2: assume num represents the current sequence length for recording the current num of consecutive measuring device values x i * There is strong correlation between the two, i is more than or equal to 1 and less than or equal to num, in order to obtain the length of the dynamic sequence, the measuring device value x of any two adjacent time needs to be calculated i * And x j * Correlation between, if x i * And x j * If the correlation between the sequences exceeds the threshold tau, the current sequence length num is added by one until the correlation is lower than the threshold, the current sequence length num is recorded, the current sequence length num is added into the sequence length list L, the new sequence length is recorded again, and the x is used for recording the new sequence length i * And x j * For example, if τ is 0.9, and the threshold τ exceeds 0.9, it indicates that there is a strong correlation between the values of the measuring devices at two adjacent times, x is calculated i * And x j * The specific method of the correlation is shown in formula (II):
Figure BDA0003545631120000095
in the formula (II), the compound is shown in the specification,
Figure BDA0003545631120000096
is x i * The average value of (a) of (b),
Figure BDA0003545631120000097
is x j * The mean value of (a);
finally, a sequence length list L ═ L is obtained 1 ,l 2 ,...,l n ]。
Generating an RGB map based on the dynamic sequence length, comprising: and (4) normalization processing, namely reconstructing a matrix after the normalization processing, wherein each element in the matrix corresponds to 256 gray values to obtain a gray map, and superposing the three continuous gray maps into an RGB map.
The method specifically comprises the following steps:
obtaining the longest sequence length L of L according to the sequence length list L MAX Note the longest sequence length L MAX According to training set D only train Performing calculation on the training set D train The resulting longest sequence length L MAX Needs to be applied to test concentration. Suppose M is the closest M L MAX The number of squares of;
step a: suppose E ═ E 1 ,E 2 ,...,E m ]Representing m devices including sensors and actuators, S ═ S 1 ,s 2 ,...,s m ]The measured values corresponding to m devices are represented, the measured value of each device is a column vector, and each column vector in the S is normalized, wherein the specific method is as shown in formula (III):
Figure BDA0003545631120000101
in the formula (III), the compound represented by the formula (III),
Figure BDA0003545631120000102
is s i The minimum value of (a) is calculated,
Figure BDA0003545631120000103
is s i Maximum value of s i ' is normalized S;
step b: according to the sequence length list L, will L i The values of (i is more than or equal to 1 and less than or equal to n) pieces of measuring equipment are respectively reconstructed into one piece according to the sequence
Figure BDA0003545631120000104
When l is a matrix of i X m (1. ltoreq. i. ltoreq. n) is less than
Figure BDA0003545631120000105
Then, the zero padding is used at the end, and n is obtained through the step
Figure BDA0003545631120000106
A matrix of (a);
step c: each element in the matrix has 256 possible values corresponding to 256 gray values to obtain a gray map, three continuous gray maps are superposed into an RGB map, and all the RGB maps are arranged according to a time sequence to obtain an RGB map stream.
Inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model, wherein the training comprises the following steps:
the RGB image obtained by the method is used as the input of the anomaly detection model and can be used as the basis for subsequent detection. The invention improves a classical three-dimensional convolution neural network, which mainly relates to three aspects: firstly, a residual error module is added on the basis of the traditional three-dimensional convolution neural network so as to improve the network degradation problem. Secondly, the input layer of the anomaly detection model is designed to be capable of selecting the input size according to the data characteristics. And thirdly, considering the characteristic of high correlation among the characteristics of the industrial control spatiotemporal data, modifying the node number of the output layer, and adding an attention mechanism to deeply analyze the characteristics of the spatiotemporal data.
The specific implementation process is as follows:
the anomaly detection model sorts the original industrial control data according to time sequence order and groups the data according to correlation, the purpose is to obtain the sequence length upper limit which can be used for actual detection, the M value which meets the condition is obtained according to the sequence length upper limit, and the industrial control data is generated
Figure BDA0003545631120000107
The RGB image of the size is used as the input of the model;
constructing a three-dimensional convolutional neural network, namely 3D ResNet, setting the size of a convolutional kernel of a convolutional layer to be 3 multiplied by 3, setting the step size stride to be 1, setting the padding to be 1, and totally 128 convolutional kernels; the size of the convolution kernel is expressed as (k) w ,k h ,k c ),k w ,k h ,k c Respectively representing the width, the height and the channel number of the convolution kernel;
let the size of the input data be (w) in ,h in ,c in ) The size of the output data is (w) out ,h out ,c out ) (ii) a The size calculation method of the output characteristic diagram is shown as formulas (IV), (V) and (VI):
Figure BDA0003545631120000108
Figure BDA0003545631120000109
c out =c out (Ⅵ)
the overall neural network architecture of the anomaly detection model is shown in fig. 3, the first part of the neural network architecture is a feature extraction network, the feature extraction network is shown in the left part of fig. 3 and comprises four 3D-CNN Block layers, wherein "Conv 3D,3 × 3 × 3,128" in each 3D-CNN Block layer represents a three-dimensional convolutional neural network, the size of a convolutional core is 3 × 3 × 3 and has 128 channels, " BN 3D, 128" represents batch normalization with 128 channels, and "ReLU" represents a ReLU activation function. The second part of the neural network architecture is feature map processing, each RGB map obtains a feature map set through a three-dimensional convolution neural network, and if the feature map set is obtained, I is ═ I 1 ,I 2 ,...,I g ]Is shown in the specification, wherein I g Representing the g-th feature map, wherein the length and the width of each feature map in I are equal and equal to those of the input RGB map, all the feature maps in I are combined to be regarded as a g-channel picture, the picture is cut into N multiplied by N small pictures, N is a small picture which can be divided into N multiplied by N
Figure BDA0003545631120000111
And (3) flattening the N multiplied by N small pictures into a vector by the number of integer divisions respectively, giving a position vector to each vector to record the position of each small picture, adding each vector and the position vector, and then passing through a Dropout layer with the probability of 0.1 to prevent overfitting. The third part of the neural network architecture is a prediction network, which is shown in the right part of FIG. 3 and uses a Transformer modelThe prediction method comprises the steps of forming L Encoder blocks and L Dncoder blocks, wherein the size of L is 6, wherein 'Multi-Head attachment' represents a Multi-Head Attention mechanism, 'fed forward' represents a FeedForward neural network, 'Layer Norm' represents normalization, and 'Dropout' represents a Dropout Layer with the probability of 0.1, a vector output by the Dropout Layer is input into a prediction network, and the prediction network finally outputs a prediction vector.
After the anomaly detection model is constructed, the mean square error MSE is used as a loss function in the training process, and the formula (VII) is shown as follows:
Figure BDA0003545631120000112
in the formula (VII), y i The actual value is represented by a value representing,
Figure BDA0003545631120000113
representing the predicted value, n representing the total number of sequences, and the smaller the MSE represents the more accurate the model training.
Performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) graph from the test set, inputting the trained anomaly detection model to judge whether an anomaly occurs or not, and comprising the following steps:
step A: let L be the sequence length list obtained from the training set 1 ,l 2 ,...,l n ]The longest sequence length obtained from the training set is L MAX The sequence length obtained by the test set is listed as L test =[l 1 ,l 2 ,...,l r ]The current sequence length num is used for recording the sequence length and the initial value is set to 1, M is the closest M × L MAX M refers to the number of measuring equipment in the industrial control system;
and B: shallow analysis abnormal data: calculating the data sequence length l of the test set r : firstly, test data are normalized according to a formula (I), then the test data are calculated according to a time sequence, and if a correlation threshold tau exceeds 0.9, a num value is calculatedAdding one to the sequence length of the current sequence, and stopping until the correlation between two adjacent test data is lower than a correlation threshold tau, and recording the value of the current sequence length num as l r Is prepared by r Add to test set sequence Length List L test In (c), and determining r Whether in the sequence length list L ═ L 1 ,l 2 ,...,l n ]If the data sequence length of the test set is not in the sequence length list L, judging whether the next data sequence length of the test set is in the sequence length list L or not; and C, detecting sequence order abnormity and more hidden attacks designed for the industrial control system.
And C: deep analysis of abnormal data: normalizing the test set according to a formula (III), and utilizing the test set sequence length list L obtained in the step B test =[l 1 ,l 2 ,...,l r ]Is prepared by r The strip measuring device values are reconstructed in time sequence to form one
Figure BDA0003545631120000121
R is more than or equal to 1 and less than or equal to n, when l r X m (r is more than or equal to 1 and less than or equal to n) is less than
Figure BDA0003545631120000122
Then, the zero padding is used at the end, and n is obtained through the step
Figure BDA0003545631120000123
Each element in the matrix corresponds to 256 gray values, each matrix obtains a gray image, three continuous gray images are superposed into an RGB image, and all the RGB images are arranged according to a time sequence to obtain an RGB image stream of the test set; predicting RGB image flow of the test set by using a trained abnormal detection model, predicting a sequence with a future time length t by using a historical value with the current time length t, and then, obtaining a residual vector as shown in a formula (VIII):
Figure BDA0003545631120000124
in the formula (VIII),
Figure BDA0003545631120000125
representing a residual vector obtained by predicting a sequence with a future time length t by using a historical value with the current time length t.
Figure BDA0003545631120000126
Representing the real value of the sequence for a future time length t.
Figure BDA0003545631120000127
Representing a predicted value for a sequence of length t in the future. t represents a sequence for predicting the future time length t by using the historical value of the current time length t;
normalizing the residual vector to obtain
Figure BDA0003545631120000128
The regularization method is shown as formula (IX):
Figure BDA0003545631120000129
in the formula (IX),
Figure BDA00035456311200001210
is that
Figure BDA00035456311200001211
The average value of (a) of (b),
Figure BDA00035456311200001212
is that
Figure BDA00035456311200001213
Standard deviation of (d);
if it is not
Figure BDA00035456311200001214
Satisfy the requirement of
Figure BDA00035456311200001215
The method is judged to be abnormal, otherwise, the method is judged to be normal. The threshold τ is determined based on the maximum acceptable false alarm rate.
In this embodiment, the batdial data set, the safe water treatment (SWaT) data set, and the safe water distribution (WADI) data set provided by the network security research center of the university of singapore science and design are selected for verification.
The experimental conditions of this example are as follows:
one high-performance server, a windows 1064-bit system, a python compiling environment and a Pythrch framework.
The effect of the model was evaluated from the following four aspects.
First, the experimental results were divided into the following four sample sets:
(1) TP: number of instances that are actually normal samples and are detected as normal by the model.
(2) FP: number of instances that are actually abnormal samples but are detected as normal by the model.
(3) FN: number of instances that are actually normal samples but are detected by the model as abnormal.
(4) TN: number of instances that are actually anomalous samples and are detected by the model as anomalous.
After the results are classified, the performance of the model can be evaluated by calculating Accuracy (Accuracy), Precision (Precision), Recall (Recall) and F-Measure. The specific definition is as follows:
rate of accuracy
Figure BDA0003545631120000131
Rate of accuracy
Figure BDA0003545631120000132
Recall rate
Figure BDA0003545631120000133
F-Measure
Figure BDA0003545631120000134
Preprocessing the BATADAL, SWaT and WADI data sets respectively, calculating sequence length lists of the three data sets respectively after normalization, and obtaining the longest sequence length. To speed up training, the maximum length upper limit of the sequences was set to 100 in the experimental part, thus giving a maximum sequence length of BATADAL of 12, a maximum sequence length of SWaT of 100, and a maximum sequence length of WADI of 100. The number of measurement devices in BATADAL, SWaT and WADI data sets is 44, 50 and 123 respectively, the BATADAL, SWaT and WADI data sets are reconstructed into RGB images with the sizes of 24 × 24, 100 × 100 and 112 × 112 respectively, and FIG. 4(a) is a gray scale image generated on the BATADAL data set by the invention; FIG. 4(b) is an RGB map generated on the BATADAL dataset according to the present invention; FIG. 5(a) is a gray scale map generated on a SWaT data set by the present invention; FIG. 5(b) is a graph of RGB generated on a SWaT dataset by the present invention; FIG. 6(a) is a gray scale map generated by the present invention on a WADI data set; FIG. 6(b) is an RGB map generated by the present invention on a WADI dataset; the RGB graph is used as the input of the feature learning network, the feature learning network performs feature learning on input data, the feature graph is processed by the feature graph processing module after the feature graph is obtained, each feature graph is divided into four equal parts, and the four feature graphs are respectively expressed into a vector to be used as the input of the prediction network. And comparing a prediction result obtained by using a prediction network with a true value to obtain a residual vector, and comparing the normalized residual vector with a threshold value to detect abnormal data.
In the training phase, fig. 7(a) is a schematic diagram of the variation of training loss on the BATADAL data set according to the present invention; FIG. 7(b) is a schematic diagram of the training loss variation of the present invention on a SWaT data set; FIG. 7(c) is a graphical illustration of the training loss variation of the present invention over a WADI data set; it can be seen that the BATADAL dataset converged around 200 iterations, the SWaT dataset converged around 100 iterations, and the WADI dataset converged around 10 iterations.
The results of the tests performed on the BATADAL, SWaT, and WADI datasets are shown in Table 1, where Table 1 is a comparison table of the performance of the invention on the BATADAL, SWaT, and WADI datasets.
TABLE 1
Figure BDA0003545631120000141
As can be seen from table 1, the detection result of the present invention can achieve higher detection accuracy.
Example 3
An industrial control anomaly detection system based on correlation analysis and three-dimensional convolution comprises:
a data pre-processing module configured to: acquiring industrial control data, preprocessing the industrial control data, and dividing the preprocessed industrial control data into a training set and a test set;
a sequence correlation calculation module configured to: performing correlation calculation by using the correlation characteristics of the sequence to obtain time sequences with different time lengths;
an RGB map generation module configured to: generating an RGB map based on the dynamic sequence length;
the construction model module and the parameter tuning module are configured to: inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model;
an anomaly detection module configured to: performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX And generating an RGB (red, green and blue) graph by using the test set, and inputting the trained anomaly detection model to judge whether an anomaly occurs.
Example 4
A computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the industrial control anomaly detection method based on correlation analysis and three-dimensional convolution according to embodiment 1 or 2 when executing the computer program.
Example 5
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the industrial-control abnormality detection method based on correlation analysis and three-dimensional convolution of embodiment 1 or 2.

Claims (8)

1. An industrial control anomaly detection method based on correlation analysis and three-dimensional convolution is characterized by comprising the following steps:
acquiring industrial control data, preprocessing the industrial control data, and dividing the preprocessed industrial control data into a training set and a test set;
designing a dynamic sequence length for the training set based on the correlation;
generating an RGB map based on the dynamic sequence length;
inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model;
performing correlation calculation on the test set to obtain the sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) graph from the test set, and inputting the trained anomaly detection model to judge whether an anomaly occurs;
the industrial control data to be detected are subjected to preprocessing, dynamic sequence length design and RGB (red, green and blue) graph generation based on the dynamic sequence length in sequence, and the generated RGB graph is input into an anomaly detection model subjected to test of a test set to judge whether anomaly occurs or not;
wherein, designing the length of the dynamic sequence, including using the correlation characteristics of the sequence to perform correlation calculation, and obtaining the time sequences with different time lengths, includes: normalizing industrial control data; calculating the correlation between the industrial control data of two adjacent times, if the correlation exceeds a threshold, adding one to the length of the current sequence until the correlation is lower than the threshold, and finally obtaining the length of the sequence;
performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) graph from the test set, inputting the trained anomaly detection model to judge whether an anomaly occurs or not, and comprising the following steps:
step A: the sequence length list obtained from the training set is L ═ L 1 ,l 2 ,...,l n ]The longest sequence length obtained from the training set is L MAX The sequence length obtained from the test set is listed as L test =[l 1 ,l 2 ,...,l r ]Current order of dayThe column length num is used for recording the sequence length and has an initial value of 1, M is the closest M × L MAX M refers to the number of measuring equipment in the industrial control system;
and B: shallow analysis abnormal data: calculating the data sequence length l of the test set r : firstly, test data are normalized, then the test data are subjected to correlation calculation between two adjacent test data according to a time sequence, if a correlation threshold tau exceeds 0.9, a num value is added by one until the correlation between the two adjacent test data is lower than the correlation threshold tau, and then the current sequence length num value is recorded as l r Is prepared by r Add to test set sequence Length List L test In (c), and determining r Whether in the sequence length list L ═ L 1 ,l 2 ,...,l n ]If the data sequence length of the test set is not in the sequence length list L, judging whether the next data sequence length of the test set is in the sequence length list L or not;
and C: deep analysis of abnormal data: after the test set is normalized, the length list L of the test set sequence obtained in the step B is utilized test =[l 1 ,l 2 ,...,l r ]Is prepared by r The strip measuring device values are reconstructed into one in time sequence
Figure FDA0003808003360000021
R is more than or equal to 1 and less than or equal to n, when l r X m is less than
Figure FDA0003808003360000022
When r is more than or equal to 1 and less than or equal to n, the tail is filled with zero, and n is obtained through the step
Figure FDA0003808003360000023
Each element in the matrix corresponds to 256 gray values, each matrix obtains a gray image, three continuous gray images are superposed into an RGB image, and all the RGB images are arranged according to the time sequenceListing to obtain RGB image flow of the test set; predicting RGB image flow of the test set by using a trained abnormal detection model, predicting a sequence with a future time length t by using a historical value with the current time length t, and then, obtaining a residual vector as shown in a formula (VIII):
Figure FDA0003808003360000024
in the formula (VIII),
Figure FDA0003808003360000025
representing a residual vector obtained by predicting a sequence with a future time length t by using historical values with the current time length t,
Figure FDA0003808003360000026
representing the real value of the sequence of future time lengths t,
Figure FDA0003808003360000027
representing a predicted value of a sequence with a future time length t, wherein t represents that the sequence with the future time length t is predicted by using a historical value with the current time length t;
normalizing the residual vector to obtain
Figure FDA0003808003360000028
The regularization method is shown as formula (IX):
Figure FDA0003808003360000029
in the formula (IX), the compound (I),
Figure FDA00038080033600000210
is that
Figure FDA00038080033600000211
The average value of (a) of (b),
Figure FDA00038080033600000212
is that
Figure FDA00038080033600000213
Standard deviation of (d);
if it is used
Figure FDA00038080033600000214
Satisfy the requirement of
Figure FDA00038080033600000215
The method is judged to be abnormal, otherwise, the method is judged to be normal.
2. The industrial control anomaly detection method based on correlation analysis and three-dimensional convolution according to claim 1, characterized by comprising the following steps:
step 1: in industrial control systems, E ═ E 1 ,E 2 ,...,E m ]Representing m devices including sensors and actuators, E m Represents the mth device; x for measuring device value, i.e. industrial control data, obtained at time i i It is shown that,
Figure FDA00038080033600000216
wherein
Figure FDA00038080033600000217
Is a device E m A measurement at time i;
industrial control data are normalized, and the specific method of normalization is shown as formula (I):
Figure FDA00038080033600000218
in the formula (I), the compound is shown in the specification,
Figure FDA0003808003360000031
is x i Is x i Standard deviation of (1), x i * Is normalized x i
Figure FDA0003808003360000032
Step 2: num represents the length of the current sequence and is used for recording the continuous measuring equipment values x of the current num i * There is strong correlation between the two, i is more than or equal to 1 and less than or equal to num, and the measuring equipment value x of any two adjacent time is calculated i * And x j * Correlation between, if x i * And x j * If the correlation between the sequences exceeds the threshold tau, the current sequence length num is added with one until the correlation is lower than the threshold, the current sequence length num is recorded, the current sequence length num is added into the sequence length list L, the new sequence length is recorded again, and x is calculated i * And x j * The specific method of the correlation is shown in formula (II):
Figure FDA0003808003360000033
in the formula (II), the compound is shown in the specification,
Figure FDA0003808003360000034
is x i * The average value of (a) of (b),
Figure FDA0003808003360000035
is x j * The mean value of (a);
finally, a sequence length list L ═ L is obtained 1 ,l 2 ,...,l n ]。
3. The industrial control anomaly detection method based on correlation analysis and three-dimensional convolution as claimed in claim 1, wherein generating RGB map based on dynamic sequence length includes: normalization processing, namely reconstructing a matrix after the normalization processing, wherein each element in the matrix corresponds to 256 gray values to obtain a gray image, and superposing the three continuous gray images into an RGB image; the method specifically comprises the following steps:
obtaining the longest sequence length L of L according to the sequence length list L MAX M is the nearest M × L MAX The number of squares of;
step a: e ═ E 1 ,E 2 ,...,E m ]Representing m devices including sensors and actuators, S ═ S 1 ,s 2 ,...,s m ]The measured values corresponding to m devices are represented, the measured value of each device is a column vector, and each column vector in the S is normalized, wherein the specific method is as shown in formula (III):
Figure FDA0003808003360000036
in the formula (III), the reaction mixture is,
Figure FDA0003808003360000037
is s i The minimum value of (a) is determined,
Figure FDA0003808003360000038
is s i Maximum value of s i ' is normalized S;
step b: according to the sequence length list L, will L i The strip measuring device values are reconstructed into one in sequence
Figure FDA0003808003360000039
I is more than or equal to 1 and less than or equal to n, when l i X m is less than
Figure FDA00038080033600000310
When i is more than or equal to 1 and less than or equal to n, the tail is filled with zero, and n is obtained through the step
Figure FDA00038080033600000311
A matrix of (a);
step c: each element in the matrix has 256 possible values, a gray map is obtained corresponding to 256 gray values, three continuous gray maps are superposed into an RGB map, and all the RGB maps are arranged according to the time sequence to obtain an RGB map stream.
4. The industrial control anomaly detection method based on correlation analysis and three-dimensional convolution according to claim 1, wherein the generated RGB map is input into an anomaly detection model for training to obtain a trained anomaly detection model, and the method comprises the following steps:
constructing a three-dimensional convolution neural network, namely 3D ResNet;
the size of the input data is (w) in ,h in ,c in ) The size of the output data is (w) out ,h out ,c out ) (ii) a The size calculation method of the output characteristic diagram is shown as formulas (IV), (V) and (VI):
Figure FDA0003808003360000041
Figure FDA0003808003360000042
c out =c out (Ⅵ)
after the anomaly detection model is constructed, the mean square error MSE is used as a loss function in the training process, and the formula (VII) is shown as follows:
Figure FDA0003808003360000043
in the formula (VII), y i The actual value is represented by the value of,
Figure FDA0003808003360000044
representing the predicted value, n representing the total number of sequences, and the smaller the MSE represents the more accurate the model training.
5. The industrial control anomaly detection method based on correlation analysis and three-dimensional convolution as claimed in any one of claims 1-4, wherein preprocessing includes filling missing values in a mean interpolation mode for industrial control data.
6. An industrial control anomaly detection system based on correlation analysis and three-dimensional convolution is characterized by comprising:
a data pre-processing module configured to: acquiring industrial control data, preprocessing the industrial control data, and dividing the preprocessed industrial control data into a training set and a test set;
a sequence correlation calculation module configured to: performing correlation calculation by using the correlation characteristics of the sequence to obtain time sequences with different time lengths;
an RGB map generation module configured to: generating an RGB map based on the dynamic sequence length;
the model building module and the parameter tuning module are configured to: inputting the generated RGB image into an anomaly detection model for training to obtain a trained anomaly detection model;
an anomaly detection module configured to: performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) image from the test set, and inputting the trained anomaly detection model to judge whether an anomaly occurs;
performing correlation calculation by using correlation characteristics of the sequence to obtain time sequences with different time lengths, wherein the correlation calculation comprises the following steps: normalizing industrial control data; calculating the correlation between the industrial control data of two adjacent times, if the correlation exceeds a threshold, adding one to the length of the current sequence until the correlation is lower than the threshold, and finally obtaining the length of the sequence;
performing correlation calculation on the test set to obtain sequence length, and using the longest sequence length L obtained by the training set MAX Generating an RGB (red, green and blue) graph from the test set, inputting the trained anomaly detection model to judge whether an anomaly occurs or not, and comprising the following steps:
step A: the sequence length list obtained from the training set is L ═ L 1 ,l 2 ,...,l n ]The longest sequence length obtained from the training set is L MAX The sequence length obtained by the test set is listed as L test =[l 1 ,l 2 ,...,l r ]The current sequence length num is used for recording the sequence length and has an initial value of 1, M is the closest M × L MAX M refers to the number of measuring equipment in the industrial control system;
and B: shallow analysis abnormal data: calculating the data sequence length l of the test set r : firstly, test data are normalized, then the test data are subjected to correlation calculation between two adjacent test data according to a time sequence, if a correlation threshold tau exceeds 0.9, a num value is added by one until the correlation between the two adjacent test data is lower than the correlation threshold tau, and then the current sequence length num value is recorded as l r Is prepared by r Add to test set sequence Length List L test In (c), and determining r Whether in the sequence length list L ═ L 1 ,l 2 ,...,l n ]If the data sequence length of the test set is not in the sequence length list L, judging whether the next data sequence length of the test set is in the sequence length list L or not;
and C: deep analysis of abnormal data: after the test set is normalized, the sequence length list L of the test set obtained in the step B is utilized test =[l 1 ,l 2 ,...,l r ]Is prepared by r The strip measuring device values are reconstructed in time sequence to form one
Figure FDA0003808003360000051
When l is a matrix of r X m is less than
Figure FDA0003808003360000052
When r is more than or equal to 1 and less than or equal to n, the tail is filled with zero, and n is obtained through the step
Figure FDA0003808003360000053
Each element in the matrix corresponds to 256 gray values, each matrix obtains a gray image, three continuous gray images are superposed into an RGB image, and all the RGB images are arranged according to a time sequence to obtain an RGB image stream of the test set; predicting RGB image flow of the test set by using a trained abnormal detection model, predicting a sequence with a future time length t by using a historical value with the current time length t, and then, obtaining a residual vector as shown in a formula (VIII):
Figure FDA0003808003360000054
in the formula (VIII),
Figure FDA0003808003360000055
representing a residual vector obtained by predicting a sequence with a future time length t by using a historical value with the current time length t,
Figure FDA0003808003360000056
representing the real value of the sequence of future time lengths t,
Figure FDA0003808003360000057
representing a predicted value of a sequence with a future time length t, wherein t represents that the sequence with the future time length t is predicted by using a historical value with the current time length t;
normalizing the residual vector to obtain
Figure FDA0003808003360000058
The regularization method is shown as formula (IX):
Figure FDA0003808003360000061
in the formula (IX), the compound (I),
Figure FDA0003808003360000062
is that
Figure FDA0003808003360000063
The average value of (a) of (b),
Figure FDA0003808003360000064
is that
Figure FDA0003808003360000065
Standard deviation of (d);
if it is used
Figure FDA0003808003360000066
Satisfy the requirement of
Figure FDA0003808003360000067
The method is judged to be abnormal, otherwise, the method is judged to be normal.
7. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and wherein the processor when executing the computer program implements the steps of the correlation analysis and three-dimensional convolution based industrial control anomaly detection method according to any one of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the industrial-control anomaly detection method based on correlation analysis and three-dimensional convolution according to any one of claims 1 to 5.
CN202210247513.7A 2022-03-14 2022-03-14 Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium Active CN114595448B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210247513.7A CN114595448B (en) 2022-03-14 2022-03-14 Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210247513.7A CN114595448B (en) 2022-03-14 2022-03-14 Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium

Publications (2)

Publication Number Publication Date
CN114595448A CN114595448A (en) 2022-06-07
CN114595448B true CN114595448B (en) 2022-09-27

Family

ID=81809617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210247513.7A Active CN114595448B (en) 2022-03-14 2022-03-14 Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium

Country Status (1)

Country Link
CN (1) CN114595448B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442154B (en) * 2022-10-26 2022-12-30 北京安帝科技有限公司 Method and system for verifying deep analysis of modular industrial control protocol packet

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766992A (en) * 2018-12-06 2019-05-17 北京工业大学 Industry control abnormality detection and attack classification based on deep learning
CN110378111A (en) * 2019-06-04 2019-10-25 哈尔滨工业大学(威海) For the intrusion detection method and detection system of industrial control system concealed attack
CN110825068A (en) * 2019-09-29 2020-02-21 惠州蓄能发电有限公司 Industrial control system anomaly detection method based on PCA-CNN
CN111562996A (en) * 2020-04-11 2020-08-21 北京交通大学 Method and system for detecting time sequence abnormality of key performance index data
CN112861364A (en) * 2021-02-23 2021-05-28 哈尔滨工业大学(威海) Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation
CN113162893A (en) * 2020-09-29 2021-07-23 国网河南省电力公司电力科学研究院 Attention mechanism-based industrial control system network flow abnormity detection method
CN113691417A (en) * 2021-08-14 2021-11-23 珠海市鸿瑞信息技术股份有限公司 Industrial control information monitoring system and method based on industrial protocol

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160330225A1 (en) * 2014-01-13 2016-11-10 Brightsource Industries (Israel) Ltd. Systems, Methods, and Devices for Detecting Anomalies in an Industrial Control System
CN109029731A (en) * 2018-05-24 2018-12-18 河海大学常州校区 A kind of power equipment exception monitoring system and method based on multi-vision visual
CN109710636B (en) * 2018-11-13 2022-10-21 广东工业大学 Unsupervised industrial system anomaly detection method based on deep transfer learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766992A (en) * 2018-12-06 2019-05-17 北京工业大学 Industry control abnormality detection and attack classification based on deep learning
CN110378111A (en) * 2019-06-04 2019-10-25 哈尔滨工业大学(威海) For the intrusion detection method and detection system of industrial control system concealed attack
CN110825068A (en) * 2019-09-29 2020-02-21 惠州蓄能发电有限公司 Industrial control system anomaly detection method based on PCA-CNN
CN111562996A (en) * 2020-04-11 2020-08-21 北京交通大学 Method and system for detecting time sequence abnormality of key performance index data
CN113162893A (en) * 2020-09-29 2021-07-23 国网河南省电力公司电力科学研究院 Attention mechanism-based industrial control system network flow abnormity detection method
CN112861364A (en) * 2021-02-23 2021-05-28 哈尔滨工业大学(威海) Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation
CN113691417A (en) * 2021-08-14 2021-11-23 珠海市鸿瑞信息技术股份有限公司 Industrial control information monitoring system and method based on industrial protocol

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Efficient Cyber Attack Detection in Industrial;Moshe Kravchik 等;《IEEE Transactions on Dependable and Secure Computing》;20210108;1-18 *
基于三维深度卷积神经网络的车间生产行为识别;刘庭煜等;《计算机集成制造系统》(第08期);141-154 *
基于相关性分析的工业时序数据异常检测;丁小欧等;《软件学报》(第03期);134-155 *
工业控制网络多模式攻击检测及异常状态评估方法;徐丽娟 等;《计算机研究与发展》;20211130;第58卷(第11期);2333-2349 *

Also Published As

Publication number Publication date
CN114595448A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN108564109B (en) Remote sensing image target detection method based on deep learning
CN112491796B (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN106529419B (en) The object automatic testing method of saliency stacking-type polymerization
CN109165242B (en) Fault diagnosis and early warning method based on entropy sorting and space-time analysis
CN110298374B (en) Driving track energy consumption analysis method and device based on deep learning
CN114943694A (en) Defect detection method based on confrontation generation network and attention
CN114595448B (en) Industrial control anomaly detection method, system and equipment based on correlation analysis and three-dimensional convolution and storage medium
CN116341901B (en) Integrated evaluation method for landslide surface domain-monomer hazard early warning
CN116823664B (en) Remote sensing image cloud removal method and system
CN113988357B (en) Advanced learning-based high-rise building wind induced response prediction method and device
CN112560967A (en) Multi-source remote sensing image classification method, storage medium and computing device
CN113284046A (en) Remote sensing image enhancement and restoration method and network based on no high-resolution reference image
CN115983465A (en) Rock burst time sequence prediction model construction method based on small sample learning
CN112437451A (en) Wireless network flow prediction method and device based on generation countermeasure network
CN115700542A (en) Optical fiber pipeline safety early warning algorithm based on deep learning
CN116383747A (en) Anomaly detection method for generating countermeasure network based on multi-time scale depth convolution
CN112101482B (en) Method for detecting abnormal parameter mode of missing satellite data
Si et al. Assessment of rib spalling hazard degree in mining face based on background subtraction algorithm and support vector machine
CN116680988A (en) Porous medium permeability prediction method based on Transformer network
Merzougui et al. Multi-gene Genetic Programming based Predictive Models for Full-reference Image Quality Assessment.
Zhu et al. FDTNet: Enhancing frequency-aware representation for prohibited object detection from X-ray images via dual-stream transformers
CN114760128A (en) Network abnormal flow detection method based on resampling
Merzougui Multi-measures fusion based on multi-objective genetic programming for full-reference image quality assessment
CN112488321B (en) Antagonistic machine learning defense method oriented to generalized nonnegative matrix factorization algorithm
CN113780105B (en) Space-time mixed pixel decomposition method for MODIS real-time remote sensing image data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221214

Address after: 250014 No. 19, ASTRI Road, Lixia District, Shandong, Ji'nan

Patentee after: SHANDONG COMPUTER SCIENCE CENTER(NATIONAL SUPERCOMPUTER CENTER IN JINAN)

Patentee after: Qilu University of Technology

Address before: 250014 No. 19, ASTRI Road, Ji'nan, Shandong

Patentee before: SHANDONG COMPUTER SCIENCE CENTER(NATIONAL SUPERCOMPUTER CENTER IN JINAN)