CN113162811B - Industrial control network flow abnormity detection method and device based on deep learning - Google Patents

Industrial control network flow abnormity detection method and device based on deep learning Download PDF

Info

Publication number
CN113162811B
CN113162811B CN202110605897.0A CN202110605897A CN113162811B CN 113162811 B CN113162811 B CN 113162811B CN 202110605897 A CN202110605897 A CN 202110605897A CN 113162811 B CN113162811 B CN 113162811B
Authority
CN
China
Prior art keywords
data
flow
prediction model
lstm prediction
predicting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110605897.0A
Other languages
Chinese (zh)
Other versions
CN113162811A (en
Inventor
穆洪涛
毕超然
姜海昆
范宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changyang Technology Beijing Co ltd
Original Assignee
Changyang Tech Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changyang Tech Beijing Co ltd filed Critical Changyang Tech Beijing Co ltd
Priority to CN202110605897.0A priority Critical patent/CN113162811B/en
Publication of CN113162811A publication Critical patent/CN113162811A/en
Application granted granted Critical
Publication of CN113162811B publication Critical patent/CN113162811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The invention relates to an industrial control network flow abnormity detection method and device based on deep learning, wherein the method comprises the following steps: acquiring to-be-detected data and historical data of network traffic and preprocessing the to-be-detected data and the historical data to obtain traffic characteristics; selecting flow characteristics corresponding to a section of historical data, inputting the flow characteristics into an LSTM prediction model, and predicting the flow characteristics corresponding to the data to be detected; calculating a dynamic threshold value for anomaly detection based on historical data, and detecting whether network traffic anomaly occurs in a time period corresponding to the data to be detected by combining a prediction result and real traffic characteristics corresponding to the data to be detected; and storing the flow characteristic data, periodically training a new LSTM prediction model, and updating the LSTM prediction model for predicting the flow characteristic if the new LSTM prediction model is superior to the LSTM prediction model for predicting the flow characteristic. The invention can realize the abnormal detection of the industrial control network flow, has low false alarm rate and is effective for a long time.

Description

Industrial control network flow abnormity detection method and device based on deep learning
Technical Field
The invention relates to the technical field of computer safety, in particular to an industrial control network flow abnormity detection method and device based on deep learning, computer equipment and a computer readable storage medium.
Background
The network flow prediction problem belongs to the application category of time series prediction, and a plurality of methods are applied and developed at present. Among these methods, they can be roughly classified into two types according to the modeling manner: the first type is a linear prediction method based on time series modeling, such as an autoregressive model method, an autoregressive sliding method, a differential autoregressive moving average model method, a HoltWinters exponential smoothing method and the like; the second type is a non-linear prediction method based on a machine learning algorithm, such as a support vector machine method, a recurrent neural network method, and the like.
For a linear prediction method, such as an Autoregressive Moving Average Model (ARIMA), although the Model of the method is simple and easy to implement, and only requires endogenous variables and does not need to use other exogenous variables, the method has the disadvantage that ARIMA requires time series data to be stable and can only solve the linear problem, and thus is not suitable for industrial control network traffic prediction.
In the non-linear prediction method, a support vector machine is not suitable for processing a large-scale sample, a Recurrent Neural Network (RNN) is a Recurrent Neural Network (RNN) that takes sequence data as input, recurses in the evolution direction of the sequence, and all nodes (cyclic units) are connected in a chain, but the RNN cannot solve the problem of long-term dependence. To solve this problem, a Long Short-Term Memory network (LSTM) is proposed. Considering that the industrial control network traffic prediction and anomaly detection method in the prior art is often poor in accuracy and strong in dependence on manual experience, a more accurate and more intelligent industrial control network traffic prediction and anomaly detection method needs to be provided.
Disclosure of Invention
The invention aims to provide an industrial control network flow prediction and anomaly detection method based on deep learning, which aims to overcome at least part of defects.
In a first aspect, the invention provides an industrial control network traffic anomaly detection method based on deep learning, which comprises the following steps:
s1, acquiring to-be-detected data and historical data of network traffic and preprocessing the to-be-detected data and the historical data to obtain corresponding traffic characteristics;
s2, selecting flow characteristics corresponding to a section of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristics, predicting the flow characteristics corresponding to the data to be detected, and outputting a prediction result;
the LSTM prediction model at least predicts the flow characteristics, wherein the flow characteristics comprise flow duration, the number of forward packets, the number of reverse packets, the total number of bytes of the forward packets, the total number of bytes of the reverse packets, the total number of bytes of forward packet headers, the total number of bytes of reverse packet headers, the total number of bytes of forward sub-streams and the total number of bytes of reverse sub-streams;
step S3, calculating a dynamic threshold value for anomaly detection based on historical data, and detecting whether network traffic anomaly occurs in a time period corresponding to the data to be detected by combining the prediction result and the real traffic characteristics corresponding to the data to be detected;
and step S4, storing the flow characteristic data and training a new LSTM prediction model periodically, and if the new LSTM prediction model is superior to the LSTM prediction model used for predicting the flow characteristic currently, updating the LSTM prediction model used for predicting the flow characteristic currently.
Optionally, the LSTM prediction model for predicting the flow characteristics is constructed by:
preprocessing real network flow data to obtain flow characteristics; the method comprises the steps of preprocessing to obtain flow characteristics, analyzing predicted flow characteristics from real pcap flow packet data, aggregating and counting the flow characteristics at continuous moments according to a time window to obtain corresponding real values of each time period, and forming corresponding characteristic vectors;
the method comprises the steps of utilizing a plurality of characteristic vectors which are continuously arranged according to a time positive sequence to form a sample, utilizing the plurality of samples to form a training set and a testing set, and training an LSTM prediction model which is constructed based on the LSTM based on the training set and the testing set until the LSTM prediction model converges.
Optionally, in step S2, selecting a flow characteristic corresponding to a section of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristic, predicting the flow characteristic corresponding to the data to be detected, and outputting a prediction result, where the step includes:
determining the length of the sequence data, selecting flow characteristics corresponding to historical data, forming a group of characteristic vector groups with continuous time intervals and latest time interval flow characteristics, inputting an LSTM prediction model for predicting the flow characteristics, and predicting the flow characteristics of the next time interval by the LSTM prediction model; the latest time interval and the next time interval are adjacent in time sequence, and the next time interval is the time interval corresponding to the data to be detected;
and outputting the prediction result of the LSTM prediction model on the flow characteristics of the next period, wherein the prediction result comprises the corresponding prediction value of each flow characteristic.
Optionally, in step S3, calculating a dynamic threshold for anomaly detection based on historical data, and detecting whether a network traffic anomaly occurs in the time period by combining the prediction result and a real traffic feature corresponding to the data to be detected, including:
s3-1, calling real values and predicted values of all flow characteristics corresponding to historical data;
step S3-2, calculating a dynamic threshold by:
Figure 89078DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 141347DEST_PATH_IMAGE002
indicating a dynamic threshold corresponding to the ith flow characteristic,
Figure 125484DEST_PATH_IMAGE003
representing a historical data set consisting of true values of the ith flow characteristic,
Figure 845178DEST_PATH_IMAGE004
representing a historical data set consisting of predicted values of the ith flow characteristic,
Figure 623778DEST_PATH_IMAGE003
and
Figure 428923DEST_PATH_IMAGE004
the lengths are all n;
step S3-3
Figure 216751DEST_PATH_IMAGE005
Representing the true value of the ith flow characteristic over a period of time,
Figure 790952DEST_PATH_IMAGE006
the predicted value of the ith flow characteristic in the predicted result representing the corresponding time interval meets the requirement for all i
Figure 6032DEST_PATH_IMAGE007
Then, it is considered that the network traffic abnormality occurs in the period.
Optionally, in the step S3-2, the value of n is an integer multiple of the length of the input sequence data of the LSTM prediction model;
Figure 32894DEST_PATH_IMAGE003
and
Figure 624413DEST_PATH_IMAGE004
is composed of continuous history data, and
Figure 53120DEST_PATH_IMAGE003
including the actual value of the flow characteristic for the latest time period,
Figure 173523DEST_PATH_IMAGE004
the method comprises the steps of including a predicted value of the flow characteristic of the latest time interval, wherein the latest time interval is adjacent to the time interval corresponding to the data to be predicted in time sequence.
Optionally, in step S4, storing the flow characteristic data and periodically training a new LSTM prediction model, and if the new LSTM prediction model is better than the LSTM prediction model currently used for predicting the flow characteristic, updating the LSTM prediction model currently used for predicting the flow characteristic, including:
step S4-1, when storing the flow characteristic data, if the network flow abnormity does not occur in the time interval after the detection, storing the true value of each flow characteristic in the time interval;
if the network flow abnormity occurs in the period of time after detection, storing the predicted value of each flow characteristic in the period of time;
step S4-2, when the stored data reaches the data volume of the migration training, the stored data is used for on-line learning to train a new LSTM prediction model;
after the new LSTM prediction model is trained, the new LSTM prediction model and the LSTM prediction model used for predicting the flow characteristics at present are checked by using the historical data of the network flow, and the prediction accuracy of the two LSTM prediction models is scored respectively;
if the score of the new LSTM prediction model is higher than the score of the LSTM prediction model currently used to predict the traffic characteristics, the LSTM prediction model that is then used to predict the traffic characteristics is updated to the new LSTM prediction model.
Optionally, in the step S4-2, when the new LSTM prediction model and the LSTM prediction model currently used for predicting the traffic characteristics are checked by using the historical data of the network traffic, the used historical data of the network traffic is an actually obtained real value;
when the prediction accuracy of the two LSTM prediction models is evaluated, the flow characteristics corresponding to the same historical data are input into the two LSTM prediction models respectively for flow characteristic prediction, and the difference between the predicted value and the true value of each flow characteristic is weighted and evaluated, wherein the expression is as follows:
Figure 953260DEST_PATH_IMAGE008
wherein i represents the ith set of data for a flow characteristic,
Figure 82890DEST_PATH_IMAGE009
the actual value is represented by the value of,
Figure 366104DEST_PATH_IMAGE010
the average value of the true values is represented,
Figure 188566DEST_PATH_IMAGE011
representing a predicted value, n represents the total data number of one flow characteristic, and R2_ score represents the accuracy score corresponding to the flow characteristic;
obtaining an R2_ score according to the expression and each flow characteristic, and adding the R2_ score into an R2_ score set;
calculating a weighted average of R2_ score sets corresponding to all flow characteristics, wherein the weight is the variance of R2_ score of each flow characteristic in the R2_ score set, so as to obtain a final model score, and the model score is high and has high representative accuracy;
and judging whether the new LSTM prediction model is superior to the LSTM prediction model currently used for predicting the flow characteristics or not through the model score.
In a second aspect, the present invention further provides a deep learning-based industrial control network traffic anomaly detection apparatus, including:
the preprocessing module is used for acquiring to-be-detected data and historical data of network traffic and preprocessing the to-be-detected data and the historical data to obtain corresponding traffic characteristics;
the flow prediction module is used for selecting flow characteristics corresponding to a section of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristics, predicting the flow characteristics corresponding to the data to be detected and outputting a prediction result;
the LSTM prediction model at least predicts the flow characteristics, wherein the flow characteristics comprise flow duration, the number of forward packets, the number of reverse packets, the total number of bytes of the forward packets, the total number of bytes of the reverse packets, the total number of bytes of forward packet headers, the total number of bytes of reverse packet headers, the total number of bytes of forward sub-streams and the total number of bytes of reverse sub-streams;
the anomaly detection module is used for calculating a dynamic threshold value for anomaly detection based on historical data and detecting whether network traffic anomaly occurs in a time period corresponding to the data to be detected or not by combining the prediction result and the real traffic characteristics corresponding to the data to be detected;
and the online learning module is used for storing the flow characteristic data and training a new LSTM prediction model periodically, and if the new LSTM prediction model is superior to the LSTM prediction model used for predicting the flow characteristic currently, the LSTM prediction model used for predicting the flow characteristic currently is updated.
In a third aspect, the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above-mentioned industrial control network traffic anomaly detection methods based on deep learning when executing the computer program.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned industrial control network traffic anomaly detection methods based on deep learning.
The technical scheme of the invention has the following advantages: the invention provides an industrial control network flow abnormity detection method and device based on deep learning, computer equipment and a computer readable storage medium.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a deep learning-based industrial control network traffic anomaly detection method in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a deep learning-based industrial control network traffic anomaly detection method in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an industrial control network traffic anomaly detection device based on deep learning in an embodiment of the present invention.
In the figure: 100: a preprocessing module; 200: a flow prediction module; 300: an anomaly detection module; 400: and an online learning module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1 and fig. 2, an industrial control network traffic anomaly detection method based on deep learning according to an embodiment of the present invention includes the following steps:
and step S1, acquiring to-be-detected data and historical data of the network traffic and preprocessing the to-be-detected data and the historical data to obtain corresponding traffic characteristics.
The step S1 is to acquire and preprocess network traffic data, acquire pcap traffic packet data in an industrial control network when acquiring the network traffic data, where the pcap traffic packet data records real industrial control network traffic data, analyze the pcap traffic packet into a plurality of industrial control network traffic characteristics in real time during preprocessing, and perform aggregation statistics on corresponding values of the characteristics according to a time window for subsequent use.
And step S2, selecting flow characteristics corresponding to a section of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristics, predicting the flow characteristics corresponding to the data to be detected, and outputting a prediction result.
The LSTM prediction model is realized based on the LSTM, and at least predicts the flow characteristics of the industrial control network, including the flow duration, the number of forward packets, the number of reverse packets, the total number of bytes of the forward packets, the total number of bytes of the reverse packets, the total number of bytes of a forward packet header, the total number of bytes of a reverse packet header, the total number of bytes of a forward sub-flow and the total number of bytes of a reverse sub-flow. The number of forward packets, i.e., the number of packets (packets) transmitted in the forward direction (each row in the pcap traffic packet is referred to as a packet); the total byte number of the forward packet, that is, the sum of all data bytes of the data packet (packet) transmitted in the forward direction; the total byte number of the forward packet header, that is, the sum of the bytes of the packet header of the data packet (packet) transmitted in the forward direction; the total number of bytes of the forward substream, i.e., the sum of bytes of all data transmitted in the forward direction for each substream of the current session. Similarly, the number of reverse packets, i.e. the number of data packets (packets) transmitted in the reverse direction; the total byte number of the reverse packet, i.e. the sum of all data bytes of the data packet (packet) transmitted in the reverse direction; the total byte number of the reverse packet header, that is, the sum of the bytes of the packet header of the data packet (packet) transmitted in the reverse direction; the total number of bytes of the reverse substream, i.e., the sum of the bytes of all data transmitted in the reverse direction for each substream of the current session. When processing data, the pcap traffic packet is broken into sessions, each session may have multiple connections for sending data, and each connection for sending data is a sub-stream. When data is transmitted, sub-stream transmission may or may not be used, so that there may be several data transmitted via multiple sub-streams, or there may be data transmitted without any sub-streams.
This step S2 is intended to realize network traffic prediction and output the prediction result. In order to improve the accuracy of the anomaly detection, the invention simultaneously predicts and judges the anomalies of a plurality of flow characteristics. Based on the principle of feature selection which is both predictable and helpful for anomaly detection, the invention finally selects the nine flow features (namely the flow duration, the number of forward packets, the number of reverse packets, the total byte number of the forward packet headers, the total byte number of the reverse sub-flows, and the total byte number of the reverse sub-flows) for prediction and anomaly detection based on analysis and practice. By jointly predicting and detecting the nine flow characteristics, the accuracy of industrial control network flow abnormity detection can be improved, and the dependence of the industrial control network flow abnormity detection on manual experience is reduced.
The core of the LSTM prediction model is LSTM, which is used for inputting a feature vector which is obtained by preprocessing real network traffic data and corresponds to a period of continuous time, predicting the feature vector (namely a prediction result) of data to be detected in the next period, wherein the feature vector at least comprises the nine traffic features.
It should be noted that, as shown in fig. 2, the trained LSTM prediction model is used in step S2, and when performing prediction, a segment of sequence data is directly input into the trained LSTM prediction model, so as to predict the flow characteristics corresponding to the data to be detected, and the LSTM prediction model for predicting the flow characteristics does not need to be trained for each prediction.
Step S3, calculating a dynamic threshold for anomaly detection based on the historical data of the network traffic, and detecting whether the network traffic anomaly occurs in the time period corresponding to the data to be detected, in combination with the prediction result given by the LSTM prediction model for predicting traffic characteristics in step S2 and the real data of the next time period (i.e., the real traffic characteristics corresponding to the data to be detected).
This step S3 is intended to implement dynamic threshold calculation and network traffic anomaly detection using historical data. In the detection process, the trained LSTM prediction model continuously predicts, meanwhile, the dynamic threshold calculated according to the historical data continuously changes along with the time, and when the prediction result and the real situation of the network flow are greatly different, namely, the difference between the real flow characteristic corresponding to the data to be detected and the flow characteristic contained in the prediction result exceeds the respective corresponding dynamic threshold, the current network flow is considered to be abnormal. Accordingly, an early warning can be given to prompt the user that a safety problem possibly occurs.
And step S4, storing data containing the flow characteristics and periodically training a new LSTM prediction model, and if the new LSTM prediction model is superior to the LSTM prediction model used for predicting the flow characteristics currently, updating the LSTM prediction model used for predicting the flow characteristics currently.
This step S4 is intended to implement LSTM prediction model online learning based on the stored flow characteristic data. When the stored data reach the data volume capable of carrying out the migration training, starting to carry out online learning; after learning is completed, namely training is completed, a new LSTM prediction model and an old LSTM prediction model (namely an LSTM prediction model currently used for predicting flow characteristics) are respectively tested: when the effect of the new LSTM prediction model is superior to that of the old LSTM prediction model, the LSTM prediction model is updated; when the effect of the new LSTM prediction model is inferior to the effect of the old LSTM prediction model, it is not updated.
It should be noted that, in order to save processing time, the step S4, the step S2, and the step S3 may be performed simultaneously, that is, real-time data is collected and online learning is performed while prediction and abnormality detection are performed.
The industrial control network traffic anomaly detection method based on deep learning uses the deep learning model based on LSTM to predict a plurality of network traffic characteristics, and the prediction result is closer to the true value; the threshold value of the abnormal detection is set to be a dynamic threshold value calculated based on historical data, and the number of the selected flow characteristics is large, so that the judgment result of the abnormal detection is more reasonable, and the false alarm rate is lower; and through the online learning mode, the LSTM prediction model for predicting the traffic characteristics can be continuously adjusted and optimized along with the change of the network environment, and the method can be always suitable for the current network environment, and can ensure that the anomaly detection is long-term and effective.
Optionally, in the present invention, the LSTM prediction model for predicting the flow characteristics is constructed as follows:
acquiring real network flow data, and preprocessing to obtain flow characteristics; the method comprises the steps of preprocessing to obtain flow characteristics, analyzing predicted flow characteristics from real pcap flow packet data, aggregating and counting the flow characteristics at continuous moments according to a time window to obtain corresponding real values of each time interval, and forming corresponding characteristic vectors;
the method comprises the steps of utilizing a plurality of characteristic vectors which are continuously arranged according to a time positive sequence to form a sample, utilizing the data length of the sample, namely the sequence data length input by a network, utilizing a plurality of samples to form a training set and a testing set, and training an LSTM prediction model constructed based on the LSTM based on the training set and the testing set until the LSTM prediction model converges.
The specific steps for training the LSTM prediction model can refer to the prior art, and are not further described herein.
After the training of the LSTM prediction model is completed, the LSTM prediction model can be used for prediction.
In step S2, a segment of the history data is selected for prediction, and to ensure the effect of prediction, it is preferable to use history data including the latest segment. Therefore, optionally, in step S2, selecting a flow characteristic corresponding to a piece of historical data to form sequence data, inputting an LSTM prediction model for predicting a flow characteristic, predicting a flow characteristic corresponding to data to be detected, and outputting a prediction result, where the step S includes:
determining the length of the sequence data, selecting flow characteristics corresponding to historical data, forming a group of characteristic vector groups with continuous time intervals and latest time interval flow characteristics, inputting an LSTM prediction model for predicting the flow characteristics, and predicting the flow characteristics of the next time interval by the LSTM prediction model; the latest time interval and the next time interval are adjacent in time sequence, and the next time interval is the time interval corresponding to the data to be detected;
and outputting the prediction result of the LSTM prediction model on the flow characteristics of the next period, wherein the prediction result comprises the corresponding prediction value of each flow characteristic.
In a preferred embodiment, the time window in the deep learning based industrial control network traffic anomaly detection method may be set to 1 minute, that is, traffic features at consecutive times are aggregated and counted according to the time window of 1 minute, so that each minute corresponds to one feature vector. The length of the feature vector is 9, and the feature vector corresponds to nine analyzed features respectively. The LSTM prediction model is used to input 10 minutes of feature vector sequence data and predict the 11 th minute of feature vectors. The shape of the input data of the LSTM prediction model is (Batch, 10, 9), the output shape is (Batch, 9), and Batch is the number of input LSTM data per Batch during training.
Optionally, when a sample is formed by using a plurality of feature vectors that are continuously arranged in a time positive sequence, a mean variance normalization process is performed on data values of the feature vectors, and the data values are converted into data with a mean value of 0 and a variance of 1, so as to reduce the amount of calculation and accelerate the convergence speed of the model. When the LSTM prediction model is trained until the LSTM prediction model converges, the loss function is set to be a mean square loss function MSE, the optimizer is Adam, and the BatchSize (namely the size of Batch) is set to be 64. Further, a specific LSTM prediction model has a network structure as shown in table 1, which includes an LSTM layer, a sense (full link) layer, and a batch normalization layer, and uses a PReLU as an activation function. The design enables the network to catch sequence characteristics in data, the capacity of the network is improved, meanwhile, the risk of overfitting and gradient problems is reduced, the network convergence speed is accelerated, and the None representation model in the table 1 does not make requirements on the data volume input by 1 Batch at the same time.
TABLE 1 LSTM prediction model network architecture
Figure 190020DEST_PATH_IMAGE013
Optionally, in step S3, calculating a dynamic threshold for anomaly detection based on historical data, and detecting whether a network traffic anomaly occurs in the time period by combining the prediction result and a real traffic feature corresponding to the data to be detected, including:
s3-1, calling real values and predicted values of all flow characteristics corresponding to historical data;
step S3-2, calculating a dynamic threshold for detection by:
Figure 592183DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 995482DEST_PATH_IMAGE002
the dynamic threshold corresponding to the ith flow characteristic is represented, i represents the number of the flow characteristics, the value of i is at least 1 to 9, if other flow characteristics are added on the basis of the nine flow characteristics, the value of i is correspondingly increased,
Figure 988846DEST_PATH_IMAGE003
representing a historical data set consisting of true values of the ith flow characteristic,
Figure 212017DEST_PATH_IMAGE004
representing a historical data set consisting of predicted values of the ith flow characteristic,
Figure 949029DEST_PATH_IMAGE003
and
Figure 206835DEST_PATH_IMAGE004
all lengths being n, i.e.
Figure 839941DEST_PATH_IMAGE003
And
Figure 815988DEST_PATH_IMAGE004
each comprises n data;
step S3-3
Figure 91111DEST_PATH_IMAGE005
Representing the true value of the ith flow characteristic over a period of time,
Figure 203424DEST_PATH_IMAGE006
representing the predicted value of the ith flow characteristic in the prediction result of the corresponding time interval obtained in the step S2, and combining the prediction result with the real data of the next time interval, if all i satisfy
Figure 538590DEST_PATH_IMAGE007
Then, it is considered that the network traffic abnormality occurs in the period.
n represents that feature vector data of n continuous time periods are involved in calculation when the dynamic threshold is calculated, if the value of n is too small, the dynamic threshold is too sensitive, and if the value of n is too large, the dynamic threshold is too slow, and the two conditions are unfavorable for the result.
Because the LSTM prediction model captures and predicts a trend rule from input feature vector sequence data of a certain length, where the length is a basic unit of the trend rule, the dynamic threshold is reasonable volatility for capturing features from history data of a certain length, and both the lengths are used for capturing history rules and serving for anomaly detection, optionally, in order to ensure the reasonability of the dynamic threshold obtained by calculation, when selecting history data, the value of n is preferably an integral multiple of the length of input sequence data (i.e., sequence data formed by a plurality of feature vectors) of the LSTM prediction model, and the integral multiple is not too large. Further, the value range of n can be 3-5 times of the length of the input sequence data of the LSTM prediction model.
Further, the air conditioner is provided with a fan,
Figure 736353DEST_PATH_IMAGE003
and
Figure 815168DEST_PATH_IMAGE004
preferably from continuous historical data, and
Figure 781987DEST_PATH_IMAGE003
and
Figure 22475DEST_PATH_IMAGE004
the time period corresponding to the latest historical data in the prediction is immediately adjacent to the next time period corresponding to the prediction result, that is,
Figure 973114DEST_PATH_IMAGE003
including the actual value of the flow characteristic for the most recent time period in the historical data,
Figure 590040DEST_PATH_IMAGE004
the method comprises the step of including a predicted value of the flow characteristic of the latest time period in the historical data so as to ensure the timeliness of the dynamic threshold.
Optionally, in step S4, storing the flow characteristic data and periodically training a new LSTM prediction model, and if the new LSTM prediction model is better than the LSTM prediction model currently used for predicting the flow characteristic, updating the LSTM prediction model currently used for predicting the flow characteristic, including:
s4-1, when the traffic characteristic data are stored, if the detection in the S3 shows that the network traffic abnormality does not occur in the current time period, the true value of each traffic characteristic in the current time period is stored;
if the detection of the step S3 shows that the network flow is abnormal in the current time period, storing the predicted value of each flow characteristic in the current time period, and collecting data at continuous time while ensuring that abnormal data are not added;
step S4-2, when the stored data reaches the data volume of the migration training, the stored data is used for on-line learning to train a new LSTM prediction model;
after the new LSTM prediction model is trained, the new LSTM prediction model and the LSTM prediction model used for predicting the flow characteristics at present are checked by using the historical data of the network flow, and the prediction accuracy of the two LSTM prediction models is scored respectively;
if the score of the new LSTM prediction model is higher than the score of the LSTM prediction model currently used to predict the traffic characteristics, the LSTM prediction model that is then used to predict the traffic characteristics is updated to the new LSTM prediction model.
The specific data volume required by the migration training can be set according to the actual time period of the industrial control network operation and the network flow change condition, and is not further limited here.
Further, in step S4-2, when the new LSTM prediction model and the LSTM prediction model currently used for predicting the traffic characteristics are verified by using the historical data of the network traffic, the used historical data of the network traffic is the actually obtained true value of the traffic characteristics, and the corresponding time period thereof is preferably continuous data which is close to the current time period and is homologous with the training set.
When the prediction accuracy of the two LSTM prediction models is evaluated, the flow characteristics corresponding to the same historical data are input into the two LSTM prediction models respectively for flow characteristic prediction, and the difference between the predicted value and the true value of each flow characteristic is weighted and evaluated, wherein the expression is as follows:
Figure 676944DEST_PATH_IMAGE008
wherein i represents the ith set of data for a flow characteristic,
Figure 88334DEST_PATH_IMAGE009
the actual value is represented by the value of,
Figure 995110DEST_PATH_IMAGE010
the average value of the true values is represented,
Figure 415727DEST_PATH_IMAGE011
denotes the predicted value, n denotes the total number of data for one flow rate characteristic, R2_ score denotes the flow rate characteristicCharacterizing a corresponding accuracy score;
according to the expression, each flow characteristic can obtain a corresponding R2_ score and R2_ score set;
calculating a weighted average of R2_ score sets corresponding to all flow characteristics, wherein the weight is the variance of R2_ score of each flow characteristic in the R2_ score set, so as to obtain a final model score, and the model score is high and has high representative accuracy;
and judging whether the new LSTM prediction model is superior to the current LSTM prediction model for predicting the flow characteristics or not through the model score, and if the model score of the new LSTM prediction model is higher than the model score of the current LSTM prediction model for predicting the flow characteristics, indicating that the new LSTM prediction model is superior to the current LSTM prediction model for predicting the flow characteristics.
As shown in fig. 3, the present invention further provides an industrial control network traffic anomaly detection apparatus based on deep learning, which includes a preprocessing module 100, a traffic prediction module 200, an anomaly detection module 300, and an online learning module 400. Specifically, wherein:
the preprocessing module 100 is configured to obtain data to be detected and historical data of network traffic and perform preprocessing to obtain corresponding traffic characteristics;
the flow prediction module 200 is configured to select flow characteristics corresponding to a segment of historical data to form sequence data, input an LSTM prediction model for predicting the flow characteristics, predict the flow characteristics corresponding to the data to be detected, and output a prediction result; the LSTM prediction model at least predicts the following flow characteristics, wherein the flow characteristics comprise flow duration, the number of forward packets, the number of reverse packets, the total number of bytes of the forward packets, the total number of bytes of the reverse packets, the total number of bytes of forward packet headers, the total number of bytes of reverse packet headers, the total number of bytes of forward sub-streams and the total number of bytes of reverse sub-streams;
the anomaly detection module 300 is configured to calculate a dynamic threshold for anomaly detection based on historical data, and detect whether a network traffic anomaly occurs in a time period corresponding to data to be detected in combination with a prediction result and a real traffic feature corresponding to the data to be detected;
the online learning module 400 is used to store the traffic characteristic data and periodically train a new LSTM prediction model, and if the new LSTM prediction model is better than the LSTM prediction model currently used to predict the traffic characteristic, the LSTM prediction model currently used to predict the traffic characteristic is updated.
Optionally, the LSTM prediction model for predicting the flow characteristics is constructed by:
preprocessing real network flow data to obtain flow characteristics; the method comprises the steps of preprocessing to obtain flow characteristics, analyzing predicted flow characteristics from real pcap flow packet data, aggregating and counting the flow characteristics at continuous moments according to a time window to obtain corresponding real values of each time period, and forming corresponding characteristic vectors;
the method comprises the steps of utilizing a plurality of characteristic vectors which are continuously arranged according to a time positive sequence to form a sample, utilizing the plurality of samples to form a training set and a testing set, and training an LSTM prediction model which is constructed based on the LSTM based on the training set and the testing set until the LSTM prediction model converges.
Optionally, the flow prediction module 200 is configured to select a flow feature corresponding to a section of historical data to form sequence data, input an LSTM prediction model for predicting the flow feature, predict the flow feature corresponding to the data to be detected, and output a prediction result, and includes:
determining the length of the sequence data, selecting flow characteristics corresponding to historical data, forming a group of characteristic vector groups with continuous time intervals and latest time interval flow characteristics, inputting an LSTM prediction model for predicting the flow characteristics, and predicting the flow characteristics of the next time interval by the LSTM prediction model; the latest time interval and the next time interval are adjacent in time sequence, and the next time interval is the time interval corresponding to the data to be detected;
and outputting the prediction result of the LSTM prediction model on the flow characteristics of the next period, wherein the prediction result comprises the corresponding prediction value of each flow characteristic.
Optionally, the anomaly detection module 300 is configured to calculate a dynamic threshold for anomaly detection based on historical data, and detect whether a network traffic anomaly occurs in the time period by combining a prediction result and a real traffic feature corresponding to data to be detected, where the detection includes:
calling real values and predicted values of all flow characteristics corresponding to the historical data;
the dynamic threshold is calculated by:
Figure 357139DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 205009DEST_PATH_IMAGE002
indicating a dynamic threshold corresponding to the ith flow characteristic,
Figure 864660DEST_PATH_IMAGE003
representing a historical data set consisting of true values of the ith flow characteristic,
Figure 823389DEST_PATH_IMAGE004
representing a historical data set consisting of predicted values of the ith flow characteristic,
Figure 619307DEST_PATH_IMAGE003
and
Figure 638078DEST_PATH_IMAGE004
the lengths are all n;
is provided with
Figure 519447DEST_PATH_IMAGE005
Representing the true value of the ith flow characteristic over a period of time,
Figure 281866DEST_PATH_IMAGE006
representing the predicted value of the ith flow characteristic in the predicted result of the corresponding time interval, combining the predicted result with the real data of the next time interval, and if all i satisfy
Figure 197870DEST_PATH_IMAGE007
Then, it is considered that the network traffic abnormality occurs in the period.
Alternatively, of nThe value is integral multiple of the length of the input sequence data of the LSTM prediction model;
Figure 387543DEST_PATH_IMAGE003
and
Figure 21786DEST_PATH_IMAGE004
is composed of continuous history data, and
Figure 68457DEST_PATH_IMAGE003
including the actual value of the flow characteristic for the latest time period,
Figure 838967DEST_PATH_IMAGE004
the method comprises the steps of including a predicted value of the flow characteristic of the latest time interval, wherein the latest time interval is adjacent to the time interval corresponding to the data to be predicted in time sequence.
Optionally, the online learning module 400 is configured to store the traffic characteristic data and periodically train a new LSTM prediction model, and if the new LSTM prediction model is better than the LSTM prediction model currently used for predicting the traffic characteristic, update the LSTM prediction model currently used for predicting the traffic characteristic, including:
when the flow characteristic data is stored, if the network flow abnormity does not occur in the time period after the detection, the true value of each flow characteristic in the time period is stored; if the network flow abnormity occurs in the period of time after detection, storing the predicted value of each flow characteristic in the period of time;
when the stored data reach the data volume of the migration training, the stored data are used for on-line learning to train a new LSTM prediction model; after the new LSTM prediction model is trained, the new LSTM prediction model and the LSTM prediction model used for predicting the flow characteristics at present are checked by using the historical data of the network flow, and the prediction accuracy of the two LSTM prediction models is scored respectively; if the score of the new LSTM prediction model is higher than the score of the LSTM prediction model currently used to predict the traffic characteristics, the LSTM prediction model that is then used to predict the traffic characteristics is updated to the new LSTM prediction model.
Further, when the new LSTM prediction model and the LSTM prediction model used for predicting the flow characteristics at present are checked by using the historical data of the network flow, the adopted historical data of the network flow is an actual value of the flow characteristics which is actually obtained;
when the prediction accuracy of the two LSTM prediction models is evaluated, the flow characteristics corresponding to the same historical data are input into the two LSTM prediction models respectively for flow characteristic prediction, and the difference between the predicted value and the true value of each flow characteristic is weighted and evaluated, wherein the expression is as follows:
Figure 199541DEST_PATH_IMAGE008
wherein i represents the ith set of data for a flow characteristic,
Figure 789923DEST_PATH_IMAGE009
the actual value is represented by the value of,
Figure 894145DEST_PATH_IMAGE010
the average value of the true values is represented,
Figure 784740DEST_PATH_IMAGE011
representing a predicted value, n represents the total data number of one flow characteristic, and R2_ score represents the accuracy score corresponding to the flow characteristic;
obtaining an R2_ score according to the expression and each flow characteristic, and adding the R2_ score into an R2_ score set;
calculating a weighted average of R2_ score sets corresponding to all flow characteristics, wherein the weight is the variance of R2_ score of each flow characteristic in the R2_ score set, so as to obtain a final model score, and the model score is high and has high representative accuracy;
and judging whether the new LSTM prediction model is superior to the LSTM prediction model currently used for predicting the flow characteristics or not through the model score.
The contents of information interaction, execution process and the like among the modules of the industrial control network flow anomaly detection method device based on deep learning are based on the same concept as the method embodiment of the invention, and specific contents can be referred to the description in the method embodiment of the invention, and are not described again here.
In the above embodiments, the hardware module may be implemented mechanically or electrically. For example, a hardware module may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. A hardware module may also include programmable logic or circuitry (e.g., a general-purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.
Particularly, in some preferred embodiments of the present invention, there is further provided a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the deep learning based industrial control network traffic anomaly detection method in any one of the above embodiments when executing the computer program.
In other preferred embodiments of the present invention, a computer-readable storage medium is further provided, where a computer program is stored, and when being executed by a processor, the computer program implements the steps of the deep learning-based industrial control network traffic anomaly detection method described in any one of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes in the method for implementing the embodiments described above may be implemented by a computer program, and the computer program may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes in the above embodiments of the deep learning-based industrial control network traffic anomaly detection method, and will not be described again here.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A deep learning-based industrial control network flow anomaly detection method is characterized by comprising the following steps:
s1, acquiring to-be-detected data and historical data of network traffic and preprocessing the to-be-detected data and the historical data to obtain corresponding traffic characteristics;
s2, selecting flow characteristics corresponding to a section of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristics, predicting the flow characteristics corresponding to the data to be detected, and outputting a prediction result;
the LSTM prediction model at least predicts the flow characteristics, wherein the flow characteristics comprise flow duration, the number of forward packets, the number of reverse packets, the total number of bytes of the forward packets, the total number of bytes of the reverse packets, the total number of bytes of forward packet headers, the total number of bytes of reverse packet headers, the total number of bytes of forward sub-streams and the total number of bytes of reverse sub-streams;
step S3, calculating a dynamic threshold for anomaly detection based on the historical data, and detecting whether a network traffic anomaly occurs in a time period corresponding to the data to be detected in combination with the prediction result and the real traffic characteristics corresponding to the data to be detected, including:
s3-1, calling real values and predicted values of all flow characteristics corresponding to historical data;
step S3-2, calculating a dynamic threshold by:
Figure FDA0003333522650000011
wherein, threshiIndicating a dynamic threshold corresponding to the ith flow characteristic,
Figure FDA0003333522650000012
indicating the characteristics of the ith flowA historical data set of true values of tokens,
Figure FDA0003333522650000013
representing a historical data set consisting of predicted values of the ith flow characteristic,
Figure FDA0003333522650000014
and
Figure FDA0003333522650000015
the lengths are all n;
step S3-3, setting t _ nowiRepresenting the true value of the ith flow characteristic, p _ now, over a period of timeiIndicating the predicted value of the ith flow characteristic in the predicted result of the corresponding time period, if all i satisfy | t _ nowi-p_nowi|>threshiIf the network flow abnormality is detected in the time period, the network flow abnormality is considered to occur;
and step S4, storing the flow characteristic data and training a new LSTM prediction model periodically, and if the new LSTM prediction model is superior to the LSTM prediction model used for predicting the flow characteristic currently, updating the LSTM prediction model used for predicting the flow characteristic currently.
2. The industrial control network traffic anomaly detection method based on deep learning according to claim 1,
the LSTM prediction model for predicting the flow characteristics is constructed by the following method:
preprocessing real network flow data to obtain flow characteristics; the method comprises the steps of preprocessing to obtain flow characteristics, analyzing predicted flow characteristics from real pcap flow packet data, aggregating and counting the flow characteristics at continuous moments according to a time window to obtain corresponding real values of each time period, and forming corresponding characteristic vectors;
the method comprises the steps of utilizing a plurality of characteristic vectors which are continuously arranged according to a time positive sequence to form a sample, utilizing the plurality of samples to form a training set and a testing set, and training an LSTM prediction model which is constructed based on the LSTM based on the training set and the testing set until the LSTM prediction model converges.
3. The industrial control network traffic anomaly detection method based on deep learning according to claim 1,
in step S2, selecting a flow characteristic corresponding to a segment of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristic, predicting the flow characteristic corresponding to the data to be detected, and outputting a prediction result, including:
determining the length of the sequence data, selecting flow characteristics corresponding to historical data, forming a group of characteristic vector groups with continuous time intervals and latest time interval flow characteristics, inputting an LSTM prediction model for predicting the flow characteristics, and predicting the flow characteristics of the next time interval by the LSTM prediction model; the latest time interval and the next time interval are adjacent in time sequence, and the next time interval is the time interval corresponding to the data to be detected;
and outputting the prediction result of the LSTM prediction model on the flow characteristics of the next period, wherein the prediction result comprises the corresponding prediction value of each flow characteristic.
4. The industrial control network traffic anomaly detection method based on deep learning according to claim 1, characterized in that:
in the step S3-2, the value of n is integral multiple of the length of the input sequence data of the LSTM prediction model;
Figure FDA0003333522650000031
and
Figure FDA0003333522650000032
is composed of continuous history data, and
Figure FDA0003333522650000033
including the actual value of the flow characteristic for the latest time period,
Figure FDA0003333522650000034
the method comprises the steps of including a predicted value of the flow characteristic of the latest time interval, wherein the latest time interval is adjacent to the time interval corresponding to the data to be predicted in time sequence.
5. The industrial control network traffic anomaly detection method based on deep learning according to claim 1,
in step S4, storing the flow characteristic data and periodically training a new LSTM prediction model, and if the new LSTM prediction model is better than the LSTM prediction model currently used for predicting the flow characteristic, updating the LSTM prediction model currently used for predicting the flow characteristic, including:
step S4-1, when storing the flow characteristic data, if the network flow abnormity does not occur in the time interval after the detection, storing the true value of each flow characteristic in the time interval;
if the network flow abnormity occurs in the period of time after detection, storing the predicted value of each flow characteristic in the period of time;
step S4-2, when the stored data reaches the data volume of the migration training, the stored data is used for on-line learning to train a new LSTM prediction model;
after the new LSTM prediction model is trained, the new LSTM prediction model and the LSTM prediction model used for predicting the flow characteristics at present are checked by using the historical data of the network flow, and the prediction accuracy of the two LSTM prediction models is scored respectively;
if the score of the new LSTM prediction model is higher than that of the LSTM prediction model used for predicting the flow characteristics at present, updating the LSTM prediction model used for predicting the flow characteristics into the new LSTM prediction model;
in step S4-2, when the new LSTM prediction model and the LSTM prediction model currently used for predicting the traffic characteristics are checked by using the historical data of the network traffic, the used historical data of the network traffic is an actually obtained true value;
when the prediction accuracy of the two LSTM prediction models is evaluated, the flow characteristics corresponding to the same historical data are input into the two LSTM prediction models respectively for flow characteristic prediction, and the difference between the predicted value and the true value of each flow characteristic is weighted and evaluated, wherein the expression is as follows:
Figure FDA0003333522650000041
where i denotes the ith set of data for a flow characteristic, yiRepresenting true value, ymeanMean value, p, representing true valueiRepresenting a predicted value, n represents the total data number of one flow characteristic, and R2_ score represents the accuracy score corresponding to the flow characteristic;
obtaining an R2_ score according to the expression and each flow characteristic, and adding the R2_ score into an R2_ score set;
calculating a weighted average of R2_ score sets corresponding to all flow characteristics, wherein the weight is the variance of R2_ score of each flow characteristic in the R2_ score set, so as to obtain a final model score, and the model score is high and has high representative accuracy;
and judging whether the new LSTM prediction model is superior to the LSTM prediction model currently used for predicting the flow characteristics or not through the model score.
6. The utility model provides an industrial control network flow anomaly detection device based on deep learning which characterized in that includes:
the preprocessing module is used for acquiring to-be-detected data and historical data of network traffic and preprocessing the to-be-detected data and the historical data to obtain corresponding traffic characteristics;
the flow prediction module is used for selecting flow characteristics corresponding to a section of historical data to form sequence data, inputting an LSTM prediction model for predicting the flow characteristics, predicting the flow characteristics corresponding to the data to be detected and outputting a prediction result;
the LSTM prediction model at least predicts the flow characteristics, wherein the flow characteristics comprise flow duration, the number of forward packets, the number of reverse packets, the total number of bytes of the forward packets, the total number of bytes of the reverse packets, the total number of bytes of forward packet headers, the total number of bytes of reverse packet headers, the total number of bytes of forward sub-streams and the total number of bytes of reverse sub-streams;
the anomaly detection module is used for calculating a dynamic threshold value for anomaly detection based on historical data, and detecting whether network traffic anomaly occurs in a time period corresponding to the data to be detected or not by combining the prediction result and the real traffic characteristics corresponding to the data to be detected, and comprises the following steps:
calling real values and predicted values of all flow characteristics corresponding to the historical data;
the dynamic threshold is calculated by:
Figure FDA0003333522650000051
wherein, threshiIndicating a dynamic threshold corresponding to the ith flow characteristic,
Figure FDA0003333522650000052
representing a historical data set consisting of true values of the ith flow characteristic,
Figure FDA0003333522650000053
representing a historical data set consisting of predicted values of the ith flow characteristic,
Figure FDA0003333522650000054
and
Figure FDA0003333522650000055
the lengths are all n;
let t _ nowiActual value, p _ now, representing the ith flow characteristic over a period of timeiRepresenting the predicted value of the ith flow characteristic in the prediction result of the corresponding time interval, combining the prediction result with the real data of the next time interval, and if all i satisfy | t _ nowi-p_nowi|>threshiIf the network flow abnormality is detected in the time period, the network flow abnormality is considered to occur;
and the online learning module is used for storing the flow characteristic data and training a new LSTM prediction model periodically, and if the new LSTM prediction model is superior to the LSTM prediction model used for predicting the flow characteristic currently, the LSTM prediction model used for predicting the flow characteristic currently is updated.
7. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the deep learning-based industrial control network traffic anomaly detection method according to any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the deep learning-based industrial control network traffic anomaly detection method according to any one of claims 1 to 5.
CN202110605897.0A 2021-06-01 2021-06-01 Industrial control network flow abnormity detection method and device based on deep learning Active CN113162811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110605897.0A CN113162811B (en) 2021-06-01 2021-06-01 Industrial control network flow abnormity detection method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110605897.0A CN113162811B (en) 2021-06-01 2021-06-01 Industrial control network flow abnormity detection method and device based on deep learning

Publications (2)

Publication Number Publication Date
CN113162811A CN113162811A (en) 2021-07-23
CN113162811B true CN113162811B (en) 2021-12-28

Family

ID=76875564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110605897.0A Active CN113162811B (en) 2021-06-01 2021-06-01 Industrial control network flow abnormity detection method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN113162811B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568774B (en) * 2021-07-27 2024-01-16 东华大学 Multi-dimensional time sequence data real-time abnormality detection method using unsupervised deep neural network
CN113347384B (en) * 2021-08-06 2021-11-05 北京电信易通信息技术股份有限公司 Video conference flow prediction method and system based on time sequence representation learning
CN113505857B (en) * 2021-08-06 2023-06-27 红云红河烟草(集团)有限责任公司 Data anomaly detection method for real-time data acquisition of cigarettes
CN113949653B (en) * 2021-10-18 2023-07-07 中铁二院工程集团有限责任公司 Encryption protocol identification method and system based on deep learning
CN115589310A (en) * 2022-09-23 2023-01-10 中国电信股份有限公司 Attack detection method, device and related equipment
CN116708030A (en) * 2023-08-04 2023-09-05 浙江大学 Industrial edge computing gateway and protocol flow monitoring method and device thereof

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101621019B1 (en) * 2015-01-28 2016-05-13 한국인터넷진흥원 Method for detecting attack suspected anomal event
CN108900546A (en) * 2018-08-13 2018-11-27 杭州安恒信息技术股份有限公司 The method and apparatus of time series Network anomaly detection based on LSTM
CN109886833B (en) * 2019-01-21 2023-01-17 广东电网有限责任公司信息中心 Deep learning method for smart grid server flow anomaly detection
CN109934337B (en) * 2019-03-14 2020-12-25 哈尔滨工业大学 Spacecraft telemetry data anomaly detection method based on integrated LSTM
CN110381524B (en) * 2019-07-15 2022-12-20 安徽理工大学 Bi-LSTM-based large scene mobile flow online prediction method, system and storage medium
CN112529234A (en) * 2019-09-18 2021-03-19 上海交通大学 Surface water quality prediction method based on deep learning
US11496495B2 (en) * 2019-10-25 2022-11-08 Cognizant Technology Solutions India Pvt. Ltd. System and a method for detecting anomalous patterns in a network
CN111798051B (en) * 2020-07-02 2023-11-10 杭州电子科技大学 Air quality space-time prediction method based on long-term and short-term memory neural network
CN112132324A (en) * 2020-08-26 2020-12-25 浙江工业大学 Ultrasonic water meter data restoration method based on deep learning model
CN112202736B (en) * 2020-09-15 2021-07-06 浙江大学 Communication network anomaly classification method based on statistical learning and deep learning
CN112906982A (en) * 2021-03-22 2021-06-04 哈尔滨理工大学 GNN-LSTM combination-based network flow prediction method
CN113343587A (en) * 2021-07-01 2021-09-03 国网湖南省电力有限公司 Flow abnormity detection method for electric power industrial control network

Also Published As

Publication number Publication date
CN113162811A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113162811B (en) Industrial control network flow abnormity detection method and device based on deep learning
CN109492193B (en) Abnormal network data generation and prediction method based on deep machine learning model
CN112858919B (en) Battery system online fault diagnosis method and system based on cluster analysis
CN111444953B (en) Sensor fault monitoring method based on improved particle swarm optimization algorithm
WO2020077672A1 (en) Method and device for training service quality evaluation model
CN109407654B (en) Industrial data nonlinear causal analysis method based on sparse deep neural network
CN107423433A (en) A kind of data sampling rate control method and device
CN109033513B (en) Power transformer fault diagnosis method and power transformer fault diagnosis device
CN107247653A (en) A kind of Fault Classification and device of data center's monitoring system
CN110362772B (en) Real-time webpage quality evaluation method and system based on deep neural network
CN107679715A (en) A kind of electric energy meter comprehensive error process merit rating method and evaluation system based on SPC
CN112836720B (en) Building operation and maintenance equipment abnormality diagnosis method, system and computer readable storage medium
CN112505570A (en) Method for estimating battery health state of electric automobile
CN110986407A (en) Fault diagnosis method for centrifugal water chilling unit
CN114036647A (en) Power battery safety risk assessment method based on real vehicle data
Zhong et al. Anomaly detection and sampling cost control via hierarchical GANs
CN112491627A (en) Network quality real-time analysis method and device
CN117156442A (en) Cloud data security protection method and system based on 5G network
Thi et al. One-class collective anomaly detection based on long short-term memory recurrent neural networks
Altmemi et al. A New Approach Based on Intelligent Method to Classify Quality of Service
CN115935814A (en) Transformer fault prediction method based on ARIMA-SVM model
CN114840581A (en) Method and device for generating dynamic threshold for equipment early warning based on statistical model
CN115829089A (en) Load composition analysis method, device and equipment
CN111866017B (en) Method and device for detecting abnormal frame interval of CAN bus
CN114416410A (en) Anomaly analysis method and device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 01, floor 1, building 104, No. 3 minzhuang Road, Haidian District, Beijing 100195

Patentee after: Changyang Technology (Beijing) Co.,Ltd.

Address before: 100195 room 01, 2 / F, building 103, 3 minzhuang Road, Haidian District, Beijing

Patentee before: CHANGYANG TECH (BEIJING) Co.,Ltd.

CP03 Change of name, title or address