CN114124447A - Intrusion detection method and device based on Modbus data packet recombination - Google Patents

Intrusion detection method and device based on Modbus data packet recombination Download PDF

Info

Publication number
CN114124447A
CN114124447A CN202111187044.6A CN202111187044A CN114124447A CN 114124447 A CN114124447 A CN 114124447A CN 202111187044 A CN202111187044 A CN 202111187044A CN 114124447 A CN114124447 A CN 114124447A
Authority
CN
China
Prior art keywords
data
intrusion detection
modbus
convolution
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111187044.6A
Other languages
Chinese (zh)
Other versions
CN114124447B (en
Inventor
尹微皓
郑秋华
胡海忠
夏帅凡
叶楷
张旭
翟亮
吴铤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111187044.6A priority Critical patent/CN114124447B/en
Publication of CN114124447A publication Critical patent/CN114124447A/en
Application granted granted Critical
Publication of CN114124447B publication Critical patent/CN114124447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L2012/40208Bus networks characterized by the use of a particular bus standard
    • H04L2012/40228Modbus

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an intrusion detection method and device based on Modbus data packet reassembly. Performing head-to-tail splicing on Modbus flow data of the industrial control system extracted from a database or a log file according to types and time to obtain recombined data; then carrying out feature screening and data graphical processing on the data; and carrying out intrusion detection by using an intrusion detection model based on the convolution deep neural network. The invention considers the unique business logic relation in the industrial control system, realizes the extraction of the context information in the industrial network data by packet recombination and a mode of further screening the characteristic data, and can more fully utilize the correlation among Modbus flow data packets compared with the traditional flow detection method of the industrial control system.

Description

Intrusion detection method and device based on Modbus data packet recombination
Technical Field
The invention belongs to the technical field of network space security intrusion detection, and particularly relates to an intrusion detection method and device based on Modbus data packet reassembly.
Background
With the coming of the industrial 4.0 era, the industrial internet is rapidly developed, and industrial equipment is continuously connected into the internet environment, so that the physical isolation between an industrial control system and the outside is broken, and the security threat faced by the industrial control system is continuously increased. There are generally three security mechanisms for industrial control systems, namely firewalls, intrusion detection techniques, and encryption techniques. Different from other safety protection mechanisms, the intrusion detection technology is an active defense technology adopted for the flow, the protocol and the host of the system, can monitor the work of the system in real time and sense the abnormal behavior of the system, and has the functions of data analysis and alarm. The system can meet the integral protection of an industrial control system and the requirement of a safety strategy; the method can realize internal protection of the industrial control system, can resist external attack, and is widely applied to the field of industrial control safety. Compared with the common internet environment, the industrial environment generally has the characteristics of special industrial communication protocol, close context relation, fixed communication mode and the like. The existing intrusion detection technology of the industrial control system mainly aims at detecting a single data packet, cannot fully utilize context information among the data packets, lacks understanding of business logic relation in the industrial control system and causes overhigh omission ratio. And the semantic understanding and utilization of the control instructions are lacked, and the behavior of attacks against the industrial process cannot be effectively detected.
In an industrial system, an obvious causal relationship exists between an instruction sent by an operator and the state of the industrial system, and the causal relationship is embodied between an instruction packet and a state query packet. Therefore, a method based on data packet reorganization is provided, context information reflecting the causal relationship is extracted from a plurality of data packets, and then a deep learning model is designed on the basis to detect the intrusion behavior in the Modbus network.
Disclosure of Invention
The invention aims to provide an industrial control system intrusion detection method based on Modbus data packet recombination, aiming at the characteristics of an industrial internet environment.
The technical scheme adopted by the invention for solving the technical problems is as follows:
an intrusion detection method based on Modbus data packet recombination comprises the following steps:
step 1, obtaining a Modbus data packet;
acquiring four types of Modbus data packets including command reading, response reading, command writing and response writing in a Modbus network flow packet in an industrial control system;
step 2, Modbus data packet recombination and feature extraction;
performing head-to-tail splicing on the four types of Modbus data packets according to the sequence of command reading, response reading, command writing and response writing and the time sequence to obtain recombined data; dividing the recombined data by taking 8 bits as a unit according to the characteristics of a Modbus frame structure to obtain a plurality of characteristics;
step 3, characteristic screening;
because the recombined data contains a lot of redundant information, in order to remove unnecessary information and further improve the accuracy of detection, feature screening is needed;
calculating information gains of all the characteristics obtained by dividing in the step 2, sequencing the information gains from large to small, and reserving the characteristics corresponding to the first N information gains; the information gain is used for measuring the correlation degree of the features and the classification labels so as to judge the importance of the features; n is an artificial definition parameter;
3-1, calculating the information entropy of each feature, wherein the information entropy is the measure for measuring the degree of disorder, and the calculation formula is shown in formula (1):
Figure BDA0003299687200000021
wherein xi,jRepresents a feature XiThe jth value of (1), XiFor all value cases of the ith feature, P (x)i,j) The expression characteristic takes the value of xi,jProbability of (A), H (X)i) Information entropy of the ith feature;
3-2, calculating the conditional entropy of each feature, wherein the conditional entropy is the information entropy under a certain condition, and the calculation formula is shown in formula (2):
Figure BDA0003299687200000022
wherein H (Y | X)i) Representing the uncertainty of Y in the selection of the ith feature, YiFor the ith classification category, Y represents a classification category set;
3-3 calculating the feature information gain by equation (3) according to the conditional entropy of each feature and combining the information entropy of the classification category:
IG(Xi)=H(Y)-H(Y|Xi) (3)
Figure BDA0003299687200000023
wherein IG (X)i) Information gain representing the ith feature, H (Y) being the information entropy of the classification category;
3-4, sorting the information gains of all the characteristics from large to small, and reserving the characteristics corresponding to the first N information gains;
and 4, data imaging processing: converting the characteristic data screened in the step 3 into a binary image by adopting the conventional technology;
after the four types of data packets are combined into one piece of data and the redundancy characteristic is removed, the original hexadecimal data stream needs to be converted into a binary stream, and then the binary stream is converted into a square RGB image; wherein 0 in the binary stream is replaced by RGB (255 ), and 1 in the binary stream is replaced by RGB (0,0, 0);
step 5, constructing a data set according to the binary image obtained in the step 4, and carrying out manual annotation; meanwhile, dividing the data set into a training set and a testing set; each type of Modbus data packet has two types of tags which are attack behaviors and normal behaviors respectively;
preferably, the tags of the data set adopt 4-bit binary digits, and the digits from the highest bit to the lowest bit represent tags of four types of Modbus data packets, namely command read, response read, command write and response write.
Step 6, establishing an intrusion detection model based on a convolution depth neural network, and training by utilizing a training set;
the intrusion detection model based on the convolution depth neural network comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a third convolution layer, a first linear full-connection layer, a second linear full-connection layer and a third linear full-connection layer which are sequentially cascaded;
the input of the input layer is a binary image of a training set, and the input size is 3 image length and image width;
the number of convolution kernels, the size of the convolution kernels, the step length and the filling of the first convolution layer, the second convolution layer and the third convolution layer are adjusted according to the size of input data, and the relation between the input size and the output size is shown as a formula (4):
Figure BDA0003299687200000031
where O is the size of the output image, I is the size of the input image, K is the size of the convolution kernel, P is the padding size, and S is the convolution kernel move step.
The convolution results of the first convolution layer, the second convolution layer and the third convolution layer all use Relu functions as activation functions; the Relu function is shown in equation (5):
Relu(r)=max(0,r) (5)
wherein r is the convolution result value of the first convolution layer, the second convolution layer and the third convolution layer, and the max (0, r) function represents taking the maximum value of 0 and r.
The sizes of the pooling windows of the first maximum pooling layer and the second maximum pooling layer are set according to the size of input data;
the output of the third convolution layer needs to be flattened to form a one-dimensional tensor which is used as the input of the first linear full-connection layer;
and the first linear full-connection layer, the second linear full-connection layer and the third linear full-connection layer are used for linear operation and finally outputting a sixteen-dimensional vector.
Calculating the vector output by the third linear full-connection layer through a Softmax function to obtain a final classification result, wherein the formula of the Softmax function is shown in a formula (6);
Figure BDA0003299687200000041
wherein z isiAnd C is the output value of the ith node, and the number of output nodes, namely the number of classification categories.
An Adam optimizer is selected for optimization of the model, and the AdaGrad and RMSProp optimization algorithms are combined, so that the first moment estimation and the second moment estimation of the gradient can be comprehensively considered for updating the step length. The loss function is a category cross entropy function with a formula (7)
Figure BDA0003299687200000042
Wherein n is the number of samples, m is the number of classification categories, l is the actual output value of the model, i.e. the predicted value,
Figure BDA0003299687200000043
are true values.
The trained model is stored in a file so as to facilitate the extraction and calling of the model by a detection module.
Step 7, testing the trained intrusion detection model based on the convolution depth neural network by using a test set;
and 8, acquiring a Modbus network flow packet to be detected in the industrial control system from the network flow, the database or the log file, repeating the steps 2 to 4, and carrying out intrusion detection on the Modbus network flow packet by using a tested intrusion detection model based on the convolutional deep neural network.
The invention also aims to provide an intrusion detection system based on Modbus data packet reorganization, which comprises a data packet reorganization module, an intrusion detection model training module and a detection module;
the data packet recombination module is used for recombining the four Modbus data packets according to the service relationship in the industrial control system, and performing characteristic screening and data imaging processing on the recombined data;
the intrusion detection model training module is used for manually marking the data processed by the data packet restructuring module, dividing the data into a training set and a testing set, training an intrusion detection model based on a convolutional neural network by using the data in the training set, and optimizing the intrusion detection model based on the convolutional neural network by using the data in the testing set;
and the detection module is used for inputting unmarked Modbus data packets into a trained intrusion detection model based on a convolutional neural network after the data packets are processed by the data packet recombination module, and identifying whether the data packets are intrusion data according to detection results.
Compared with the existing intrusion detection method, the method has the remarkable advantages that:
(1) the invention considers the unique business logic relation in the industrial control system, realizes the extraction of the context information in the industrial network data by packet recombination and a mode of further screening the characteristic data, and can more fully utilize the correlation among Modbus flow data packets compared with the traditional flow detection method of the industrial control system.
(2) Compared with the traditional machine learning intrusion detection model, the deep learning network model (namely, the intrusion detection model based on the convolutional neural network) is adopted, so that the labor cost of expert feature engineering is saved, and deeper features can be automatically extracted from data.
Drawings
FIG. 1 is a flow chart of an intrusion detection method based on Modbus packet reassembly;
FIG. 2Modbus communication process diagram;
FIG. 3 is an exemplary diagram of a Modbus packet;
FIG. 4 is a diagram of a convolutional neural network architecture.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific examples.
The intrusion detection system of the invention mainly comprises a data packet reorganization module, an intrusion detection model training module and a detection module, as shown in figure 1.
(1) And the data packet recombination module is used for assembling the four related Modbus data packets together according to the time sequence, namely, the head data packet, the tail data packet and the tail data packet, and performing characteristic screening and imaging processing on the recombined data according to the service relation in the industrial control system.
(2) And the intrusion detection model training module divides the data which is processed by the data packet restructuring module and has labels into a training set and a testing set, trains the intrusion detection model based on the convolutional neural network by using the data in the training set, uses the data in the testing set as the input of the trained model, and uses the output result of the model as the basis of judgment to optimize the model. And finally, the optimally trained model is stored in a file for storage, so that subsequent calling is facilitated.
(3) And the detection module is used for inputting unmarked Modbus data packets to be detected into the trained intrusion detection model after being processed by the data packet recombination module to obtain detection results, and screening out suspicious data packets of attack behaviors according to the detection results.
The basic flow chart of the invention is shown in fig. 1, and comprises the following steps:
step 1, obtaining a Modbus data packet;
the Modbus protocol is the most widely used industrial network protocol. The protocol adopts a master-slave architecture, only one master node is arranged in an industrial control system, and the other slave nodes are arranged. Only the master node can initiate a request, while the slave nodes can only respond passively. In the Modbus protocol, a single packet contains only one function. According to the function type, the Modbus protocol data packet can be divided into four different types of data packets, namely command read, command write, response read and response write. The communication process of these four types of packets is shown in fig. 2.
The command reading data packet is request information which is sent by the client to the server to read the system state. The response write data packet is response information which is sent by the server to the client and confirms that the command is received. The response reading data packet contains the state information sent to the client system by the server side. The command write data packet contains instruction information issued by the client to the server. The same kind of data packets have the same function information and length. If the response reading data packets contain the state information of the equipment, the command writing data packets contain the command information issued by the operator to the equipment. An example of four types of packets is shown in fig. 3.
Step 2, Modbus data packet recombination and feature extraction
And according to the business relation in the industrial control system, recombining and splicing four data packets of command reading, response reading, command writing and response writing in Modbus network flow acquired from an industrial environment to form a new recombined data information. The specific recombination strategy is as follows:
(1) all data packets can be divided into four types of command reading data packets, command writing data packets, response reading data packets and response writing data packets according to the information carried by the data. There are two types of cases for each of the four types of packets, namely packets with attack data and packets with normal behavior.
(2) And splicing the four data packets end to end according to the sequence of command reading, response reading, command writing and response writing and the time sequence, and recombining the four data packets into a new data message.
Dividing the recombined data by taking 8 bits as a unit according to the structural characteristics of the Modbus frame to obtain a plurality of characteristics;
step 3, characteristic screening
The recombined data contains a lot of redundant information, and feature screening is needed to remove unnecessary information and further improve the accuracy of detection.
And (3) calculating information gains of all the characteristics obtained by dividing in the step (2), sequencing the information gains from large to small, and reserving the characteristics corresponding to the first N information gains. The information gain is used for measuring the correlation degree of the features and the classification labels, so that the importance of the features is judged.
3-1, calculating the information entropy of each feature, wherein the information entropy is the measure for measuring the degree of disorder, and the calculation formula is shown in formula (1):
Figure BDA0003299687200000071
wherein xi,jRepresents a feature XiThe jth value of (1), XiFor all value cases of the ith feature, P (x)i,j) The expression characteristic takes the value of xi,jThe probability of (i) is the information entropy of the ith feature.
3-2, calculating the conditional entropy of each feature, wherein the conditional entropy is the information entropy under a certain condition, and the calculation formula is shown in formula (2):
Figure BDA0003299687200000072
wherein H (Y | X)i) Indicating the uncertainty of Y when the ith feature was selected. y isiFor the ith classification category, Y represents a set of classification categories.
3-3 the characteristic information gain can be calculated by equation (3):
IG(Xi)=H(Y)-H(Y|Xi) (3)
Figure BDA0003299687200000073
wherein IG (X)i) The information gain of the ith feature is shown, and H (Y) is the information entropy of the classification category.
3-4, sorting the information gains of all the characteristics from large to small, and reserving the characteristics corresponding to the first N-40 information gains.
Step 4, converting the data subjected to the feature screening in the step 3 into a binary image;
after combining four Modbus packets into one piece of data and removing the redundancy feature, the original hexadecimal data stream needs to be converted into a binary stream, and then the binary stream is converted into a square RGB image, wherein 0 in the binary stream is replaced by RGB (255 ), and 1 in the binary stream is replaced by RGB (0,0, 0).
Step 5, constructing a data set according to the binary image obtained in the step 4, and carrying out manual annotation; meanwhile, dividing the data set into a training set and a testing set;
each of the four types of data packets has two types of conditions, namely a data packet with attack data and a data packet with normal behavior, the four types of data packets are combined to have sixteen different classification conditions, the manually marked label consists of 4-bit binary digits, and the digits from the highest bit to the lowest bit represent the results of the first data packet, the second data packet, the third data packet and the fourth data packet respectively. Each digit is binary, and a value of 0 indicates that the data packet is normal, and a value of 1 indicates that the data packet is an attack data packet. For example, label 0000, indicates that all four combined packets are normal; the label 0101 indicates that the second and fourth of the combined four packets are attack packets.
The constructed data set is randomly scrambled and proportionally divided into a training set and a test set.
Step 6, establishing an intrusion detection model based on a convolution depth neural network, and training by utilizing a training set;
the intrusion detection model based on the convolution depth neural network comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a third convolution layer, a first linear full-connection layer, a second linear full-connection layer and a third linear full-connection layer which are sequentially cascaded; the structure of the model is shown in fig. 4, and the detailed structure and parameter setting of the specific model are described below.
The first layer of the model is an input layer, the image data matrix is used as input, and the input size is 3 × 26;
the number of convolution kernels, the size of the convolution kernels, the step length and the filling of the first convolution layer, the second convolution layer and the third convolution layer are adjusted according to the size of input data, and the relation between the input size and the output size is shown as a formula (4):
Figure BDA0003299687200000081
where O is the size of the output image, I is the size of the input image, K is the size of the convolution kernel, P is the padding size, and S is the convolution kernel move step. The result after convolution uses Relu function as activation function; the Relu function is shown in equation (5):
Relu(r)=max(0,r) (5)
where r is the resulting value after convolution, the max (0, r) function will take the maximum of 0 and r.
The second layer of the model is a first convolution layer, the number of convolution kernels of the convolution layer is set to be 32, the size of the convolution kernels is set to be 3 x 3, the step length is set to be 1, and padding filling is set to be 1;
the third layer of the model is a first maximum pooling layer, and the size of a pooling window is set to be 2 x 2;
the fourth layer of the model is a second convolution layer, the number of convolution kernels is set to be 32, the size of the convolution kernels is set to be 5 x 5, the step length is set to be 1, and padding is set to be 2;
the fifth layer of the model is the second largest pooling layer, and the pooling windows are also 2 x 2;
the sixth layer of the model is a third convolution layer, the number of convolution kernels is set to 64, the size of the convolution kernels is set to 3 x 3, the step length is set to 1, and padding is set to 2;
the seventh layer of the model is a first linear full-connection layer, the input of the seventh layer of the model is a result obtained by the sixth layer of the model after output flattening, specifically a 3136-dimensional vector, and the output of the seventh layer of the model is a 1024-dimensional vector;
the eighth layer of the model is a second linear fully-connected layer, the input of which is a 1024-dimensional vector and the output of which is a 128-dimensional vector;
the ninth layer of the model is the third linear fully-connected layer, whose input is a 128-dimensional vector and whose output is a 16-dimensional vector.
Calculating the vector output by the third linear fully-connected layer through a Softmax function to obtain a final classification result, wherein the formula of the Softmax function is shown in a formula (6), and z is shown in the formulaiAnd C is the output value of the ith node, and the number of output nodes, namely the number of classification categories.
Figure BDA0003299687200000091
An Adam optimizer is selected for optimization of the model, and the AdaGrad and RMSProp optimization algorithms are combined, so that the first moment estimation and the second moment estimation of the gradient can be comprehensively considered for updating the step length. The loss function is a category _ cross entropy function, the calculation formula is formula (7), wherein n is the number of samples, m is the number of categories, l is the actual output value of the model, namely the predicted value,
Figure BDA0003299687200000093
are true values.
Figure BDA0003299687200000092
The trained model is stored in a file so as to facilitate the extraction and calling of the model by a detection module.
Step 7, testing the trained intrusion detection model based on the convolution depth neural network by using a test set;
and 8, acquiring a Modbus network flow packet to be detected in the industrial control system from the network flow, the database or the log file, repeating the steps 2 to 4, and carrying out intrusion detection on the Modbus network flow packet by using a tested intrusion detection model based on the convolutional deep neural network.
Modbus traffic data of the industrial control system extracted from network traffic, a database or a log file is used for intrusion detection. And the data is processed by the data packet recombination module and then directly input into the detection module. The detection module selects a model with training accuracy from the stored model files, and the recall rate reaches a certain threshold value. And constructing a model for intrusion detection by using the model stored in the model file, inputting data to be detected, and outputting detection results corresponding to all the data by using the model. For example, if the output result is 0000, it indicates that all the four combined packets are normal; the output result is 1011, which indicates that the first, third and fourth data packets of the four combined data packets are attack data packets.
Although the description is given according to the embodiments, not every embodiment includes only one independent technical solution, and the description is only given for clarity, and those skilled in the art should make the description as a whole, and technical solutions in the embodiments may be appropriately modified and implemented as understood by those skilled in the art. The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent embodiments or modifications that do not depart from the technical spirit of the present invention should be included within the scope of the present invention.

Claims (10)

1. An intrusion detection method based on Modbus data packet reorganization is characterized by comprising the following steps:
step 1, obtaining a Modbus data packet;
acquiring four types of Modbus data packets including command reading, response reading, command writing and response writing in a Modbus network flow packet in an industrial control system;
step 2, Modbus data packet recombination and feature extraction;
performing head-to-tail splicing on the four types of Modbus data packets according to the sequence of command reading, response reading, command writing and response writing and the time sequence to obtain recombined data; dividing the recombined data by taking 8 bits as a unit according to the characteristics of a Modbus frame structure to obtain a plurality of characteristics;
step 3, characteristic screening;
calculating information gains of all the characteristics obtained by dividing in the step 2, sequencing the information gains from large to small, and reserving the characteristics corresponding to the first N information gains;
and 4, data graphical processing: converting the characteristic data screened in the step 3 into a binary picture;
step 5, constructing a data set according to the binary picture obtained in the step 4, and carrying out manual labeling; meanwhile, dividing the data set into a training set and a testing set;
step 6, establishing an intrusion detection model based on a convolution depth neural network, and training by utilizing a training set;
the intrusion detection model based on the convolution depth neural network comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a third convolution layer, a first linear full-connection layer, a second linear full-connection layer and a third linear full-connection layer which are sequentially cascaded;
step 7, testing the trained intrusion detection model based on the convolution depth neural network by using a test set;
and 8, acquiring a Modbus network flow packet to be detected in the industrial control system from the database or the log file, repeating the steps 2 to 3, and carrying out intrusion detection on the Modbus network flow packet by using a tested intrusion detection model based on the convolutional deep neural network.
2. The intrusion detection method based on Modbus data packet reassembly according to claim 1, wherein each type of Modbus data packet in step 1 has two types of tags, namely attack behavior and normal behavior.
3. The intrusion detection method based on Modbus data packet reassembly according to claim 1, wherein the step 3 specifically is:
3-1, calculating the information entropy of each feature, wherein the information entropy is the measure for measuring the degree of disorder, and the calculation formula is shown in formula (1):
Figure FDA0003299687190000021
wherein xi,jRepresents a feature XiThe jth value of (1), XiFor all value cases of the ith feature, P (x)i,j) The expression characteristic takes the value of xi,jProbability of (A), H (X)i) Information entropy of the ith feature;
3-2, calculating the conditional entropy of each feature, wherein the conditional entropy is the information entropy under a certain condition, and the calculation formula is shown in formula (2):
Figure FDA0003299687190000022
wherein H (Y | X)i) Representing the uncertainty of Y in the selection of the ith feature, YiFor the ith classification category, Y represents a classification category set;
3-3 calculating the feature information gain by equation (3) according to the conditional entropy of each feature and combining the information entropy of the classification category:
IG(Xi)=H(Y)-H(Y|Xi) (3)
Figure FDA0003299687190000023
wherein IG (X)i) Information gain representing the ith feature, H (Y) being the information entropy of the classification category;
3-4, sorting the information gains of all the characteristics from large to small, and reserving the characteristics corresponding to the first n information gains.
4. The intrusion detection method based on Modbus data packet reassembly according to claim 1, wherein the step 4 specifically is: converting the data processed in the step 3 from hexadecimal data stream into binary stream, and then converting the binary stream into a square RGB image; wherein 0 in the binary stream is replaced by RGB (255 ) and 1 in the binary stream is replaced by RGB (0,0, 0).
5. The intrusion detection method based on Modbus data packet reassembly according to claim 1, wherein the tags of the data set in step 5 adopt 4-bit binary digits, and the digits from the highest bit to the lowest bit represent tags of four types of Modbus data packets, namely command read, response read, command write and response write, respectively.
6. The intrusion detection method based on Modbus data packet reassembly according to claim 1, wherein the number of convolution kernels, the size of convolution kernels, the step size and the padding of the first convolution layer, the second convolution layer and the third convolution layer in the intrusion detection model based on the convolution deep neural network are adjusted according to the size of input data, and the relationship between the input size and the output size is as shown in formula (4):
Figure FDA0003299687190000031
where O is the size of the output image, I is the size of the input image, K is the size of the convolution kernel, P is the padding size, and S is the convolution kernel move step.
7. The intrusion detection method based on Modbus data packet reassembly according to claim 6, wherein the convolved results of the first convolutional layer, the second convolutional layer and the third convolutional layer in the intrusion detection model based on the convolutional deep neural network all use a Relu function as an activation function; the Relu function is shown in equation (5):
Relu(r)=max(0,r) (5)
wherein r is the convolution result value of the first convolution layer, the second convolution layer and the third convolution layer, and the max (0, r) function represents taking the maximum value of 0 and r.
8. The intrusion detection method based on Modbus packet reassembly according to claim 7, wherein the vector output by the third linear fully-connected layer in the intrusion detection model based on the convolutional deep neural network is calculated by a Softmax function, and the formula of the Softmax function is shown in formula (6);
Figure FDA0003299687190000032
wherein z isiAnd C is the output value of the ith node, and the number of output nodes, namely the number of classification categories.
9. The intrusion detection method based on Modbus packet reassembly according to claim 8, wherein the loss function is a category _ cross entropy function, and the calculation formula is formula (7)
Figure FDA0003299687190000033
Wherein n is the number of samples, m is the number of classification categories, l is the actual output value of the model, i.e. the predicted value,
Figure FDA0003299687190000034
are true values.
10. An intrusion detection system based on Modbus data packet reorganization is characterized by comprising a data packet reorganization module, an intrusion detection model training module and a detection module;
the data packet recombination module is used for recombining the four Modbus data packets according to the service relationship in the industrial control system, and performing characteristic screening and data imaging processing on the recombined data;
the intrusion detection model training module is used for manually marking the data processed by the data packet restructuring module, dividing the data into a training set and a testing set, training an intrusion detection model based on a convolutional neural network by using the data in the training set, and optimizing the intrusion detection model based on the convolutional neural network by using the data in the testing set;
and the detection module is used for inputting unmarked Modbus data packets into a trained intrusion detection model based on the convolutional neural network after the data packets are processed by the data packet recombination module, and identifying whether the data packets are intrusion data according to detection results.
CN202111187044.6A 2021-10-12 2021-10-12 Intrusion detection method and device based on Modbus data packet reorganization Active CN114124447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111187044.6A CN114124447B (en) 2021-10-12 2021-10-12 Intrusion detection method and device based on Modbus data packet reorganization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111187044.6A CN114124447B (en) 2021-10-12 2021-10-12 Intrusion detection method and device based on Modbus data packet reorganization

Publications (2)

Publication Number Publication Date
CN114124447A true CN114124447A (en) 2022-03-01
CN114124447B CN114124447B (en) 2024-02-02

Family

ID=80441801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111187044.6A Active CN114124447B (en) 2021-10-12 2021-10-12 Intrusion detection method and device based on Modbus data packet reorganization

Country Status (1)

Country Link
CN (1) CN114124447B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553606A (en) * 2022-04-26 2022-05-27 科大天工智能装备技术(天津)有限公司 Industrial control network intrusion detection method and system
CN115333957A (en) * 2022-08-05 2022-11-11 国家电网有限公司信息通信分公司 Service flow prediction method and system based on user behaviors and enterprise service characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107438052A (en) * 2016-05-26 2017-12-05 中国科学院沈阳自动化研究所 A kind of anomaly detection method towards unknown industrial communication protocol stipulations
CN108429753A (en) * 2018-03-16 2018-08-21 重庆邮电大学 A kind of matched industrial network DDoS intrusion detection methods of swift nature
CN110912867A (en) * 2019-09-29 2020-03-24 惠州蓄能发电有限公司 Intrusion detection method, device, equipment and storage medium for industrial control system
CN113179279A (en) * 2021-05-20 2021-07-27 哈尔滨凯纳科技股份有限公司 Industrial control network intrusion detection method and device based on AE-CNN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107438052A (en) * 2016-05-26 2017-12-05 中国科学院沈阳自动化研究所 A kind of anomaly detection method towards unknown industrial communication protocol stipulations
CN108429753A (en) * 2018-03-16 2018-08-21 重庆邮电大学 A kind of matched industrial network DDoS intrusion detection methods of swift nature
CN110912867A (en) * 2019-09-29 2020-03-24 惠州蓄能发电有限公司 Intrusion detection method, device, equipment and storage medium for industrial control system
CN113179279A (en) * 2021-05-20 2021-07-27 哈尔滨凯纳科技股份有限公司 Industrial control network intrusion detection method and device based on AE-CNN

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
池亚平;杨垠坦;李格菲;王志强;许萍;: "基于GR-CNN算法的网络入侵检测模型设计与实现", 计算机应用与软件, no. 12 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553606A (en) * 2022-04-26 2022-05-27 科大天工智能装备技术(天津)有限公司 Industrial control network intrusion detection method and system
CN115333957A (en) * 2022-08-05 2022-11-11 国家电网有限公司信息通信分公司 Service flow prediction method and system based on user behaviors and enterprise service characteristics
CN115333957B (en) * 2022-08-05 2023-09-05 国家电网有限公司信息通信分公司 Service flow prediction method and system based on user behavior and enterprise service characteristics

Also Published As

Publication number Publication date
CN114124447B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN112003870B (en) Network encryption traffic identification method and device based on deep learning
CN112839034B (en) Network intrusion detection method based on CNN-GRU hierarchical neural network
CN110730140A (en) Deep learning flow classification method based on combination of space-time characteristics
CN110808945B (en) Network intrusion detection method in small sample scene based on meta-learning
CN110896381A (en) Deep neural network-based traffic classification method and system and electronic equipment
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
JP6055548B2 (en) Apparatus, method, and network server for detecting data pattern in data stream
CN114124447A (en) Intrusion detection method and device based on Modbus data packet recombination
CN111866024B (en) Network encryption traffic identification method and device
CN111191767A (en) Vectorization-based malicious traffic attack type judgment method
CN113806746A (en) Malicious code detection method based on improved CNN network
CN109033833B (en) Malicious code classification method based on multiple features and feature selection
CN109698798B (en) Application identification method and device, server and storage medium
CN110222795A (en) The recognition methods of P2P flow based on convolutional neural networks and relevant apparatus
CN110971603A (en) Abnormal flow detection method and system based on deep learning
CN113705604A (en) Botnet flow classification detection method and device, electronic equipment and storage medium
CN115277888A (en) Method and system for analyzing message type of mobile application encryption protocol
CN111898129A (en) Malicious code sample screener and method based on Two-Head anomaly detection model
CN115567305B (en) Sequential network attack prediction analysis method based on deep learning
CN117176664A (en) Abnormal flow monitoring system for Internet of things
CN112446341A (en) Alarm event identification method, system, electronic equipment and storage medium
CN112487406A (en) Network behavior analysis method based on machine learning
CN114979017A (en) Deep learning protocol identification method and system based on original flow of industrial control system
CN113992419A (en) User abnormal behavior detection and processing system and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant