CN115396204A - Industrial control network flow abnormity detection method and device based on sequence prediction - Google Patents

Industrial control network flow abnormity detection method and device based on sequence prediction Download PDF

Info

Publication number
CN115396204A
CN115396204A CN202211031858.5A CN202211031858A CN115396204A CN 115396204 A CN115396204 A CN 115396204A CN 202211031858 A CN202211031858 A CN 202211031858A CN 115396204 A CN115396204 A CN 115396204A
Authority
CN
China
Prior art keywords
industrial control
data
data packet
length
time interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211031858.5A
Other languages
Chinese (zh)
Inventor
潘洁
耿洋洋
车欣
邓瑞龙
赵成成
孙铭阳
程鹏
陈积明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211031858.5A priority Critical patent/CN115396204A/en
Publication of CN115396204A publication Critical patent/CN115396204A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes

Abstract

The invention discloses a method and a device for detecting abnormal industrial control network flow based on sequence prediction, which increase the recall rate and detection precision of abnormal detection of an industrial control system. Aiming at the flow packet characteristics of the industrial control system, on the basis of considering quintuple information, the invention considers the special function code of the industrial control protocol, the data packet time characteristic caused by the long-term stable operation of the industrial control system, the coupling characteristic of the function code and the data packet length and other characteristics, and preliminarily screens the message by utilizing a multi-level white list, thereby improving the detection efficiency and reducing the influence of abnormal data on the model performance; and an LSTM-SVM model structure is used, a neural network considering time sequence information is used for extracting the hidden logic relation among the data packets, and an SVM is used for outputting a classification result, so that the detection accuracy is improved.

Description

Industrial control network flow abnormity detection method and device based on sequence prediction
Technical Field
The invention belongs to the field of industrial control system safety, and relates to a method and a device for detecting abnormal flow of an industrial control network based on sequence prediction.
Background
The industrial control system is composed of a plurality of control loops. The control loop is realized by a controller, a sensor and an actuator. The sensor transmits the measured physical information to the controller; the controller calculates and processes the information received from the sensor according to a control algorithm to obtain a control signal, and transmits the control signal to the actuator; the actuator executes the operation according to the above, and finally the control purpose is realized.
Distributed Control Systems (DCS) and data acquisition and monitoring Systems (SCADA) are common control system architectures. DCS is a system that controls the production process in the same geographical location. The SCADA is a system for acquiring, monitoring and controlling data. The SCADA can comprise a plurality of DCS systems for local control; remote control is performed using a remote control module (RTU). PLCs are important control components that are commonly used in SCADA and DCS systems to implement specific operations and provide local process management through loop control.
At the beginning of the design of the industrial control system, only the reliability and physical safety problems are considered in terms of safety problems because the industrial control system is physically isolated from the outside. With the development of digitization, information technology is being more applied to industrial control systems in order to better master production data, apply the production data to enterprise management, and the like. With the use of information technology, industrial control systems are threatened by viruses, trojans and the like from the conventional information systems. To ensure the safety of industrial control systems, effective defense measures must be considered.
Intrusion detection technology is a defense means which is firstly applied to the field of information systems. Intrusion detection can discover potential threats and take certain measures to further defend against the potential threats when abnormal behaviors occur without making excessive changes to the original system. With the popularization of ethernet and TCP/IP networks in industrial control systems, intrusion detection techniques can be applied to industrial control systems. Intrusion detection technologies fall into two broad categories, host-based and network-based. Due to the particularity of industrial control, host-based intrusion detection techniques that require frequent updates of the host configuration are not suitable for use in industrial control systems. The network-based intrusion detection technology acquires network flow information through network switching equipment, and analyzes suspicious behavior patterns according to the network flow information.
Disclosure of Invention
The invention aims to provide a method and a device for detecting abnormal flow of an industrial control network based on sequence prediction, aiming at the defects of safety of the existing industrial control system.
The purpose of the invention is realized by the following technical scheme:
according to a first aspect of the present specification, there is provided a method for detecting an abnormal flow in an industrial control network based on sequence prediction, the method including the steps of:
(1) Acquiring communication data of an industrial control system by using packet capturing software in a hybrid mode, wherein the communication data comprises a data packet in a long-time normal running state and a data packet in an abnormal state, removing a normal communication behavior of an intranet host for automatically inquiring a default gateway, marking a category label for each data packet in the communication data, and constructing to obtain a training set;
(2) Carrying out protocol analysis work on each data packet of the industrial control system, and identifying and extracting effective characteristics in the data packet, wherein the protocol analysis work comprises the following steps: the method comprises the following steps of a source IP, a target IP, a source port, a target port, a protocol type, an industrial control protocol function code, a data packet length, a time interval between two data packets and an industrial control protocol data segment length;
(3) Creating a white list, and performing primary screening by using the white list; the white list includes three parts arranged in sequence: screening only data packets in the white list range, and marking the screened data packets as abnormal data packets;
(4) Preprocessing the effective features extracted from each data packet, and converting the effective features into a piece of standardized vector data;
(4.1) the preprocessing mode of the time interval characteristics comprises the following steps: calculating the time interval between the current data packet receiving time and the previous data packet receiving time; performing maximum and minimum normalization processing on the time interval characteristics after 10 logarithms are taken, dividing the normalized time interval characteristics into a plurality of distribution intervals by using a clustering algorithm, and updating the numbers of the distribution intervals into original corresponding data packets;
(4.2) the preprocessing mode of the data packet length characteristics comprises the following steps: for data packets with different lengths, uniformly compressing the length characteristics of the data packets to different digital intervals according to a proportion, and performing maximum and minimum normalization processing on the compressed length characteristic values to obtain the data packet length characteristics;
(4.3) connecting all the category quantities of each data packet, the distribution interval number processed in the step (4.1) and the length characteristics of the data packet processed in the step (4.2) into a Hash character string, numbering the data packets, and converting the data packets into one-hot vectors;
(5) Establishing a prediction model based on an LSTM-SVM structure by using the one-hot vector obtained in the step (4) for predicting the type of a data packet at the next moment; converting the abnormal detection problem into an optimization problem of a loss function by using a prediction model, training and optimizing the prediction model, and updating parameters of the prediction model;
(6) And (5) detecting the data packet to be detected in the actual industrial control system by using the model trained in the step (5), and judging whether the data packet is normal or abnormal.
Further, in the step (1), the external device is accessed to an internal communication network of the industrial control system, and the hybrid mode is adopted to collect the communication data of the industrial control system by using the packet capturing software Wireshark, wherein the data source is actual field data or safety test platform data.
Further, in the step (2), the source IP, the destination IP, the source port, the destination port, the protocol type, and the industrial control protocol function code are category quantities, and the length and the time interval are numerical quantities, which represent the flow size and the communication frequency information carried in the data packet communication process, wherein the time interval carries certain industrial control device fingerprint information; the protocol type focuses on the type of the industrial control private protocol used; the industrial control protocol function code is a characteristic which is unique in the industrial control field and represents the intention of an operator.
Further, in the step (2), the data segment part of the industrial control protocol is a specific part of the communication data packet of the industrial control system, and includes the operation of the upper computer on the controller, the real-time state of the controller or the memory data of the controller, the length and format of the data segment part have special definition, the data segment part has correlation with the functional code of the industrial control protocol, and the data segment part can check the validity of the data packet, analyze the purpose of the data packet and simply, rapidly and simply detect the data packet by identifying the length of the data segment of the industrial control protocol and comparing the data segment with the functional code of the industrial control protocol.
Further, in the step (3), the method for constructing the quintuple white list includes: extracting source IP, destination IP, source port, destination port and protocol type characteristics from the data packet collected in the step (1) in the long-time normal operation state, and storing the characteristics in a hash table mode to form a quintuple white list;
the construction method of the industrial control protocol function code white list comprises the following steps: extracting industrial control protocol function codes from the data packets collected in the step (1) in the long-time normal operation state to form a function code white list;
the construction method of the industrial control protocol data segment length white list comprises the following steps: the length of the industrial control protocol data segment is correlated with the length of the industrial control protocol function code, the length of the data segment of the same industrial control protocol function code is limited within a certain length range or is fixed, and the length range of the data segment is set according to expert experience to form an industrial control protocol data segment white list.
Further, in the step (4), the preprocessing mode for the time interval characteristics specifically includes:
(a) Calculating the time interval between the current data packet receiving time and the last data packet receiving time, wherein the calculation formula is as follows:
Figure BDA0003817422770000031
where i represents the current packet number, Δ t i Characteristic of the time interval, t, representing the ith packet i Represents the time of receipt of packet i; the time interval c of the captured first data packet is obtained by utilizing the 2 nd to 4 th time interval data and adopting a least square method for estimation;
(b) And after the logarithm of 10 is taken for the time interval characteristics, maximum and minimum normalization processing is carried out, the normalized time interval characteristics are divided into a plurality of distribution intervals by using a clustering algorithm, and the numbers of the distribution intervals are updated into original corresponding data packets.
Further, in the step (4), the preprocessing mode for the packet length characteristic specifically includes: the length characteristics of the data packets with the length of 0-150 byte interval are uniformly compressed to 0-9 according to the proportion, the length characteristics of the data packets with the length of 150-999 byte interval are uniformly compressed to 9-20 according to the proportion, the length characteristic value after subsection compression is between 0 and 20, and the compressed length characteristic value is subjected to maximum and minimum normalization processing to be used as the length characteristics of the data packets.
Further, in the step (5), the prediction model includes an Embedding layer, an LSTM hidden layer 1, a Dropout layer, an LSTM hidden layer 2, and an SVM layer, which are connected in sequence; the Embellding layer converts the input one-hot vector into a word vector with the length of N; two LSTM hidden layers receive sample features for training; the Dropout layer is used to avoid model overfitting; the SVM layer is used as an output layer, hidden layer sparse characteristic data output by the LSTM hidden layer 2 is used as input, and the type of a data packet is output; the classification decision function f of the SVM layer is as follows:
Figure BDA0003817422770000041
wherein
Figure BDA0003817422770000042
Is a Lagrangian multiplier and
Figure BDA0003817422770000043
y (n) for class labels, y ∈ { +1, -1}, sgn (·) is a sign function, when, b * For bias, k (·,. Cndot.) is a radial basis kernel function, x (n) For the nth sample in the training set, x is the argument.
Further, in the step (5), the loss function L is expressed as follows:
L=max(0,1-y (n) w T x (n) )
w is a weight vector obtained by training, and T represents transposition; and updating the weight vector by using an Adam optimization algorithm in the training process.
According to a second aspect of the present specification, there is provided an industrial control network traffic anomaly detection apparatus for sequence prediction, including a memory and one or more processors, where the memory stores executable codes, and the processors execute the executable codes to implement the industrial control network traffic anomaly detection method for sequence prediction according to the first aspect.
The beneficial effects of the invention are: the industrial control network flow abnormity detection method and device based on sequence prediction increase the recall rate and detection precision of abnormity detection of an industrial control system. Aiming at the flow packet characteristics of the industrial control system, on the basis of considering quintuple information, the invention considers the special function code of the industrial control protocol, the data packet time characteristic caused by the long-term stable operation of the industrial control system, the coupling characteristic of the function code and the data packet length and other characteristics, and preliminarily screens the message by utilizing a multi-level white list, thereby improving the detection efficiency and reducing the influence of abnormal data on the model performance; and an LSTM-SVM model structure is used, a neural network considering time sequence information is used for extracting the hidden logic relation among the data packets, and an SVM is used for outputting a classification result, so that the detection accuracy is improved.
Drawings
Fig. 1 is a flowchart of a method for detecting abnormal traffic in an industrial control network based on sequence prediction according to an exemplary embodiment.
Fig. 2 is a schematic diagram of a white list structure according to an exemplary embodiment.
FIG. 3 is a schematic diagram of the structure of the prediction model of the LSTM-SVM provided by the exemplary embodiment.
Fig. 4 is a block diagram of an industrial control network traffic anomaly detection device based on sequence prediction according to an exemplary embodiment.
Detailed Description
The technical solution in the present embodiment will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The embodiment of the invention provides an industrial control network flow abnormity detection method based on sequence prediction, which comprises the following steps:
(1) The method comprises the steps of accessing external equipment into an internal communication network of the industrial control system, acquiring communication data of the industrial control system by using packet capturing software in a hybrid mode, wherein the communication data comprises data packets in a long-time normal running state and data packets in an abnormal state, removing a normal communication behavior of an intranet host for automatically inquiring a default gateway, marking each data packet in the communication data with a category label, and constructing a training set and a testing set, wherein the category comprises normal and abnormal. The data source can be actual field data or safety test platform data. The bale catching software can adopt Wireshark.
(2) Paying attention to the communication process, carrying out protocol analysis work on each data packet of the industrial control system, identifying and extracting effective characteristics in the data packet, and comprising the following steps of: source IP (IPsrc), destination IP (IPdst), source port (PORTsrc), destination port (PORTdst), protocol type (Protocol), industrial control Protocol function code (FunCode), length of data packet, time interval (Δ t) between two data packets, and Length of industrial control Protocol data segment (P _ Length). The first six types are category quantities, that is, each identifier is only one type, and the length and the time interval are numerical quantities, which represent the flow size and the communication frequency information carried in the data packet communication process, wherein the time interval carries certain industrial control equipment fingerprint information. The protocol type is different from the traditional classification, and the important point is the type of the industrial control proprietary protocol used. The industrial control protocol function code is a characteristic which is unique in the industrial control field and represents the intention of an operator. The data segment part of the industrial control protocol is a special part of a communication data packet of the industrial control system, and comprises the operation of an upper computer on a controller, the real-time state of the controller or the memory data of the controller, the length and the format of the data segment part have special definitions, and the data segment part has correlation with the functional code of the industrial control protocol.
(3) A white list is created and used for preliminary screening.
The white list includes three parts arranged in sequence: and screening only the data packets in the white list range.
The construction method of the quintuple white list comprises the following steps: extracting source IP, destination IP, source port, destination port and protocol type characteristics from the data packet acquired in the step (1) in the long-time normal running state, and storing the characteristics in a hash table mode to form a quintuple white list;
the construction method of the industrial control protocol function code white list comprises the following steps: extracting industrial control protocol function codes from the data packets collected in the step (1) in the long-time normal operation state to form a function code white list;
the construction method of the industrial control protocol data segment length white list comprises the following steps: the length of the industrial control protocol data segment is correlated with the length of the industrial control protocol function code, the length of the data segment of the same industrial control protocol function code is limited within a certain length range or is fixed, and the length range of the data segment is set according to expert experience to form an industrial control protocol data segment length white list;
and screening the data packet to be detected by using a quintuple white list, an industrial control protocol function code white list and an industrial control protocol data segment length white list in sequence to obtain the screened data packet, and marking the screened data packet as an abnormal data packet.
(4) Preprocessing the effective features extracted from each data packet, and converting the effective features into a piece of standardized vector data, namely, each data packet corresponds to a piece of standardized vector data;
(4.1) the preprocessing mode of the time interval characteristics comprises the following steps:
(a) Calculating the time interval between the current data packet receiving time and the last data packet receiving time, wherein the calculation formula is as follows:
Figure BDA0003817422770000061
where i represents the current packet number, Δ t i Characteristic of the time interval, t, of the ith packet i Represents the time of receipt of packet i; the time interval c of the captured first data packet is obtained by utilizing the 2 nd to 4 th time interval data and adopting a least square method for estimation;
(b) And after 10 logarithms are acquired for the time interval characteristics, maximum and minimum normalization processing is carried out, the normalized time interval characteristics are divided into a plurality of distribution intervals by using a clustering algorithm, and the numbers of the distribution intervals are updated into original corresponding data packets.
(4.2) the preprocessing mode of the data packet length characteristics comprises the following steps:
the length characteristics of the data packets with the length of 0-150 byte interval are uniformly compressed to 0-9 according to the proportion, the length characteristics of the data packets with the length of 150-999 byte interval are uniformly compressed to 9-20 according to the proportion, the length characteristic value after subsection compression is between 0 and 20, and the compressed length characteristic value is subjected to maximum and minimum normalization processing to be used as the length characteristics of the data packets.
And (4.3) connecting all the category quantities of each data packet, the distribution interval number processed in the step (4.1) and the length characteristics of the data packet processed in the step (4.2) into a hash character string, numbering the data packets through a Dict data structure of python language, and converting the data packets into one-hot vectors.
(5) And (4) establishing a prediction model based on the LSTM-SVM structure by using the one-hot vector obtained in the step (4) for predicting the type of the data packet at the next moment.
The built prediction model comprises an Embedding layer, an LSTM hidden layer 1, a Dropout layer, an LSTM hidden layer 2 and an SVM layer which are connected in sequence. The Embedding layer converts the input one-hot vector into a word vector with the length of N. Two LSTM hidden layers receive sample features for training. The Dropout layer is used to avoid model overfitting. The SVM layer is used as an output layer, hidden layer sparse characteristic data output by the LSTM hidden layer 2 is used as input, and the type of a data packet is output, namely the data packet is judged to be normal or abnormal.
The SVM layer is implemented as follows: the method comprises the following steps of implicitly mapping a sample from an original space to a high-dimensional space by using a radial basis kernel function, solving the linear indifference problem in an original characteristic space, wherein the formula of the radial basis kernel function is as follows:
Figure BDA0003817422770000062
where k (·,. Cndot.) is a radial basis kernel function, x (n) For the nth sample in the training set, x is an independent variable, σ is an adjustable free parameter, | | x (n) -x|| 2 Is the squared euclidean distance between the two feature vectors.
The classification decision function f of the SVM layer is as follows:
Figure BDA0003817422770000071
wherein
Figure BDA0003817422770000072
Is a Lagrangian multiplier and
Figure BDA0003817422770000073
y (n) is a category label, y belongs to { +1, -1}, sgn (x) is a sign function, and when x is more than 0, sgn (x) =1; when x =0, sgn (x) =0; when x is less than 0, sgn (x) = -1,b * Is an offset.
Converting the abnormal detection problem into an optimization problem of a loss function by using a prediction model, training and optimizing the prediction model, and updating parameters of the prediction model, wherein the loss function L formula is as follows:
L=max(0,1-y (n) w T x (n) )
where w is the weight vector obtained from training and T represents transposition.
(6) And (5) detecting the data packet to be detected in the actual industrial control system by using the model trained in the step (5), and judging whether the data packet is normal or abnormal.
FIG. 1 is a flow chart of an anomaly detection method of the present invention. And after the network communication data is collected from the industrial control system, processing the collected network communication data. And marking the data packets collected in the normal state as normal and constructing a white list. And marking the data packets collected in the abnormal state as abnormal, constructing a training set and a testing set together with the data packets marked as normal, and training the LSTM-SVM prediction model. The data packets to be detected are firstly detected through a white list, and the data packets which do not pass the detection are directly marked as abnormal. And preprocessing the data packet passing the detection, and then obtaining a detection result through an LSTM-SVM prediction model.
Fig. 2 is a specific white list structure diagram. The white list includes three parts arranged in sequence: a quintuple white list, an industrial control protocol function code white list and an industrial control protocol data segment length white list; only data packets that are within the white list will pass the screening.
FIG. 3 is a specific hierarchical structure of the LSTM-SVM prediction model. The prediction model is divided into an Embedding layer, an LSTM hidden layer 1, a Dropout layer, an LSTM hidden layer 2 and an SVM layer. And training the prediction model by using a training set, and updating the weight vector by using an Adam optimization algorithm in the training process.
Corresponding to the embodiment of the industrial control network flow abnormity detection method based on sequence prediction, the invention also provides an embodiment of an industrial control network flow abnormity detection device based on sequence prediction.
Referring to fig. 4, the industrial control network traffic anomaly detection device based on sequence prediction according to the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and when the processors execute the executable codes, the industrial control network traffic anomaly detection device based on sequence prediction is used to implement the industrial control network traffic anomaly detection method based on sequence prediction according to the embodiment.
The embodiment of the industrial control network traffic anomaly detection device based on sequence prediction can be applied to any equipment with data processing capability, and the equipment with data processing capability can be equipment or devices such as computers. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 4, the present invention is a hardware structure diagram of any device with data processing capability where the sequence prediction-based industrial control network traffic anomaly detection apparatus is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, in which any device with data processing capability where the apparatus is located in the embodiment may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the invention also provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for detecting the abnormal flow of the industrial control network based on the sequence prediction in the above embodiments is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein in one or more embodiments to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (10)

1. An industrial control network flow abnormity detection method based on sequence prediction is characterized by comprising the following steps:
(1) Acquiring communication data of an industrial control system by using packet capturing software in a hybrid mode, wherein the communication data comprises a data packet in a long-time normal running state and a data packet in an abnormal state, removing a normal communication behavior of an intranet host for automatically inquiring a default gateway, marking a category label for each data packet in the communication data, and constructing to obtain a training set;
(2) Carrying out protocol analysis work on each data packet of the industrial control system, and identifying and extracting effective characteristics in the data packet, wherein the protocol analysis work comprises the following steps: the method comprises the following steps of a source IP, a target IP, a source port, a target port, a protocol type, an industrial control protocol function code, a data packet length, a time interval between two data packets and an industrial control protocol data segment length;
(3) Creating a white list, and performing primary screening by using the white list; the white list includes three parts arranged in sequence: screening only data packets in the white list range, and marking the screened data packets as abnormal data packets;
(4) Preprocessing the effective characteristics extracted from each data packet, and converting the effective characteristics into standardized vector data;
(4.1) the preprocessing mode of the time interval characteristics comprises the following steps: calculating the time interval between the current data packet receiving time and the previous data packet receiving time; performing maximum and minimum normalization processing on the time interval characteristics after 10 logarithms are taken, dividing the normalized time interval characteristics into a plurality of distribution intervals by using a clustering algorithm, and updating the numbers of the distribution intervals into original corresponding data packets;
(4.2) the preprocessing mode of the data packet length characteristics comprises the following steps: for data packets with different lengths, uniformly compressing the length characteristics of the data packets to different digital intervals according to a proportion, and performing maximum and minimum normalization processing on the compressed length characteristic values to obtain the data packet length characteristics;
(4.3) connecting all the category quantities of each data packet, the distribution interval number processed in the step (4.1) and the length characteristics of the data packet processed in the step (4.2) into a Hash character string, numbering the data packets, and converting the data packets into one-hot vectors;
(5) Establishing a prediction model based on an LSTM-SVM structure by using the one-hot vector obtained in the step (4) for predicting the type of the data packet at the next moment; converting the abnormal detection problem into an optimization problem of a loss function by using a prediction model, training and optimizing the prediction model, and updating parameters of the prediction model;
(6) And (5) detecting the data packet to be detected in the actual industrial control system by using the model trained in the step (5), and judging whether the data packet is normal or abnormal.
2. The method for detecting the flow anomaly of the industrial control network based on the sequence prediction as claimed in claim 1, wherein in the step (1), the external device is accessed to an internal communication network of the industrial control system, a hybrid mode is adopted, and packet-capturing software Wireshark is used for acquiring communication data of the industrial control system, and the data source is actual field data or safety test platform data.
3. The method for detecting the flow anomaly of the industrial control network based on the sequence prediction as claimed in claim 1, wherein in the step (2), the source IP, the destination IP, the source port, the destination port, the protocol type and the functional code of the industrial control protocol are category quantities, and the length and the time interval are numerical quantities, which represent the flow size and the communication frequency information carried in the communication process of the data packet, wherein the time interval carries certain fingerprint information of the industrial control equipment; the protocol type focuses on the type of the industrial control private protocol used; the industrial control protocol function code is a characteristic which is unique in the industrial control field and represents the intention of an operator.
4. The method for detecting abnormal flow in industrial control network based on sequence prediction as claimed in claim 1, wherein in the step (2), the data segment part of the industrial control protocol is a specific part of the communication data packet of the industrial control system, and includes the operation of the upper computer on the controller, the real-time status of the controller or the memory data of the controller, the length and format of the data segment part have special definition, and the data segment part has correlation with the functional code of the industrial control protocol.
5. The method for detecting abnormal industrial control network traffic based on sequence prediction as claimed in claim 1, wherein in the step (3), the method for constructing the quintuple white list is as follows: extracting source IP, destination IP, source port, destination port and protocol type characteristics from the data packet collected in the step (1) in the long-time normal operation state, and storing the characteristics in a hash table mode to form a quintuple white list;
the construction method of the industrial control protocol function code white list comprises the following steps: extracting industrial control protocol function codes from the data packets collected in the step (1) in the long-time normal running state to form a function code white list;
the construction method of the industrial control protocol data segment length white list comprises the following steps: the length of the industrial control protocol data segment is correlated with the length of the industrial control protocol function code, the length of the data segment of the same industrial control protocol function code is limited within a certain length range or is fixed, and the length range of the data segment is set according to expert experience to form an industrial control protocol data segment white list.
6. The method for detecting abnormal flow of industrial control network based on sequence prediction as claimed in claim 1, wherein in the step (4), the preprocessing mode for the time interval characteristics specifically includes:
(a) Calculating the time interval between the current data packet receiving time and the last data packet receiving time, wherein the calculation formula is as follows:
Figure FDA0003817422760000021
where i represents the current packet number, Δ t i Characteristic of the time interval, t, representing the ith packet i Represents the reception time of the data packet i; the first strip caughtThe time interval c of the data packet is obtained by utilizing the 2 nd to 4 th time interval data and adopting a least square method for estimation;
(b) And after the logarithm of 10 is taken for the time interval characteristics, maximum and minimum normalization processing is carried out, the normalized time interval characteristics are divided into a plurality of distribution intervals by using a clustering algorithm, and the numbers of the distribution intervals are updated into original corresponding data packets.
7. The method for detecting abnormal traffic in industrial control network based on sequence prediction as claimed in claim 1, wherein in the step (4), the preprocessing mode for the length characteristics of the data packet specifically is as follows: the length characteristics of the data packets with the length of 0-150 byte interval are uniformly compressed to 0-9 according to the proportion, the length characteristics of the data packets with the length of 150-999 byte interval are uniformly compressed to 9-20 according to the proportion, the length characteristic value after subsection compression is between 0 and 20, and the compressed length characteristic value is subjected to maximum and minimum normalization processing to be used as the length characteristics of the data packets.
8. The industrial control network traffic anomaly detection method based on sequence prediction as claimed in claim 1, wherein in the step (5), the prediction model comprises an Embedding layer, an LSTM hidden layer 1, a Dropout layer, an LSTM hidden layer 2 and an SVM layer which are connected in sequence; the Embellding layer converts the input one-hot vector into a word vector with the length of N; two LSTM hidden layers receive sample features for training; the Dropout layer is used to avoid model overfitting; the SVM layer is used as an output layer, hidden layer sparse characteristic data output by the LSTM hidden layer 2 is used as input, and the type of a data packet is output; the classification decision function f of the SVM layer is as follows:
Figure FDA0003817422760000031
wherein
Figure FDA0003817422760000032
Is a Lagrange multiplier and
Figure FDA0003817422760000033
y (n) for the class label, y ∈ { +1, -1}, sgn (·) is a sign function, when, b * For bias, k (·,. Cndot.) is a radial basis kernel function, x (n) For the nth sample in the training set, x is the argument.
9. The method for detecting the abnormal flow of the industrial control network based on the sequence prediction as claimed in claim 8, wherein in the step (5), the loss function L is expressed as follows:
L=max(0,1-y (n) w T x (n) )
w is a weight vector obtained by training, and T represents transposition; and updating the weight vector by using an Adam optimization algorithm in the training process.
10. An industrial control network traffic anomaly detection device based on sequence prediction, comprising a memory and one or more processors, wherein the memory stores executable codes, and the processor is used for implementing the industrial control network traffic anomaly detection method based on sequence prediction according to any one of claims 1 to 9 when executing the executable codes.
CN202211031858.5A 2022-08-26 2022-08-26 Industrial control network flow abnormity detection method and device based on sequence prediction Pending CN115396204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211031858.5A CN115396204A (en) 2022-08-26 2022-08-26 Industrial control network flow abnormity detection method and device based on sequence prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211031858.5A CN115396204A (en) 2022-08-26 2022-08-26 Industrial control network flow abnormity detection method and device based on sequence prediction

Publications (1)

Publication Number Publication Date
CN115396204A true CN115396204A (en) 2022-11-25

Family

ID=84122591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211031858.5A Pending CN115396204A (en) 2022-08-26 2022-08-26 Industrial control network flow abnormity detection method and device based on sequence prediction

Country Status (1)

Country Link
CN (1) CN115396204A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170241A (en) * 2023-04-26 2023-05-26 国家工业信息安全发展研究中心 Intrusion detection method, system and equipment of industrial control system
CN116578037A (en) * 2023-07-10 2023-08-11 杭州鄂达精密机电科技有限公司 Full inspection machine PLC control system and full inspection machine system
CN116957049A (en) * 2023-09-20 2023-10-27 南京邮电大学 Unsupervised internal threat detection method based on countermeasure self-encoder
CN117579400A (en) * 2024-01-17 2024-02-20 国网四川省电力公司电力科学研究院 Industrial control system network safety monitoring method and system based on neural network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170241A (en) * 2023-04-26 2023-05-26 国家工业信息安全发展研究中心 Intrusion detection method, system and equipment of industrial control system
CN116578037A (en) * 2023-07-10 2023-08-11 杭州鄂达精密机电科技有限公司 Full inspection machine PLC control system and full inspection machine system
CN116578037B (en) * 2023-07-10 2023-09-29 杭州鄂达精密机电科技有限公司 Full inspection machine PLC control system and full inspection machine system
CN116957049A (en) * 2023-09-20 2023-10-27 南京邮电大学 Unsupervised internal threat detection method based on countermeasure self-encoder
CN116957049B (en) * 2023-09-20 2023-12-15 南京邮电大学 Unsupervised internal threat detection method based on countermeasure self-encoder
CN117579400A (en) * 2024-01-17 2024-02-20 国网四川省电力公司电力科学研究院 Industrial control system network safety monitoring method and system based on neural network
CN117579400B (en) * 2024-01-17 2024-03-29 国网四川省电力公司电力科学研究院 Industrial control system network safety monitoring method and system based on neural network

Similar Documents

Publication Publication Date Title
CN115396204A (en) Industrial control network flow abnormity detection method and device based on sequence prediction
CN108985361B (en) Malicious traffic detection implementation method and device based on deep learning
CN109063745B (en) Network equipment type identification method and system based on decision tree
CN111683108B (en) Method for generating network flow anomaly detection model and computer equipment
CN111107102A (en) Real-time network flow abnormity detection method based on big data
CN109347853B (en) Deep packet analysis-based anomaly detection method for integrated electronic system
CN112039906B (en) Cloud computing-oriented network flow anomaly detection system and method
CN112202726B (en) System anomaly detection method based on context sensing
CN111385309B (en) Security detection method, system and terminal for online office equipment
CN113079150B (en) Intrusion detection method for power terminal equipment
CN112804123A (en) Network protocol identification method and system for scheduling data network
CN112165484B (en) Network encryption traffic identification method and device based on deep learning and side channel analysis
CN111464510B (en) Network real-time intrusion detection method based on rapid gradient lifting tree classification model
CN117411703A (en) Modbus protocol-oriented industrial control network abnormal flow detection method
CN112688946A (en) Method, module, storage medium, device and system for constructing abnormality detection features
CN110650124A (en) Network flow abnormity detection method based on multilayer echo state network
CN114615088A (en) Terminal service flow abnormity detection model establishing method and abnormity detection method
CN115795330A (en) Medical information anomaly detection method and system based on AI algorithm
CN112925805A (en) Big data intelligent analysis application method based on network security
CN116723157A (en) Terminal behavior detection model construction method, device, equipment and storage medium
CN116366319A (en) Method and system for detecting network security
Yu et al. Mining anomaly communication patterns for industrial control systems
CN113468555A (en) Method, system and device for identifying client access behavior
CN115580490B (en) Industrial Internet edge device behavior detection method, device, equipment and medium
CN116192536B (en) Network intrusion detection method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination