CN115987643A

CN115987643A - Industrial control network intrusion detection method based on LSTM and SDN

Info

Publication number: CN115987643A
Application number: CN202211669957.6A
Authority: CN
Inventors: 张佳艺; 孙建国; 刘畅; 李思照
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2022-12-25
Filing date: 2022-12-25
Publication date: 2023-04-18

Abstract

The invention belongs to the technical field of intrusion detection of industrial control networks, and particularly relates to an industrial control network intrusion detection method based on LSTM and SDN. The invention adopts SDN architecture to realize the decoupling of forwarding plane and control plane, which is beneficial to the deployment, optimization and management of industrial control network, and effectively utilizes SDN mirror image to collect data; the method has the advantages that the method carries out abnormity detection in a mode of comparing future values and real values of the online prediction sequence of the LSTM network, identifies attack behaviors from sequence data of the network, improves the real-time performance and the accuracy of a network intrusion detection model, provides a thought for the safety problem of the SDN applied to the industrial control network, combines an attention mechanism into the LSTM network, can concentrate more computing resources on the intrusion behaviors which are harmful to the industrial control network actually, and can achieve the effects of improving the detection efficiency and reducing the abnormity detection false alarm rate.

Description

Industrial control network intrusion detection method based on LSTM and SDN

Technical Field

The invention belongs to the technical field of intrusion detection of industrial control networks, and particularly relates to an industrial control network intrusion detection method based on LSTM and SDN.

Background

The industrial control network is a network formed by taking a measurement control instrument which has digital communication capability and can be massively dispersed on a production field as a network node. The operational goal of an industrial control network is to achieve free communication of information between field devices, thereby making it easier to accomplish the tasks of the control system.

With the continuous and rapid development of industrial informatization technology, an industrial control network is directly responsible for a plurality of important key infrastructures and large systems in the country, and if the industrial control network is attacked by external malicious attacks in the internet, serious consequences such as information leakage and system paralysis can be caused.

A typical industrial control system three-layer structure comprises, from top to bottom: enterprise management layer, monitoring layer, field layer. The enterprise management layer is accessed to the Internet by using a network communication technology, so that decision management of an industrial process is facilitated. The monitoring layer is responsible for data transmission between the enterprise management layer and the field layer and controls field equipment. The field layer contains a wide variety of sensors, actuators, transmitters, I/O devices, and is primarily responsible for sensing field information and operation of the field devices, as well as exchanging digital or analog data between different field devices over the fieldbus. Three major protection goals of information security are: confidentiality, integrity, and availability. In IT systems the importance decreases in turn, whereas in industrial control systems the availability is of the utmost importance.

Software-Defined networking (SDN) is a new type of Network architecture that builds an open programmable Network architecture by decoupling Network control from Network forwarding. Software defined networking supports finer grained control of the network by programming, which allows network maintenance to be more focused on implementing centralized control in software without concern for specific network device-related details. The OpenFlow technology has very important significance in the SDN development process, the OpenFlow is used as a prototype implementation mode of the SDN and represents the technical implementation of the SDN numerical control separation framework, and the SDN technology based on the OpenFlow is adopted in the invention. The three-layer architecture of the software defined network comprises: data plane, control plane, application plane.

The idea of a logic control and data forwarding separation architecture in the SDN can simplify the forwarding function of hardware, the forwarding decision is realized by software, and the hardware is concentrated on the forwarding of the message. The network operation and maintenance can automatically deploy the network configuration only by updating software, thereby accelerating the deployment period of the network, reducing the complexity of the network and reducing the construction cost of the network. In the security field, the SDN can realize global monitoring of network devices and states, and the open programmable property also brings new possibilities for flexible realization of network security.

While bringing the above benefits, SDN also provides opportunities for network attackers to implement malicious attacks using open interfaces, so security issues in the context of SDN network architectures need to be more careful.

The Intrusion Detection System (Intrusion Detection System) is a System formed by combining computer software or software and hardware, and is used for detecting, identifying and preventing by means of collecting protected network and host data and analyzing the collected data, establishing an Intrusion Detection model, extracting key features of Intrusion behaviors and the like.

Currently, there are two main ideas for intrusion detection: misuse-based intrusion detection techniques and anomaly-based intrusion detection techniques. The prior method establishes a feature library based on known intrusion behaviors and performs matching detection on the behaviors in the network, but the method is difficult to detect unknown attack types, and nowadays, industrial control networks are increasingly large in scale, and network attack means are increasingly complex, so that the method is difficult to well solve the safety problem of the industrial control networks. In an actual industrial production process, normal samples are collected far more than abnormal samples, so that it is very difficult to identify an attack by learning the characteristics of the abnormal samples. The invention uses an anomaly-based intrusion detection technique to identify abnormal patterns in the network that do not conform to expected behavior by learning the characteristics of normal samples.

Although the anomaly detection technology has strong generalization capability, the false alarm rate is high, and the situation that the false alarm rate is not an attack behavior although the anomaly is marked may occur, so how to reduce the false alarm rate is an important improvement target.

In the anomaly detection method adopting the traditional machine learning, the effective extraction of the characteristics is an important research direction of the model detection accuracy. RNNs are commonly used for sequence data modeling because they are able to learn sequence relationships that are hidden in variable-length input sequences. The implementation of a network attack action usually needs to complete multiple steps, possibly represented by multiple events, and if an attack sequence can be mined in the network, and causal relationships among the attack sequence can be found, the attack action is more likely to be detected and even the attack intention can be deduced. However, the output of the RNN may appear to fade or to explode in gradients when learning long-term temporal correlations modeled by large-scale sequence data. The present invention uses LSTM for the basic model of anomaly detection and improves on it in conjunction with an attention mechanism.

Disclosure of Invention

The invention aims to provide an industrial control network intrusion detection method based on LSTM and SDN.

An industrial control network intrusion detection method based on LSTM and SDN comprises the following steps:

step 1: acquiring historical sequence data of nodes in an industrial control network, and pre-training an LSTM-based intrusion detection model;

and 2, step: collecting data in the network from equipment of an industrial control network field layer through a network detection tool and an SDN image;

and step 3: transmitting the collected data to an SDN control layer through a southbound interface OpenFlow;

and 4, step 4: an LSTM-based intrusion detection model deployed in an SDN application layer is analyzed and calculated through a northbound interface restful API: outputting a sequence predicted value of a next finite time step by an LSTM-based intrusion detection model, wherein the sequence predicted value is a predicted result obtained by modeling the behavior in a normal state in a network and is used as a reference sequence value;

and 5: calculating the distance between the real data and the reference sequence value, and if the distance exceeds a given threshold value, determining that the time step data is abnormal; and if the abnormality occurs in a plurality of continuous time steps, the intrusion detection program generates an alarm, and the network is considered to be at higher risk currently and needs to be checked.

Further, the intrusion detection model based on the LSTM includes an input layer, a hidden layer, an attention layer, a full connection layer, and an output layer, and the specific training process is as follows:

step 1.1: historical sequence data S = { S for nodes in industrial control network in input layer ₁ ,S ₂ ,S ₃ ,…S _r Processing the obtained product by using a Z-Scores standardized formula to obtain S' = { S } ₁ ',S',S ₃ ',…S _r '}；

Step 1.2: processing the input sequence into a sequence with consistent length, inputting the processed sequence into an LSTM network, and performing data segmentation by using a segmentation window with a window length of a fixed value k to obtain X = { X = { (X) } ₁ ,X ₂ ,X ₃ ,…X _k Any n-dimensional vector X in X _t ＝{x ₁ ,x ₂ ,x ₃ …x _n }；

Step 1.3: into the hidden layer, LSTM at time t except for h _t In addition to the cellular state C _t In addition, three more gating structures are added at time t for controlling information flow: a forgetting gate, an input gate and an output gate;

forgetting the door: discarding part of the garbage, h _t-1 And x _t Obtaining f by Sigmoid function _t This output f _t In the [0,1 ]]The probability of forgetting the state of a previous layer of hidden cells is shown;

updating the formula: f. of _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f )

An input gate: newly adding part of important information, respectively using Sigmoid and tanh activation functions to obtain results, and subsequently multiplying the results to update the cell state i _t Representing the probability of acceptance of new information that has been learned so far;

representing the knowledge learned this time, via i _t FiltrationC post-added to the present layer _t Removing;

updating the formula: i.e. i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i )

An output gate: integrating the cell states of the forgetting gate and the input gate, and outputting the cell states to the next unit; is composed of two parts, o _t Filtering information from C _t Screening out information for solving the current problem from all owned knowledge, and then obtaining a result;

updating the formula: o. o _t ＝σ(W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

Wherein x is _t Is the current input vector; h is a total of _t Is the current hidden layer vector, h _t Output comprising all LSTM cells; b. u and W respectively represent bias, input weight and circulation weight;

step 1.4: feature h of LSTM hidden layer output _t The attention layer is transferred to be calculated;

step 1.4.1: calculating attention distribution over all input information

Defining an attention variable z ∈ [1, n ]]Index position indicating selected information, z = i indicating input information of i-th dimension, and then calculating probability α of selecting i-th input information given a certain query vector q and information vector x for search and selection _i ；

The s function is used for calculating the similarity between different dimensions of q and x, the feature with high similarity is assigned with higher weight, and the soft attention mechanism adopted here is not to select only one of a plurality of information stored by x, but to extract part of all information, but the extracted contents are different to some extent:

wherein, W, U, v are network parameters that can be learnt, and are dynamically updated by taking a minimum loss function as a target, and n is the dimension of input information;

step 1.4.2: calculating a weighted average of the input information according to the attention distribution;

attention distribution α _i Representing the degree of correlation between the ith information in the input vector x and the query q when the query q is given, normalizing by using a Softmax function to obtain a normalized attention weight matrix, performing weighted average to output a processed sequence feature vector, and recording the sequence feature vector as att;

inputting the output att of the attention layer into a full-connection layer, activating by using a Sigmoid function, and finally outputting a predicted value Y = { Y = of a model ₁ ,Y ₂ ,Y ₃ ,…Y _k }；

Step 1.5: the optimization goal of the model is to minimize the loss function value; the loss function is defined as follows:

the loss function value reflects the error between the predicted and true values of the neural network.

The invention has the beneficial effects that:

the invention adopts SDN architecture to realize the decoupling of forwarding plane and control plane, which is beneficial to the deployment, optimization and management of industrial control network, and effectively utilizes SDN mirror image to collect data; the method has the advantages that the method carries out anomaly detection in a mode of comparing future values and real values of the sequence predicted on line by the LSTM network, identifies attack behaviors from the sequence data of the network, improves the real-time performance and the accuracy of a network intrusion detection model, provides a thought for the application of SDN to the safety problem of the industrial control network, combines an attention mechanism into the LSTM network, can concentrate more computing resources on intrusion behaviors which are harmful to the industrial control network actually, and can achieve the effects of improving the detection efficiency and reducing the false alarm rate of anomaly detection.

Drawings

Fig. 1 is a diagram of an industrial control network structure based on SDN in the present invention.

FIG. 2 is a flow chart of the LSTM-based intrusion detection method of the present invention.

Fig. 3 is a block diagram of an LSTM network incorporating the attention mechanism of the present invention.

FIG. 4 is a schematic diagram of a RNN hidden unit according to the present invention.

Fig. 5 is a schematic diagram of a hidden unit of the LSTM in the present invention.

Fig. 6 is a schematic diagram of a sliding window for extracting time series historical features according to the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The invention aims to provide a method for more effectively solving the problem of intrusion detection in an industrial control network by combining two new technologies of an SDN infrastructure and an LSTM neural network, which can concentrate more computing resources on the intrusion behavior which actually harms the industrial control network, and can achieve the effects of improving the detection efficiency and reducing the abnormal detection false alarm rate.

collecting data in the network from equipment of an industrial control network field layer through a traditional network detection tool and an SDN image;

transmitting the collected data to an SDN control layer through a southbound interface OpenFlow;

an intrusion detection program based on abnormal detection deployed in an SDN application layer is analyzed and calculated through a northbound interface Restful API:

inputting historical sequence data of nodes in an industrial control network into an LSTM-based intrusion detection model, wherein the output obtained by the model is a sequence predicted value in a limited time step next to the sequence, and the model mainly applies a one-way long-short term memory neural network and is added with an attention mechanism improvement model;

the attack behavior in the network needs to complete a series of steps, which usually consists of a plurality of events and has a time sequence relation in time, so that data is input according to the arrival sequence of data streams;

firstly, performing off-line training on the model according to historical data, performing repeated iterative adjustment on neural network parameters by taking a minimized cost function as a target, and then performing on-line detection;

the sequence predicted value of the next finite time step output by the model is a predicted result obtained by modeling the behavior in the normal state in the network and is used as a reference sequence value;

measuring and collecting data in the network at the current moment in real time, using the model for online detection, calculating the distance between real data and a reference sequence value, and if the distance exceeds a given threshold value, considering that the time step data is abnormal;

if a plurality of continuous time steps are abnormal, an intrusion detection program generates an alarm, the network is considered to be at higher risk at present and needs to be checked, and the number of the time steps at the position is given by the specific conditions of different networks and the combination of expert experience;

the LSTM-based intrusion detection model specifically includes:

inputting sequence data in the industrial control network nodes into a network, dividing an original sequence data set, and standardizing data;

because the self-characteristics of the LSTM and the application scene of the invention, the input sequence and the output sequence are equal, the number of the neurons in the input layer is equal to that of the neurons in the output layer, a segmentation window is set, a fixed value is given, and the standardized data is segmented;

initializing a hidden layer structure, building a single-layer long-short term memory neural network, and inputting the processed data into a hidden layer;

parameters in the LSTM are shared, information is filtered and screened by content output after passing through a repetition module, after a hidden state of a hidden layer at each time step is returned, the information is input into an attention layer, and data of which dimensions are screened play a key role in predicting the dimensions;

extracting parts from all information by adopting a soft Attention mechanism in an Attention layer, wherein the extracted contents are different, and calculating the weighted average of input information by using the Attention distribution according to the calculated Attention distribution to obtain a final Attention value;

inputting the output of the attention layer into a full connection layer, and finally obtaining the output from the output layer;

the output is the sequence prediction value for the next finite time step.

The invention provides an industrial control network intrusion detection method based on LSTM and SDN, provides a method for carrying out real-time intrusion detection through an LSTM network in an industrial control network of an SDN network architecture, and provides a thought for the application of the SDN to the intrusion detection of the industrial control network.

As shown in fig. 1, in the SDN-based industrial control network structure diagram based on SDN, in the embodiment, a software defined network architecture is used to solve the problem of the conventional industrial control network. Centralized network control and management of the SDN facilitates management of network resources through a global view, and guarantees are provided for real-time transmission of an industrial network. The design is carried out based on the basic structure of the SDN, and specific application components, service managers and the like in the specific production and manufacturing process of the industrial control system are omitted.

The three-layer architecture of the software defined network comprises: data plane, control plane, application plane.

From the top-down analysis of fig. 1, the intrusion detection module to be implemented by the present invention should be deployed in the form of an application program in the application plane of the SDN network architecture. The intrusion detection program communicates with the control plane through a northbound interface. Accessing the controller and obtaining the network state information obtained by the controller from the network equipment of the data plane, carrying out data modeling, detecting whether an intrusion event occurs in the network, if so, generating an alarm, returning the result to the control plane, and executing the safety management of the data plane network equipment by the control plane.

The control plane is the core of the SDN, and acquires information of an underlying infrastructure through a southbound interface (a data plane and a control plane interface, such as OpenFlow), and provides an extensible northbound interface (a control plane and an application plane interface, such as restful API), so as to facilitate construction of the application plane, for example, an application program in the application plane obtains network state information through accessing the API, manages network resources, and adjusts forwarding rules.

In the SDN technology based on OpenFlow, openFlow specifies rules to be organized in different flow tables, and the OpenFlow matches and processes network packets through rules defined or preset by a user. The processing unit of each OpenFlow switch is composed of a flow table, flow table items represent forwarding rules, and data packets entering the switch acquire corresponding operations through polling the flow table. The invention takes the flow as the basic unit when training the neural network.

The role of the data plane in the network is a dummy data unit, which comprises specific physical equipment, and a controller, an actuator, a PLC, an instrument and the like are arranged in the industrial control network. Data forwarding needs to be completed by the participation of OpenFlow switches, a processing unit of each switch is composed of a flow table, a data packet entering the switch is subjected to operation by inquiring the flow table, matching rules and executing, and if no rule which can be matched exists, the data packet is sent to an SDN controller through a security channel to request processing. In addition, the data plane also collects the state information of the network, including topology discovery, routing strategy, flow statistics and the like, and the collected data are sent to the router of the belonged group.

Compared with a traditional network, the SDN can realize all-around monitoring of network equipment and states, supports programming control, and can better guarantee safe operation of an industrial control system, so that the problems of difficult information acquisition and difficult response execution are solved.

The intrusion detection process described in the embodiment of the present invention is shown in fig. 2, and the bottom-up analysis and data acquisition mainly come from two ways:

1) Using conventional network probing tools (e.g., traceroute, nmap, TCPDump, wireshark, advanced Port Scanner, GFI LanGuard, nagios, etc.);

the content of the scanning includes: the method comprises the steps of specifying an Icmp message of an interface, a message of a protocol, a message of a source host, a message of a port, a host of an IP address in a specified range, the type of an operating system of the host and the like, wherein the Traceroute, the Nmap, the TCPDump and the like are common network detection commands or security auditing tools, can be used for rapidly scanning large networks, can normally operate for a single host, are simple and flexible, and support a plurality of operating systems.

2) Data collected using SDN images;

the SDN mirror image can receive information such as an MAC address and a port reported by the switch, send the mirror image flow table to the switch according to the set configuration, and store the information matched with the mirror image flow table in the switch. And the SDN network mirror image acquires communication flow in the industrial control system, analyzes a communication message, preprocesses data, extracts data characteristics and completes statistical analysis. In the preprocessing process, misuse data packets need to be filtered, and the data packets are subjected to timestamp marking; the feature extraction needs to analyze information such as MAC addresses, IP addresses, window sizes, port numbers and the like; the statistical analysis focuses on key devices in the industrial control network, and the value ranges of some attributes of the key devices are set with thresholds according to historical data in the subsequent intrusion detection.

Since the SDN image configuration task sequence is completed on the SDN controller, the replication of the received or sent data stream into the SDN controller can be easily implemented. A maintainer of an industrial control network only pays attention to how to utilize the open programmable characteristic of the SDN, an intrusion detection module is added in an application layer, and the functions of analysis, calculation and early warning are realized by calling global data in a network view maintained in an SDN controller.

The intrusion detection program starts working after acquiring the data stream from the controller.

The method used by the invention is based on the idea of anomaly detection to carry out intrusion detection, namely learning a large number of data samples in the network, classifying behaviors outside the profiles into anomalies by constructing the profiles of normal behaviors, then detecting whether the behaviors are attack behaviors or not, and sending an alarm if the behaviors are attack behaviors. Compared with the intrusion detection method based on misuse, the intrusion detection method based on the abnormity has the advantages of low missing report rate and strong generalization capability for the existing high requirement on the completeness of the feature library, but errors of marking normal behaviors as abnormity are likely to occur.

In an industrial production process, many work periods are circulated, so that data in a network may show periodicity, and if external invasion occurs, the operation state of a system is likely to deviate from a normal state. The present invention therefore uses machine learning based anomaly detection techniques. The abnormity detection plays an important role in providing safety guarantee and early warning of dangerous conditions for the industrial process.

The invention uses the Long-Short Term Memory neural network model (Attention-Based Long Short-Term Memory) combined with the Attention mechanism, namely, an Attention layer is added in the Long-Short Term Memory artificial neural network (LSTM).

Because the industrial control network plays a crucial role in national industrial activities, the network model is put into practical use after being trained in advance, and therefore the method adopts an off-line training model and a mode of updating data and parameters regularly. A network attack is a systematic behavior, and an attacker needs to complete a series of steps to achieve the goal, which usually consists of a plurality of events and has a time sequence relation in time, so the invention inputs data streams into the network according to the arrival sequence.

A Recurrent Neural Network (Recurrent Neural Network) in a plurality of deep learning models has better adaptability to time sequence data analysis. The recurrent neural network operates a single model at all time steps and at all sequence lengths, which enables it to be generalized to unseen sequence lengths, and the training samples required to estimate the model are far less than models without parameter sharing.

The output of a certain time step of the RNN is related to the input of a plurality of previous time steps and the current state of the RNN, but the state of each step learns the previous information and accumulates errors, thereby possibly causing gradient explosion; when the sequence is too long and information is too much, RNN learning a very distant sequence becomes difficult and gradient disappearance may occur.

The LSTM is one of the RNN variants, and has the advantages of better solving the problems of RNN gradient disappearance, gradient explosion and insufficient long-term memory. The key to solving the long-term dependence problem is to link previous information to the current task and to use the existing knowledge to guess the understanding of the current stage state. LSTM solves this problem by making special designs of the duplicated modules in the RNN.

LSTM can use h _t As a lossy summary of task-related aspects of past sequences, the summary may selectively refine certain aspects of the past sequences to be retained according to different training criteria.

The reason why the abstract is generally lossy is that the LSTM network, as a variant of RNN, also has the basic features of RNN: each term of the output is a function of the previous term, sharing parameters in different parts of the model. Regardless of the sequence length, the learned model always has the same input size because it specifies a transition from one state to another, rather than operating on a variable-length historical state. At each time step, the same transfer function f of the same parameters may be used. Mapping sequences of arbitrary length to some fixed length vector makes the digest generally lossy.

When an LSTM network corresponding to a large-scale industrial control network is trained, the efficiency and the precision of a model are influenced by the problems that the performance of the network is reduced along with the increase of the input length, the network lacks the extraction and the strengthening of characteristics and the like. The different characteristics in the industrial control network greatly contribute to the detection result, so an attention mechanism is introduced, a mode of processing information by a human brain is simulated, a part with important processing is selected from a large number of inputs, and the characteristics with poor correlation are ignored by distributing low weight.

An attention mechanism is introduced to focus on key information in the LSTM training process, so that the long-distance interdependent features in sequence data can be captured; and the attention mechanism directly establishes the dependency relationship between the input and the output without circulation, so that the parallelism of the calculation is improved. In summary, attention is drawn to a mechanism that allows LSTM to better model variable length sequence data.

The essence of the attention mechanism is to adaptively assign weights to the information in the input network, and then perform a weighted summation. When applying the attention mechanism, the selection can be based on time steps, namely, the decision of which time step has a larger influence on the result is made, and the attention mechanism is used for the dimension to know which dimension plays a key role in predicting the dimension. After the attention layer is added into the hidden layer of the LSTM, the LSTM returns the hidden state of the hidden layer of the last layer at each time step, and a full connection layer is added after the output of the attention layer, and finally the output of the LSTM is output from the output layer.

The modified network structure is shown in fig. 3. The input layer carries out some preprocessing on the data stream in the original industrial control network, such as proper cleaning and transformation; the hidden layer is the core of the LSTM network; the attention layer calculates an attention value to better realize feature extraction; the fully connected layer integrates the features extracted before; the output layer provides a prediction result; in the model, a single-layer recurrent neural network is constructed by adopting a cell structure as shown in the figure. The network training and application of the embodiments of the present invention are analyzed next:

raw sequence data S = { S } in input layer ₁ ,S ₂ ,S ₃ ,…S _r Dividing the data set, normalizing elements in the training set, and expressing the training set processed by using a Z-Scores normalization formula as S' = { S } ₁ ',S',S ₃ ',…S _r '}。

Processing the input sequence into a sequence with consistent length, inputting the sequence into an LSTM network, and performing data segmentation by using a segmentation window with a window length of a fixed value k to obtain X = { X = ₁ ,X ₂ ,X ₃ ,…X _k Any n-dimensional vector X in X _t ＝{x ₁ ,x ₂ ,x ₃ …x _n }(i∈{1,2,…n})

And the input layer processes the sequence and then enters the hidden layer. H can be seen from the RNN hidden unit diagram of FIG. 4 _t From x _t And h _t-1 Is obtained, calculate h _t Later on for losses in this layerCalculate and next layer h _t+1 And (4) calculating. In contrast, referring to FIG. 5, the hidden unit structure of LSTM can be seen, LSTM at time t except for h _t In addition, there is a hidden state: cell State C _t In addition, three more gating structures are added at time t for controlling information flow: forget gate, input gate, output gate.

And (5) forgetting the door, namely discarding part of useless information. h is a total of _t-1 And x _t Obtaining f by Sigmoid function _t This output f _t In the [0,1 ]]And (3) represents the probability of forgetting the state of the previous layer of hidden cells.

Updating the formula: f. of _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f )

An input gate: and adding part of important information. Composed of two parts, respectively using Sigmoid and tanh activation functions, and then multiplying the obtained results to update the cell state, i _t Indicating the probability of acceptance of new information that has been learned so far.

Representing the knowledge learned this time, via i _t C added to the layer after filtration _t And (4) removing.

Updating the formula: i.e. i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i )

An output gate: the cell states of the forgetting gate and the input gate are integrated and then output to the next cell. Is composed of two parts, o _t Filtering information from C _t And screening out information for solving the current problem from all the owned knowledge, and then obtaining a result.

Updating the formula: o _t ＝σ(W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

Interpreting variables; wherein x _t Is the current input vector, h _t Is the current hidden layer vector，h _t Contains the output of all LSTM cells. b. U and W represent bias, input weight, and round robin weight, respectively.

Feature h of LSTM hidden layer output _t The method is transferred to an attention layer for calculation and mainly comprises the following two steps:

(1) Calculating the attention distribution over all input information

Defining an attention variable z e [1, n ]]Index position indicating selected information, z = i indicating input information of i-th dimension, and then calculating probability α of selecting i-th input information given a certain query vector q and information vector x for search and selection _i . The calculation formula is as follows:

the s function is used for calculating the similarity between q and x in different dimensions, the feature with high similarity is assigned with higher weight, and the soft attention mechanism adopted here is not to select only one of a plurality of information stored by x, but to extract part of all the information, but the extracted contents are different to some extent. The formula for calculating the similarity in this embodiment selects a scaling dot product model:

wherein W, U and v are network parameters which can be learned, the dynamic update is carried out by taking a minimization loss function as a target, and n is the dimension of input information.

(2) Calculating a weighted average of the input information according to the attention distribution

Attention distribution α _i Representing the degree of correlation between the ith information in the input vector x and the query q when the query q is given, normalizing by using a Softmax function to obtain a normalized attention weight matrix, performing weighted average to output a processed sequence feature vector, and recording the sequence feature vector as att:

inputting the output att of the attention layer into a full-connection layer, activating by using a Sigmoid function, and finally outputting the predicted value Y = { Y ] of a model ₁ ,Y ₂ ,Y ₃ ,…Y _k }。

The optimization goal of the model is to minimize the loss function value. The loss function is defined as follows:

the loss function value reflects the error between the predicted and true values of the neural network, and since the LSTM has two directional inputs, it also faces two directions when calculating the back-propagation error. A common back propagation algorithm is BPTT, where a distance w is chosen, and when the error reaches this distance, the truncation is no longer forward propagated. This distance determines the number of the hidden layers after expansion, and finally influences the accuracy of the neural network prediction.

The method comprises the steps of determining the hyper-parameters of the network before training, wherein a common method is grid search, manually setting the value range of the hyper-parameters, and selecting a proper hyper-parameter combination from the value range of the hyper-parameters by taking the value of a loss function as an evaluation basis.

After the LSTM neural network is trained and optimized using historical data, the model is used to predict real-time data. After the data is subjected to standardized preprocessing, n-dimensional data from t-m to t time is input, a forward propagation algorithm is executed once, and the n-dimensional data with the length of m is obtained. The output of the model is X _t-m+1 To X _t+1 And (5) predicting the value in the m steps. Since the value up to the current time is known, only X is present _t+1 The predicted value of (A) is a value to be retained in a single calculation, which is compared with X _t-m+1 To x _t The numerical values of (A) are combined to form a new sequence and continue to participate in the calculation. The data input in the industrial control network is generally a continuous time sequence, and the data at a certain moment can participate in the calculation of the network model for many timesBut at a different position, similar to a sliding window, as shown in fig. 6.

The predicted value calculated by the LSTM network cannot directly indicate whether the system is in an abnormal state, and then the real data from time t +1 to time t + m are collected and compared with the predicted value, and the present embodiment uses the mean square error to represent the distance between the two.

If the mean square error of P continuous time steps exceeds a threshold value lambda, the fact that a real value is greatly different from an LSTM predicted value is shown, the system is likely to be abnormal, whether the system is attacked or not needs to be detected as soon as possible, and an intrusion detection program returns an alarm to an SDN control plane.

The threshold λ used for comparison is an important influence factor of the anomaly detection judgment standard, and needs to be determined according to the test result in the actual production process and the expert experience.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An industrial control network intrusion detection method based on LSTM and SDN is characterized by comprising the following steps:

and 4, step 4: an LSTM-based intrusion detection model deployed at an SDN application layer is analyzed and calculated through a northbound interface Restful API: outputting a sequence predicted value of a next finite time step by an LSTM-based intrusion detection model, wherein the sequence predicted value is a predicted result obtained by modeling the behavior in a normal state in a network and is used as a reference sequence value;

2. The industrial control network intrusion detection method based on the LSTM and the SDN according to claim 1, wherein: the intrusion detection model based on the LSTM comprises an input layer, a hidden layer, an attention layer, a full-connection layer and an output layer, and the specific training process is as follows:

forget the door: discard part of the garbage, h _t-1 And x _t Obtaining f by Sigmoid function _t This output f _t In the [0,1 ]]The probability of forgetting the state of a previous layer of hidden cells is shown;

updating the formula: f. of _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f )

An input gate: newly adding part of important information, respectively using Sigmoid and tanh activation functions, and subsequently multiplying the obtained results to update the cell state i _t Representing the probability of acceptance of new information that has been learned so far;

representing the knowledge learned this time, via i _t C added to the layer after filtration _t Removing;

updating the formula: i.e. i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i )

An output gate: integrating the cell states of the forgetting gate and the input gate, and outputting the cell states to the next unit; is composed of two parts, o _t Filtering information from C _t Screening out information for solving the current problem from all the owned knowledge, and then obtaining a result;

updating the formula: o _t ＝σ(W _o [h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

Wherein x is _t Is the current input vector; h is _t Is the current hidden layer vector, h _t Output comprising all LSTM cells; b. u and W respectively represent bias, input weight and circulating weight;

step 1.4.1: calculating attention distribution over all input information

Defining an attention variable z e [1, n ]]Index position indicating selected information, z = i indicating input information of ith dimension, and then calculating probability α of selecting ith input information given a certain query vector q and information vector x for search and selection _i ；

attention distribution α _i Expressing the degree of correlation between ith information in an input vector x and a query q when the query q is given, normalizing by using a Softmax function to obtain a normalized attention weight matrix, performing weighted average to output a processed sequence feature vector, and recording the processed sequence feature vector as att;

inputting the output att of the attention layer into a full-connection layer, activating by using a Sigmoid function, and finally outputting the predicted value Y = { Y ] of a model ₁ ,Y ₂ ,Y ₃ ,…Y _k }；