CN113783841A

CN113783841A - Industrial Internet of things intrusion detection network architecture construction method, device and equipment

Info

Publication number: CN113783841A
Application number: CN202110906235.7A
Authority: CN
Inventors: 李贝贝; 何俊江; 刘翱; 杜卿芸; 欧阳远凯; 朱子青
Original assignee: Chengdu Mojia Information Technology Co ltd
Current assignee: Chengdu Mojia Information Technology Co ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-12-10
Anticipated expiration: 2041-08-06
Also published as: CN113783841B

Abstract

The invention discloses a construction method of an industrial Internet of things intrusion detection network architecture, which comprises the steps of obtaining environmental state data based on historical Internet of things data; acquiring a reward cumulative function of an intrusion detection network framework based on the environment state data; obtaining a training strategy of the intrusion detection network framework according to the value function of the intrusion detection network framework; and obtaining a loss function of the intrusion detection network architecture by utilizing the reward cumulative function so as to obtain the intrusion detection network architecture. The invention also discloses an industrial Internet of things intrusion detection network architecture construction device, equipment and a storage medium. According to the invention, the intrusion detection network framework of the industrial Internet of things for detecting the continuous change of the environment and the structure is obtained through the intrusion detection network framework and the reward cumulative function of the environment state according to the training strategy and the loss function, and the intrusion detection network framework is utilized for training, verifying and detecting, so that the intrusion detection adaptability can be improved.

Description

Industrial Internet of things intrusion detection network architecture construction method, device and equipment

Technical Field

The invention relates to the technical field of network security, in particular to a construction method, a device and equipment of an industrial Internet of things intrusion detection network architecture.

Background

The industrial internet of things is a complex network, and any failure or abnormality of a part of the system can cause great damage to the whole system in a short time. Therefore, early detection of a network attack is critical to timely and effective network response. An Intrusion Detection System (IDS) is an important component of network security protection, and can help the System to effectively discover network Intrusion behavior.

However, in recent years, as the operating environment and structure of the industrial internet of things continuously change, a traditional intrusion detection model (such as an intrusion detection model based on simple machine learning) often does not have the adaptive adjustment capability for network threats, cannot dynamically adjust the identification strategy of the traditional intrusion detection model when the network risk environment of the industrial internet of things changes, and further cannot provide adaptive detection, response, defense and the like for complex network attacks.

Therefore, how to construct an industrial internet of things intrusion detection model suitable for continuous changes of operating environment and structure is a technical problem which needs to be solved urgently.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a construction method, a construction device and construction equipment of an industrial Internet of things intrusion detection network architecture, and aims to solve the technical problem that a traditional intrusion detection model in the prior art cannot provide self-adaptive intrusion detection for industrial Internet of things with continuously changing operating environment and structure.

In order to achieve the purpose, the invention provides a construction method of an industrial internet of things intrusion detection network architecture, which comprises the following steps:

obtaining environmental state data based on historical internet of things data, wherein the historical internet of things data is text data of network interaction and system states;

acquiring a reward cumulative function of an intrusion detection network framework based on the environment state data;

obtaining a cost function of the intrusion detection network framework according to the reward cumulative function, and obtaining a training strategy of the intrusion detection network framework according to the cost function;

and obtaining a loss function of the intrusion detection network architecture by utilizing the reward cumulative function so as to obtain the intrusion detection network architecture.

Optionally, the step of obtaining the environmental state data based on historical internet of things data specifically includes:

obtaining common network users in the environment state data according to network user data in historical internet of things data;

according to attack intrusion data in historical data of the Internet of things, obtaining malicious attackers in the environmental state data;

and obtaining a detection manager in the environment state data according to intrusion detection data in historical internet of things data.

Optionally, the step of obtaining a reward cumulative function of the intrusion detection network framework based on the environmental status data specifically includes:

based on the environmental state data, obtaining a feedback signal r of each time step t_t；

According to the feedback signal r of each time step t_tAnd acquiring a reward cumulative function of the intrusion detection network framework.

Optionally, based on the environmental status data, obtaining each time step tOf the feedback signal r_tThe method specifically comprises the following steps:

when the intrusion detection network framework detects attack intrusion data and successfully classifies the types of the attack intrusion data, a positive feedback signal r is obtained_t+ 1; or

When the intrusion detection network framework does not detect attack intrusion data or detects that the attack intrusion data are not successfully classified, acquiring a negative feedback signal r_t-1; or

And when the intrusion detection network framework detects the network user data and does not send intrusion detection data, no signal is fed back.

Optionally, the feedback signal r is obtained according to each time step t_tAfter the step of obtaining the reward accumulation function of the intrusion detection network framework, the method further comprises the following steps:

processing the reward cumulative function obtained by the intrusion detection network framework by using a discount factor to obtain the processed reward cumulative function; wherein the processed reward cumulative function expression is:

wherein γ ∈ [0,1 ]]For the discount coefficient, t is the time step in the ambient state, R_t+k+1The accumulated sum of the rewards from time step t to time step t + k + 1.

Optionally, the step of obtaining a cost function of the intrusion detection network framework according to the reward cumulative function, and obtaining a training strategy of the intrusion detection network framework according to the cost function specifically includes:

obtaining a cost function of the intrusion detection network framework according to the reward cumulative function;

based on the value function, obtaining a state value function and an action value function of the intrusion detection network framework;

acquiring the value of each environmental state data and the value of different actions under each environmental state data according to the state value function and the action value function of the intrusion detection network framework;

selecting the action which enables the maximum value of the state value function and the action value function under the current environment state data to obtain a training strategy of the intrusion detection network framework; wherein:

the expression of the state value function is:

the expression of the action value function is:

in the formula,

in order to be the probability of a state transition,

in the form of a set of state spaces,

for the action space set, R is the reward function, s is the state, a is the action, s 'is the next state for the transition of state s, and a' is the next action performed by action a.

Optionally, the obtaining a loss function of the intrusion detection network framework by using the reward cumulative function to obtain an intrusion detection network framework specifically includes:

processing a strategy network and a value network of the training strategy based on the reward cumulative function to obtain an advantage function value of the training strategy and a ratio of updating a new strategy and an old strategy;

constructing a loss function of the training strategy according to the ratio of the merit function value to the updated new strategy;

and obtaining an intrusion detection network framework according to the training strategy and the loss function.

In addition, in order to achieve the above object, the present invention further provides an industrial internet of things intrusion detection network architecture construction device, which includes:

the state acquisition module is used for acquiring environmental state data based on historical internet of things data, wherein the historical internet of things data is text data of network interaction and system states;

the reward acquisition module is used for acquiring a reward cumulative function of the intrusion detection network framework based on the environment state data;

the strategy acquisition module is used for acquiring a cost function of the intrusion detection network framework according to the reward cumulative function and acquiring a training strategy of the intrusion detection network framework according to the cost function;

and the framework acquisition module is used for acquiring the loss function of the intrusion detection network framework by utilizing the reward cumulative function so as to obtain the intrusion detection network framework.

In addition, in order to achieve the above object, the present invention further provides an industrial internet of things intrusion detection network architecture construction device, where the industrial internet of things intrusion detection network architecture construction device includes: the industrial Internet of things intrusion detection network architecture construction method comprises a memory, a processor and an industrial Internet of things intrusion detection network architecture construction program, wherein the industrial Internet of things intrusion detection network architecture construction program is stored on the memory and can run on the processor, and the steps of the industrial Internet of things intrusion detection network architecture construction method are realized when the processor executes the industrial Internet of things intrusion detection network architecture construction program.

In addition, in order to achieve the above object, the present invention further provides a storage medium, where an industrial internet of things intrusion detection network architecture construction program is stored on the storage medium, and the industrial internet of things intrusion detection network architecture construction program, when executed by a processor, implements the steps of the industrial internet of things intrusion detection network architecture construction method.

According to the method, environmental state data are obtained based on historical internet of things data; acquiring a reward cumulative function of an intrusion detection network framework based on the environment state data; obtaining a training strategy of the intrusion detection network framework according to the value function of the intrusion detection network framework; and obtaining a loss function of the intrusion detection network architecture by utilizing the reward cumulative function so as to obtain the intrusion detection network architecture. According to the invention, the intrusion detection network framework of the industrial Internet of things for detecting the continuous change of the environment and the structure is obtained through the intrusion detection network framework and the reward cumulative function of the environment state according to the training strategy and the loss function, and the intrusion detection network framework is utilized for training, verifying and detecting, so that the intrusion detection adaptability can be improved.

Drawings

Fig. 1 is a schematic structural diagram of a hardware operating environment and an industrial internet of things intrusion detection network architecture construction device according to an embodiment of the present invention.

Fig. 2 is a diagram of a communication network system architecture according to an embodiment of the present invention.

Fig. 3 is a schematic flow chart of a first embodiment of an industrial internet of things intrusion detection network architecture construction method according to the present invention.

Fig. 4 is a flowchart illustrating a second embodiment of the method for constructing an intrusion detection network architecture of the industrial internet of things according to the present invention.

Fig. 5 is a detailed flow chart of step a100 in fig. 4.

Fig. 6 is another detailed flowchart of step a100 in fig. 4.

Fig. 7 is a schematic diagram of a training process of the construction method of the intrusion detection network architecture of the industrial internet of things.

Fig. 8 is a flowchart illustrating a third embodiment of the method for constructing an intrusion detection network architecture of the industrial internet of things according to the present invention.

Fig. 9 is a block diagram of an embodiment of an industrial internet of things intrusion detection network architecture construction device according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The industrial internet of things is a complex network, and any failure or abnormality of a part of the system can cause great damage to the whole system in a short time. Therefore, early detection of a network attack is critical to timely and effective network response. An Intrusion Detection System (IDS) is an important component of network security protection, and can help the System to effectively discover network Intrusion behavior. However, in recent years, as the operating environment and structure of the industrial internet of things continuously change, a traditional intrusion detection model (such as an intrusion detection model based on simple machine learning) often does not have the adaptive adjustment capability for network threats, cannot dynamically adjust the identification strategy of the traditional intrusion detection model when the network risk environment of the industrial internet of things changes, and further cannot provide adaptive detection, response, defense and the like for complex network attacks.

In order to solve the problem, the invention provides various embodiments of the construction method of the industrial internet of things intrusion detection network architecture. According to the construction method of the industrial Internet of things intrusion detection network architecture, the intrusion detection network architecture of the industrial Internet of things for detecting the continuous change of the environment and the structure is obtained through the intrusion detection network architecture and the reward cumulative function of the environment state according to the training strategy and the loss function, and the intrusion detection network architecture is trained and detected, so that whether the current industrial Internet of things has intrusion behaviors or not can be judged.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a recommendation device of a hardware operating environment and an intrusion detection method for an industrial internet of things according to an embodiment of the present invention.

The device may be a User Equipment (UE) such as a Mobile phone, smart phone, laptop, digital broadcast receiver, Personal Digital Assistant (PDA), tablet computer (PAD), handheld device, vehicular device, wearable device, computing device or other processing device connected to a wireless modem, Mobile Station (MS), or the like. The device may be referred to as a user terminal, portable terminal, desktop terminal, etc.

Generally, the apparatus comprises: at least one processor 301, a memory 302, and an industrial internet of things intrusion detection network architecture building program stored on the memory and operable on the processor, the industrial internet of things intrusion detection network architecture building program is configured to implement the steps of the industrial internet of things intrusion detection network architecture building method as described above.

The processor 301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 301 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. The processor 301 may further include an AI (Artificial Intelligence) processor, which is configured to process relevant operation for building the intrusion detection network architecture of the industrial internet of things, so that the intrusion detection network architecture building model of the industrial internet of things can be trained and learned autonomously, thereby improving efficiency and accuracy.

Memory 302 may include one or more computer-readable storage media, which may be non-transitory. Memory 302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 302 is configured to store at least one instruction for execution by the processor 801 to implement the method for building an industrial internet of things intrusion detection network architecture provided by the method embodiments herein.

In some embodiments, the terminal may further include: a communication interface 303 and at least one peripheral device. The processor 301, the memory 302 and the communication interface 303 may be connected by a bus or signal lines. Various peripheral devices may be connected to communication interface 303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 304, a display screen 305, and a power source 306.

The communication interface 303 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 301 and the memory 302. The communication interface 303 is used for receiving the movement tracks of the plurality of mobile terminals uploaded by the user and other data through the peripheral device. In some embodiments, processor 301, memory 302, and communication interface 303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 301, the memory 302 and the communication interface 303 may be implemented on a single chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 304 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 304 communicates with a communication network and other communication devices through electromagnetic signals, so as to obtain the movement tracks and other data of a plurality of mobile terminals. The rf circuit 304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 304 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 304 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 305 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 305 is a touch display screen, the display screen 305 also has the ability to capture touch signals on or over the surface of the display screen 305. The touch signal may be input to the processor 301 as a control signal for processing. At this point, the display screen 305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 305 may be one, the front panel of the electronic device; in other embodiments, the display screens 305 may be at least two, respectively disposed on different surfaces of the electronic device or in a folded design; in still other embodiments, the display screen 305 may be a flexible display screen disposed on a curved surface or a folded surface of the electronic device. Even further, the display screen 305 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 305 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The power supply 306 is used to power various components in the electronic device. The power source 306 may be alternating current, direct current, disposable or rechargeable. When the power source 306 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the industrial internet of things intrusion detection device, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

In order to facilitate understanding of the embodiment of the present invention, a communication network system on which the industrial internet of things intrusion detection device of the present invention is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present invention, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Specifically, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Among them, the eNodeB2021 may be connected with other eNodeB2022 through backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. The MME2031 is a control node that handles signaling between the UE201 and the EPC203, and provides bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present invention is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.

Based on the hardware structure of the industrial Internet of things intrusion detection network architecture construction equipment and the communication network system, the embodiment of the industrial Internet of things intrusion detection network architecture construction method is provided.

An embodiment of the invention provides a method for constructing an industrial internet of things intrusion detection network architecture, and referring to fig. 3, fig. 3 is a schematic flow diagram of a first embodiment of the method for constructing the industrial internet of things intrusion detection network architecture.

In this embodiment, the method for implementing the intrusion detection network architecture of the industrial internet of things includes the following steps:

step S100: and obtaining environmental state data based on historical internet of things data.

Specifically, in practical application, an execution main body of the method of the embodiment is an industrial internet of things intrusion detection network architecture construction device, and the industrial internet of things intrusion detection network architecture construction device may be various types of electronic devices such as a mobile phone, a tablet, a computer, or a wearable device. Of course, other devices with similar functions may also be used, and the present embodiment is not limited thereto.

It should be noted that the environmental state data is simulated by using a real industrial internet of things data set, and an environmental state required by the intrusion detection network architecture structure of the embodiment is formed. In this embodiment, the historical data set of the industrial internet of things comprises network interaction data and system state data; the network interaction data comprises an equipment address, a function code, a message length, message error check information, a message interval and the like; the system state data includes sensor measurements, monitoring inputs, distributed control states, and the like.

Specifically, based on historical internet of things data, obtaining environmental status data includes:

step S101: and obtaining common network users in the environment state data according to network user data in historical internet of things data.

Step S102: and acquiring malicious attackers in the environment state data according to attack intrusion data in historical data of the Internet of things.

Step S103: and obtaining a detection manager in the environment state data according to intrusion detection data in historical internet of things data.

It is readily understood that the context of acquisition is basedAnd the state data, the intrusion detection network architecture can sense the environment state and obtain the reward accumulation function of the intrusion detection network architecture according to the feedback signal provided by the environment. Wherein the intrusion detection network architecture senses the environmental state and provides a feedback signal r according to the environment_tSelecting an action that maximizes the future reward, i.e. from the current time step t, until the reward r of the final state_t,nIs R_t＝r_t,1+r_t,2+…+r_t,n。

Step S200: and acquiring a reward cumulative function of the intrusion detection network framework based on the environment state data.

Specifically, the step of obtaining the reward cumulative function of the intrusion detection network framework based on the environment state data comprises the following steps:

based on the environmental state data, obtaining a feedback signal r of each time step t_t(ii) a According to the feedback signal r of each time step t_tAnd acquiring a reward cumulative function of the intrusion detection network framework.

Wherein, according to the feedback signal r of each time step t_tThe step of obtaining the reward cumulative function of the intrusion detection network framework specifically comprises:

when the intrusion detection network framework detects attack intrusion data and successfully classifies the types of the attack intrusion data, a positive feedback signal r is obtained_t+ 1; or when the intrusion detection network framework does not detect attack intrusion data or detects that the attack intrusion data are not successfully classified, acquiring a negative feedback signal r_t-1; or when the intrusion detection network framework detects the network user data and does not send the intrusion detection data, no signal is fed back.

Further, to reduce uncertainty and randomness, the present embodiment uses a discount factor to reduce the strong correlation between steps, replacing the future prize with a discounted future jackpot Gt.

Specifically, according to the feedback signal r of each time step t_tObtaining a reward cumulative letter of said intrusion detection network infrastructureAfter several steps, the method also comprises the following steps:

Step S300: and obtaining a cost function of the intrusion detection network framework according to the reward cumulative function, and obtaining a training strategy of the intrusion detection network framework according to the cost function.

It should be noted that, after obtaining the reward accumulation function, the value function of the intrusion detection network architecture can be obtained according to the reward accumulation function, and then the training strategy and the loss function of the intrusion detection network architecture are obtained, so as to obtain the intrusion monitoring network architecture for intrusion detection.

Specifically, the steps of obtaining a cost function of the intrusion detection network framework according to the reward cumulative function and obtaining a training strategy of the intrusion detection network framework according to the cost function include:

step S301: and obtaining a cost function of the intrusion detection network framework according to the reward cumulative function.

Step S302: and acquiring a state value function and an action value function of the intrusion detection network framework based on the value function.

Step S303: and acquiring the value of each environment state data and the value of different actions under each environment state data according to the state value function and the action value function of the intrusion detection network framework.

Step S304: and selecting the action which enables the maximum value of the state value function and the action value function under the current environment state data to obtain the training strategy of the intrusion detection network framework.

It should be understood that the data for evaluating the state of the intrusion detection agent in different states, guiding the selection of the action of the agent, influencing the decision making of the next action by the intrusion detection network architecture can be obtained, and the degree of the state of the intrusion detection agent in a certain time t can be evaluated by a cost function. In this embodiment, Q is defined_π(s, a) is a function of the action value, V_π(s) is a function of the state value. The former is used to evaluate the current agent's expectation return from state s, performing action a, and subject to policy pi, and the latter represents the reward expectation from performing action a in state s, where,

in the embodiment, the action space in the intrusion detection network framework is a positive discrete value, "0" indicates that the intrusion detection network framework is predicted to be normal traffic, and "1, 2, …, n" indicates n types of attacks. Specifically, a Markov decision process is adopted to define a state value function and an action value function of an intrusion detection intelligent agent in the action decision process, and then the state value function or the action value function is formally expressed through a Bellman equation to complete the action decision process of the intrusion detection network framework.

In particular, the markov decision process has a markov property, namely: at time step t +1, the feedback of the environment depends only on the state and action a of the last time step t, with no correlation with time step t-1 and the time before t-1 step. The next state of the system is only relevant to the current state. Thus, the decision making process of the intrusion detection network architecture can be simplified. The Markov decision process of the system consists of five elements, S is a state space set, A is an action space set, and P_saRepresenting the probability of a state transition (the probability distribution of a transition to another state S 'after performing action a in state S, with an action reward written as P (S', R | S, a)), R being the reward function and γ being the discount factor. The Bellman equation will award Rt in time and the discount value gamma of the future state, the state value V (S) of time step t +1_t+1) Adding, reflecting the function V (S) of the state value in the current state_t) And the state of the next momentValue function V (S)_t+1) The relationship between them.

The Markov decision expression is:

MDP＝(S,A,P_sar, γ); wherein S ═ { S ═ S₁,S₂,…,S_n}；A＝{A₁,A₂,…,A_n}。

The expression of the Bellman equation is as follows:

similarly, the expression of the action value can be obtained as follows:

considering recursive updating of the Bellman equation, the Bellman equation expression is divided into an action value function and a state value function. When the next action is taken, the two value functions respectively follow the strategy pi to update the value function, wherein

Representing the state transition probability.

The expression of the state value function is:

the expression of the action value function is:

in the formula,

in order to be the probability of a state transition,

in the form of a set of state spaces,

Step S400: and obtaining a loss function of the intrusion detection network architecture by utilizing the reward cumulative function so as to obtain the intrusion detection network architecture.

It should be noted that after the training strategy of the intrusion detection network architecture is obtained, the loss function of the intrusion detection network architecture can be obtained according to the reward accumulation function, and then the intrusion monitoring network architecture for intrusion detection is obtained through the training strategy and the loss function.

Specifically, the step of obtaining the loss function of the intrusion detection network architecture by using the reward cumulative function to obtain the intrusion detection network architecture includes:

step S401: and processing the strategy network and the value network of the training strategy based on the reward cumulative function to obtain the advantage function value of the training strategy and the ratio of updating the new strategy and the old strategy.

Step S402: and constructing a loss function of the training strategy according to the ratio of the merit function value to the updated new strategy and the updated old strategy.

Step S403: and obtaining an intrusion detection network framework according to the training strategy and the loss function.

Specifically, when a loss function of a training strategy is constructed according to a ratio of an advantage function value to an updated new strategy and an updated old strategy, in this embodiment, a PPO2 strategy gradient algorithm is used to process the ratio of the advantage function value to the updated new strategy and the updated old strategy, so as to obtain a gradient value of the training strategy; and processing the gradient value of the training strategy by using a random gradient rise algorithm to obtain a loss function of the strategy gradient.

It is easily understood that the PPO2 policy gradient algorithm is to calculate an estimation amount of policy gradient and insert it into a stochastic gradient boosting algorithm, and to calculate policy gradient loss to update the parameters of the policy network by performing stochastic gradient boosting on the policy parameter θ.

Specifically, the expression of the policy gradient algorithm is:

the expression for updating the policy network is: a. the^π(s,a)＝Q^π(S,a)-V^π(s). Wherein,

is the merit function estimate for the time step t. When in use

Is positive and the gradient is positive, the probability of these actions should be increased at this time and decreased conversely. Expectation of

Shows the empirical average of finite batch samples when the strategy pi is adopted_θIn general, a neural network takes as input a state observed from the environment, takes as output an action taken, log π_θIs the probability logarithm of the policy network output, a_tFor the action of time step t, s_tFor the state of time step t, Q^π(s, a) is a function of the action value, V^π(s) is a function of the state value.

In this embodiment, in order to prevent the too large oscillation amplitude during the intrusion detection model training, the PPO2 introduces an objective Function (Clipped simulation Function) to constrain the update ratio of the new policy and the old policy, so as to implement small batch update in multiple steps. Definition of

For the new and old policy proportions, the Conservative Policy Iteration (CPI) penalty expression is:

it should be noted that if there is no constraint, the maximization of CPI may cause gradient explosion, and the change of proportion far from 1 may be punished by using a shear function, and the policy gradient expression after the constraint is obtained is:

by passing

Updating the strategy, and obtaining an updated strategy gradient expression as follows:

where ε is 0.2 is a hyperparameter, the first term in the minimum is CPI, the second term modifies the replacement target by clipping the scale, which will guarantee r_tIn the corresponding stimulation interval (1- ε,1+ ε). Clip is a cut function and the min function makes the final target the lower bound of CPI. The ratio is ignored if and only if the target improves, and is taken into account when making the target worse.

Further, based on the first embodiment of the method for constructing the intrusion detection network architecture of the industrial internet of things, the second embodiment of the method for constructing the intrusion detection network architecture of the industrial internet of things is provided. Referring to fig. 4, fig. 4 is a schematic flowchart of a second embodiment of the construction method of the intrusion detection network architecture of the industrial internet of things.

In practical application, when the intrusion detection of the industrial internet of things is carried out, after data to be detected are obtained, the intrusion detection intelligent body based on deep reinforcement learning training is adopted to judge intrusion behaviors and actions of the data to be detected so as to obtain an intrusion detection result, wherein the intrusion detection intelligent body obtaining the intrusion detection result is obtained through training of a training strategy and a loss function obtained through a reward cumulative function simulated by an environment state.

The embodiment provides a specific implementation scheme for obtaining an intrusion detection agent through a training strategy obtained through a reward cumulative function simulated by an environment state and a loss function training, which specifically includes the following steps:

step A100: network sample data of the target Internet of things is obtained.

Step A200: and inputting the network sample data into an intrusion detection model to obtain an expected intrusion detection result.

Step A300: and judging whether the intrusion detection model meets a convergence condition or not according to the expected intrusion detection result.

Step A400: if not, adjusting the training strategy of the intrusion detection model by using a reward cumulative function, returning to the step of inputting the network sample data into the intrusion detection model, and circulating until the expected intrusion detection result meets the convergence condition to obtain the intrusion detection agent.

Specifically, in this embodiment, the intrusion detection model is the industrial internet of things intrusion detection network architecture constructed in the above embodiment, and the intrusion detection agent with the converged model is obtained by performing cyclic training on the industrial internet of things intrusion detection network architecture.

It should be noted that the network sample data is acquired target internet of things historical data, the target internet of things historical data is used for training the acquired intrusion detection model, and the data type required by the training is network sample feature data acquired by performing feature screening and preprocessing on the target internet of things historical network data. In this embodiment, the network sample data is text data for recording interaction and operation of the target internet of things.

For convenience of understanding, referring to fig. 5, fig. 5 is a flowchart illustrating a specific implementation scheme of step a100 of the method for constructing an intrusion detection network architecture of the industrial internet of things according to the present invention. This embodiment provides a specific implementation scheme for performing feature screening on the network sample data to obtain network feature data, which is specifically as follows:

a101: and extracting the characteristics of the network sample data to obtain initial network characteristic data.

A102: and performing characteristic screening on the initial network characteristic data by adopting a LightGBM algorithm to obtain the network characteristic data.

Specifically, in this step, feature screening is performed on the network sample data to obtain network feature data including feature extraction of the network sample data and screening of the network feature data.

The feature extraction of the network sample data is used for extracting feature data in the network sample data to obtain initial network feature data used for training the detection model. The feature extraction may use feature extraction algorithms such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) to process the network sample data, and of course, other algorithms with feature extraction may also be used, which is not limited in this embodiment.

Extracting network sample data of text data for recording interaction and operation of the target Internet of things to obtain interaction characteristic data of the target Internet of things and operation characteristic data of the target Internet of things; the interactive feature data of the target Internet of things comprise adjacent message time intervals, equipment IDs in command messages, equipment IDs in response messages, values of command/response neutron function codes and the like; the target internet of things operation characteristic data comprises a sensor measurement value, a response function, a set value, a command function code value, an equipment state value and the like.

It should be noted that after the feature extraction is performed on the network sample data, the feature screening is further performed on the initial network feature data to obtain the processed network feature data. In this embodiment, the feature screening of the initial network feature data adopts a LightGBM algorithm to process the initial network feature data to delete the target network feature data in the initial network feature data, so as to obtain the network feature data for training the detection model.

Specifically, in this embodiment, the target network feature data includes network feature data with a missing value greater than a first preset value, network feature data with a unique value, any one of the pair of strongly correlated network feature data, and network feature data with a feature importance rank lower than a second preset value, which is obtained according to the LightGBM algorithm, and it should be noted that the pair of strongly correlated network feature data in this embodiment is a pair of network feature data with a Pearson correlation coefficient not lower than 0.99.

In this embodiment, when the missing rate of a feature is greater than 60%, it is defined that the feature has a large influence on the accuracy of intrusion detection, and the feature needs to be deleted, the first preset value is set to 60%; when the importance rank of one feature is lower than 70% of the number of all features, the feature is defined to have small influence on the detection result of intrusion detection, and the feature needs to be deleted, the second preset value is set to be 70%. It should be noted that, when different industrial internet of things process different feature data, different first preset values and second preset values may be set to improve the applicability of the feature data of intrusion detection, which is not limited in this embodiment.

In addition, it is easy to understand that, in order to reduce noise redundancy of network sample data and improve multi-class detection accuracy of the model, feature selection is performed on the network sample data first, and the redundant dimension of the data is effectively reduced while intrusion detection performance is ensured. After the target Internet of things sample data is obtained, feature extraction and feature screening are carried out on the network sample data to obtain network feature data used for training a detection model.

For convenience of understanding, referring to fig. 6, fig. 6 is a flowchart illustrating another specific implementation of step a100 of the method for constructing an intrusion detection network architecture of the industrial internet of things according to the present invention. This embodiment provides a specific implementation scheme for preprocessing the network feature data, which is as follows:

a111: and carrying out normalization processing on the network characteristic data to obtain normalized characteristic data.

A112: and carrying out feature vectorization on the normalized feature data to obtain feature vector data.

A113: and carrying out single-hot coding on the feature vector data to obtain data to be detected.

Specifically, the step of preprocessing the network sample data to obtain the network feature data includes normalization, feature vectorization, and unique hot encoding of the network sample data.

Data normalization, also called data normalization, is to limit the data to be processed to a certain range after being processed by some algorithm. In the data standardization process, a basic work of data mining is carried out, different evaluation indexes often have different dimensions and dimension units, the data analysis result is influenced under the condition, and in order to eliminate the dimension influence among the indexes, the data needs to be subjected to normalization process, so that the comparability problem among the data indexes is solved.

Specifically, in the present embodiment, data normalization uses the min-max function to scale the range eigenvalues to [0, 1%]Intervals, and further all variables in different intervals are normalized by the normalization formula:

wherein x is the original value and x' is the normalized value.

It should be noted that the feature vectorization is used to extract and combine attributes in the network feature data, each piece of network feature data has its attribute, different attributes are represented by different attribute values, and a feature vector is obtained by combining multiple attribute values.

It should be noted that the One-Hot encoding, i.e., One-Hot encoding, encodes N states by using N as a state register, each state having a separate register bit, and only One of the bits is active at any time. In the embodiment, the feature vector data is processed through the one-hot coding, only one feature in the feature vector is activated, sparse feature vector data is obtained, and then the data to be detected which can be directly detected by the detection model is obtained.

And it is easy to understand that, the present embodiment can directly use the data to be detected in order to obtain the detection model. After network characteristic data extracted from original network data of a target Internet of things are obtained, preprocessing the network characteristic data to obtain to-be-detected data which can be directly input into a detection model, wherein the to-be-detected data are obtained by considering the obtained network characteristic data with different dimensions and different orders of magnitude, and the network characteristic data with different dimensions and different orders of magnitude are unified in the same dimension and the same order of magnitude.

It should be noted that, when the intrusion detection model is trained according to network sample data, the intrusion detection model after each training needs to be determined, and therefore, an expected intrusion detection result of the network sample data processed by the intrusion detection model after each training needs to be obtained. Judging whether the intrusion detection model meets a convergence condition or not according to an expected intrusion detection result; if the intrusion detection model meets the convergence condition, acquiring an intrusion detection intelligent agent for intrusion detection; and if the intrusion detection model does not meet the convergence condition, adjusting the training strategy of the intrusion detection model by using a reward cumulative function, and returning to the step of inputting the network sample data into the intrusion detection model.

Specifically, in this embodiment, it is determined whether the intrusion detection model satisfies a convergence condition, where the convergence condition of the intrusion detection model is the convergence of the intrusion detection model or a preset step length of the intrusion detection model after training.

It should be noted that, in this embodiment, the step of adjusting the training strategy of the intrusion detection model by using the reward cumulative function specifically includes:

a401: and processing the strategy network and the value network of the training strategy based on the reward cumulative function to obtain the advantage function value of the training strategy and the ratio of updating the new strategy and the old strategy.

A402: and constructing a loss function of the training strategy according to the ratio of the merit function value to the updated new strategy and the updated old strategy.

A403: and updating strategy parameters of the training strategy by using the loss function so as to adjust the training strategy of the intrusion detection model.

It should be noted that the PPO2 strategy gradient algorithm is described in detail in the foregoing embodiments, and will not be described herein.

Further, as shown in fig. 7, in this embodiment, in order to update the neural network in time, the policy network and the value network are fused, and the weights of the two networks are shared and updated at the same time. PPO2 formalizes a fixed-length track segment of a representation using a set of historical records of states and actions. In each iteration, N parallel agents collect data for T steps. Constructing loss on NT step, optimizing the loss by using small batch gradient descent or Adam optimizer, using MLP network sharing 3 hidden layers between strategy network and value network, 128 neurons at

layer

1, 64 neurons at layer 2 and 64 neurons at layer 3, and adding a linear unit ReLU activation function after each hidden layer.

And it is easy to understand that, this embodiment provides a training process for an intrusion detection agent for detecting data to be detected, in the training process, network sample data is processed through an intrusion detection model, before the intrusion detection model meets a convergence condition, a reward cumulative function is used to adjust a training strategy of the intrusion detection model and return to the step of inputting the network sample data into the intrusion detection model, and the process is circulated until the expected intrusion detection result meets the convergence condition to obtain the intrusion detection agent. The method can dynamically adjust the identification strategy of the industrial Internet of things according to the characteristics of continuous change of the environment and the structure of the industrial Internet of things, and obtain an accurate intrusion detection result.

Further, based on the first embodiment and the second embodiment of the method for constructing the intrusion detection network architecture of the industrial internet of things, the third embodiment of the method for constructing the intrusion detection network architecture of the industrial internet of things is provided. Referring to fig. 8, fig. 8 is a schematic flowchart of a third embodiment of the method for constructing an intrusion detection network architecture of the industrial internet of things according to the present invention.

In this embodiment, the method for implementing the intrusion detection of the industrial internet of things by using the intrusion detection agent trained by the intrusion detection network architecture of the industrial internet of things includes the following steps:

step B100: and acquiring original network data of the target Internet of things.

The target internet of things is an industrial internet of things to be subjected to intrusion detection, and includes but is not limited to high-automation industrial field internet of things systems such as an electric power internet of things system, an oil and gas internet of things system, an urban rail transit internet of things system and the like. The original network data of the industrial Internet of things are original network data of text data for recording interaction and operation of the target Internet of things and the like.

In addition, it is worth mentioning that, in this embodiment, the action of acquiring the original network data of the target internet of things may be specified by a user. For example, for an industrial internet of things with high real-time requirement and fast response to an intrusion action, the original network data can be acquired every day, every hour or every minute, or even acquired without intervals; for industrial internet of things with low real-time requirements and slow response of intrusion actions, the original network data can be acquired every three days, every week or every month, or even can be acquired manually by a user when the intrusion detection is required.

And it is easy to understand that, according to the requirements of the user, the original network data in the target internet of things can be obtained at each preset time interval or no time interval or manually, and whether the original network data access to the industrial internet of things has intrusion actions and behaviors or not is judged by processing the original network data.

Step B200: and performing characteristic screening on the original network data to obtain network characteristic data.

It should be noted that, when the intrusion action and behavior are determined according to the original network data, the feature data of the traffic information is used instead of the traffic information data, and after the original network data of the target internet of things is obtained, feature extraction and feature screening are performed on the original network data to obtain the network feature data for detecting the intrusion action.

Specifically, original network data of text data recording target internet of things interaction and operation are extracted and screened, and target internet of things interaction feature data and target internet of things operation feature data are obtained; the interactive feature data of the target Internet of things comprise adjacent message time intervals, equipment IDs in command messages, equipment IDs in response messages, values of command/response neutron function codes and the like; the target internet of things operation characteristic data comprises a sensor measurement value, a response function, a set value, a command function code value, an equipment state value and the like. Of course, other network feature data with text data for recording interaction and operation of the target internet of things may also be used, which is not limited in this embodiment.

It is easy to understand that, in this embodiment, after the original data of the target internet of things is obtained, feature extraction and feature screening are performed on the original network data to obtain network feature data for detecting an intrusion behavior, and then whether the access of the target internet of things is an intrusion action and behavior is determined according to the network feature data.

Step B300: and preprocessing the network characteristic data to obtain data to be detected.

It is easy to understand that the acquired network characteristic data may be based on different terminal devices and different communication networks, and the data dimensions and data magnitudes obtained by the network characteristic data will be different, so that the acquired network characteristic data needs to be preprocessed before the network characteristic data is input into the detection model for intrusion action and behavior determination to obtain the data to be detected.

Specifically, after network characteristic data extracted from original network data of a target internet of things are obtained, in order to use the network characteristic data with different dimensions and different magnitudes as detection data of the same detection model, adaptive preprocessing needs to be performed on the network characteristic data, so that the data to be detected obtained through preprocessing can be directly input into the detection model to obtain an intrusion detection result, time consumed by feature engineering of the detection model in the detection process is reduced, and influence of unprocessed network characteristic data on the detection rate of the detection model is reduced.

It is easy to understand that, in this embodiment, after the network feature data extracted from the original network data of the target internet of things is obtained, the network feature data is preprocessed to obtain the data to be detected, which can be directly input into the detection model, where the data to be detected is the network feature data with different dimensions and different magnitudes, and the network feature data with different dimensions and different magnitudes are unified in the same dimension and the same magnitude.

It should be noted that the original network data is used to input the intrusion detection agent for detection, and the type of the data required by the intrusion detection agent is the same as the data to be detected obtained by performing feature screening and preprocessing on the network sample data of the target internet of things in the foregoing embodiment, which is not described in detail in this embodiment.

Step B400: and inputting the data to be detected into an intrusion detection intelligent agent obtained by training so as to obtain an intrusion detection result.

It should be noted that, in this embodiment, after the data to be detected is obtained, the intrusion detection agent based on the deep reinforcement learning training is adopted to judge the intrusion behavior and the action of the data to be detected, so as to obtain an intrusion detection result.

Specifically, in the embodiment, the data to be detected is subjected to judgment of intrusion behaviors and actions to obtain an intrusion detection result, wherein the intrusion detection agent obtaining the intrusion detection result is obtained by training a training strategy and a loss function obtained by a reward cumulative function simulated by an environment state, and the environment state is obtained according to historical industrial internet of things data.

In the embodiment, the data to be detected is obtained by processing the original network data, the data to be detected is input into the mature intrusion detection agent of the training strategy which is continuously updated through the loss function, the data to be detected is subjected to intrusion detection, the identification strategy of the intelligent intrusion detection agent can be dynamically adjusted according to the characteristics of the continuous change of the environment and the structure of the industrial internet of things, the accurate intrusion detection result is obtained, and the intelligent intrusion detection agent has strong adaptability.

Referring to fig. 9, fig. 9 is a structural block diagram of an intrusion detection network architecture construction device for the industrial internet of things.

As shown in fig. 9, the device for constructing an intrusion detection network architecture of an industrial internet of things according to an embodiment of the present invention includes:

the state acquisition module 10 is used for acquiring environmental state data based on historical internet of things data;

a reward obtaining module 20, configured to obtain a reward cumulative function of the intrusion detection network framework based on the environment state data;

a policy obtaining module 30, configured to obtain a cost function of the intrusion detection network framework according to the reward cumulative function, and obtain a training policy of the intrusion detection network framework according to the cost function;

and the framework obtaining module 40 is configured to obtain a loss function of the intrusion detection network framework by using the reward cumulative function, so as to obtain the intrusion detection network framework.

Other embodiments or specific implementation manners of the industrial internet of things intrusion detection network architecture construction device provided by the invention can refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. An element defined by the phrase "comprising", without further limitation, does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, etc. do not denote any order, but rather the terms first, second, etc. are used to denote any order.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A construction method of an industrial Internet of things intrusion detection network architecture is characterized by comprising the following steps:

2. The method for constructing an industrial internet of things intrusion detection network architecture according to claim 1, wherein the step of obtaining environmental state data based on historical internet of things data specifically comprises:

3. The method for constructing an intrusion detection network architecture of the industrial internet of things according to claim 2, wherein the step of obtaining the reward cumulative function of the intrusion detection network architecture based on the environment state data specifically comprises:

4. The construction method of the intrusion detection network architecture of the industrial internet of things according to claim 3, wherein the feedback signal r of each time step t is obtained based on the environment state data_tThe method specifically comprises the following steps:

5. The method for constructing an industrial Internet of things intrusion detection network architecture according to claim 3,the feedback signal r according to each time step t_tAfter the step of obtaining the reward accumulation function of the intrusion detection network framework, the method further comprises the following steps:

6. The method for constructing an intrusion detection network architecture of the industrial internet of things according to claim 5, wherein the step of obtaining the cost function of the intrusion detection network architecture according to the reward cumulative function and obtaining the training strategy of the intrusion detection network architecture according to the cost function specifically comprises:

the expression of the state value function is:

the expression of the action value function is:

in the formula,

in order to be the probability of a state transition,

in the form of a set of state spaces,

7. The method for constructing an intrusion detection network architecture of the industrial internet of things according to claim 1, wherein the step of obtaining a loss function of the intrusion detection network architecture by using the reward cumulative function to obtain the intrusion detection network architecture specifically comprises:

8. The utility model provides an industry thing networking intrusion detection network architecture founds device which characterized in that, industry thing networking intrusion detection network architecture founds device includes:

9. The utility model provides an industry thing networking intrusion detection network architecture founds equipment which characterized in that, industry thing networking intrusion detection network architecture founds equipment and includes: the industrial internet of things intrusion detection network architecture construction method comprises a memory, a processor and an industrial internet of things intrusion detection network architecture construction program which is stored on the memory and can run on the processor, wherein the steps of the industrial internet of things intrusion detection network architecture construction method according to any one of claims 1 to 7 are realized when the industrial internet of things intrusion detection network architecture construction program is executed by the processor.

10. A storage medium having an industrial internet of things intrusion detection network architecture construction program stored thereon, wherein the industrial internet of things intrusion detection network architecture construction program, when executed by a processor, implements the steps of the industrial internet of things intrusion detection network architecture construction method according to any one of claims 1 to 7.