CN117040809A

CN117040809A - Method for generating defense strategy of industrial information physical system based on Bayesian random game

Info

Publication number: CN117040809A
Application number: CN202310894809.2A
Authority: CN
Inventors: 杨强; 姚鹏超; 颜秉晶; 王文海
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-11-10
Anticipated expiration: 2043-07-20
Also published as: CN117040809B

Abstract

The invention provides a method for generating an industrial information physical system defense strategy based on Bayesian random game. The invention establishes a Bayesian random game model to describe the characteristics of attack and defense competition under the environment of an industrial information physical system, and generates a defensive reaction strategy under the condition of no complete network attack information; adopting a unified measurement and balance framework to standardize the configuration of game parameters, and ensuring the consistency and compatibility of the information physical model construction; adopting a Harsanyi conversion method to convert incomplete information about an attacker utility function into imperfect information about the attacker type, and ensuring the existence of Bayesian Nash equilibrium strategy; the invention also provides a multi-agent Bayesian Q-learning algorithm to learn under the dynamic network environment and unknown game parameters to obtain a Nash equilibrium defense strategy, so as to help a decision maker to formulate a robust and effective security defense strategy.

Description

Method for generating defense strategy of industrial information physical system based on Bayesian random game

Technical Field

The invention relates to a network security defense method of an industrial information physical system, belongs to the field of industrial control network space security, and particularly relates to a method for generating a defense strategy of the industrial information physical system based on Bayesian random game.

Background

The rapid development and application of information and communication technology has drastically changed the traditional industrial control system into a networked industrial information physical system. This digital revolution has greatly improved the production efficiency and manufacturing capacity of the industry. However, advanced information and communication technologies also expose industrial network physical systems to more and more attacks from network space (unauthorized access, data leakage, system destruction, and other malicious activities), which can compromise the confidentiality, integrity, and availability of critical infrastructure and industrial processes. Therefore, making effective network security countermeasures is of great importance.

Security measures in the field of industrial information physical systems today rely mainly on passive defense mechanisms (such as intrusion detection systems and firewalls). In view of the durability and complexity of the network, it is critical to develop active decision methods that can effectively predict and deal with emerging threats in real time. Game theory, as a formal theoretical framework for research strategic interactions, provides a systematic approach to help system administrators formulate timely and proactive defense strategies. By analyzing the interactive interactions between the attacker and defender, the game theory allows the administrator to learn the incentives, motivations, and potential strategies of the adversaries, thereby developing a defensive strategy that maximizes the security outcome while taking into account the associated costs and benefits.

Disclosure of Invention

The invention aims to solve the problem of active decision response in the network security defense of the industrial information physical system. Aiming at the defect that the existing game theory is applied to the research of the field of safety protection of the industrial information physical system, the method for generating the defense strategy of the industrial information physical system based on Bayesian random game is provided. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game has guiding significance on network security protection of key industrial infrastructure.

The aim of the invention can be achieved by the following technical scheme:

the invention firstly provides a method for generating an industrial information physical system defense strategy based on Bayesian random game, which comprises the following steps:

1) Constructing a Bayesian random game model, wherein the Bayesian random game model describes an attack-defense interaction process under the condition that network attack information is incompletely known in an industrial information physical system;

2) The industrial information physical system comprises an information layer and a physical layer, and a unified quantization method based on time is utilized to construct game utility functions of the information layer and the physical layer, which are targeted by both the attack and the defense in the Bayesian random game model, by taking time as a quantization index;

3) Converting incomplete information of an attacker-to-attacker game utility function into imperfect information about attacker types by using a Harsanyi conversion method to obtain an imperfect information Bayesian random game model participated by various attacker types, and ensuring the existence of Bayesian Nash equilibrium defense strategies;

4) Based on the Bayesian random game model obtained in the step 3), learning by utilizing a multi-agent Bayesian Q-learning algorithm (MABQL) under the conditions of a dynamic network environment and unknown game parameters to obtain a Nash equilibrium defense strategy, and obtaining the optimal defense strategy of the industrial information physical system.

According to a preferred scheme of the invention, the construction process of the Bayesian random game model in the step 1) specifically comprises the following steps:

abstracting attack and defense interaction in an industrial information physical system into a non-zero Bayesian random game model, under the model, dispersing a continuous process of the attack and defense interaction into a plurality of game states by utilizing a random game theory, wherein each game state corresponds to a specific security state of an information layer or a physical layer in the system; the attacker and the defender have corresponding game utility functions in the information layer or physical layer game state; the current state may transition to the next gaming state based on a probability distribution determined jointly by the actions of the defender and the attacker;

considering that an defender does not have complete information about an attacker in an actual attack and defense scene, converting each game state into a static Bayesian game by utilizing a Bayesian game theory, wherein a utility function is unilateral information;

finally, the bayesian stochastic game model is defined as an 11-tuple:

G＝<N,S,Θ,P _A ,A,D,T,O,π ^A ,π ^D ,U ^A ,U ^D >

wherein the elements are defined as: n= { Attacker, defender } is the set of game participants; s= { S ₁ ,s ₂ ,...,s _K -a set of game states, wherein each state represents a security state of the network; Θ= { θ ₁ ,θ ₂ ,...,θ _I -representing a set of attacker types;a set of probability distributions representing defenders versus aggressor types, whereRepresentative state s _k Distribution of the lower part; a= { a ₁ ,A ₂ ,...,A _K An attacker action set, where A _k ＝{a ₁ ,a ₂ ,...,a _n Represents state s _k A lower action set; d= { D ₁ ,D ₂ ,...,D _K The actions set of the defender, where D _k ＝{d ₁ ,d ₂ ,...,d _n Represents state s _k A lower action set; />Is the policy set of the attacker; />Is a policy set of defenders; u (U) ^A And U ^D Utility functions of the offender and defender, respectively.

According to a preferred scheme of the invention, in step 2), the game utility functions of the attack and defense parties aiming at the information layer and the physical layer in the Bayesian random game model are constructed by taking time as a quantization index, and specifically:

2.1 Defining attack time T _a The method comprises the steps of carrying out a first treatment on the surface of the It represents the time required for an attacker to perform an attack action on a particular target, which quantifies the time from the vulnerability scanning to the attacker's successful exploitation of the vulnerability;

2.2 Defining a defence time T _d The method comprises the steps of carrying out a first treatment on the surface of the Which represents the time required for the defender to perform the defending action, which quantifies the time from recognition of the attack to completion of the defending action;

2.3 Defining recovery time T _r The method comprises the steps of carrying out a first treatment on the surface of the Representing the time required for the attacked device to resume its normal operating state after the attack; t for unified quantization of recovery time of information layer and physical layer _r Is defined as:

T _r ＝T _r,c +T _r,p

T _r,c representing the time required to restore the compromised device to a normal state by bug fixes or switches to the standby device; t (T) _r,p Representing the time required to restore the physical layer control process to a normal state; when the attacked device is an information layer host device, T _r ＝T _r,c The method comprises the steps of carrying out a first treatment on the surface of the When the attacked device is a physical layer sensor or actuator device then T _r ＝T _r,c +T _r,p The method comprises the steps of carrying out a first treatment on the surface of the This marks that the defender needs not only to repair the damaged equipment, but also to resume the control process;

2.4 For calculating T _r,p The physical layer control process is modeled as:

x _k is the system state at time k, y _k Is the measurement state at time k, u' _k Represents the attack signal at the moment k, A represents the transfer matrix, B represents the input control matrix, C represents the output observation matrix, B _a Represents an attack matrix, w _k Representing process noise, v _k Representing measurement noise; t (T) _r,p Indicating the overall recovery time for the state variable to deviate from the normal range to recover to the normal range when the physical layer control process is attacked.

According to a preferred embodiment of the present invention, the Harsanyi conversion method in step 3) is:

3.1 Introducing a third party participant, and converting incomplete information of the defender on the utility function of the attacker into imperfect information on the action of the third party participant;

3.2 After the attack and defense parties select respective actions, the third party participant selects the next game state according to the transition probability, and the new state can be observed by the attacker and the defender;

3.3 Third party participants select the type of attacker based on the probability distribution of the defender to the type of attacker, and can only be observed by the attacker;

3.4 After the third party participant completes the selection, the attacker and defender both select actions according to the respective strategies and then obtain the respective rewards;

3.5 Repeating the processes from 3.1) to 3.4) until the game is finished, and obtaining the imperfect information Bayesian random game model with participation of various attacker types. According to a preferred embodiment of the present invention, the utility function of the attacker and defender in step 3) is defined as:

R ^A (s _t ,a,d,θ _i ) And R is ^D (s _t ,a,d,θ _i ) Instant rewards representing aggressors and defenders,and->Representing a future reward;

the immediate rewards in the utility function are defined as:

R ^A (s _t ,a,d,θ _i )＝ε(s _t ,a,d)T _r (a,θ _i )-T _a (θ _i )

R ^D (s _t ,a,d,θ _i )＝-ε(s _t ,a,d)T _r (a,θ _i )-T _d

T _r (a,θ _i ) Is attacker theta _i Recovery time required by the system after successful attack; t (T) _a (θ _i ) Representing attacker theta _i The time required to perform the attack action; epsilon(s) _t A, d) represents the probability of success of the attack under action (a, d); t (T) _d Representing the time required to implement a defensive action;

for future awards in the utility function, O (s _k |s _t A, d) represents the state s in the case of an attacker and defender taking action (a, d), respectively _t Transition to state s _k And O(s) _k |s _t ,a,d)＝ε _k (a,d)；Andrepresenting the desired utility value in a state, also referred to as a state value, can be found by the following formula:

bayesian nash equalization for each game state is:

namely the Bayesian Nash equilibrium strategy. Because the attacker type set Θ, the state set S, the attacker action set A and the defender action set D are allThe game has bayesian nash equalization, i.e. bayesian nash equalization defense strategies.

According to the preferred scheme of the invention, in the step 4), a multiple agent Bayesian Q-learning algorithm (MABQL) is utilized to learn and obtain a Nash equilibrium defense strategy under the conditions of a dynamic network environment and unknown game parameters, specifically:

4.1 Q-function of Q-learning algorithm is defined as:

alpha represents the learning rate, gamma represents the fit factor, and the expected benefit of the Q function can be defined as:

4.2 The flow of learning to obtain Nash equilibrium defense strategy is as follows: (1) initializing parameters: r is equal to or is equal to 0,k=1, 2..k; (2) third party participant selection of current gaming state s _t The method comprises the steps of carrying out a first treatment on the surface of the (3) Third party participants select attacker theta based on theta distribution _i The method comprises the steps of carrying out a first treatment on the surface of the (4) Selecting actions by the two game parties based on the epsilon-greedy strategy; (5) the third party participant selects the next state; (6) updating the Q-function based on the Chinese formula of 4.1); (7) nash equalization strategy for calculating current state +.>(8) r is ≡r+1 and jumps to ≡ 2) repeating the above process; (9) when meeting the requirementsReturning to a Nash equilibrium strategy when the delta is a threshold parameter; obtaining the best strategy for all types of attackers and the best defense strategy for cooperatively defending all the attackers.

The invention has the beneficial effects of solving the problem that the industrial information physical system lacks active decision capability when facing high-persistence and high-concealment network attack. The invention provides a safety decision method based on a Bayesian random game model, which accurately captures the dynamic interaction process of attack and defense and can generate cooperative attack aiming at different types of attackers to perform optimal defense. In order to overcome the challenges brought by different objective utility functions between information and physical layers in a traditional game model, the invention introduces a unified quantization method based on time, integrates various utility functions into one frame, and bridges the gap between the information and physical security fields. Unlike traditional methods that require complete information to solve the game problem, the present invention proposes a data-driven reinforcement learning method, i.e., MABQL, that can obtain an optimal defense strategy in unknown game parameters or uncertain network environments. According to the invention, the network attack process in the actual industrial control scene is accurately modeled, and the Bayesian Nash equilibrium theory and the MABQL algorithm are utilized to generate the optimal defense strategy in the dynamic network security state, so that the network penetration attack on the industrial information physical system is actively and efficiently prevented.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a Harsanyi converted game tree;

FIG. 3 is a diagram of a test platform system architecture;

FIG. 4 is a diagram of a gambling state Markov transition;

FIG. 5 is a MABQL algorithm defense strategy convergence diagram;

FIG. 6 is a graph of expected benefit convergence for the MABQL algorithm;

figure 7 is a defensive strategy evolution process under varying circumstances.

Detailed Description

A clearer explanation of the objects and effects of the present invention will be provided with reference to the attached drawings.

As shown in fig. 1, the implementation flow chart of the method for generating the defense strategy of the industrial information physical system based on the bayesian random game mainly comprises 4 sub-steps, and in a specific embodiment, the method specifically comprises the following steps:

specifically, the attack and defense interaction in the industrial information physical system is abstracted into a non-zero Bayesian random game model, under the model, the continuous process of the attack and defense interaction is discretized into a plurality of game states by utilizing a random game theory, and each game state corresponds to a specific security state of an information layer or a physical layer in the system; the attacker and the defender have corresponding game utility functions in the information layer or physical layer game state; the current state may transition to the next gaming state based on a probability distribution determined jointly by the actions of the defender and the attacker;

finally, the bayesian stochastic game model is defined as an 11-tuple:

G＝<N,S,Θ,P _A ,A,D,T,O,π ^A ,π ^D ,U ^A ,U ^D >

wherein the elements are defined as: n= { Attacker, defender } is the set of game participants; s= { S ₁ ,s ₂ ,...,s _K -a set of game states, wherein each state represents a security state of the network; Θ= { θ ₁ ,θ ₂ ,...,θ _I -representing a set of attacker types;a set of probability distributions representing defenders versus aggressor types, whereRepresentative state s _k Distribution of the lower part; a= { a ₁ ,A ₂ ,...,A _K Is an attackerWherein A is _k ＝{a ₁ ,a ₂ ,...,a _n Represents state s _k A lower action set; d= { D ₁ ,D ₂ ,...,D _K The actions set of the defender, where D _k ＝{d ₁ ,d ₂ ,...,d _n Represents state s _k A lower action set; />Is the policy set of the attacker; />Is a policy set of defenders; u (U) ^A And U ^D Utility functions of the offender and defender, respectively.

2) The industrial information physical system comprises an information layer and a physical layer, and a unified quantization method based on time is utilized to construct game utility functions of the information layer and the physical layer, which are targeted by both the attack and the defense in the Bayesian random game model, by taking time as a quantization index; the method comprises the following steps:

T _r ＝T _r,c +T _r,p

2.4 For calculating T _r,p The physical layer control process is modeled as:

as shown in fig. 2, the Harsanyi conversion method is as follows:

3.5 Repeating the processes from 3.1) to 3.4) until the game is finished, and obtaining the imperfect information Bayesian random game model with participation of various attacker types.

The utility function of the attacker and defender in step 3) is defined as:

the immediate rewards in the utility function are defined as:

R ^A (s _t ,a,d,θ _i )＝ε(s _t ,a,d)T _r (a,θ _i )-T _a (θ _i )

R ^D (s _t ,a,d,θ _i )＝-ε(s _t ,a,d)T _r (a,θ _i )-T _d

for future awards in the utility function, O (s _k |s _t ,a,d)Representing the state s in the case of the respective actions (a, d) taken by the attacker and defender _t Transition to state s _k And O(s) _k |s _t ,a,d)＝ε _k (a,d)；Andrepresenting the desired utility value in a state, also referred to as a state value, can be found by the following formula:

bayesian nash equalization for each game state is:

namely the Bayesian Nash equilibrium strategy. Because the attacker type set Θ, the state set S, the attacker action set A and the defender action set D are all finite sets, bayesian Nash equilibrium exists in the game, namely a Bayesian Nash equilibrium defending strategy exists.

In the step 4), a multiple-agent bayesian Q-learning algorithm (MABQL) is used to learn and obtain a nash equilibrium defense strategy under the conditions of a dynamic network environment and unknown game parameters, specifically:

4.1 Q-function of Q-learning algorithm is defined as:

The present invention utilizes a laboratory test platform at university of Zhejiang to simulate the power generation process to evaluate the proposed method. Fig. 3 shows the system architecture of the test platform used. In enterprise networks, management host 1 (MH 1) plays the role of an external network service, while management host 2 (MH 2) acts as an SQL server for data storage. They all have the ability to communicate with hosts within the control network. In a control network, human-machine interfaces (HMI) are used to monitor the control process, while Engineer Stations (ES) and Operator Stations (OS) are used to issue control commands to Programmable Logic Controllers (PLCs). S7-1200 series PLCs were selected to effectively control the underlying actuators and sensors.

There are several security vulnerabilities in the test platform environment, as shown in table 1.

TABLE 1

Each vulnerability is assigned a unique ID based on a common vulnerability and a public vulnerability database (CVE). For example, CVE-2017-0144 represents a vulnerability to Windows Server Message Block (SMB) allowing remote arbitrary code execution over the SMB interface. The present invention determines by evaluating the probability of a successful exploit vulnerability using the availability score in the universal vulnerability scoring system (CVSS).

We assume that the main goal of an attacker is to destroy the sensors and actuators. In view of the tight access control in industrial networks, an attacker should first exploit vulnerabilities to elevate his rights on enterprise network hosts. Then, the attacker moves transversely to destroy the control network host and establishes hidden communication with the controller by adopting disguise of legal hosts. Eventually, the attacker will continue to launch targeted attacks on the executor or sensor, aiming at disturbing the physical process. Table 2 shows security states accessible to an attacker and fig. 4 illustrates the corresponding transition relationships between gaming states.

TABLE 2

For an attacker, the invention considers the attack type theta= { theta ₁ ,θ ₂ And }, wherein θ ₁ Representing a low-level attacker, θ ₂ Representing a high-level attacker. Table 3 lists all of the submarinesIn the list of attacks, and the corresponding action time required for each type of attacker.

TABLE 3 Table 3

The set of attack actions available in each state can be represented as A ₁ ＝{a ₁ ,a ₂ }，A ₂ ＝{a ₃ ,a ₄ ,a ₅ },A ₃ ＝{a ₃ ,a ₄ ,a ₅ },A ₄ ＝{a ₆ ,a ₇ ,a ₈ },A ₅ ＝{a ₆ ,a ₇ ,a ₈ },A ₆ ＝{a ₃ ,a ₄ },In addition, table 4 shows the recovery time T required after a successful attack _r (a,θ _i )。

TABLE 4 Table 4

For defenders, in each state, the prior probability distribution of the attacker type is P ₁ ^A ＝{0.7,0.3},P ₂ ^A ＝{0.5,0.5},P ₃ ^A ＝{0.5,0.5},P ₄ ^A ＝{0.26,0.74},P ₅ ^A ＝{0.26,0.74},P ₆ ^A = {0.3,0.7}. To prevent potential intrusion by an attacker, defensive actions are recorded in table 5.

TABLE 5

The set of defensive actions available in each state can then be described as D ₁ ＝{d ₁ ,d ₂ ,d ₃ },D ₂ ＝{d ₄ ,d ₅ ,d ₆ },D ₃ ＝{d ₄ ,d ₅ ,d ₆ },D ₄ ＝{d ₇ ,d ₈ ,d ₉ ,d ₁₀ },D ₅ ＝{d ₇ ,d ₈ ,d ₉ ,d ₁₀ },D ₆ ＝{d ₄ ,d ₅ },

In order to verify the feasibility and convergence of the proposed MABQL algorithm, a conventional nonlinear programming method is first used to obtain an optimal attack defense strategy with complete information of all parameters. The results are presented in table 6, which provides a comprehensive overview of the policies employed by the attacker and defender at the nash equilibrium point and their corresponding expected utility values.

TABLE 6

The results show that attacker θ ₁ And theta ₂ Different optimal strategies are selected. For example, in state S ₁ Under attacker θ ₁ A mixing strategy will be employedAnd attacker theta ₂ Will employ a pure strategy->To combat attacker theta ₁ And theta ₂ Is +.>These experimental results illustrate the subtle and varied decision processes employed by different players in gaming. In addition, table 6 provides an overview of the player's expected utility values when the Nash equilibrium point is reached, indicating that the utility function in the proposed gaming model is non-zero and.

Subsequently, the invention applies the proposed MABQL algorithm to solve the gaming model under incomplete information. The learning rate α is set to 0.5, and the discount factor γ is set to 1. Fig. 5 depicts the process of defender strategy learning, converging after about 100 learning steps. Fig. 6 illustrates the evolution of the expected utility value of the defender in each state. Experimental results show that the proposed algorithm can converge to a nash equalization strategy.

The present invention defines the following network environment to evaluate the suitability of the proposed algorithm: (1) in state t=t ₁ Under the condition, the state transition probability in the network environment changes, O (s _k |s _t ,a,d)＝ε _k (a, d)/2; (2) in state t=t ₂ Next, the defender updates the probability beliefs for the type of attacker in the network. In state S ₂ In the case of an example of this,from {0.5,0.5} to {0.3,0.7}.

The results are shown in fig. 7, indicating that the algorithm initially converged to nash equalization. When the network environment is at t=t ₁ When the position changes, the algorithm is quickly adapted and converges again after 100 learning steps, and the defense strategy is changed into {0.543,0.309,0.148}. Also, at t=t ₂ When the probability distribution of the attacker type changes, the algorithm quickly converges to a new nash equalization strategy {0.402,0.386,0.212}. The experimental result verifies that the provided algorithm can quickly learn the optimal defense strategy, and shows the effectiveness of the provided invention in the aspect of formulating the optimal defense strategy.

The foregoing list is only illustrative of specific embodiments of the invention. Obviously, the invention is not limited to the above embodiments, but many variations are possible. All modifications directly derived or suggested to one skilled in the art from the present disclosure should be considered as being within the scope of the present invention.

Claims

1. A method for generating an industrial information physical system defense strategy based on Bayesian random game is characterized by comprising the following steps:

2. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game according to claim 1, wherein the construction process of the Bayesian random game model in the step 1) is specifically as follows:

finally, the bayesian stochastic game model is defined as an 11-tuple:

G＝<N,S,Θ,P _A ,A,D,T,O,π ^A ,π ^D ,U ^A ,U ^D >

3. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game according to claim 2, wherein in the step 2), game utility functions of both the attacking and defending party in the Bayesian random game model aiming at the information layer and the physical layer are constructed by taking time as a quantization index, specifically:

T _r ＝T _r,c +T _r,p

2.4 For calculating T _r,p The physical layer control process is modeled as:

x _k is the system state at time k, y _k Is the measurement state at time k, u' _k Represents the attack signal at the moment k, A represents the transfer matrix, B represents the input control matrix, C represents the output observation matrix, B _a Represents an attack matrix, w _k Representing process noise, v _k Representing measurement noise; t (T) _r,p Representing physical layer controlThe overall recovery time for the state variable to deviate from the normal range to its recovery to the normal range when the process is under attack.

4. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game according to claim 2, wherein the Harsanyi conversion method in the step 3) is as follows:

5. The method for generating an industrial information physical system defense strategy based on bayesian random game according to claim 4, wherein the utility function of an attacker and a defender in step 3) is defined as:

the immediate rewards in the utility function are defined as:

R ^A (s _t ,a,d,θ _i )＝ε(s _t ,a,d)T _r (a,θ _i )-T _a (θ _i )

R ^D (s _t ,a,d,θ _i )＝-ε(s _t ,a,d)T _r (a,θ _i )-T _d

for future awards in the utility function, O (s _k |s _t A, d) represents the state s in the case of an attacker and defender taking action (a, d), respectively _t Transition to state s _k And O(s) _k |s _t ,a,d)＝ε _k (a,d)；And->Representing the desired utility value in a state, also referred to as a state value, can be found by the following formula:

bayesian nash equalization for each game state is:

6. The method for generating the defense strategy of the industrial information physical system based on the bayesian random game according to claim 2, wherein in the step 4), the nash equilibrium defense strategy is obtained by learning a multi-agent bayesian Q-learning algorithm (MABQL) under the conditions of a dynamic network environment and unknown game parameters, specifically:

4.1 Q-function of Q-learning algorithm is defined as: