CN117040809A - Method for generating defense strategy of industrial information physical system based on Bayesian random game - Google Patents

Method for generating defense strategy of industrial information physical system based on Bayesian random game Download PDF

Info

Publication number
CN117040809A
CN117040809A CN202310894809.2A CN202310894809A CN117040809A CN 117040809 A CN117040809 A CN 117040809A CN 202310894809 A CN202310894809 A CN 202310894809A CN 117040809 A CN117040809 A CN 117040809A
Authority
CN
China
Prior art keywords
attacker
game
bayesian
state
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310894809.2A
Other languages
Chinese (zh)
Other versions
CN117040809B (en
Inventor
杨强
姚鹏超
颜秉晶
王文海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202310894809.2A priority Critical patent/CN117040809B/en
Publication of CN117040809A publication Critical patent/CN117040809A/en
Application granted granted Critical
Publication of CN117040809B publication Critical patent/CN117040809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • H04L63/205Network architectures or network communication protocols for network security for managing network security; network security policies in general involving negotiation or determination of the one or more network security mechanisms to be used, e.g. by negotiation between the client and the server or between peers or by selection according to the capabilities of the entities involved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for generating an industrial information physical system defense strategy based on Bayesian random game. The invention establishes a Bayesian random game model to describe the characteristics of attack and defense competition under the environment of an industrial information physical system, and generates a defensive reaction strategy under the condition of no complete network attack information; adopting a unified measurement and balance framework to standardize the configuration of game parameters, and ensuring the consistency and compatibility of the information physical model construction; adopting a Harsanyi conversion method to convert incomplete information about an attacker utility function into imperfect information about the attacker type, and ensuring the existence of Bayesian Nash equilibrium strategy; the invention also provides a multi-agent Bayesian Q-learning algorithm to learn under the dynamic network environment and unknown game parameters to obtain a Nash equilibrium defense strategy, so as to help a decision maker to formulate a robust and effective security defense strategy.

Description

Method for generating defense strategy of industrial information physical system based on Bayesian random game
Technical Field
The invention relates to a network security defense method of an industrial information physical system, belongs to the field of industrial control network space security, and particularly relates to a method for generating a defense strategy of the industrial information physical system based on Bayesian random game.
Background
The rapid development and application of information and communication technology has drastically changed the traditional industrial control system into a networked industrial information physical system. This digital revolution has greatly improved the production efficiency and manufacturing capacity of the industry. However, advanced information and communication technologies also expose industrial network physical systems to more and more attacks from network space (unauthorized access, data leakage, system destruction, and other malicious activities), which can compromise the confidentiality, integrity, and availability of critical infrastructure and industrial processes. Therefore, making effective network security countermeasures is of great importance.
Security measures in the field of industrial information physical systems today rely mainly on passive defense mechanisms (such as intrusion detection systems and firewalls). In view of the durability and complexity of the network, it is critical to develop active decision methods that can effectively predict and deal with emerging threats in real time. Game theory, as a formal theoretical framework for research strategic interactions, provides a systematic approach to help system administrators formulate timely and proactive defense strategies. By analyzing the interactive interactions between the attacker and defender, the game theory allows the administrator to learn the incentives, motivations, and potential strategies of the adversaries, thereby developing a defensive strategy that maximizes the security outcome while taking into account the associated costs and benefits.
Disclosure of Invention
The invention aims to solve the problem of active decision response in the network security defense of the industrial information physical system. Aiming at the defect that the existing game theory is applied to the research of the field of safety protection of the industrial information physical system, the method for generating the defense strategy of the industrial information physical system based on Bayesian random game is provided. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game has guiding significance on network security protection of key industrial infrastructure.
The aim of the invention can be achieved by the following technical scheme:
the invention firstly provides a method for generating an industrial information physical system defense strategy based on Bayesian random game, which comprises the following steps:
1) Constructing a Bayesian random game model, wherein the Bayesian random game model describes an attack-defense interaction process under the condition that network attack information is incompletely known in an industrial information physical system;
2) The industrial information physical system comprises an information layer and a physical layer, and a unified quantization method based on time is utilized to construct game utility functions of the information layer and the physical layer, which are targeted by both the attack and the defense in the Bayesian random game model, by taking time as a quantization index;
3) Converting incomplete information of an attacker-to-attacker game utility function into imperfect information about attacker types by using a Harsanyi conversion method to obtain an imperfect information Bayesian random game model participated by various attacker types, and ensuring the existence of Bayesian Nash equilibrium defense strategies;
4) Based on the Bayesian random game model obtained in the step 3), learning by utilizing a multi-agent Bayesian Q-learning algorithm (MABQL) under the conditions of a dynamic network environment and unknown game parameters to obtain a Nash equilibrium defense strategy, and obtaining the optimal defense strategy of the industrial information physical system.
According to a preferred scheme of the invention, the construction process of the Bayesian random game model in the step 1) specifically comprises the following steps:
abstracting attack and defense interaction in an industrial information physical system into a non-zero Bayesian random game model, under the model, dispersing a continuous process of the attack and defense interaction into a plurality of game states by utilizing a random game theory, wherein each game state corresponds to a specific security state of an information layer or a physical layer in the system; the attacker and the defender have corresponding game utility functions in the information layer or physical layer game state; the current state may transition to the next gaming state based on a probability distribution determined jointly by the actions of the defender and the attacker;
considering that an defender does not have complete information about an attacker in an actual attack and defense scene, converting each game state into a static Bayesian game by utilizing a Bayesian game theory, wherein a utility function is unilateral information;
finally, the bayesian stochastic game model is defined as an 11-tuple:
G=<N,S,Θ,P A ,A,D,T,O,π AD ,U A ,U D >
wherein the elements are defined as: n= { Attacker, defender } is the set of game participants; s= { S 1 ,s 2 ,...,s K -a set of game states, wherein each state represents a security state of the network; Θ= { θ 12 ,...,θ I -representing a set of attacker types;a set of probability distributions representing defenders versus aggressor types, whereRepresentative state s k Distribution of the lower part; a= { a 1 ,A 2 ,...,A K An attacker action set, where A k ={a 1 ,a 2 ,...,a n Represents state s k A lower action set; d= { D 1 ,D 2 ,...,D K The actions set of the defender, where D k ={d 1 ,d 2 ,...,d n Represents state s k A lower action set; />Is the policy set of the attacker; />Is a policy set of defenders; u (U) A And U D Utility functions of the offender and defender, respectively.
According to a preferred scheme of the invention, in step 2), the game utility functions of the attack and defense parties aiming at the information layer and the physical layer in the Bayesian random game model are constructed by taking time as a quantization index, and specifically:
2.1 Defining attack time T a The method comprises the steps of carrying out a first treatment on the surface of the It represents the time required for an attacker to perform an attack action on a particular target, which quantifies the time from the vulnerability scanning to the attacker's successful exploitation of the vulnerability;
2.2 Defining a defence time T d The method comprises the steps of carrying out a first treatment on the surface of the Which represents the time required for the defender to perform the defending action, which quantifies the time from recognition of the attack to completion of the defending action;
2.3 Defining recovery time T r The method comprises the steps of carrying out a first treatment on the surface of the Representing the time required for the attacked device to resume its normal operating state after the attack; t for unified quantization of recovery time of information layer and physical layer r Is defined as:
T r =T r,c +T r,p
T r,c representing the time required to restore the compromised device to a normal state by bug fixes or switches to the standby device; t (T) r,p Representing the time required to restore the physical layer control process to a normal state; when the attacked device is an information layer host device, T r =T r,c The method comprises the steps of carrying out a first treatment on the surface of the When the attacked device is a physical layer sensor or actuator device then T r =T r,c +T r,p The method comprises the steps of carrying out a first treatment on the surface of the This marks that the defender needs not only to repair the damaged equipment, but also to resume the control process;
2.4 For calculating T r,p The physical layer control process is modeled as:
x k is the system state at time k, y k Is the measurement state at time k, u' k Represents the attack signal at the moment k, A represents the transfer matrix, B represents the input control matrix, C represents the output observation matrix, B a Represents an attack matrix, w k Representing process noise, v k Representing measurement noise; t (T) r,p Indicating the overall recovery time for the state variable to deviate from the normal range to recover to the normal range when the physical layer control process is attacked.
According to a preferred embodiment of the present invention, the Harsanyi conversion method in step 3) is:
3.1 Introducing a third party participant, and converting incomplete information of the defender on the utility function of the attacker into imperfect information on the action of the third party participant;
3.2 After the attack and defense parties select respective actions, the third party participant selects the next game state according to the transition probability, and the new state can be observed by the attacker and the defender;
3.3 Third party participants select the type of attacker based on the probability distribution of the defender to the type of attacker, and can only be observed by the attacker;
3.4 After the third party participant completes the selection, the attacker and defender both select actions according to the respective strategies and then obtain the respective rewards;
3.5 Repeating the processes from 3.1) to 3.4) until the game is finished, and obtaining the imperfect information Bayesian random game model with participation of various attacker types. According to a preferred embodiment of the present invention, the utility function of the attacker and defender in step 3) is defined as:
R A (s t ,a,d,θ i ) And R is D (s t ,a,d,θ i ) Instant rewards representing aggressors and defenders,and->Representing a future reward;
the immediate rewards in the utility function are defined as:
R A (s t ,a,d,θ i )=ε(s t ,a,d)T r (a,θ i )-T ai )
R D (s t ,a,d,θ i )=-ε(s t ,a,d)T r (a,θ i )-T d
T r (a,θ i ) Is attacker theta i Recovery time required by the system after successful attack; t (T) ai ) Representing attacker theta i The time required to perform the attack action; epsilon(s) t A, d) represents the probability of success of the attack under action (a, d); t (T) d Representing the time required to implement a defensive action;
for future awards in the utility function, O (s k |s t A, d) represents the state s in the case of an attacker and defender taking action (a, d), respectively t Transition to state s k And O(s) k |s t ,a,d)=ε k (a,d);Andrepresenting the desired utility value in a state, also referred to as a state value, can be found by the following formula:
bayesian nash equalization for each game state is:
namely the Bayesian Nash equilibrium strategy. Because the attacker type set Θ, the state set S, the attacker action set A and the defender action set D are allThe game has bayesian nash equalization, i.e. bayesian nash equalization defense strategies.
According to the preferred scheme of the invention, in the step 4), a multiple agent Bayesian Q-learning algorithm (MABQL) is utilized to learn and obtain a Nash equilibrium defense strategy under the conditions of a dynamic network environment and unknown game parameters, specifically:
4.1 Q-function of Q-learning algorithm is defined as:
alpha represents the learning rate, gamma represents the fit factor, and the expected benefit of the Q function can be defined as:
4.2 The flow of learning to obtain Nash equilibrium defense strategy is as follows: (1) initializing parameters: r is equal to or is equal to 0,k=1, 2..k; (2) third party participant selection of current gaming state s t The method comprises the steps of carrying out a first treatment on the surface of the (3) Third party participants select attacker theta based on theta distribution i The method comprises the steps of carrying out a first treatment on the surface of the (4) Selecting actions by the two game parties based on the epsilon-greedy strategy; (5) the third party participant selects the next state; (6) updating the Q-function based on the Chinese formula of 4.1); (7) nash equalization strategy for calculating current state +.>(8) r is ≡r+1 and jumps to ≡ 2) repeating the above process; (9) when meeting the requirementsReturning to a Nash equilibrium strategy when the delta is a threshold parameter; obtaining the best strategy for all types of attackers and the best defense strategy for cooperatively defending all the attackers.
The invention has the beneficial effects of solving the problem that the industrial information physical system lacks active decision capability when facing high-persistence and high-concealment network attack. The invention provides a safety decision method based on a Bayesian random game model, which accurately captures the dynamic interaction process of attack and defense and can generate cooperative attack aiming at different types of attackers to perform optimal defense. In order to overcome the challenges brought by different objective utility functions between information and physical layers in a traditional game model, the invention introduces a unified quantization method based on time, integrates various utility functions into one frame, and bridges the gap between the information and physical security fields. Unlike traditional methods that require complete information to solve the game problem, the present invention proposes a data-driven reinforcement learning method, i.e., MABQL, that can obtain an optimal defense strategy in unknown game parameters or uncertain network environments. According to the invention, the network attack process in the actual industrial control scene is accurately modeled, and the Bayesian Nash equilibrium theory and the MABQL algorithm are utilized to generate the optimal defense strategy in the dynamic network security state, so that the network penetration attack on the industrial information physical system is actively and efficiently prevented.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a Harsanyi converted game tree;
FIG. 3 is a diagram of a test platform system architecture;
FIG. 4 is a diagram of a gambling state Markov transition;
FIG. 5 is a MABQL algorithm defense strategy convergence diagram;
FIG. 6 is a graph of expected benefit convergence for the MABQL algorithm;
figure 7 is a defensive strategy evolution process under varying circumstances.
Detailed Description
A clearer explanation of the objects and effects of the present invention will be provided with reference to the attached drawings.
As shown in fig. 1, the implementation flow chart of the method for generating the defense strategy of the industrial information physical system based on the bayesian random game mainly comprises 4 sub-steps, and in a specific embodiment, the method specifically comprises the following steps:
1) Constructing a Bayesian random game model, wherein the Bayesian random game model describes an attack-defense interaction process under the condition that network attack information is incompletely known in an industrial information physical system;
specifically, the attack and defense interaction in the industrial information physical system is abstracted into a non-zero Bayesian random game model, under the model, the continuous process of the attack and defense interaction is discretized into a plurality of game states by utilizing a random game theory, and each game state corresponds to a specific security state of an information layer or a physical layer in the system; the attacker and the defender have corresponding game utility functions in the information layer or physical layer game state; the current state may transition to the next gaming state based on a probability distribution determined jointly by the actions of the defender and the attacker;
considering that an defender does not have complete information about an attacker in an actual attack and defense scene, converting each game state into a static Bayesian game by utilizing a Bayesian game theory, wherein a utility function is unilateral information;
finally, the bayesian stochastic game model is defined as an 11-tuple:
G=<N,S,Θ,P A ,A,D,T,O,π AD ,U A ,U D >
wherein the elements are defined as: n= { Attacker, defender } is the set of game participants; s= { S 1 ,s 2 ,...,s K -a set of game states, wherein each state represents a security state of the network; Θ= { θ 12 ,...,θ I -representing a set of attacker types;a set of probability distributions representing defenders versus aggressor types, whereRepresentative state s k Distribution of the lower part; a= { a 1 ,A 2 ,...,A K Is an attackerWherein A is k ={a 1 ,a 2 ,...,a n Represents state s k A lower action set; d= { D 1 ,D 2 ,...,D K The actions set of the defender, where D k ={d 1 ,d 2 ,...,d n Represents state s k A lower action set; />Is the policy set of the attacker; />Is a policy set of defenders; u (U) A And U D Utility functions of the offender and defender, respectively.
2) The industrial information physical system comprises an information layer and a physical layer, and a unified quantization method based on time is utilized to construct game utility functions of the information layer and the physical layer, which are targeted by both the attack and the defense in the Bayesian random game model, by taking time as a quantization index; the method comprises the following steps:
2.1 Defining attack time T a The method comprises the steps of carrying out a first treatment on the surface of the It represents the time required for an attacker to perform an attack action on a particular target, which quantifies the time from the vulnerability scanning to the attacker's successful exploitation of the vulnerability;
2.2 Defining a defence time T d The method comprises the steps of carrying out a first treatment on the surface of the Which represents the time required for the defender to perform the defending action, which quantifies the time from recognition of the attack to completion of the defending action;
2.3 Defining recovery time T r The method comprises the steps of carrying out a first treatment on the surface of the Representing the time required for the attacked device to resume its normal operating state after the attack; t for unified quantization of recovery time of information layer and physical layer r Is defined as:
T r =T r,c +T r,p
T r,c representing the time required to restore the compromised device to a normal state by bug fixes or switches to the standby device; t (T) r,p Representing the time required to restore the physical layer control process to a normal state; when the attacked device is an information layer host device, T r =T r,c The method comprises the steps of carrying out a first treatment on the surface of the When the attacked device is a physical layer sensor or actuator device then T r =T r,c +T r,p The method comprises the steps of carrying out a first treatment on the surface of the This marks that the defender needs not only to repair the damaged equipment, but also to resume the control process;
2.4 For calculating T r,p The physical layer control process is modeled as:
x k is the system state at time k, y k Is the measurement state at time k, u' k Represents the attack signal at the moment k, A represents the transfer matrix, B represents the input control matrix, C represents the output observation matrix, B a Represents an attack matrix, w k Representing process noise, v k Representing measurement noise; t (T) r,p Indicating the overall recovery time for the state variable to deviate from the normal range to recover to the normal range when the physical layer control process is attacked.
3) Converting incomplete information of an attacker-to-attacker game utility function into imperfect information about attacker types by using a Harsanyi conversion method to obtain an imperfect information Bayesian random game model participated by various attacker types, and ensuring the existence of Bayesian Nash equilibrium defense strategies;
as shown in fig. 2, the Harsanyi conversion method is as follows:
3.1 Introducing a third party participant, and converting incomplete information of the defender on the utility function of the attacker into imperfect information on the action of the third party participant;
3.2 After the attack and defense parties select respective actions, the third party participant selects the next game state according to the transition probability, and the new state can be observed by the attacker and the defender;
3.3 Third party participants select the type of attacker based on the probability distribution of the defender to the type of attacker, and can only be observed by the attacker;
3.4 After the third party participant completes the selection, the attacker and defender both select actions according to the respective strategies and then obtain the respective rewards;
3.5 Repeating the processes from 3.1) to 3.4) until the game is finished, and obtaining the imperfect information Bayesian random game model with participation of various attacker types.
The utility function of the attacker and defender in step 3) is defined as:
R A (s t ,a,d,θ i ) And R is D (s t ,a,d,θ i ) Instant rewards representing aggressors and defenders,and->Representing a future reward;
the immediate rewards in the utility function are defined as:
R A (s t ,a,d,θ i )=ε(s t ,a,d)T r (a,θ i )-T ai )
R D (s t ,a,d,θ i )=-ε(s t ,a,d)T r (a,θ i )-T d
T r (a,θ i ) Is attacker theta i Recovery time required by the system after successful attack; t (T) ai ) Representing attacker theta i The time required to perform the attack action; epsilon(s) t A, d) represents the probability of success of the attack under action (a, d); t (T) d Representing the time required to implement a defensive action;
for future awards in the utility function, O (s k |s t ,a,d)Representing the state s in the case of the respective actions (a, d) taken by the attacker and defender t Transition to state s k And O(s) k |s t ,a,d)=ε k (a,d);Andrepresenting the desired utility value in a state, also referred to as a state value, can be found by the following formula:
bayesian nash equalization for each game state is:
namely the Bayesian Nash equilibrium strategy. Because the attacker type set Θ, the state set S, the attacker action set A and the defender action set D are all finite sets, bayesian Nash equilibrium exists in the game, namely a Bayesian Nash equilibrium defending strategy exists.
4) Based on the Bayesian random game model obtained in the step 3), learning by utilizing a multi-agent Bayesian Q-learning algorithm (MABQL) under the conditions of a dynamic network environment and unknown game parameters to obtain a Nash equilibrium defense strategy, and obtaining the optimal defense strategy of the industrial information physical system.
In the step 4), a multiple-agent bayesian Q-learning algorithm (MABQL) is used to learn and obtain a nash equilibrium defense strategy under the conditions of a dynamic network environment and unknown game parameters, specifically:
4.1 Q-function of Q-learning algorithm is defined as:
alpha represents the learning rate, gamma represents the fit factor, and the expected benefit of the Q function can be defined as:
4.2 The flow of learning to obtain Nash equilibrium defense strategy is as follows: (1) initializing parameters: r is equal to or is equal to 0,k=1, 2..k; (2) third party participant selection of current gaming state s t The method comprises the steps of carrying out a first treatment on the surface of the (3) Third party participants select attacker theta based on theta distribution i The method comprises the steps of carrying out a first treatment on the surface of the (4) Selecting actions by the two game parties based on the epsilon-greedy strategy; (5) the third party participant selects the next state; (6) updating the Q-function based on the Chinese formula of 4.1); (7) nash equalization strategy for calculating current state +.>(8) r is ≡r+1 and jumps to ≡ 2) repeating the above process; (9) when meeting the requirementsReturning to a Nash equilibrium strategy when the delta is a threshold parameter; obtaining the best strategy for all types of attackers and the best defense strategy for cooperatively defending all the attackers.
The present invention utilizes a laboratory test platform at university of Zhejiang to simulate the power generation process to evaluate the proposed method. Fig. 3 shows the system architecture of the test platform used. In enterprise networks, management host 1 (MH 1) plays the role of an external network service, while management host 2 (MH 2) acts as an SQL server for data storage. They all have the ability to communicate with hosts within the control network. In a control network, human-machine interfaces (HMI) are used to monitor the control process, while Engineer Stations (ES) and Operator Stations (OS) are used to issue control commands to Programmable Logic Controllers (PLCs). S7-1200 series PLCs were selected to effectively control the underlying actuators and sensors.
There are several security vulnerabilities in the test platform environment, as shown in table 1.
TABLE 1
Each vulnerability is assigned a unique ID based on a common vulnerability and a public vulnerability database (CVE). For example, CVE-2017-0144 represents a vulnerability to Windows Server Message Block (SMB) allowing remote arbitrary code execution over the SMB interface. The present invention determines by evaluating the probability of a successful exploit vulnerability using the availability score in the universal vulnerability scoring system (CVSS).
We assume that the main goal of an attacker is to destroy the sensors and actuators. In view of the tight access control in industrial networks, an attacker should first exploit vulnerabilities to elevate his rights on enterprise network hosts. Then, the attacker moves transversely to destroy the control network host and establishes hidden communication with the controller by adopting disguise of legal hosts. Eventually, the attacker will continue to launch targeted attacks on the executor or sensor, aiming at disturbing the physical process. Table 2 shows security states accessible to an attacker and fig. 4 illustrates the corresponding transition relationships between gaming states.
TABLE 2
For an attacker, the invention considers the attack type theta= { theta 12 And }, wherein θ 1 Representing a low-level attacker, θ 2 Representing a high-level attacker. Table 3 lists all of the submarinesIn the list of attacks, and the corresponding action time required for each type of attacker.
TABLE 3 Table 3
The set of attack actions available in each state can be represented as A 1 ={a 1 ,a 2 },A 2 ={a 3 ,a 4 ,a 5 },A 3 ={a 3 ,a 4 ,a 5 },A 4 ={a 6 ,a 7 ,a 8 },A 5 ={a 6 ,a 7 ,a 8 },A 6 ={a 3 ,a 4 },In addition, table 4 shows the recovery time T required after a successful attack r (a,θ i )。
TABLE 4 Table 4
For defenders, in each state, the prior probability distribution of the attacker type is P 1 A ={0.7,0.3},P 2 A ={0.5,0.5},P 3 A ={0.5,0.5},P 4 A ={0.26,0.74},P 5 A ={0.26,0.74},P 6 A = {0.3,0.7}. To prevent potential intrusion by an attacker, defensive actions are recorded in table 5.
TABLE 5
The set of defensive actions available in each state can then be described as D 1 ={d 1 ,d 2 ,d 3 },D 2 ={d 4 ,d 5 ,d 6 },D 3 ={d 4 ,d 5 ,d 6 },D 4 ={d 7 ,d 8 ,d 9 ,d 10 },D 5 ={d 7 ,d 8 ,d 9 ,d 10 },D 6 ={d 4 ,d 5 },
In order to verify the feasibility and convergence of the proposed MABQL algorithm, a conventional nonlinear programming method is first used to obtain an optimal attack defense strategy with complete information of all parameters. The results are presented in table 6, which provides a comprehensive overview of the policies employed by the attacker and defender at the nash equilibrium point and their corresponding expected utility values.
TABLE 6
The results show that attacker θ 1 And theta 2 Different optimal strategies are selected. For example, in state S 1 Under attacker θ 1 A mixing strategy will be employedAnd attacker theta 2 Will employ a pure strategy->To combat attacker theta 1 And theta 2 Is +.>These experimental results illustrate the subtle and varied decision processes employed by different players in gaming. In addition, table 6 provides an overview of the player's expected utility values when the Nash equilibrium point is reached, indicating that the utility function in the proposed gaming model is non-zero and.
Subsequently, the invention applies the proposed MABQL algorithm to solve the gaming model under incomplete information. The learning rate α is set to 0.5, and the discount factor γ is set to 1. Fig. 5 depicts the process of defender strategy learning, converging after about 100 learning steps. Fig. 6 illustrates the evolution of the expected utility value of the defender in each state. Experimental results show that the proposed algorithm can converge to a nash equalization strategy.
The present invention defines the following network environment to evaluate the suitability of the proposed algorithm: (1) in state t=t 1 Under the condition, the state transition probability in the network environment changes, O (s k |s t ,a,d)=ε k (a, d)/2; (2) in state t=t 2 Next, the defender updates the probability beliefs for the type of attacker in the network. In state S 2 In the case of an example of this,from {0.5,0.5} to {0.3,0.7}.
The results are shown in fig. 7, indicating that the algorithm initially converged to nash equalization. When the network environment is at t=t 1 When the position changes, the algorithm is quickly adapted and converges again after 100 learning steps, and the defense strategy is changed into {0.543,0.309,0.148}. Also, at t=t 2 When the probability distribution of the attacker type changes, the algorithm quickly converges to a new nash equalization strategy {0.402,0.386,0.212}. The experimental result verifies that the provided algorithm can quickly learn the optimal defense strategy, and shows the effectiveness of the provided invention in the aspect of formulating the optimal defense strategy.
The foregoing list is only illustrative of specific embodiments of the invention. Obviously, the invention is not limited to the above embodiments, but many variations are possible. All modifications directly derived or suggested to one skilled in the art from the present disclosure should be considered as being within the scope of the present invention.

Claims (6)

1. A method for generating an industrial information physical system defense strategy based on Bayesian random game is characterized by comprising the following steps:
1) Constructing a Bayesian random game model, wherein the Bayesian random game model describes an attack-defense interaction process under the condition that network attack information is incompletely known in an industrial information physical system;
2) The industrial information physical system comprises an information layer and a physical layer, and a unified quantization method based on time is utilized to construct game utility functions of the information layer and the physical layer, which are targeted by both the attack and the defense in the Bayesian random game model, by taking time as a quantization index;
3) Converting incomplete information of an attacker-to-attacker game utility function into imperfect information about attacker types by using a Harsanyi conversion method to obtain an imperfect information Bayesian random game model participated by various attacker types, and ensuring the existence of Bayesian Nash equilibrium defense strategies;
4) Based on the Bayesian random game model obtained in the step 3), learning by utilizing a multi-agent Bayesian Q-learning algorithm (MABQL) under the conditions of a dynamic network environment and unknown game parameters to obtain a Nash equilibrium defense strategy, and obtaining the optimal defense strategy of the industrial information physical system.
2. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game according to claim 1, wherein the construction process of the Bayesian random game model in the step 1) is specifically as follows:
abstracting attack and defense interaction in an industrial information physical system into a non-zero Bayesian random game model, under the model, dispersing a continuous process of the attack and defense interaction into a plurality of game states by utilizing a random game theory, wherein each game state corresponds to a specific security state of an information layer or a physical layer in the system; the attacker and the defender have corresponding game utility functions in the information layer or physical layer game state; the current state may transition to the next gaming state based on a probability distribution determined jointly by the actions of the defender and the attacker;
considering that an defender does not have complete information about an attacker in an actual attack and defense scene, converting each game state into a static Bayesian game by utilizing a Bayesian game theory, wherein a utility function is unilateral information;
finally, the bayesian stochastic game model is defined as an 11-tuple:
G=<N,S,Θ,P A ,A,D,T,O,π AD ,U A ,U D >
wherein the elements are defined as: n= { Attacker, defender } is the set of game participants; s= { S 1 ,s 2 ,...,s K -a set of game states, wherein each state represents a security state of the network; Θ= { θ 12 ,...,θ I -representing a set of attacker types;a set of probability distributions representing defenders versus aggressor types, whereRepresentative state s k Distribution of the lower part; a= { a 1 ,A 2 ,...,A K An attacker action set, where A k ={a 1 ,a 2 ,...,a n Represents state s k A lower action set; d= { D 1 ,D 2 ,...,D K The actions set of the defender, where D k ={d 1 ,d 2 ,...,d n Represents state s k A lower action set; />Is the policy set of the attacker; />Is a policy set of defenders; u (U) A And U D Utility functions of the offender and defender, respectively.
3. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game according to claim 2, wherein in the step 2), game utility functions of both the attacking and defending party in the Bayesian random game model aiming at the information layer and the physical layer are constructed by taking time as a quantization index, specifically:
2.1 Defining attack time T a The method comprises the steps of carrying out a first treatment on the surface of the It represents the time required for an attacker to perform an attack action on a particular target, which quantifies the time from the vulnerability scanning to the attacker's successful exploitation of the vulnerability;
2.2 Defining a defence time T d The method comprises the steps of carrying out a first treatment on the surface of the Which represents the time required for the defender to perform the defending action, which quantifies the time from recognition of the attack to completion of the defending action;
2.3 Defining recovery time T r The method comprises the steps of carrying out a first treatment on the surface of the Representing the time required for the attacked device to resume its normal operating state after the attack; t for unified quantization of recovery time of information layer and physical layer r Is defined as:
T r =T r,c +T r,p
T r,c representing the time required to restore the compromised device to a normal state by bug fixes or switches to the standby device; t (T) r,p Representing the time required to restore the physical layer control process to a normal state; when the attacked device is an information layer host device, T r =T r,c The method comprises the steps of carrying out a first treatment on the surface of the When the attacked device is a physical layer sensor or actuator device then T r =T r,c +T r,p The method comprises the steps of carrying out a first treatment on the surface of the This marks that the defender needs not only to repair the damaged equipment, but also to resume the control process;
2.4 For calculating T r,p The physical layer control process is modeled as:
x k is the system state at time k, y k Is the measurement state at time k, u' k Represents the attack signal at the moment k, A represents the transfer matrix, B represents the input control matrix, C represents the output observation matrix, B a Represents an attack matrix, w k Representing process noise, v k Representing measurement noise; t (T) r,p Representing physical layer controlThe overall recovery time for the state variable to deviate from the normal range to its recovery to the normal range when the process is under attack.
4. The method for generating the defense strategy of the industrial information physical system based on the Bayesian random game according to claim 2, wherein the Harsanyi conversion method in the step 3) is as follows:
3.1 Introducing a third party participant, and converting incomplete information of the defender on the utility function of the attacker into imperfect information on the action of the third party participant;
3.2 After the attack and defense parties select respective actions, the third party participant selects the next game state according to the transition probability, and the new state can be observed by the attacker and the defender;
3.3 Third party participants select the type of attacker based on the probability distribution of the defender to the type of attacker, and can only be observed by the attacker;
3.4 After the third party participant completes the selection, the attacker and defender both select actions according to the respective strategies and then obtain the respective rewards;
3.5 Repeating the processes from 3.1) to 3.4) until the game is finished, and obtaining the imperfect information Bayesian random game model with participation of various attacker types.
5. The method for generating an industrial information physical system defense strategy based on bayesian random game according to claim 4, wherein the utility function of an attacker and a defender in step 3) is defined as:
R A (s t ,a,d,θ i ) And R is D (s t ,a,d,θ i ) Instant rewards representing aggressors and defenders,and->Representing a future reward;
the immediate rewards in the utility function are defined as:
R A (s t ,a,d,θ i )=ε(s t ,a,d)T r (a,θ i )-T ai )
R D (s t ,a,d,θ i )=-ε(s t ,a,d)T r (a,θ i )-T d
T r (a,θ i ) Is attacker theta i Recovery time required by the system after successful attack; t (T) ai ) Representing attacker theta i The time required to perform the attack action; epsilon(s) t A, d) represents the probability of success of the attack under action (a, d); t (T) d Representing the time required to implement a defensive action;
for future awards in the utility function, O (s k |s t A, d) represents the state s in the case of an attacker and defender taking action (a, d), respectively t Transition to state s k And O(s) k |s t ,a,d)=ε k (a,d);And->Representing the desired utility value in a state, also referred to as a state value, can be found by the following formula:
bayesian nash equalization for each game state is:
namely the Bayesian Nash equilibrium strategy. Because the attacker type set Θ, the state set S, the attacker action set A and the defender action set D are all finite sets, bayesian Nash equilibrium exists in the game, namely a Bayesian Nash equilibrium defending strategy exists.
6. The method for generating the defense strategy of the industrial information physical system based on the bayesian random game according to claim 2, wherein in the step 4), the nash equilibrium defense strategy is obtained by learning a multi-agent bayesian Q-learning algorithm (MABQL) under the conditions of a dynamic network environment and unknown game parameters, specifically:
4.1 Q-function of Q-learning algorithm is defined as:
alpha represents the learning rate, gamma represents the fit factor, and the expected benefit of the Q function can be defined as:
4.2 The flow of learning to obtain Nash equilibrium defense strategy is as follows: (1) initializing parameters: r is equal to or is equal to 0,k=1, 2..k; (2) third party participant selection of current gaming state s t The method comprises the steps of carrying out a first treatment on the surface of the (3) Third party participants select attacker theta based on theta distribution i The method comprises the steps of carrying out a first treatment on the surface of the (4) Selecting actions by the two game parties based on the epsilon-greedy strategy; (5) the third party participant selects the next state; (6) updating the Q-function based on the Chinese formula of 4.1); (7) nash equalization strategy for calculating current state +.>(8) r is ≡r+1 and jumps to ≡ 2) repeating the above process; (9) when meeting the requirementsReturning to a Nash equilibrium strategy when the delta is a threshold parameter; obtaining the best strategy for all types of attackers and the best defense strategy for cooperatively defending all the attackers.
CN202310894809.2A 2023-07-20 2023-07-20 Method for generating defense strategy of industrial information physical system based on Bayesian random game Active CN117040809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310894809.2A CN117040809B (en) 2023-07-20 2023-07-20 Method for generating defense strategy of industrial information physical system based on Bayesian random game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310894809.2A CN117040809B (en) 2023-07-20 2023-07-20 Method for generating defense strategy of industrial information physical system based on Bayesian random game

Publications (2)

Publication Number Publication Date
CN117040809A true CN117040809A (en) 2023-11-10
CN117040809B CN117040809B (en) 2024-04-05

Family

ID=88638137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310894809.2A Active CN117040809B (en) 2023-07-20 2023-07-20 Method for generating defense strategy of industrial information physical system based on Bayesian random game

Country Status (1)

Country Link
CN (1) CN117040809B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105307175A (en) * 2015-09-22 2016-02-03 绍兴文理学院 Method for selecting IDA (intrusion detection agent) start strategies of wireless sensor network
CN107070956A (en) * 2017-06-16 2017-08-18 福建中信网安信息科技有限公司 APT Attack Prediction methods based on dynamic bayesian game
US20190385473A1 (en) * 2017-03-03 2019-12-19 Mbda France Method and device for predicting optimum attack and defence solutions in a military conflict scenario
CN112115469A (en) * 2020-09-15 2020-12-22 浙江科技学院 Edge intelligent moving target defense method based on Bayes-Stackelberg game
CN115102166A (en) * 2022-07-27 2022-09-23 南京邮电大学 Active power distribution network dynamic defense performance optimization method based on game theory
CN115348064A (en) * 2022-07-28 2022-11-15 南京邮电大学 Power distribution network defense strategy design method based on dynamic game under network attack
CN115932752A (en) * 2023-01-06 2023-04-07 中国人民解放军海军大连舰艇学院 Radar cognitive interference decision method based on incomplete information game
US20230185912A1 (en) * 2021-12-13 2023-06-15 International Business Machines Corporation Defending deep generative models against adversarial attacks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105307175A (en) * 2015-09-22 2016-02-03 绍兴文理学院 Method for selecting IDA (intrusion detection agent) start strategies of wireless sensor network
US20190385473A1 (en) * 2017-03-03 2019-12-19 Mbda France Method and device for predicting optimum attack and defence solutions in a military conflict scenario
CN107070956A (en) * 2017-06-16 2017-08-18 福建中信网安信息科技有限公司 APT Attack Prediction methods based on dynamic bayesian game
CN112115469A (en) * 2020-09-15 2020-12-22 浙江科技学院 Edge intelligent moving target defense method based on Bayes-Stackelberg game
US20230185912A1 (en) * 2021-12-13 2023-06-15 International Business Machines Corporation Defending deep generative models against adversarial attacks
CN115102166A (en) * 2022-07-27 2022-09-23 南京邮电大学 Active power distribution network dynamic defense performance optimization method based on game theory
CN115348064A (en) * 2022-07-28 2022-11-15 南京邮电大学 Power distribution network defense strategy design method based on dynamic game under network attack
CN115932752A (en) * 2023-01-06 2023-04-07 中国人民解放军海军大连舰艇学院 Radar cognitive interference decision method based on incomplete information game

Also Published As

Publication number Publication date
CN117040809B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
Falco et al. A master attack methodology for an AI-based automated attack planner for smart cities
Maeda et al. Automating post-exploitation with deep reinforcement learning
Huang et al. A game-theoretic approach to cross-layer security decision-making in industrial cyber-physical systems
Kaloudi et al. The ai-based cyber threat landscape: A survey
Tuptuk et al. Security of smart manufacturing systems
Zhu et al. On multi-phase and multi-stage game-theoretic modeling of advanced persistent threats
Huang et al. Reinforcement learning for feedback-enabled cyber resilience
Huang et al. Dynamic bayesian games for adversarial and defensive cyber deception
Chung et al. Game theory with learning for cyber security monitoring
Hu et al. Optimal network defense strategy selection based on incomplete information evolutionary game
Durkota et al. Case studies of network defense with attack graph games
Liu et al. On deep reinforcement learning security for Industrial Internet of Things
Tan et al. WF-MTD: Evolutionary decision method for moving target defense based on wright-fisher process
Khoury et al. A hybrid game theory and reinforcement learning approach for cyber-physical systems security
Chukwudi et al. Game theory basics and its application in cyber security
Hu et al. Adaptive cyber defense against multi-stage attacks using learning-based POMDP
Liu et al. FlipIt game model-based defense strategy against cyberattacks on SCADA systems considering insider assistance
Chen et al. Optimal defense strategy selection for spear-phishing attack based on a multistage signaling game
Zhong et al. An efficient parallel reinforcement learning approach to cross-layer defense mechanism in industrial control systems
Gaur et al. Abusive adversarial agents and attack strategies in cyber‐physical systems
CN117040809B (en) Method for generating defense strategy of industrial information physical system based on Bayesian random game
CN116582330A (en) Industrial control network automatic defense decision-making method oriented to part of unknown security states
Luo et al. A fictitious play‐based response strategy for multistage intrusion defense systems
Yao et al. Bayesian and stochastic game joint approach for Cross-Layer optimal defensive Decision-Making in industrial Cyber-Physical systems
Li et al. IoT security situational awareness based on Q-learning and Bayesian game

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant