CN116032653A

CN116032653A - Method, device, equipment and storage medium for constructing network security game strategy

Info

Publication number: CN116032653A
Application number: CN202310098667.9A
Authority: CN
Inventors: 夏辉; 封学财; 姜曙亮; 张睿; 徐硕
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-04-28

Abstract

The application discloses a method, a device, equipment and a storage medium for constructing a network security game strategy, which relate to the field of network security and comprise the following steps: acquiring a preset resource grid cell diagram and preset proxy resources; the preset proxy resource comprises a preset defender and a preset attacker, and the preset defender comprises a preset detector and a preset executor; controlling a preset attacker and a preset defender to move in a preset resource grid cell diagram by utilizing the attacker multiparameter behavior model and the current safety game strategy, and counting corresponding movement information; judging whether the movement information meets a preset game stop condition or not; if yes, stopping the round of game, determining the score of the current defender by using a preset score determining rule, and updating the current security game strategy according to the score of the current defender so as to construct a corresponding target network security game strategy based on the updated current security game strategy. In this way, the stability of the network security gaming strategy can be increased.

Description

Method, device, equipment and storage medium for constructing network security game strategy

Technical Field

The present invention relates to the field of network security, and in particular, to a method, an apparatus, a device, and a storage medium for constructing a network security game policy.

Background

The internet of things (Internet of Things, ioT) is an ecosystem formed by interconnecting a plurality of terminal devices via a network, and new paradigms such as industrial internet of things (Industrial Internet of Things, IIoT), medical internet of things (Internet of Medical Things, ioMT) and the like have been derived. The internet of things brings flexibility and intelligence to the devices which provide availability for us, but as the number of networking devices increases, the number and complexity of network attacks also increases significantly, and the lack of corresponding defensive measures leads to a huge risk of revealing private data of terminal devices with high value.

However, existing defense schemes have inadequate consideration of how to reasonably allocate limited defense resources. Moreover, due to the lack of an effective coordination mechanism among a plurality of defense agency resources in the game model, the stability of the defense strategy can be greatly reduced, and when facing to a complex game model variant, the existing scheme is difficult to construct the stable defense strategy.

Therefore, how to construct a relatively stable network security game strategy is a current urgent problem in the art.

Disclosure of Invention

Accordingly, the present invention is directed to a method, apparatus, device and storage medium for constructing a network security game policy, which can improve the learning effect of the network security game policy, thereby increasing the stability of the defending policy in the network security game policy. The specific scheme is as follows:

in a first aspect, the present application provides a method for constructing a network security game policy, including:

acquiring a preset resource grid cell diagram and preset proxy resources; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resource comprises a preset defender and a preset attacker, wherein the preset defender comprises a preset detector for detecting the attacker and a preset executor for clearing the detected attacker;

controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing an attacker multiparameter behavior model and a current security game strategy, and counting corresponding movement information;

judging whether the currently counted mobile information meets a preset game stop condition or not;

if yes, stopping the round of game, determining a current defender score by using a preset score determining rule, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy.

Optionally, the determining whether the currently counted mobile information meets a preset game stop condition includes:

judging whether the current position updating times in the mobile information is larger than a preset updating times threshold value or not;

and if the current position updating times are larger than the preset updating times threshold, judging that the currently counted mobile information meets a preset game stop condition.

Optionally, the determining the current defending person score by using the preset score determining rule includes:

and determining the score of the current defender according to the number of preset attackers meeting the preset attacker clearing condition in the process of the round of game.

Optionally, the controlling the preset defender to move in the preset resource grid cell map by using the current secure game policy includes:

and controlling the preset defenders to move in the preset resource grid cell diagram by utilizing a preset multi-agent decision DQN network, a preset centralized training and decentralized execution framework and the current safety game strategy.

Optionally, the updating the current security game policy according to the current defender score so as to construct a corresponding target network security game policy based on the updated current security game policy includes:

Updating the current safety game strategy according to the current defender score;

judging whether a preset update stop condition is met currently;

if not, re-jumping to the step of controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current security game strategy;

if yes, constructing a corresponding target network security game strategy based on the updated current security game strategy.

Optionally, the controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by using the attacker multiparameter behavior model and the current security game policy includes:

acquiring current resource information of each grid cell on the resource grid cell map; the current resource information is information including the number of terminal equipment resources contained in the current grid cell;

determining an initial attack position of the preset attacker, an initial detection position of the preset detector and an initial execution position of the preset executor in the preset resource grid cell diagram by utilizing the current resource information and the current security game strategy;

Determining mobile parameters of the preset attacker and the preset defender based on the attacker multiparameter behavior model and the current security game strategy respectively; the attacker multi-parameter behavior model comprises the initial detection position, the initial execution position and the current resource information;

and controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram based on the initial attack position of the preset attacker, the initial detection position of the preset detector, the initial execution position of the preset executor and the movement parameters of the preset attacker and the preset defender.

Optionally, determining the movement parameter of the preset defender based on the current security game policy includes:

determining a movement parameter of the preset detector based on the current security game strategy, and acquiring a detection result of the preset detector on a current detection position in the preset resource grid cell diagram;

and determining the movement parameters of the preset executor according to the detection result and the current safety game strategy.

In a second aspect, the present application provides a network security game policy building device, including:

The preset resource information acquisition module is used for acquiring a preset resource grid cell diagram and preset proxy resources; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resource comprises a preset defender and a preset attacker, wherein the preset defender comprises a preset detector for detecting the attacker and a preset executor for clearing the detected attacker;

the agent resource moving module is used for controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing the attacker multiparameter behavior model and the current safety game strategy, and counting corresponding moving information;

the game stop condition judging module is used for judging whether the currently counted mobile information meets a preset game stop condition or not;

and the target security game strategy construction module is used for stopping the round of game and determining the current defender score by utilizing a preset score determination rule if the round of game is yes, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy.

In a third aspect, the present application provides an electronic device, including:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the network security game policy construction method.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the network security game policy building method described above.

In the application, a preset resource grid cell diagram and preset proxy resources are acquired; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resource comprises a preset defender and a preset attacker, wherein the preset defender comprises a preset detector for detecting the attacker and a preset executor for clearing the detected attacker; controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing an attacker multiparameter behavior model and a current security game strategy, and counting corresponding movement information; judging whether the currently counted mobile information meets a preset game stop condition or not; if yes, stopping the round of game, determining a current defender score by using a preset score determining rule, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy. Through the scheme, the preset attacker and the preset defender can be controlled to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current security game strategy so as to complete the game of the current round and update the current security game strategy, so that the corresponding target network security game strategy is constructed based on the updated current security game strategy. Therefore, the learning effect of the network security game strategy can be improved, and the stability of the defending strategy in the network security game strategy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for constructing a network security game policy provided in the present application;

FIG. 2 is a schematic diagram of a network security gaming process provided in the present application;

FIG. 3 is a flowchart of a specific method for constructing a network security game policy;

FIG. 4 is a diagram of a network security gaming policy framework provided herein;

FIG. 5 is a schematic structural diagram of a network security game policy building device provided in the present application;

fig. 6 is a block diagram of an electronic device provided in the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Existing defense schemes have inadequate consideration of how to reasonably allocate limited defense resources. Moreover, due to the lack of an effective coordination mechanism among a plurality of defense agency resources in the game model, the stability of the defense strategy can be greatly reduced, and when facing to a complex game model variant, the existing scheme is difficult to construct the stable defense strategy. Therefore, the method for constructing the network security game strategy can improve the learning effect of the network security game strategy, so that the stability of the defending strategy in the network security game strategy is improved.

Referring to fig. 1, the embodiment of the invention discloses a method for constructing a network security game strategy, which comprises the following steps:

step S11, acquiring a preset resource grid cell diagram and preset proxy resources; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resources comprise preset defenders and preset aggressors, wherein the preset defenders comprise preset detectors for detecting the aggressors and preset executors for clearing the detected aggressors.

In this embodiment, an environment of a game model is pre-built, and a region to be protected, that is, a preset resource grid cell diagram, is represented by dividing a physical world connected by a terminal network in a real internet of things according to regions and deploying the physical world as a grid world, where each grid cell in the preset resource grid cell diagram includes a plurality of terminal equipment resources in the region. The preset resource grid cell diagram also comprises a plurality of preset proxy resources, wherein the preset proxy resources comprise preset defenders and preset aggressors, and the preset defenders comprise preset detectors for detecting the aggressors and preset executors for clearing the detected aggressors.

And S12, controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current safety game strategy, and counting corresponding movement information.

In this embodiment, as shown in fig. 2, the preset defender and the preset attacker play games in the preset resource grid cell diagram, the leftmost grid diagram in fig. 2 is the preset resource grid cell diagram, and A, B and C in the preset resource grid cell diagram represent the preset attacker, the preset detector and the preset executor, respectively. The specific game process comprises the following steps: controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by using an attacker multiparameter behavior model and a current security game strategy, wherein the preset attacker attacks the grid cell where the attacker is currently positioned, and determining a target attack grid cell by using the attacker multiparameter behavior model so as to move to the target attack grid cell for attack; the preset detector in the preset defender has a detection function, and when the preset detector and the preset attacker are simultaneously positioned in the same grid unit, the preset detector can detect the preset attacker. The preset detector can send corresponding warning information to a preset attacker in the grid cell to prevent the attack of the preset attacker, but the preset detector does not have the capability of clearing the preset attacker, and the preset attacker can determine whether to continue the attack behavior of the current grid cell according to the warning information; the preset detector can also send a detection result obtained by detecting the grid cell where the preset executor is located to the preset executor, the preset executor can determine the current target mobile grid cell of the preset executor according to the detection result and the current safety game strategy, and if the preset executor and the preset attacker are simultaneously located in the same grid cell, the preset executor can detect and clear the preset attacker.

The action spaces of the preset detector, the preset executor and the preset attacker can be expressed as follows:

presetting a detector: [ up, down, left, right, original area stay ] x [ no operation, alert preset attacker, notify preset executor ];

presetting an actuator: [ Up, down, left, right, original area stay ];

presetting an attacker: [ Up, down, left, right, original area stay ].

And S13, judging whether the currently counted mobile information meets a preset game stop condition or not.

In this embodiment, according to the currently counted movement information of the preset attacker and the preset defender, whether the movement information meets the preset game stop condition is judged. It may be appreciated that the determining whether the currently counted movement information meets a preset game stop condition may specifically include: judging whether the current position updating times in the mobile information is larger than a preset updating times threshold value or not; and if the current position updating times are larger than the preset updating times threshold, judging that the currently counted mobile information meets a preset game stop condition. That is, the number of position updates of the preset defender can be counted, the number of current position updates in the mobile information is determined, if the number of current position updates is greater than the preset number of update threshold, the preset attacker is characterized as not being completely cleared after the preset round is passed, and at the moment, the currently counted mobile information can be judged to meet the preset game stop condition so as to stop the current round of game.

And S14, if yes, stopping the round of game, determining a current defender score by using a preset score determining rule, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy.

In this embodiment, if the currently counted movement information meets the preset game stop condition, the current round of game is stopped, and the score of the current defender is determined by using the preset score determining rule. It may be appreciated that the determining the current defender score by using the preset score determining rule may specifically include: and determining the score of the current defender according to the number of preset attackers meeting the preset attacker clearing condition in the process of the round of game. For the preset defender, if a preset attacker is detected and cleared in each time step in the current game process, the preset defender can obtain the corresponding positive reward score r ^d+ > 0; if the preset attacker is not cleared or stops attacking, the preset defender obtains the corresponding negative rewarding score r ^d- < 0, the rewards being related to the number of terminal equipment resources in the grid cell, the total rewards of the predetermined defender per time step being r _t ^d And (3) representing. And the current defender score of the preset defender can be R ^d Is represented by

Obtained. While the current attacker score of the preset attacker can be R ^a Representing, due to zero and game attributes, R ^a ＝-R ^d 。

It may be appreciated that the updating the current security game policy according to the current defender score so as to construct a corresponding target network security game policy based on the updated current security game policy may specifically include: updating the current safety game strategy according to the current defender score; judging whether a preset update stop condition is met currently; if not, re-jumping to the step of controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current security game strategy; if yes, constructing a corresponding target network security game strategy based on the updated current security game strategy. Wherein, the determining whether the preset update stop condition is satisfied currently may include: judging whether the current strategy updating times are larger than a preset strategy updating times threshold value or not; if yes, judging that the preset updating stop condition is met currently. In this way, if the current policy update times is greater than the preset policy update times threshold, a corresponding target network security game policy can be constructed based on the updated current security game policy; and if the current strategy updating times are not greater than the preset strategy updating times threshold, re-jumping to the step of controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current safety game strategy.

In this embodiment, a preset resource grid cell diagram and a preset proxy resource are obtained; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resource comprises a preset defender and a preset attacker, wherein the preset defender comprises a preset detector for detecting the attacker and a preset executor for clearing the detected attacker; controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing an attacker multiparameter behavior model and a current security game strategy, and counting corresponding movement information; judging whether the currently counted mobile information meets a preset game stop condition or not; if yes, stopping the round of game, determining a current defender score by using a preset score determining rule, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy. Through the scheme, the preset attacker and the preset defender can be controlled to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current security game strategy so as to complete the game of the current round and update the current security game strategy, so that the corresponding target network security game strategy is constructed based on the updated current security game strategy. Therefore, the learning effect of the network security game strategy can be improved, and the stability of the defending strategy in the network security game strategy is improved.

Referring to fig. 3, the embodiment of the invention discloses a specific method for constructing a network security game policy, which comprises the following steps:

s21, acquiring a preset resource grid cell diagram and preset proxy resources; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resources comprise preset defenders and preset aggressors, wherein the preset defenders comprise preset detectors for detecting the aggressors and preset executors for clearing the detected aggressors.

Step S22, obtaining the current resource information of each grid cell on the resource grid cell diagram; the current resource information is information including the number of terminal device resources contained in the current grid cell.

And S23, determining an initial attack position of the preset attacker, an initial detection position of the preset detector and an initial execution position of the preset executor in the preset resource grid cell diagram by using the current resource information and the current security game strategy.

In this embodiment, the preset defender and the preset attacker play games in the preset resource grid cell diagram, and the game can be divided into two stages: a distribution phase and a patrol phase. As shown in fig. 4, fig. 4 is a security gaming policy framework diagram disclosed in the present application.

In the allocation phase, when the game is zero and the game, common solutions include: nash Equilibrium (NE), SSE (Stackelberg Security Equilibrium, packlberg safety Equilibrium). In the present application, a preset resource grid cell map of n×n is constructed in advance, and is shared

A possible preset defender distribution mode, wherein n is as follows _d Is to preset the number of detectors, n _e The number of the actuators is preset. Considering that the optimization method using conventional LP (Linear Programming ) or MILP (Mixed-Integer Linear Programming, mixed integer linear programming) may cause the result to be not converged, the present application uses a gradient-based approximation method to determine allocation policies and uses auxiliary embedding to correct the deficiency of learning allocation information while representing allocation by using learning embedding, and also decomposes a policy into a component that can function in a low-dimensional space of an action representation and a component that can convert the representation into an actual action, wherein the representation can allow generalization of the action, thereby improving the learning speed while improving the performance thereof. The present application may generate a corresponding allocation data set from a preset resource grid cell map to train the auto-encoder neural network to learn the above-described allocated action space representation (embedding), which is a possible operation of proxy resources. The main embedded e distributed by the preset defender and the preset attacker can be obtained through the encoder network ^d 、e ^a Auxiliary embedding action representation e ^'d 、e ^'a . Simultaneously using neural networks to parameterize allocation policies of preset aggressors and preset defenders, one being a primary allocation policy network and the other being a secondary allocation policy network, in particular consisting of pi ^d (.∣s _i ；σ ^d )、π ^a (.∣s _i ；σ ^a ) And pi ^'d (.∣s _i ；σ ^'d )、π ^'a (.∣s _i ；σ ^'a ) Given, wherein σ ^d ,σ ^a ,σ ^'d ,σ ^'a Representing the corresponding weights of the neural network. Terminal equipment resource of preset resource grid cell diagramIs represented by a two-dimensional tensor, i.e. provided as state information s to the networks. />

Since the present application uses two distribution networks, the rewards R of both the pre-set attacker and the pre-set defender should be tied to both the primary/secondary distribution policy network. However, according to the nature of the contention policy optimization (COPO, coordinated Policy Optimization), the present application takes the training of the primary allocation policy network as an example to make corresponding adjustments (with the secondary allocation policy network), and updates the relevant weights of the primary allocation policy network by actively embedding the return R returned together with the secondary embedding, so that the combined action of the two allocation networks can be considered. The objective function of the preset defender and the preset attacker is the expected return, which can be expressed as:

R ^d (s,e ^d ,e ^a ,e' ^d ,e' ^a )dsde ^d de ^a

wherein F is ^d And F ^a Target functions of a preset defender and a preset attacker respectively, and F ^a ＝-F ^d 。

Simultaneously solving according to the target:

updating the weight of the corresponding network through COPO to obtain sigma ^d* Sum sigma ^a* Is as follows:

step S24, determining the mobile parameters of the preset attacker and the preset defender based on the attacker multi-parameter behavior model and the current security game strategy respectively; the attacker multi-parameter behavior model comprises the initial detection position, the initial execution position and the current resource information.

In this embodiment, the attacker multi-parameter behavior model used for determining the attack behavior policy in the patrol stage is determined according to the current resource information of the grid cell in the preset resource grid cell diagram and the initial allocation position of the preset defender, and the determination of the preset attacker on the target attack grid cell depends on the terminal device density of the grid cell, the terminal device density of the adjacent grid cells and the distance between the preset attacker and the initial allocation position of the preset defender. This is to characterize that a preset attacker will try to attack the area currently and around which the number of terminal equipment resources is high, while being as far away from the preset defender as possible, preventing detection and removal. The preset attacker calculates the corresponding attack score for each grid cell i, namely, the preset attacker calculates the standard distance between each grid cell and the initial allocation position of the preset defender, and calculates a weighted average score for each grid cell according to the terminal density of the grid cells and surrounding cells. In the next time set (epoode), the attack score of the grid cell is updated, specifically expressed as the following formula:

p _i,ep ＝p _i,ep +β·p _i,ep-1 ,episode＞1

Where λ is a measurement parameter, ω _i Representing the terminal density of grid cell i, eta is the influencing parameter of the adjacent cell, omega _j Representing the density of adjacent cell terminals of grid cell i, dist representing the distance from a preset attacker to the initial allocation location of a preset defenderBeta is the discount parameter. In the time step of each time set, the preset attacker can select the adjacent unit with the highest score to perform the next attack.

The present application models patrol phases as Partially Observable Markov Decision Process (POMDP), consisting of tuples

And (3) representing. Wherein S is the real state of the environment, +.>

Is a set of all possible actions, Ω→o _d ×O _e Is a set of all pre-set defender observations, wherein +.>

Is the observation set of all preset detectors, < +.>

Is the observation set of all preset actuators, +.>

Is the observation function of the corresponding preset defender, < ->

Is a state transition function, +.>

Is a reward function, and the discount factor gamma is E [0,1 ]]。

In this embodiment, a multi-agent decision DQN (deep Q-network) network (i.e., a Dueling DQN) can be used to calculate patrol strategies of the preset detectors and preset actuators, i.e., each preset actuator e is learned by the Dueling DQN _i Random strategy of (a)

Wherein->

Parameters representing the lasting DQN shared by all preset actuators; likewise, each preset detector d is learned using a lasting DQN _i Random strategy of (a)

Wherein->

Parameters representing the lasting DQN shared by all preset detectors. In order to enable more efficient collaboration between preset defenders, a multi-agent Dueling DQN can be trained using a centralized training and decentralized execution (CTDE, centralized Training Decentralized Execution, also called centralized training to centralized execution) framework, i.e. one centralized controller is used to control all preset actuators, while one centralized controller is used to control all preset detectors. Due to the features of the CTDE framework, while the Dueling DQN is trained intensively with experience of all preset actuators or all preset detectors that it controls itself, the actions of each preset actuator or preset detector are independent. The preset defenders can share the known information with each other through the communication action of the preset detector.

It should be noted that, determining the movement parameters of the preset defender based on the current security game policy may specifically include: determining a movement parameter of the preset detector based on the current security game strategy, and acquiring a detection result of the preset detector on a current detection position in the preset resource grid cell diagram; and determining the movement parameters of the preset executor according to the detection result and the current safety game strategy.

Step S25, controlling the preset attacker and the preset defender to move in the preset resource grid cell map based on the initial attack position of the preset attacker, the initial detection position of the preset detector, the initial execution position of the preset executor and the movement parameters of the preset attacker and the preset defender, and counting corresponding movement information.

And S26, judging whether the currently counted movement information meets a preset game stop condition or not.

And S27, if so, stopping the round of game, determining a current defender score by using a preset score determining rule, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy.

In this embodiment, the current security gaming strategy is first updated from pi at each update ^d 、π ^a And pi' ^d 、π' ^a The method comprises the steps of performing active action embedding and auxiliary action embedding on a medium sample, matching each action embedding to the closest distribution represented by the action embedding and the auxiliary action embedding by using cosine similarity measurement, and obtaining the specific distribution position information of a preset defender and a preset attacker by calculating the lower limit value of the action embedding through weighted sum.

As can be seen from the foregoing embodiments, these allocation information are used as an allocation of the initial location of the preset proxy resource, followed by a patrol phase to simulate a loop run n _t The distribution information of the environment for the preset defender and the preset attacker returns the score R of the current defender ^d And current aggressor score R ^a . The last corresponding assignment location information and the obtained score are used to update pi via COPO ^d (.∣s _i ；σ ^d )、π ^a (.∣s _i ；σ ^a ) And pi' ^d (.∣s _i ；σ' ^d )、π' ^a (.∣s _i ；σ' ^a ) Thereby completing the update of the current security gaming strategy.

Reference may be made to the corresponding disclosure in the foregoing embodiments for the specific implementation of step S21 and step S26, and no further description is given here.

In this embodiment, a preset resource grid cell diagram and a preset proxy resource are obtained; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resource comprises a preset defender and a preset attacker, wherein the preset defender comprises a preset detector for detecting the attacker and a preset executor for clearing the detected attacker; acquiring current resource information of each grid cell on the resource grid cell map; the current resource information is information including the number of terminal equipment resources contained in the current grid cell; determining an initial attack position of the preset attacker, an initial detection position of the preset detector and an initial execution position of the preset executor in the preset resource grid cell diagram by utilizing the current resource information and the current security game strategy; determining mobile parameters of the preset attacker and the preset defender based on the attacker multiparameter behavior model and the current security game strategy respectively; the attacker multi-parameter behavior model comprises the initial detection position, the initial execution position and the current resource information; controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram based on the initial attack position of the preset attacker, the initial detection position of the preset detector, the initial execution position of the preset executor and the movement parameters of the preset attacker and the preset defender, and counting corresponding movement information; judging whether the currently counted mobile information meets a preset game stop condition or not; if yes, stopping the round of game, determining a current defender score by using a preset score determining rule, and updating the current security game strategy according to the current defender score so as to construct a corresponding target network security game strategy based on the updated current security game strategy. In this way, the mobile parameters of the preset attacker and the preset defender are determined based on the attacker multi-parameter behavior model and the current security game policy respectively, so that the reality of the behavior of the preset attacker and the response effect of the preset defender to the complex security game scene can be improved, the defending effect of the network security game policy is improved, and the stability of the defending policy in the network security game policy is improved.

Referring to fig. 5, the application discloses a network security game policy building device, which includes:

a preset resource information acquiring module 11, configured to acquire a preset resource grid cell map and a preset proxy resource; the preset resource grid cell diagram is a diagram which is built in advance and comprises a plurality of terminal equipment resources in each grid cell; the preset proxy resource comprises a preset defender and a preset attacker, wherein the preset defender comprises a preset detector for detecting the attacker and a preset executor for clearing the detected attacker;

the proxy resource moving module 12 is configured to control the preset attacker and the preset defender to move in the preset resource grid cell diagram by using the attacker multi-parameter behavior model and the current security game policy, and count corresponding movement information;

a game stop condition judging module 13, configured to judge whether the currently counted movement information meets a preset game stop condition;

and the target security game policy construction module 14 is configured to stop the round of game and determine a current defender score by using a preset score determination rule if the round of game is yes, and update the current security game policy according to the current defender score so as to construct a corresponding target network security game policy based on the updated current security game policy.

In some embodiments, the game stop condition determining module 13 may specifically include:

the position updating frequency judging unit is used for judging whether the current position updating frequency in the mobile information is larger than a preset updating frequency threshold value or not;

and the game stop condition judging unit is used for judging that the currently counted mobile information meets the preset game stop condition if the current position updating times are larger than the preset updating times threshold value.

In some specific embodiments, the target security game policy building module 14 may specifically include:

and the defender score determining unit is used for determining the current defender score according to the number of preset attackers meeting the preset attacker clearing condition in the process of the round of game.

In some embodiments, the proxy resource movement module 12 may specifically include:

the preset network and framework control mobile unit is used for controlling the preset defenders to move in the preset resource grid cell diagram by utilizing a preset multi-agent decision DQN network, a preset centralized training and decentralized execution framework and the current safety game strategy.

The current safety game strategy updating unit is used for updating the current safety game strategy according to the current defender score;

an update stop condition judgment unit configured to judge whether a preset update stop condition is currently satisfied;

the step jump unit is used for re-jumping to the step of controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram by utilizing the attacker multi-parameter behavior model and the current safety game strategy if not;

and the network security game strategy construction unit is used for constructing a corresponding target network security game strategy based on the updated current security game strategy if the current security game strategy is the same.

a grid resource information obtaining unit, configured to obtain current resource information of each grid cell on the resource grid cell map; the current resource information is information including the number of terminal equipment resources contained in the current grid cell;

an initial position determining unit, configured to determine an initial attack position of the preset attacker, an initial detection position of the preset detector, and an initial execution position of the preset executor, which are located in the preset resource grid cell diagram, using the current resource information and a current secure game policy;

The mobile parameter determination submodule is used for determining mobile parameters of the preset attacker and the preset defender based on the attacker multi-parameter behavior model and the current safety game strategy respectively; the attacker multi-parameter behavior model comprises the initial detection position, the initial execution position and the current resource information;

and the proxy resource movement control unit is used for controlling the preset attacker and the preset defender to move in the preset resource grid cell diagram based on the initial attack position of the preset attacker, the initial detection position of the preset detector, the initial execution position of the preset executor and the movement parameters of the preset attacker and the preset defender.

In some specific embodiments, the movement parameter determination submodule may specifically include:

a detector movement parameter determining unit, configured to determine movement parameters of the preset detector based on the current security game policy;

a detection result obtaining unit, configured to obtain a detection result of the preset detector on a current detection position in the preset resource grid cell map;

and the actuator movement parameter determining unit is used for determining the movement parameters of the preset actuator according to the detection result and the current safety game strategy.

Further, the embodiment of the present application further discloses an electronic device, and fig. 6 is a structural diagram of the electronic device 20 according to an exemplary embodiment, where the content of the drawing is not to be considered as any limitation on the scope of use of the present application.

Fig. 6 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, where the computer program is loaded and executed by the processor 21 to implement relevant steps in the network security game policy building method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the network security game policy construction method performed by the electronic device 20 as disclosed in any of the previous embodiments.

Further, the application also discloses a computer readable storage medium for storing a computer program; the method for constructing the network security game strategy is characterized in that the computer program is executed by a processor to realize the method for constructing the network security game strategy. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The method for constructing the network security game strategy is characterized by comprising the following steps of:

2. The method for constructing a network security game policy according to claim 1, wherein said determining whether the currently counted movement information satisfies a preset game stop condition comprises:

3. The method of claim 1, wherein determining the current defender score using a preset score determination rule comprises:

4. The network security game policy construction method according to claim 1, wherein controlling the movement of the preset defender in the preset resource grid cell map by using the current security game policy comprises:

5. The method of claim 1, wherein the updating the current security gaming policy according to the current defender score to construct a corresponding target network security gaming policy based on the updated current security gaming policy comprises:

judging whether a preset update stop condition is met currently;

6. The method for constructing a network security game policy according to any one of claims 1 to 5, wherein controlling the movement of the preset attacker and the preset defender in the preset resource grid cell map by using an attacker multi-parameter behavior model and a current security game policy comprises:

7. The network security game policy construction method of claim 6, wherein determining movement parameters of the preset defender based on the current security game policy comprises:

8. A network security gaming policy building device, comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the network security gaming policy construction method of any of claims 1 to 7.

10. A computer readable storage medium for storing a computer program which when executed by a processor implements the network security gaming policy construction method of any of claims 1 to 7.