CN115550078A - Method and system for fusing scheduling and response of dynamic resource pool - Google Patents

Method and system for fusing scheduling and response of dynamic resource pool Download PDF

Info

Publication number
CN115550078A
CN115550078A CN202211537098.5A CN202211537098A CN115550078A CN 115550078 A CN115550078 A CN 115550078A CN 202211537098 A CN202211537098 A CN 202211537098A CN 115550078 A CN115550078 A CN 115550078A
Authority
CN
China
Prior art keywords
defense
strategy
resources
state
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211537098.5A
Other languages
Chinese (zh)
Other versions
CN115550078B (en
Inventor
陈玉强
秦峰
吴昊
陆月明
韩道岐
高佳琪
王成月
樊明睿
王秦君
王大明
徐文杰
陆文强
王占峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guoxin Blue Shield Technology Co ltd
Original Assignee
Beijing Guoxin Blue Shield Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guoxin Blue Shield Technology Co ltd filed Critical Beijing Guoxin Blue Shield Technology Co ltd
Priority to CN202211537098.5A priority Critical patent/CN115550078B/en
Publication of CN115550078A publication Critical patent/CN115550078A/en
Application granted granted Critical
Publication of CN115550078B publication Critical patent/CN115550078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • H04L63/205Network architectures or network communication protocols for network security for managing network security; network security policies in general involving negotiation or determination of the one or more network security mechanisms to be used, e.g. by negotiation between the client and the server or between peers or by selection according to the capabilities of the entities involved

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a method and a system for integrating dynamic resource pool scheduling and response, wherein the method comprises the following steps: s1: according to the characteristics of the entity defense resources, carrying out lightweight treatment on the entity defense resources, and adding the entity defense resources into a resource pool; s2: according to an application scene, defining the equipment state in the network, establishing strategy spaces of both attacking parties and defending parties and a profit space based on an attacking and defending strategy generating model, and generating an optimal defense resource scheduling strategy through Nash balance; s3: and performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of the equipment in the network after the network threat. According to the method and the system for integrating scheduling and responding of the dynamic resource pool, the dynamic security resource pool is constructed, so that light and flexible arrangement of security defense resources can be realized, and a foundation is laid for realizing adaptive generation of defense strategies. Meanwhile, cooperative defense for network threats can be realized, and the combined effect of security defense resources is further improved to the maximum extent.

Description

Method and system for fusing scheduling and response of dynamic resource pool
Technical Field
The invention relates to scheduling arrangement of defining, constructing and defending resources, belongs to the technical field of network space security, and particularly relates to a method and a system for fusing scheduling and response of a dynamic resource pool.
Background
The rapid development of network technology enables the internet of things to approach our lives day by day. Aiming at the complex and various network threats existing in the current network, the intelligent equipment cluster also provides requirements for dynamic combined defense threat of various security defense resources. Because the current safety equipment has various hardware and solidified positions, the dynamic deployment and elastic expansion of the safety defense function cannot be supported. Aiming at the complex and changeable threats existing in the internet of things, the equipment in the internet of things is generally set as network resources which are mutually contended between attackers and defenders at present. And further modeling and analyzing the network resource competition condition in the attack and defense game. The optimization problem of the participants is solved by adopting the related ideas of the game theory. The relevant characteristics of the game theory are very consistent with the characteristics of target confrontation, strategic dependence and non-cooperation of the attacking and defending parties in the network space. By describing the game process in the simulation defense by using the game theory, the selection problem of the optimal defense strategy can be solved. Different events in the network are modeled into different states, and the evolution of the different states in the network is described by using differential equations. The method realizes the discretization of the continuous time differential equation by adopting a Gauss-like Seidel implicit finite difference method, obtains a saddle point strategy by iteration, realizes the numerical display of the network state and obtains the optimal strategy selection.
However, the existing differential strategy analysis for the attack defense strategy of the internet of things mainly faces the following three problems: 1. at present, many researches on defense strategies are directly arranged aiming at virtual resources, but in an actual application scene, due to the fact that different design schemes exist in defense equipment, the hardware of security equipment of an access node is numerous, the positions of resource entities are solidified, dynamic deployment and elastic expansion of a security defense function cannot be supported, and meanwhile deployment of the defense strategies is influenced. 2. In the current research, after roughly dividing the network states, the optimal strategy is directly solved according to the interrelation among the states, and the actual scene is not considered comprehensively. 3. States of an attack and defense party in the network are constantly changed, unknown attacks may occur, defense strategies need to be adjusted and optimized in real time, and an existing strategy generation method adopting a game theory is lack of optimization and self-adaption processes.
Disclosure of Invention
The research of the patent aims at a scheduling and responding method of a fusion dynamic resource pool, container virtualization is carried out on defense resource equipment of an entity, and limitation existing in entity resources is solved. And meanwhile, a dynamic resource defense pool is constructed, the states of the devices in the network are defined according to different application scenes, and a differential equation is constructed to digitize the states of the network by combining the thought of a game theory. On the basis of completing initialization, game theory equations are solved by utilizing Nash equilibrium. Meanwhile, aiming at the situation that the network environment and the attack and defense state change continuously, a Q-Learning algorithm in reinforcement Learning is selected to explore a security defense strategy adaptive matching strategy, so that the optimization processing of security defense resources is achieved, and the goal of defense profit maximization is achieved.
The invention mainly comprises the following steps: the system comprises a dynamic resource pool construction and corresponding defense strategy scheduling optimization module, wherein the specific module comprises a dynamic construction module of a security defense resource pool, a defense resource scheduling strategy generation module and a defense strategy self-adaptive matching optimization module. Firstly, a dynamic construction method of a security defense resource pool is provided, which comprises the steps of virtualization of entity resources and containerization encapsulation of corresponding software. On the basis of dynamic construction of a security defense resource pool, an attack and defense environment space is constructed, the idea of a game theory is adopted, the profit of defense resources is maximized through solving, the effect optimization under the condition of utilizing limited resources is achieved, and meanwhile, the self-adaptive generation of defense strategies is realized on the basis of the existing strategies by utilizing a Q-learning method. The method comprises the following specific steps:
establishing a security defense resource pool, performing personalized resource virtualization operation according to the characteristics of different defense resources, and adding the corresponding defense resources into the resource pool for generating a subsequent strategy.
The method comprises the steps of establishing a defense strategy generation method based on a game theory, establishing a corresponding equipment state according to an actual application scene, establishing strategy spaces of an attacking party and a defense party and a profit space, and obtaining an optimal defense strategy arrangement condition under the current condition by utilizing a solution method of Nash equilibrium.
A defense resource self-adaptive module is established, and due to the fact that the attacking and defending environment has the characteristic of continuous change, on the basis of the research, a set of safety defense strategy self-adaptive matching strategies are explored through the reinforcement Learning Q-Learning algorithm, and therefore the effect maximization of the safety defense resources is achieved. The target of the optimal security defense profit is used as a feedback model to realize the adaptability of strategy selection through iteration Q value, the strategy selection is usually carried out according to the direction of the maximum Q value, and the Q table is updated through continuous iteration modes of 'security strategy-defense state-security strategy' in the selection process so as to maximize the Q value and realize algorithm convergence.
The specific technical scheme of the invention is as follows:
in a first aspect, the present invention provides a method for fusing scheduling and response of a dynamic resource pool, including the following steps:
s1: according to the characteristics of the entity defense resources, carrying out lightweight treatment on the entity defense resources, and adding the entity defense resources into a resource pool;
s2: according to an application scene, defining the state of equipment in a network, establishing strategy spaces of an attack and defense party and a profit space based on an attack and defense strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
s3: and performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, the S1 further comprises:
s11: acquiring the characteristics of entity defense resources;
s12: judging the entity defense resource type, and if the entity defense resource type is a software program, performing containerization processing; if the program is a non-software program, virtualizing the material resources through the virtual machine;
s13: and adding the entity defense resources processed through light weight into the resource pool.
In some embodiments, the S2 comprises:
s21: according to an application scene, defining the states of equipment in the network, including a defense state, an attack state, a normal state, an attack state and a paralysis state;
s22: according to the conversion relation between the equipment states, an attack and defense strategy generation model is established and defined as
Figure 521496DEST_PATH_IMAGE001
Wherein
Figure 625587DEST_PATH_IMAGE002
On behalf of the attacker, the system will,
Figure 936483DEST_PATH_IMAGE003
the person to be defended is represented by a person,
Figure 394009DEST_PATH_IMAGE004
representing the selection of attack strategies of different strengths,
Figure 5119DEST_PATH_IMAGE005
on behalf of the nth attack strategy,
Figure 358740DEST_PATH_IMAGE006
the representative of choosing a defense strategy of different strengths,
Figure 422511DEST_PATH_IMAGE007
on behalf of the nth defense strategy, it is,
Figure 152569DEST_PATH_IMAGE008
representing the probability of selecting different strength attack strategies,
Figure 883765DEST_PATH_IMAGE009
representing the probability of selecting an nth attack strategy,
Figure 408287DEST_PATH_IMAGE010
representing the probability of selecting a different defense strategy,
Figure 693775DEST_PATH_IMAGE011
representing the probability of selecting the nth defense strategy,
Figure 961945DEST_PATH_IMAGE012
a revenue function representing the party to the attacker,
Figure 282068DEST_PATH_IMAGE013
a revenue function representing the defender of the defender,
Figure 508650DEST_PATH_IMAGE014
on behalf of the i-th attack strategy,
Figure 281434DEST_PATH_IMAGE015
represents the firstj kinds of defense strategies;
s23: and calculating an optimal defense strategy according to Nash equilibrium and a defense strategy generation model and taking the minimized system loss as a target.
In some embodiments, the S3 comprises:
s31: setting a plurality of defense strategies, counting the defense states of the safety defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet the defense requirements or not;
s32: setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources with excessive defense;
s33: and setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
In a second aspect, the present invention provides a system for merging dynamic resource pool scheduling and response, including:
the dynamic resource pool building module is used for carrying out light weight treatment on the entity defense resources according to the characteristics of the entity defense resources and adding the entity defense resources into the resource pool;
the defense strategy generation module is used for defining the equipment state in the network according to an application scene, establishing strategy spaces and income spaces of both attacking parties and defending parties based on an attacking and defending strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
and the defense resource self-adaptive module is used for carrying out iterative optimization on the optimal defense strategy by utilizing a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, the dynamic resource pool building module comprises:
the defense resource characteristic acquisition submodule is used for acquiring the defense resource characteristics of the entity;
the defense resource lightweight submodule is used for judging the entity defense resource type, and if the entity defense resource type is a software program, containerization processing is carried out; if the program is a non-software program, virtualizing the material resources through the virtual machine;
and the defense resource adding submodule is used for adding the entity defense resources subjected to the lightweight processing into the resource pool.
In some embodiments, the defense policy generation module includes:
the device state definition submodule is used for defining the device states in the network according to the application scene, wherein the device states comprise a defense state, an attack state, a normal state, an attack state and a paralysis state;
the attack and defense strategy generation model establishing submodule is used for establishing an attack and defense strategy generation model according to the conversion relation between the equipment states and is defined as
Figure 87716DEST_PATH_IMAGE001
Wherein
Figure 527925DEST_PATH_IMAGE002
On behalf of the attacker, the system will,
Figure 394250DEST_PATH_IMAGE003
the person who stands for the defense can take the defense,
Figure 654330DEST_PATH_IMAGE004
the representative chooses the attack strategy of different strengths,
Figure 264302DEST_PATH_IMAGE005
on behalf of the nth type of attack strategy,
Figure 293438DEST_PATH_IMAGE006
the representatives choose a defense strategy of different strengths,
Figure 861823DEST_PATH_IMAGE007
on behalf of the nth defense strategy, it is,
Figure 78041DEST_PATH_IMAGE008
representing the probability of selecting different strength attack strategies,
Figure 757284DEST_PATH_IMAGE009
representing the probability of selecting an nth attack strategy,
Figure 640926DEST_PATH_IMAGE010
representing the probability of selecting a different defense strategy,
Figure 849054DEST_PATH_IMAGE011
representing the probability of selecting the nth defense strategy,
Figure 146043DEST_PATH_IMAGE012
a revenue function representing the party to the attacker,
Figure 628977DEST_PATH_IMAGE013
a revenue function representing the defender of the defender,
Figure 898284DEST_PATH_IMAGE014
on behalf of the i-th attack strategy,
Figure 542892DEST_PATH_IMAGE015
represents a jth defense strategy;
and the optimal defense strategy generation submodule is used for calculating the optimal defense strategy according to Nash equilibrium and a defense strategy generation model and with the aim of minimizing system loss.
In some embodiments, the defensive resource adaptation module further comprises:
the initial defense state matrix construction submodule is used for setting a plurality of defense strategies, counting the defense states of all the security defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet defense requirements or not;
the action set setting submodule is used for setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources of excessive defense;
and the reward function setting submodule is used for setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
In a third aspect, the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the above when executing the computer program.
In a fourth aspect, the invention provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the method of any one of the above.
The beneficial effect of this application is:
according to the method and the system for integrating scheduling and responding of the dynamic resource pool, the construction of an integrating module for scheduling and responding of the dynamic resource pool is realized through the technical means. Aiming at the problem that entity resources cannot be fully utilized due to the fact that strategy generation is carried out on virtual resource analysis directly, resource virtualization research is carried out, the entity resources are virtualized on the basis of analyzing the equipment characteristics and safety requirements of existing resources, a dynamic resource pool is constructed, and a foundation is laid for research of defense strategies. Furthermore, aiming at the problem that the existing state division is roughly lack of comprehensive analysis, a more completely covered state space is established, convenience is provided for the optimal selection of strategies by adopting the game theory idea, the complexity of analysis is reduced, the strategy generation speed is increased, and better coping strategies are provided in the face of different attacks in the OWASP TOP 10. On the basis of the existing strategy generation, the continuous change of the attack and defense environment and the occurrence of unknown attacks are considered, the self-adaptive adjustment and optimization of the defense strategy is realized by adopting a reinforcement Learning Q-Learning algorithm, and the possibility of the occurrence of defense holes is reduced.
Drawings
FIG. 1 is a schematic overall flow diagram of the present application;
FIG. 2 is a flowchart of a method for merging dynamic resource pool scheduling and response according to the present application;
FIG. 3 is a sub-flowchart of step S1 of the present application;
FIG. 4 is a sub-flowchart of step S2 of the present application;
fig. 5 is a sub-flowchart of step S3 of the present application.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
In order that the above objects, features and advantages of the present application can be more clearly understood, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. The specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the application. All other embodiments that can be derived by one of ordinary skill in the art from the description of the embodiments are intended to be within the scope of the present disclosure.
It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Fig. 1 is a schematic overall flow chart of the present application. The research of the patent aims at a scheduling and responding method of a fusion dynamic resource pool, and the defense resource equipment of the entity is subjected to container virtualization, so that the limitation of the entity resource is solved. And meanwhile, a dynamic resource defense pool is constructed, the states of the devices in the network are defined according to different application scenes, and a differential equation is constructed to digitize the states of the network by combining the thought of a game theory. On the basis of completing initialization, game theory equations are solved by utilizing Nash equilibrium. Meanwhile, aiming at the situation that the network environment and the attack and defense state change continuously, a Q-Learning algorithm in reinforcement Learning is selected to explore a security defense strategy adaptive matching strategy, so that the optimization processing of security defense resources is achieved, and the goal of defense profit maximization is achieved.
The invention mainly comprises the following steps: the system comprises a dynamic resource pool construction and corresponding defense strategy scheduling optimization module, wherein the specific module comprises a dynamic construction module of a security defense resource pool, a defense resource scheduling strategy generation module and a defense strategy self-adaptive matching optimization module. Firstly, a dynamic construction method of a security defense resource pool is provided, which comprises the steps of virtualization of entity resources and containerization packaging of corresponding software. On the basis of dynamic construction of a security defense resource pool, an attack and defense environment space is constructed, the game theory idea is adopted, the benefit of defense resources is maximized through solving, the effect optimization under the condition of utilizing limited resources is achieved, and meanwhile, the self-adaptive generation of defense strategies is achieved on the basis of the existing strategies through the Q-learning method. The method comprises the following specific steps:
establishing a security defense resource pool, performing personalized resource virtualization operation according to the characteristics of different defense resources, and adding the corresponding defense resources into the resource pool for generating a subsequent strategy.
Establishing a defense strategy generation method based on a game theory, establishing a corresponding equipment state according to an actual application scene, simultaneously establishing strategy spaces of an attacking party and a defense party and a profit space, and obtaining an optimal defense strategy arrangement condition under the current condition by utilizing a solution method of Nash equilibrium.
A defense resource self-adaptive module is established, and due to the fact that the attacking and defending environment has the characteristic of continuous change, on the basis of the research, a set of safety defense strategy self-adaptive matching strategies are explored through the reinforcement Learning Q-Learning algorithm, and therefore the effect maximization of the safety defense resources is achieved. The target of the optimal security defense profit is used as a feedback model to realize the adaptability of strategy selection through iteration Q value, the strategy selection is usually carried out according to the direction of the maximum Q value, and the Q table is updated through continuous iteration modes of 'security strategy-defense state-security strategy' in the selection process so as to maximize the Q value and realize algorithm convergence.
A method for merging scheduling and response of dynamic resource pool, which is combined with fig. 1, and includes the following steps:
s1: according to the characteristics of the entity defense resources, carrying out lightweight treatment on the entity defense resources, and adding the entity defense resources into a resource pool;
in some embodiments, in combination with fig. 2, which is a sub-flowchart of step S1 of the present solution, said S1 further includes:
s11: acquiring the characteristics of entity defense resources;
s12: judging the entity defense resource type, and if the entity defense resource type is a software program, performing containerization processing; if the program is a non-software program, virtualizing the material resources through the virtual machine;
s13: and adding the entity defense resources processed through light weight into the resource pool.
Specifically, the core task of building a defensive resource pool is virtualization, which can be defined as a technique to create logical services through resources running on hardware. I.e., virtualization distributes the ability of a network, storage, server, or application, etc. resource among multiple users or environments. In particular, operating system level virtualization techniques provide an isolated computing environment, referred to as a container, within a common operating system. A container is a set of one or more processes isolated from the rest of the system. It includes an application and all its library dependencies and configuration files. Furthermore, the container has a reproducible execution environment, light lifecycle management, and is closer to metal performance than traditional virtual machine deployments. The virtualization can select different modes to execute light processing, the light processing mainly comprises two modes of a virtual machine and a container, and different virtualization modes can be adopted according to different safety state requirements and environment configuration requirements of equipment. The traditional safety function can be separated from the special safety hardware through virtualization and run on a general server.
Wherein container technology is primarily directed to virtualization of software applications. The container avoids the problem of adaptability by combining the software code with the required environment, thus enabling users and developers to build, deploy and maintain applications flexibly independent of the underlying infrastructure through the containerization process. Meanwhile, the isolated function is utilized to reduce the consumption of resources and make the deployment more flexible. In other virtualization aspects, the virtual machine has better superiority and applicability. The virtual machine has a research foundation for many years, and physical resources such as a CPU (central processing unit), hardware and the like of a physical computer can be virtualized through the virtual machine, so that a user can conveniently perform flexible scheduling operation, and the virtual machine also has higher safety. By aiming at the self characteristics and the security levels of different devices, different light weight technologies are selected, wherein the light weight technologies comprise virtual machines and containerization virtualization technologies, and by utilizing the technologies, the traditional security function can be separated from special security hardware and run on a universal server, so that a dynamic resource pool is constructed, and necessary execution conditions are provided for the following scheduling strategies.
In the execution process of the container, the container orchestrator is responsible for large-scale management and organization of the micro-service architecture, handling automation and lifecycle management of the containers and services in the cluster. The container organizer is made up of five modules. The scheduling module is responsible for determining the best position for completing the incoming task, and the resource allocation module reserves cluster resources according to a request-based method, wherein a static method and a dynamic method exist. The load balancing module is responsible for distributing tasks across container instances based on fairness, cost energy, or priority criteria. The admission control module is responsible for checking whether there are enough resources in the cluster to run the user's job and never exceeds the quota allocated to it. And, the billing module monitors the available resources of the user, while the monitoring module tracks the real-time resource consumption metrics of each node and collects metrics related to the health of the resources to support a fault tolerant system. The scheduling module is a core module of the container orchestrator and is also a core step for constructing a dynamic resource pool.
Meanwhile, it should be noted that the dynamic resource pool includes various device resources in the network, and the resources are entity resources virtualized by virtual machine and container operations, including network devices in different states mentioned in the attack and defense policy model established in S2, such as devices in a defense state, an attack state, a normal state, an intermediate state which is attacked but is not affected temporarily, and a paralysis state. And in the S2, the definition of the related state space of the network equipment is based on the resource definition after the virtualization in the S1, wherein a corresponding mapping relation exists. The optimization in S3 is also based on the policy space defined in S2, so that the virtual resources in the policy have a one-to-one correspondence with the resources in the dynamic resource pool, and after the corresponding resource arrangement policy is generated in S3, the policy is handed to the control center in the dynamic resource pool to perform the corresponding operation.
S2: according to an application scene, defining the equipment state in the network, establishing strategy spaces of both attacking parties and defending parties and a profit space based on an attacking and defending strategy generating model, and generating an optimal defense resource scheduling strategy through Nash balance;
in some embodiments, in combination with the sub-flowchart of step S2 of fig. 3, the present solution, said S2 includes:
s21: according to an application scene, defining the states of equipment in the network, including a defense state, an attack state, a normal state, an attack state and a paralysis state; s22: establishing an attack and defense strategy generation model according to the conversion relation between the equipment states, and defining as
Figure 858336DEST_PATH_IMAGE001
Wherein
Figure 144960DEST_PATH_IMAGE002
On behalf of the attacker, the system may,
Figure 3195DEST_PATH_IMAGE003
the person to be defended is represented by a person,
Figure 84283DEST_PATH_IMAGE004
representing the selection of attack strategies of different strengths,
Figure 559127DEST_PATH_IMAGE005
on behalf of the nth attack strategy,
Figure 383864DEST_PATH_IMAGE006
the representative of choosing a defense strategy of different strengths,
Figure 627763DEST_PATH_IMAGE007
on behalf of the nth kind of defense strategy,
Figure 348595DEST_PATH_IMAGE008
representing the probability of selecting different strength attack strategies,
Figure 791295DEST_PATH_IMAGE009
representing the probability of selecting the nth attack strategy,
Figure 888564DEST_PATH_IMAGE010
representing the probability of selecting a different defense strategy,
Figure 721390DEST_PATH_IMAGE011
representing the probability of selecting the nth defense strategy,
Figure 144282DEST_PATH_IMAGE012
a revenue function representing the attacker's role,
Figure 328138DEST_PATH_IMAGE013
a revenue function representing the defender,
Figure 760257DEST_PATH_IMAGE014
on behalf of the i-th attack strategy,
Figure 447590DEST_PATH_IMAGE015
represents a jth defense strategy;
s23: and calculating an optimal defense strategy according to Nash equilibrium and a defense strategy generation model and taking the minimized system loss as a target.
Specifically, in order to obtain an optimal defense strategy, the scheme carries out modeling description on the states of the attacking and defending parties in the current network, adopts the idea of dynamic game theory, creates corresponding differential equations for different states of the attacking and defending parties in the network, and simultaneously utilizes Nash equilibrium to complete the solution of the optimal defense strategy.
Firstly, because of the existence of complex and various defense resources in the resource pool, a reasonable defense resource scheduling strategy is required to carry out overall planning on the resources therein. The dynamic game theory is a typical method for solving the problem of attack and defense game. In step S21, an Attack and Defense Strategy Generation Model (ADSGM) is defined, which includes a plurality of device states, for example, D represents a defense state, that is, the device related to the internet of things in the defense state is controlled by a defender to generate an advantageous effect on defense; a represents an attack state, namely the related equipment of the Internet of things in the attack state is controlled by an attacker and can bring benefits to the attacker; n represents a normal state, namely normal work in the Internet of things, and equipment which has certain loopholes and is possibly attacked exists; i represents equipment which is under attack but has no influence on the temporary service function, and the equipment is paralyzed and cannot provide service under the strong attack condition, and can recover to a normal state and continue to provide corresponding service under the successful defense condition; m stands for the paralyzed state, and after the equipment is attacked, the equipment can be shut down under the condition that the equipment cannot be processed, so that the influence on other equipment is reduced, and meanwhile, after a certain time of waiting, the equipment can be restarted to start to continuously provide services. For example, a plurality of resources exist in the defense resource pool, when the resources are overlapped, resource redundancy is generated, and under the condition that partial resources are paralyzed due to strong network attack, the resource redundancy can be utilized to quickly switch to other similar resources, so that the task is ensured to be carried out, and the stability of the system is enhanced. The specific switching mode is to generate a corresponding strategy after acquiring the corresponding network state through reinforcement learning, and then transmit the strategy to a control center of the resource pool to perform corresponding operation.
Further, in step S22, the attack and defense strategy is modeled and defined as
Figure 775803DEST_PATH_IMAGE001
(ii) a Wherein
Figure 181377DEST_PATH_IMAGE002
On behalf of the attacker, the system will,
Figure 886027DEST_PATH_IMAGE003
the person to be defended is represented by a person,
Figure 427867DEST_PATH_IMAGE004
representative selection of different intensitiesThe attack strategy of (2) is that,
Figure 926982DEST_PATH_IMAGE005
on behalf of the nth attack strategy,
Figure 288693DEST_PATH_IMAGE006
the representative of choosing a defense strategy of different strengths,
Figure 797035DEST_PATH_IMAGE007
on behalf of the nth defense strategy, it is,
Figure 255698DEST_PATH_IMAGE008
representing the probability of selecting different strength attack strategies,
Figure 456872DEST_PATH_IMAGE009
representing the probability of selecting an nth attack strategy,
Figure 571458DEST_PATH_IMAGE010
representing the probability of selecting a different defense strategy,
Figure 86753DEST_PATH_IMAGE011
representing the probability of selecting the nth defense strategy,
Figure 603185DEST_PATH_IMAGE012
a revenue function representing the attacker's role,
Figure 444102DEST_PATH_IMAGE013
a revenue function representing the defender,
Figure 45985DEST_PATH_IMAGE014
on behalf of the i-th attack strategy,
Figure 630550DEST_PATH_IMAGE015
represents a j-th defense strategy in which the attacker's goal is to maximize the loss function of the system and the defender's goal is to minimize the system loss, thus achieving a zero-sum game.
Further, in step S23, the coefficient K1 is defined as the loss caused by the attack when the device changes from the normal state to the affected state I. The coefficient K2 is defined as the loss reduced by the defence system when the device transitions from the affected state I to the normal state N. When the equipment is converted into the M fault state, the coefficient K3 is loss caused by equipment closing, and when the equipment is recovered from the fault state to the normal state, the coefficient K4 is benefit brought by a defense strategy. The attack return function at this time is:
Figure 1489DEST_PATH_IMAGE016
the defense return function is:
Figure 747728DEST_PATH_IMAGE017
meanwhile, the cost functions of the attacker and defender are respectively defined as follows:
Figure 571327DEST_PATH_IMAGE018
Figure 959583DEST_PATH_IMAGE019
the cost functions of the attacker and defender and the system are as follows:
Figure 185028DEST_PATH_IMAGE020
wherein ca and cd represent an attack cost coefficient and a defense cost coefficient, respectively;
Figure 367748DEST_PATH_IMAGE021
Figure 944223DEST_PATH_IMAGE022
Figure 870590DEST_PATH_IMAGE023
Figure 950542DEST_PATH_IMAGE024
respectively representing the possibility of converting from the normal state N to the attacked but unaffected state I, the possibility of converting from the attacked but unaffected state I to the paralyzed state M, the possibility of converting from the attacked but unaffected state I to the normal state N, and the possibility of converting from the paralyzed state M to the normal state N; d (t), A (t), N (t), M (t) and I (t) respectively represent the number of nodes in a defense state D, an attack state A, a normal state N, a paralyzed state M and an attacked but unaffected state I in the system at time t;
Figure 163217DEST_PATH_IMAGE025
on behalf of the i-th attack strategy,
Figure 961409DEST_PATH_IMAGE026
representing the probability of selecting the ith attack strategy,
Figure 222626DEST_PATH_IMAGE027
Figure 157084DEST_PATH_IMAGE028
the strategy of the seed defense is that the seed defense is,
Figure 947186DEST_PATH_IMAGE029
representing the probability of selecting the i-th defense strategy.
But as the game theory research proves that each limited strategic game has a mixed strategy Nash equilibrium. There is a mixed attack defense strategy considering attackers and defenders. The above defines the strategy and probability of attack and defense, and can obtain the utility of the attacker and defender:
Figure 232673DEST_PATH_IMAGE030
Figure 500844DEST_PATH_IMAGE031
the related research proves that the mixing strategy has probability distribution
Figure 555387DEST_PATH_IMAGE032
The condition (b) is nash equilibrium, i.e. there is:
Figure DEST_PATH_IMAGE033
wherein,
Figure 313128DEST_PATH_IMAGE032
a probability distribution for satisfying the above-mentioned attacker and defender utility equations; nash equilibrium specifically means that the probability of attack to any one of different strengths
Figure 351491DEST_PATH_IMAGE034
When the probability distribution satisfies
Figure 688931DEST_PATH_IMAGE032
The time aggressor utilities are all equal to or greater than the probability distribution
Figure DEST_PATH_IMAGE035
Temporal attacker utility; probability of defence against any one of different strengths
Figure 394719DEST_PATH_IMAGE036
When the probability distribution satisfies
Figure 526623DEST_PATH_IMAGE032
The defender effectiveness of the time is greater than or equal to the probability distribution
Figure 786703DEST_PATH_IMAGE037
The effectiveness of defending people in time.
In the process of solving the Nash equilibrium, a minimax method is adopted to solve in consideration of the process of the zero sum game. Computing the optimal hybrid strategy is then equivalent to computing the maximum minimum strategy, which minimizes the maximum expected utility that an adversary can obtain.
Figure 396676DEST_PATH_IMAGE038
Where k represents the expected maximum benefit of the attacker, m represents different attack strategies, and n represents different defense strategies. At this time, the defender aims to minimize the k value, and then, the problem can be converted into a linear programming problem to be solved.
S3: and performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, in combination with the sub-flowchart of step S3 of fig. 4, that is, the present solution, said S3 includes:
s31: setting a plurality of defense strategies, counting the defense states of the safety defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet the defense requirements or not;
s32: setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources of excessive defense;
s33: and setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
Specifically, in order to complete adaptive research of defense strategies on the basis of solving optimal strategies, comprehensive analysis of the incoming known and unknown network threats is required, and data feedback capable of being analyzed and quantified is formed. For known and unknown network threats, the method can be mainly divided into 7 types of attack behaviors, namely 7 steps of a network killing chain, namely reconnaissance, weaponization, delivery, vulnerability exploitation, installation, command and control and target action. Where reconnaissance refers to an attacker collecting data about targets and attack strategies. Including collecting email addresses and collecting other information. Intruders use automated scanners to find vulnerability points in the system. Scanning a firewall, an intrusion prevention system and the like to obtain an entrance point of an attack; weaponization refers to an attacker exploiting security vulnerabilities to develop malware. Attackers design malware based on their needs and intent. This process also involves attackers attempting to reduce the chance of detection by an organization's existing security solutions; delivery refers to an attacker spreading weaponized malware through phishing emails or some other medium, with the most common delivery media for weaponized payloads including websites, removable disks, and emails; exploit refers to malicious code being delivered into an organization's system. The boundary is broken here. Attackers have the opportunity to take advantage of an organization's system by installing tools, running scripts, and modifying security credentials; the installation means that the malicious software installs a backdoor or remotely accesses a Trojan horse to provide access authority to an intruder; command and control refers to the ability of an attacker to control an organization's systems and networks. An attacker gains access to a privileged account and attempts a brute force attack, searches credentials, and changes permissions to take over control; the target action is that the attacker finally extracts data from the system. The goal relates to collecting, encrypting, and extracting confidential information from an organization's environment.
For the 7 types of attack behaviors, targeted defense strategies can be respectively corresponding, for a reconnaissance tracking stage, the stage is often a stage with less perception of a defender, and in the stage, attention should be paid to abnormal flow, logs and data (particularly leaked data), the abnormal flow, the logs and the data are stored for future reference, an analysis model is established and optimized, and corresponding state space conditions are updated in time; for a weapon construction stage, the behavior of the stage is almost imperceptible to a defender and is closely related to 'resources' of the defender, and the weapon construction needs vulnerabilities or defects (including operating systems, application software and social engineering) based on the 'resources', so that whether vulnerabilities, patches and repair processes related to 'assets' are complete or not is concerned in the stage; for the delivery stage, the delivery stage is particularly important, the corresponding defense strategy is updated in time, and the corresponding strategy is adjusted in time according to the action of an attacker to generate an optimal defense strategy; for the vulnerability utilization stage, safety detection, safety monitoring, blocking and audit are required, namely daily safety monitoring work, and the stage of utilizing a defense strategy to carry out defense is also required; for the installation implantation stage, the most important is to find and isolate in the shortest time, pay attention to the terminal/server security management strategy and anti-virus, block and close the equipment with problems in time and update the corresponding security strategy; a command and control stage, which is the last chance of the defender preventing the attack, and the defender can control the influence if the opponent can not send the command, and the stage is the last attempt of testing the response strategy, so the access control is paid more attention to, and the corresponding defense strategy is formed in time; for the target achievement stage, the target of the attacker is achieved in the stage, and the defender needs to reduce the influence of the attack on the whole system as much as possible at the moment, recover the paralyzed equipment in time, eliminate the danger and perfect the strategy at the same time.
On the basis, the defense strategy is self-adaptively researched by using a reinforcement learning Q-learning algorithm, so that the utility of the resource is optimized. The scheme selects and adopts the reinforcement Learning Q-Learning algorithm, the algorithm can realize the adaptivity of strategy selection by taking an iterative Q value as a feedback model, the selection of the strategy is usually carried out according to the direction of the maximum Q value, and the Q table is updated in the selection process through a continuous iterative mode of 'strategy generation-defense state-strategy generation' so as to maximize the Q value, realize the convergence of the algorithm and further find out the optimal defense strategy. The implementation of the Q-Learning algorithm requires setting of states, policies, and reward functions.
First, in step S31, an initial protection state matrix is set. In the defense strategy self-adaptive matching optimization stage, the real-time protection state of the system to the network threat is the key of security defense strategy optimization, and the defense states of all security defense strategies after the network threat are counted by formulating an initial security defense strategy. And if the defense state meets the defense requirement, the state value of the corresponding cyber-threat-security policy is marked as 1, if the defense state does not meet the defense requirement, the state value of the corresponding cyber-threat-security policy is marked as-1, and if no corresponding result exists, the state value is set as 0, so that an initial defense state matrix is constructed.
Then, in step S32, an operation set is set. The setting of the action set needs to be set according to the defense state matrix, the reconfiguration of the security policy is regarded as an action, the action comprises the increase and decrease of the virtual function set contained in the security policy, and all the virtual functions are to be selected in the resource pool. The strategy of insufficient defense is supplemented or replaced with resources, and the resources of excessive defense are simplified, so that the effect of safely defending the resources is achieved.
Finally, in step S33, a reward function is formulated. After an action is executed, a reward value is acquired by using a reward function, and a corresponding Q value in a Q table is updated, wherein the Q table is a mapping table between defense states, the action and the reward. The goal of this approach is to maximize the utility of the use of security defense resources, and from the analysis of the defense state matrix and the corresponding defense states, the system will perform actions to optimize the policy optimization of the next round of security resources. Aiming at the condition that the defense state matrix is 1, when the action execution is to reduce the security defense resource configuration or is not changed, in the next round of security threat, if the corresponding security policy can maintain the defense state matrix to be 1, the reward value of the correspondingly updated network threat-security policy is set to be 1, otherwise, the reward value is set to be-1; aiming at the situation that the defense state matrix is-1, when the action execution is to increase the security defense resource allocation, in the next round of security threat, if the corresponding security policy can maintain the defense state matrix as 1, the reward value of the network threat-security policy after corresponding update is set as 1, otherwise, the reward value is set as-1, in the process, the reward value is increased in an accumulated mode, and therefore the reward matrix of the network threat-security policy, namely a Q table, is finally obtained so as to obtain the optimal security defense policy. Meanwhile, the optimized defense strategy adjusts and optimizes the knowledge migration model of the strategy.
The second aspect of the present invention further provides a system for merging scheduling and response of dynamic resource pools, including:
the dynamic resource pool building module is used for carrying out light weight treatment on the entity defense resources according to the characteristics of the entity defense resources and adding the light weight treatment into the resource pool;
the defense strategy generation module is used for defining the equipment state in the network according to an application scene, establishing strategy spaces of an attack and defense party and a profit space based on an attack and defense strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
and the defense resource self-adaptive module is used for performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, the dynamic resource pool building module comprises:
the defense resource characteristic acquisition submodule is used for acquiring the defense resource characteristics of the entity;
the defense resource lightweight submodule is used for judging the entity defense resource type and carrying out containerization processing if the entity defense resource type is a software program; if the program is a non-software program, virtualizing the material resources through the virtual machine;
and the defense resource adding submodule is used for adding the entity defense resources processed by light weight into the resource pool.
In some embodiments, the defense policy generation module includes:
the device state definition submodule is used for defining the device states in the network according to the application scene, wherein the device states comprise a defense state, an attack state, a normal state, an attack state and a paralysis state;
the attack and defense strategy generation model establishing submodule is used for establishing an attack and defense strategy generation model according to the conversion relation between the equipment states and is defined as
Figure 160233DEST_PATH_IMAGE039
Wherein
Figure 525355DEST_PATH_IMAGE040
On behalf of the attacker, the system may,
Figure 81188DEST_PATH_IMAGE041
the person who stands for the defense can take the defense,
Figure 494852DEST_PATH_IMAGE042
the representative chooses the attack strategy of different strengths,
Figure 378494DEST_PATH_IMAGE043
on behalf of the nth attack strategy,
Figure 117780DEST_PATH_IMAGE044
the representative of choosing a defense strategy of different strengths,
Figure 86873DEST_PATH_IMAGE045
on behalf of the nth defense strategy, it is,
Figure 569807DEST_PATH_IMAGE046
representing the probability of selecting different strength attack strategies,
Figure 104693DEST_PATH_IMAGE047
representing the probability of selecting an nth attack strategy,
Figure 14881DEST_PATH_IMAGE048
representing the probability of selecting a different defense strategy,
Figure 2428DEST_PATH_IMAGE049
representing the probability of selecting the nth defense strategy,
Figure 757895DEST_PATH_IMAGE050
a revenue function representing the attacker's role,
Figure 616129DEST_PATH_IMAGE051
a revenue function representing the defender,
Figure 431638DEST_PATH_IMAGE025
on behalf of the i-th attack strategy,
Figure 375324DEST_PATH_IMAGE052
representing the jth defense strategy.
And the optimal defense strategy generation submodule is used for calculating the optimal defense strategy according to Nash equilibrium and a defense strategy generation model and with the aim of minimizing system loss.
In some embodiments, the defensive resource adaptation module further comprises:
the initial defense state matrix construction submodule is used for setting a plurality of defense strategies, counting the defense states of all the security defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet defense requirements or not;
the action set setting submodule is used for setting an action set, supplementing or replacing defense resources for a strategy with insufficient defense, and reducing the defense resources for resources with excessive defense;
and the reward function setting submodule is used for setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
The third aspect of the invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as claimed in any one of the above when executing the computer program.
The fourth aspect of the invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any one of the above.
It will be understood by those skilled in the art that although some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments.
Those skilled in the art will appreciate that the description of each embodiment has a respective emphasis, and reference may be made to the related description of other embodiments for those parts of an embodiment that are not described in detail.
Although the embodiments of the present application have been described in conjunction with the accompanying drawings, those skilled in the art will be able to make various modifications and variations without departing from the spirit and scope of the application, and such modifications and variations are included in the specific embodiments of the present invention as defined in the appended claims, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of various equivalent modifications and substitutions within the technical scope of the present disclosure, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for fusing scheduling and response of a dynamic resource pool is characterized by comprising the following steps:
s1: according to the characteristics of the entity defense resources, carrying out lightweight treatment on the entity defense resources, and adding the entity defense resources into a resource pool;
s2: according to an application scene, defining the equipment state in the network, establishing strategy spaces of both attacking parties and defending parties and a profit space based on an attacking and defending strategy generating model, and generating an optimal defense resource scheduling strategy through Nash balance;
s3: and performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
2. The method for fusing dynamic resource pool scheduling and response of claim 1, wherein the S1 further comprises:
s11: acquiring the characteristics of entity defense resources;
s12: judging the entity defense resource type, and if the entity defense resource type is a software program, performing containerization processing; if the program is a non-software program, virtualizing the material resources through the virtual machine;
s13: and adding the entity defense resources processed through light weight into the resource pool.
3. The method of claim 2, wherein the S2 comprises:
s21: according to the application scene, defining the equipment state in the network, including a defense state, an attack state, a normal state, an attack state and a paralysis state;
s22: establishing an attack and defense strategy generation model according to the conversion relation between the equipment states, and defining as
Figure DEST_PATH_IMAGE001
Wherein
Figure 812206DEST_PATH_IMAGE002
On behalf of the attacker, the system may,
Figure DEST_PATH_IMAGE003
the person to be defended is represented by a person,
Figure 690163DEST_PATH_IMAGE004
representing the selection of attack strategies of different strengths,
Figure DEST_PATH_IMAGE005
on behalf of the nth type of attack strategy,
Figure 590380DEST_PATH_IMAGE006
the representatives choose a defense strategy of different strengths,
Figure 692328DEST_PATH_IMAGE008
represents the Nth preventionThe policy of defense is that,
Figure DEST_PATH_IMAGE009
representing the probability of selecting different strength attack strategies,
Figure DEST_PATH_IMAGE011
representing the probability of selecting the nth attack strategy,
Figure 714379DEST_PATH_IMAGE012
representing the probability of selecting a different defense strategy,
Figure DEST_PATH_IMAGE013
representing the probability of selecting the nth defense strategy,
Figure 864869DEST_PATH_IMAGE014
a revenue function representing the attacker's role,
Figure DEST_PATH_IMAGE015
a revenue function representing the defender,
Figure 365731DEST_PATH_IMAGE016
on behalf of the i-th attack strategy,
Figure DEST_PATH_IMAGE017
represents a jth defense strategy;
s23: and calculating an optimal defense strategy according to Nash equilibrium and a defense strategy generation model and taking the minimized system loss as a target.
4. The method of claim 3, wherein the S3 comprises:
s31: setting a plurality of defense strategies, counting the defense states of the safety defense strategies after the network threat, and constructing an initial defense state matrix according to whether the defense states meet the defense requirements or not;
s32: setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources with excessive defense;
s33: and setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
5. A system for fusing dynamic resource pool scheduling and response, comprising:
the dynamic resource pool building module is used for carrying out light weight treatment on the entity defense resources according to the characteristics of the entity defense resources and adding the entity defense resources into the resource pool;
the defense strategy generation module is used for defining the equipment state in the network according to an application scene, establishing strategy spaces and income spaces of both attacking parties and defending parties based on an attacking and defending strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
and the defense resource self-adaptive module is used for carrying out iterative optimization on the optimal defense strategy by utilizing a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
6. The system of claim 5, wherein the dynamic resource pool building module comprises:
the defense resource characteristic acquisition submodule is used for acquiring the defense resource characteristics of the entity;
the defense resource lightweight submodule is used for judging the entity defense resource type, and if the entity defense resource type is a software program, containerization processing is carried out; if the program is a non-software program, virtualizing the material resources through the virtual machine;
and the defense resource adding submodule is used for adding the entity defense resources processed by light weight into the resource pool.
7. The system of claim 6, wherein the defense policy generation module comprises:
the device state definition submodule is used for defining the device states in the network according to the application scene, and the device states comprise a defense state, an attack state, a normal state, an attack state and a paralysis state;
the attack and defense strategy generation model establishing submodule is used for establishing an attack and defense strategy generation model according to the conversion relation between the equipment states and defining the attack and defense strategy generation model as
Figure 576264DEST_PATH_IMAGE001
Wherein
Figure 210245DEST_PATH_IMAGE002
On behalf of the attacker, the system will,
Figure 492322DEST_PATH_IMAGE003
the person to be defended is represented by a person,
Figure 662404DEST_PATH_IMAGE004
representing the selection of attack strategies of different strengths,
Figure 607619DEST_PATH_IMAGE005
on behalf of the nth attack strategy,
Figure 761520DEST_PATH_IMAGE006
the representative of choosing a defense strategy of different strengths,
Figure 316129DEST_PATH_IMAGE008
on behalf of the nth defense strategy, it is,
Figure 606296DEST_PATH_IMAGE009
representing the probability of selecting different strength attack strategies,
Figure 453904DEST_PATH_IMAGE011
representing an overview of the choice of the Nth attack strategyThe ratio of the total weight of the particles,
Figure 829522DEST_PATH_IMAGE012
representing the probability of selecting a different defense strategy,
Figure 453401DEST_PATH_IMAGE013
representing the probability of selecting the nth defense strategy,
Figure 332496DEST_PATH_IMAGE014
a revenue function representing the attacker's role,
Figure 885093DEST_PATH_IMAGE015
a revenue function representing the defender,
Figure 482428DEST_PATH_IMAGE016
on behalf of the i-th attack strategy,
Figure 644419DEST_PATH_IMAGE017
represents a jth defense strategy;
and the optimal defense strategy generation submodule is used for calculating an optimal defense strategy according to Nash balance and a defense strategy generation model and with the aim of minimizing system loss.
8. The system of claim 7, wherein the defending resource adaptation module further comprises:
the initial defense state matrix construction submodule is used for setting a plurality of defense strategies, counting the defense states of all the security defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet defense requirements or not;
the action set setting submodule is used for setting an action set, supplementing or replacing defense resources for a strategy with insufficient defense, and reducing the defense resources for resources with excessive defense;
and the reward function setting submodule is used for setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN202211537098.5A 2022-12-02 2022-12-02 Method and system for fusing scheduling and response of dynamic resource pool Active CN115550078B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211537098.5A CN115550078B (en) 2022-12-02 2022-12-02 Method and system for fusing scheduling and response of dynamic resource pool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211537098.5A CN115550078B (en) 2022-12-02 2022-12-02 Method and system for fusing scheduling and response of dynamic resource pool

Publications (2)

Publication Number Publication Date
CN115550078A true CN115550078A (en) 2022-12-30
CN115550078B CN115550078B (en) 2023-04-07

Family

ID=84721897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211537098.5A Active CN115550078B (en) 2022-12-02 2022-12-02 Method and system for fusing scheduling and response of dynamic resource pool

Country Status (1)

Country Link
CN (1) CN115550078B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116389075A (en) * 2023-03-08 2023-07-04 安芯网盾(北京)科技有限公司 Dynamic interception method and device for attack behaviors of host
CN116827685A (en) * 2023-08-28 2023-09-29 成都乐超人科技有限公司 Dynamic defense strategy method of micro-service system based on deep reinforcement learning
CN117593095A (en) * 2024-01-17 2024-02-23 苏州元脑智能科技有限公司 Method, device, computer equipment and storage medium for self-adaptive parameter adjustment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199623A1 (en) * 2012-05-11 2015-07-16 Saab Ab Method for resource allocation in mission planning
CN109327427A (en) * 2018-05-16 2019-02-12 中国人民解放军战略支援部队信息工程大学 A kind of dynamic network variation decision-making technique and its system in face of unknown threat
CN109639729A (en) * 2019-01-16 2019-04-16 北京科技大学 A kind of dynamic game method and device of internet of things oriented intimidation defense resource allocation
CN110602047A (en) * 2019-08-14 2019-12-20 中国人民解放军战略支援部队信息工程大学 Multi-step attack dynamic defense decision selection method and system for network attack and defense
CN112819300A (en) * 2021-01-21 2021-05-18 南京邮电大学 Power distribution network risk assessment method based on random game network under network attack
CN115208618A (en) * 2022-05-24 2022-10-18 华北电力大学 Novel power system APT attack active defense strategy based on multi-level attack and defense game
CN115348073A (en) * 2022-08-11 2022-11-15 浙江大学 CPPS defense strategy decision method under DDoS attack based on game theory

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199623A1 (en) * 2012-05-11 2015-07-16 Saab Ab Method for resource allocation in mission planning
CN109327427A (en) * 2018-05-16 2019-02-12 中国人民解放军战略支援部队信息工程大学 A kind of dynamic network variation decision-making technique and its system in face of unknown threat
CN109639729A (en) * 2019-01-16 2019-04-16 北京科技大学 A kind of dynamic game method and device of internet of things oriented intimidation defense resource allocation
CN110602047A (en) * 2019-08-14 2019-12-20 中国人民解放军战略支援部队信息工程大学 Multi-step attack dynamic defense decision selection method and system for network attack and defense
CN112819300A (en) * 2021-01-21 2021-05-18 南京邮电大学 Power distribution network risk assessment method based on random game network under network attack
CN115208618A (en) * 2022-05-24 2022-10-18 华北电力大学 Novel power system APT attack active defense strategy based on multi-level attack and defense game
CN115348073A (en) * 2022-08-11 2022-11-15 浙江大学 CPPS defense strategy decision method under DDoS attack based on game theory

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116389075A (en) * 2023-03-08 2023-07-04 安芯网盾(北京)科技有限公司 Dynamic interception method and device for attack behaviors of host
CN116389075B (en) * 2023-03-08 2023-10-20 安芯网盾(北京)科技有限公司 Dynamic interception method and device for attack behaviors of host
CN116827685A (en) * 2023-08-28 2023-09-29 成都乐超人科技有限公司 Dynamic defense strategy method of micro-service system based on deep reinforcement learning
CN116827685B (en) * 2023-08-28 2023-11-14 成都乐超人科技有限公司 Dynamic defense strategy method of micro-service system based on deep reinforcement learning
CN117593095A (en) * 2024-01-17 2024-02-23 苏州元脑智能科技有限公司 Method, device, computer equipment and storage medium for self-adaptive parameter adjustment
CN117593095B (en) * 2024-01-17 2024-03-22 苏州元脑智能科技有限公司 Method, device, computer equipment and storage medium for self-adaptive parameter adjustment

Also Published As

Publication number Publication date
CN115550078B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115550078B (en) Method and system for fusing scheduling and response of dynamic resource pool
Somani et al. DDoS attacks in cloud computing: Issues, taxonomy, and future directions
Kadhim et al. A review study on cloud computing issues
Shan et al. A game-theoretic approach to modeling attacks and defenses of smart grids at three levels
Guha Roy et al. A blockchain‐based cyber attack detection scheme for decentralized Internet of Things using software‐defined network
Gai et al. Secure cyber incident analytics framework using Monte Carlo simulations for financial cybersecurity insurance in cloud computing
Wang et al. A polymorphic heterogeneous security architecture for edge-enabled smart grids
Enayaty-Ahangar et al. A survey of optimization models and methods for cyberinfrastructure security
Jakóbik et al. Stackelberg games for modeling defense scenarios against cloud security threats
Xu et al. A QoS-driven approach to the cloud service addressing attributes of security
Alavizadeh et al. Evaluating the security and economic effects of moving target defense techniques on the cloud
Lakhno et al. Development of a support system for managing the cyber security of information and communication environment of transport
Khan et al. Resource Allocation in Networking and Computing Systems: A Security and Dependability Perspective
Yadav et al. SmartPatch: A patch prioritization framework
Shehu et al. Cyber Kill Chain Analysis Using Artificial Intelligence
Salinas et al. An integral cybersecurity approach using a many-objective optimization strategy
Gill et al. A systematic review on game-theoretic models and different types of security requirements in cloud environment: challenges and opportunities
Kim et al. Ontology modeling for APT attack detection in an IoT-based power system
Narwal et al. A review of game-theoretic approaches for secure virtual machine resource allocation in cloud
Lakhdhar et al. Proactive security for safety and sustainability of mission critical systems
Zhou et al. Intrusion detection system for IoT heterogeneous perceptual network based on game theory
Bellini et al. Cyber-resilience
Farooq et al. A Comprehensive Security Review on Cloud Computing
Mtsweni et al. Building an integrated cyber defence capability for African missions
Thabet et al. Approximate co-location-resistant VM placement strategy with low energy consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Method and System for Integrating Dynamic Resource Pool Scheduling and Response

Effective date of registration: 20230811

Granted publication date: 20230407

Pledgee: Zhongguancun Branch of Bank of Beijing Co.,Ltd.

Pledgor: Beijing Guoxin Blue Shield Technology Co.,Ltd.

Registration number: Y2023110000330

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20230407

Pledgee: Zhongguancun Branch of Bank of Beijing Co.,Ltd.

Pledgor: Beijing Guoxin Blue Shield Technology Co.,Ltd.

Registration number: Y2023110000330

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and system for integrating dynamic resource pool scheduling and response

Granted publication date: 20230407

Pledgee: Zhongguancun Branch of Bank of Beijing Co.,Ltd.

Pledgor: Beijing Guoxin Blue Shield Technology Co.,Ltd.

Registration number: Y2024110000297

PE01 Entry into force of the registration of the contract for pledge of patent right