Disclosure of Invention
The research of the patent aims at a scheduling and responding method of a fusion dynamic resource pool, container virtualization is carried out on defense resource equipment of an entity, and limitation existing in entity resources is solved. And meanwhile, a dynamic resource defense pool is constructed, the states of the devices in the network are defined according to different application scenes, and a differential equation is constructed to digitize the states of the network by combining the thought of a game theory. On the basis of completing initialization, game theory equations are solved by utilizing Nash equilibrium. Meanwhile, aiming at the situation that the network environment and the attack and defense state change continuously, a Q-Learning algorithm in reinforcement Learning is selected to explore a security defense strategy adaptive matching strategy, so that the optimization processing of security defense resources is achieved, and the goal of defense profit maximization is achieved.
The invention mainly comprises the following steps: the system comprises a dynamic resource pool construction and corresponding defense strategy scheduling optimization module, wherein the specific module comprises a dynamic construction module of a security defense resource pool, a defense resource scheduling strategy generation module and a defense strategy self-adaptive matching optimization module. Firstly, a dynamic construction method of a security defense resource pool is provided, which comprises the steps of virtualization of entity resources and containerization encapsulation of corresponding software. On the basis of dynamic construction of a security defense resource pool, an attack and defense environment space is constructed, the idea of a game theory is adopted, the profit of defense resources is maximized through solving, the effect optimization under the condition of utilizing limited resources is achieved, and meanwhile, the self-adaptive generation of defense strategies is realized on the basis of the existing strategies by utilizing a Q-learning method. The method comprises the following specific steps:
establishing a security defense resource pool, performing personalized resource virtualization operation according to the characteristics of different defense resources, and adding the corresponding defense resources into the resource pool for generating a subsequent strategy.
The method comprises the steps of establishing a defense strategy generation method based on a game theory, establishing a corresponding equipment state according to an actual application scene, establishing strategy spaces of an attacking party and a defense party and a profit space, and obtaining an optimal defense strategy arrangement condition under the current condition by utilizing a solution method of Nash equilibrium.
A defense resource self-adaptive module is established, and due to the fact that the attacking and defending environment has the characteristic of continuous change, on the basis of the research, a set of safety defense strategy self-adaptive matching strategies are explored through the reinforcement Learning Q-Learning algorithm, and therefore the effect maximization of the safety defense resources is achieved. The target of the optimal security defense profit is used as a feedback model to realize the adaptability of strategy selection through iteration Q value, the strategy selection is usually carried out according to the direction of the maximum Q value, and the Q table is updated through continuous iteration modes of 'security strategy-defense state-security strategy' in the selection process so as to maximize the Q value and realize algorithm convergence.
The specific technical scheme of the invention is as follows:
in a first aspect, the present invention provides a method for fusing scheduling and response of a dynamic resource pool, including the following steps:
s1: according to the characteristics of the entity defense resources, carrying out lightweight treatment on the entity defense resources, and adding the entity defense resources into a resource pool;
s2: according to an application scene, defining the state of equipment in a network, establishing strategy spaces of an attack and defense party and a profit space based on an attack and defense strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
s3: and performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, the S1 further comprises:
s11: acquiring the characteristics of entity defense resources;
s12: judging the entity defense resource type, and if the entity defense resource type is a software program, performing containerization processing; if the program is a non-software program, virtualizing the material resources through the virtual machine;
s13: and adding the entity defense resources processed through light weight into the resource pool.
In some embodiments, the S2 comprises:
s21: according to an application scene, defining the states of equipment in the network, including a defense state, an attack state, a normal state, an attack state and a paralysis state;
s22: according to the conversion relation between the equipment states, an attack and defense strategy generation model is established and defined as
;
Wherein
On behalf of the attacker, the system will,
the person to be defended is represented by a person,
representing the selection of attack strategies of different strengths,
on behalf of the nth attack strategy,
the representative of choosing a defense strategy of different strengths,
on behalf of the nth defense strategy, it is,
representing the probability of selecting different strength attack strategies,
representing the probability of selecting an nth attack strategy,
representing the probability of selecting a different defense strategy,
representing the probability of selecting the nth defense strategy,
a revenue function representing the party to the attacker,
a revenue function representing the defender of the defender,
on behalf of the i-th attack strategy,
represents the firstj kinds of defense strategies;
s23: and calculating an optimal defense strategy according to Nash equilibrium and a defense strategy generation model and taking the minimized system loss as a target.
In some embodiments, the S3 comprises:
s31: setting a plurality of defense strategies, counting the defense states of the safety defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet the defense requirements or not;
s32: setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources with excessive defense;
s33: and setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
In a second aspect, the present invention provides a system for merging dynamic resource pool scheduling and response, including:
the dynamic resource pool building module is used for carrying out light weight treatment on the entity defense resources according to the characteristics of the entity defense resources and adding the entity defense resources into the resource pool;
the defense strategy generation module is used for defining the equipment state in the network according to an application scene, establishing strategy spaces and income spaces of both attacking parties and defending parties based on an attacking and defending strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
and the defense resource self-adaptive module is used for carrying out iterative optimization on the optimal defense strategy by utilizing a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, the dynamic resource pool building module comprises:
the defense resource characteristic acquisition submodule is used for acquiring the defense resource characteristics of the entity;
the defense resource lightweight submodule is used for judging the entity defense resource type, and if the entity defense resource type is a software program, containerization processing is carried out; if the program is a non-software program, virtualizing the material resources through the virtual machine;
and the defense resource adding submodule is used for adding the entity defense resources subjected to the lightweight processing into the resource pool.
In some embodiments, the defense policy generation module includes:
the device state definition submodule is used for defining the device states in the network according to the application scene, wherein the device states comprise a defense state, an attack state, a normal state, an attack state and a paralysis state;
the attack and defense strategy generation model establishing submodule is used for establishing an attack and defense strategy generation model according to the conversion relation between the equipment states and is defined as
;
Wherein
On behalf of the attacker, the system will,
the person who stands for the defense can take the defense,
the representative chooses the attack strategy of different strengths,
on behalf of the nth type of attack strategy,
the representatives choose a defense strategy of different strengths,
on behalf of the nth defense strategy, it is,
representing the probability of selecting different strength attack strategies,
representing the probability of selecting an nth attack strategy,
representing the probability of selecting a different defense strategy,
representing the probability of selecting the nth defense strategy,
a revenue function representing the party to the attacker,
a revenue function representing the defender of the defender,
on behalf of the i-th attack strategy,
represents a jth defense strategy;
and the optimal defense strategy generation submodule is used for calculating the optimal defense strategy according to Nash equilibrium and a defense strategy generation model and with the aim of minimizing system loss.
In some embodiments, the defensive resource adaptation module further comprises:
the initial defense state matrix construction submodule is used for setting a plurality of defense strategies, counting the defense states of all the security defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet defense requirements or not;
the action set setting submodule is used for setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources of excessive defense;
and the reward function setting submodule is used for setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
In a third aspect, the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the above when executing the computer program.
In a fourth aspect, the invention provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the method of any one of the above.
The beneficial effect of this application is:
according to the method and the system for integrating scheduling and responding of the dynamic resource pool, the construction of an integrating module for scheduling and responding of the dynamic resource pool is realized through the technical means. Aiming at the problem that entity resources cannot be fully utilized due to the fact that strategy generation is carried out on virtual resource analysis directly, resource virtualization research is carried out, the entity resources are virtualized on the basis of analyzing the equipment characteristics and safety requirements of existing resources, a dynamic resource pool is constructed, and a foundation is laid for research of defense strategies. Furthermore, aiming at the problem that the existing state division is roughly lack of comprehensive analysis, a more completely covered state space is established, convenience is provided for the optimal selection of strategies by adopting the game theory idea, the complexity of analysis is reduced, the strategy generation speed is increased, and better coping strategies are provided in the face of different attacks in the OWASP TOP 10. On the basis of the existing strategy generation, the continuous change of the attack and defense environment and the occurrence of unknown attacks are considered, the self-adaptive adjustment and optimization of the defense strategy is realized by adopting a reinforcement Learning Q-Learning algorithm, and the possibility of the occurrence of defense holes is reduced.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
In order that the above objects, features and advantages of the present application can be more clearly understood, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. The specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the application. All other embodiments that can be derived by one of ordinary skill in the art from the description of the embodiments are intended to be within the scope of the present disclosure.
It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Fig. 1 is a schematic overall flow chart of the present application. The research of the patent aims at a scheduling and responding method of a fusion dynamic resource pool, and the defense resource equipment of the entity is subjected to container virtualization, so that the limitation of the entity resource is solved. And meanwhile, a dynamic resource defense pool is constructed, the states of the devices in the network are defined according to different application scenes, and a differential equation is constructed to digitize the states of the network by combining the thought of a game theory. On the basis of completing initialization, game theory equations are solved by utilizing Nash equilibrium. Meanwhile, aiming at the situation that the network environment and the attack and defense state change continuously, a Q-Learning algorithm in reinforcement Learning is selected to explore a security defense strategy adaptive matching strategy, so that the optimization processing of security defense resources is achieved, and the goal of defense profit maximization is achieved.
The invention mainly comprises the following steps: the system comprises a dynamic resource pool construction and corresponding defense strategy scheduling optimization module, wherein the specific module comprises a dynamic construction module of a security defense resource pool, a defense resource scheduling strategy generation module and a defense strategy self-adaptive matching optimization module. Firstly, a dynamic construction method of a security defense resource pool is provided, which comprises the steps of virtualization of entity resources and containerization packaging of corresponding software. On the basis of dynamic construction of a security defense resource pool, an attack and defense environment space is constructed, the game theory idea is adopted, the benefit of defense resources is maximized through solving, the effect optimization under the condition of utilizing limited resources is achieved, and meanwhile, the self-adaptive generation of defense strategies is achieved on the basis of the existing strategies through the Q-learning method. The method comprises the following specific steps:
establishing a security defense resource pool, performing personalized resource virtualization operation according to the characteristics of different defense resources, and adding the corresponding defense resources into the resource pool for generating a subsequent strategy.
Establishing a defense strategy generation method based on a game theory, establishing a corresponding equipment state according to an actual application scene, simultaneously establishing strategy spaces of an attacking party and a defense party and a profit space, and obtaining an optimal defense strategy arrangement condition under the current condition by utilizing a solution method of Nash equilibrium.
A defense resource self-adaptive module is established, and due to the fact that the attacking and defending environment has the characteristic of continuous change, on the basis of the research, a set of safety defense strategy self-adaptive matching strategies are explored through the reinforcement Learning Q-Learning algorithm, and therefore the effect maximization of the safety defense resources is achieved. The target of the optimal security defense profit is used as a feedback model to realize the adaptability of strategy selection through iteration Q value, the strategy selection is usually carried out according to the direction of the maximum Q value, and the Q table is updated through continuous iteration modes of 'security strategy-defense state-security strategy' in the selection process so as to maximize the Q value and realize algorithm convergence.
A method for merging scheduling and response of dynamic resource pool, which is combined with fig. 1, and includes the following steps:
s1: according to the characteristics of the entity defense resources, carrying out lightweight treatment on the entity defense resources, and adding the entity defense resources into a resource pool;
in some embodiments, in combination with fig. 2, which is a sub-flowchart of step S1 of the present solution, said S1 further includes:
s11: acquiring the characteristics of entity defense resources;
s12: judging the entity defense resource type, and if the entity defense resource type is a software program, performing containerization processing; if the program is a non-software program, virtualizing the material resources through the virtual machine;
s13: and adding the entity defense resources processed through light weight into the resource pool.
Specifically, the core task of building a defensive resource pool is virtualization, which can be defined as a technique to create logical services through resources running on hardware. I.e., virtualization distributes the ability of a network, storage, server, or application, etc. resource among multiple users or environments. In particular, operating system level virtualization techniques provide an isolated computing environment, referred to as a container, within a common operating system. A container is a set of one or more processes isolated from the rest of the system. It includes an application and all its library dependencies and configuration files. Furthermore, the container has a reproducible execution environment, light lifecycle management, and is closer to metal performance than traditional virtual machine deployments. The virtualization can select different modes to execute light processing, the light processing mainly comprises two modes of a virtual machine and a container, and different virtualization modes can be adopted according to different safety state requirements and environment configuration requirements of equipment. The traditional safety function can be separated from the special safety hardware through virtualization and run on a general server.
Wherein container technology is primarily directed to virtualization of software applications. The container avoids the problem of adaptability by combining the software code with the required environment, thus enabling users and developers to build, deploy and maintain applications flexibly independent of the underlying infrastructure through the containerization process. Meanwhile, the isolated function is utilized to reduce the consumption of resources and make the deployment more flexible. In other virtualization aspects, the virtual machine has better superiority and applicability. The virtual machine has a research foundation for many years, and physical resources such as a CPU (central processing unit), hardware and the like of a physical computer can be virtualized through the virtual machine, so that a user can conveniently perform flexible scheduling operation, and the virtual machine also has higher safety. By aiming at the self characteristics and the security levels of different devices, different light weight technologies are selected, wherein the light weight technologies comprise virtual machines and containerization virtualization technologies, and by utilizing the technologies, the traditional security function can be separated from special security hardware and run on a universal server, so that a dynamic resource pool is constructed, and necessary execution conditions are provided for the following scheduling strategies.
In the execution process of the container, the container orchestrator is responsible for large-scale management and organization of the micro-service architecture, handling automation and lifecycle management of the containers and services in the cluster. The container organizer is made up of five modules. The scheduling module is responsible for determining the best position for completing the incoming task, and the resource allocation module reserves cluster resources according to a request-based method, wherein a static method and a dynamic method exist. The load balancing module is responsible for distributing tasks across container instances based on fairness, cost energy, or priority criteria. The admission control module is responsible for checking whether there are enough resources in the cluster to run the user's job and never exceeds the quota allocated to it. And, the billing module monitors the available resources of the user, while the monitoring module tracks the real-time resource consumption metrics of each node and collects metrics related to the health of the resources to support a fault tolerant system. The scheduling module is a core module of the container orchestrator and is also a core step for constructing a dynamic resource pool.
Meanwhile, it should be noted that the dynamic resource pool includes various device resources in the network, and the resources are entity resources virtualized by virtual machine and container operations, including network devices in different states mentioned in the attack and defense policy model established in S2, such as devices in a defense state, an attack state, a normal state, an intermediate state which is attacked but is not affected temporarily, and a paralysis state. And in the S2, the definition of the related state space of the network equipment is based on the resource definition after the virtualization in the S1, wherein a corresponding mapping relation exists. The optimization in S3 is also based on the policy space defined in S2, so that the virtual resources in the policy have a one-to-one correspondence with the resources in the dynamic resource pool, and after the corresponding resource arrangement policy is generated in S3, the policy is handed to the control center in the dynamic resource pool to perform the corresponding operation.
S2: according to an application scene, defining the equipment state in the network, establishing strategy spaces of both attacking parties and defending parties and a profit space based on an attacking and defending strategy generating model, and generating an optimal defense resource scheduling strategy through Nash balance;
in some embodiments, in combination with the sub-flowchart of step S2 of fig. 3, the present solution, said S2 includes:
s21: according to an application scene, defining the states of equipment in the network, including a defense state, an attack state, a normal state, an attack state and a paralysis state; s22: establishing an attack and defense strategy generation model according to the conversion relation between the equipment states, and defining as
;
Wherein
On behalf of the attacker, the system may,
the person to be defended is represented by a person,
representing the selection of attack strategies of different strengths,
on behalf of the nth attack strategy,
the representative of choosing a defense strategy of different strengths,
on behalf of the nth kind of defense strategy,
representing the probability of selecting different strength attack strategies,
representing the probability of selecting the nth attack strategy,
representing the probability of selecting a different defense strategy,
representing the probability of selecting the nth defense strategy,
a revenue function representing the attacker's role,
a revenue function representing the defender,
on behalf of the i-th attack strategy,
represents a jth defense strategy;
s23: and calculating an optimal defense strategy according to Nash equilibrium and a defense strategy generation model and taking the minimized system loss as a target.
Specifically, in order to obtain an optimal defense strategy, the scheme carries out modeling description on the states of the attacking and defending parties in the current network, adopts the idea of dynamic game theory, creates corresponding differential equations for different states of the attacking and defending parties in the network, and simultaneously utilizes Nash equilibrium to complete the solution of the optimal defense strategy.
Firstly, because of the existence of complex and various defense resources in the resource pool, a reasonable defense resource scheduling strategy is required to carry out overall planning on the resources therein. The dynamic game theory is a typical method for solving the problem of attack and defense game. In step S21, an Attack and Defense Strategy Generation Model (ADSGM) is defined, which includes a plurality of device states, for example, D represents a defense state, that is, the device related to the internet of things in the defense state is controlled by a defender to generate an advantageous effect on defense; a represents an attack state, namely the related equipment of the Internet of things in the attack state is controlled by an attacker and can bring benefits to the attacker; n represents a normal state, namely normal work in the Internet of things, and equipment which has certain loopholes and is possibly attacked exists; i represents equipment which is under attack but has no influence on the temporary service function, and the equipment is paralyzed and cannot provide service under the strong attack condition, and can recover to a normal state and continue to provide corresponding service under the successful defense condition; m stands for the paralyzed state, and after the equipment is attacked, the equipment can be shut down under the condition that the equipment cannot be processed, so that the influence on other equipment is reduced, and meanwhile, after a certain time of waiting, the equipment can be restarted to start to continuously provide services. For example, a plurality of resources exist in the defense resource pool, when the resources are overlapped, resource redundancy is generated, and under the condition that partial resources are paralyzed due to strong network attack, the resource redundancy can be utilized to quickly switch to other similar resources, so that the task is ensured to be carried out, and the stability of the system is enhanced. The specific switching mode is to generate a corresponding strategy after acquiring the corresponding network state through reinforcement learning, and then transmit the strategy to a control center of the resource pool to perform corresponding operation.
Further, in step S22, the attack and defense strategy is modeled and defined as
(ii) a Wherein
On behalf of the attacker, the system will,
the person to be defended is represented by a person,
representative selection of different intensitiesThe attack strategy of (2) is that,
on behalf of the nth attack strategy,
the representative of choosing a defense strategy of different strengths,
on behalf of the nth defense strategy, it is,
representing the probability of selecting different strength attack strategies,
representing the probability of selecting an nth attack strategy,
representing the probability of selecting a different defense strategy,
representing the probability of selecting the nth defense strategy,
a revenue function representing the attacker's role,
a revenue function representing the defender,
on behalf of the i-th attack strategy,
represents a j-th defense strategy in which the attacker's goal is to maximize the loss function of the system and the defender's goal is to minimize the system loss, thus achieving a zero-sum game.
Further, in step S23, the coefficient K1 is defined as the loss caused by the attack when the device changes from the normal state to the affected state I. The coefficient K2 is defined as the loss reduced by the defence system when the device transitions from the affected state I to the normal state N. When the equipment is converted into the M fault state, the coefficient K3 is loss caused by equipment closing, and when the equipment is recovered from the fault state to the normal state, the coefficient K4 is benefit brought by a defense strategy. The attack return function at this time is:
the defense return function is:
meanwhile, the cost functions of the attacker and defender are respectively defined as follows:
the cost functions of the attacker and defender and the system are as follows:
wherein ca and cd represent an attack cost coefficient and a defense cost coefficient, respectively;
、
、
、
respectively representing the possibility of converting from the normal state N to the attacked but unaffected state I, the possibility of converting from the attacked but unaffected state I to the paralyzed state M, the possibility of converting from the attacked but unaffected state I to the normal state N, and the possibility of converting from the paralyzed state M to the normal state N; d (t), A (t), N (t), M (t) and I (t) respectively represent the number of nodes in a defense state D, an attack state A, a normal state N, a paralyzed state M and an attacked but unaffected state I in the system at time t;
on behalf of the i-th attack strategy,
representing the probability of selecting the ith attack strategy,
the strategy of the seed defense is that the seed defense is,
representing the probability of selecting the i-th defense strategy.
But as the game theory research proves that each limited strategic game has a mixed strategy Nash equilibrium. There is a mixed attack defense strategy considering attackers and defenders. The above defines the strategy and probability of attack and defense, and can obtain the utility of the attacker and defender:
the related research proves that the mixing strategy has probability distribution
The condition (b) is nash equilibrium, i.e. there is:
wherein,
a probability distribution for satisfying the above-mentioned attacker and defender utility equations; nash equilibrium specifically means that the probability of attack to any one of different strengths
When the probability distribution satisfies
The time aggressor utilities are all equal to or greater than the probability distribution
Temporal attacker utility; probability of defence against any one of different strengths
When the probability distribution satisfies
The defender effectiveness of the time is greater than or equal to the probability distribution
The effectiveness of defending people in time.
In the process of solving the Nash equilibrium, a minimax method is adopted to solve in consideration of the process of the zero sum game. Computing the optimal hybrid strategy is then equivalent to computing the maximum minimum strategy, which minimizes the maximum expected utility that an adversary can obtain.
Where k represents the expected maximum benefit of the attacker, m represents different attack strategies, and n represents different defense strategies. At this time, the defender aims to minimize the k value, and then, the problem can be converted into a linear programming problem to be solved.
S3: and performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, in combination with the sub-flowchart of step S3 of fig. 4, that is, the present solution, said S3 includes:
s31: setting a plurality of defense strategies, counting the defense states of the safety defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet the defense requirements or not;
s32: setting an action set, supplementing or replacing defense resources for the strategy of insufficient defense, and reducing the defense resources for the resources of excessive defense;
s33: and setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
Specifically, in order to complete adaptive research of defense strategies on the basis of solving optimal strategies, comprehensive analysis of the incoming known and unknown network threats is required, and data feedback capable of being analyzed and quantified is formed. For known and unknown network threats, the method can be mainly divided into 7 types of attack behaviors, namely 7 steps of a network killing chain, namely reconnaissance, weaponization, delivery, vulnerability exploitation, installation, command and control and target action. Where reconnaissance refers to an attacker collecting data about targets and attack strategies. Including collecting email addresses and collecting other information. Intruders use automated scanners to find vulnerability points in the system. Scanning a firewall, an intrusion prevention system and the like to obtain an entrance point of an attack; weaponization refers to an attacker exploiting security vulnerabilities to develop malware. Attackers design malware based on their needs and intent. This process also involves attackers attempting to reduce the chance of detection by an organization's existing security solutions; delivery refers to an attacker spreading weaponized malware through phishing emails or some other medium, with the most common delivery media for weaponized payloads including websites, removable disks, and emails; exploit refers to malicious code being delivered into an organization's system. The boundary is broken here. Attackers have the opportunity to take advantage of an organization's system by installing tools, running scripts, and modifying security credentials; the installation means that the malicious software installs a backdoor or remotely accesses a Trojan horse to provide access authority to an intruder; command and control refers to the ability of an attacker to control an organization's systems and networks. An attacker gains access to a privileged account and attempts a brute force attack, searches credentials, and changes permissions to take over control; the target action is that the attacker finally extracts data from the system. The goal relates to collecting, encrypting, and extracting confidential information from an organization's environment.
For the 7 types of attack behaviors, targeted defense strategies can be respectively corresponding, for a reconnaissance tracking stage, the stage is often a stage with less perception of a defender, and in the stage, attention should be paid to abnormal flow, logs and data (particularly leaked data), the abnormal flow, the logs and the data are stored for future reference, an analysis model is established and optimized, and corresponding state space conditions are updated in time; for a weapon construction stage, the behavior of the stage is almost imperceptible to a defender and is closely related to 'resources' of the defender, and the weapon construction needs vulnerabilities or defects (including operating systems, application software and social engineering) based on the 'resources', so that whether vulnerabilities, patches and repair processes related to 'assets' are complete or not is concerned in the stage; for the delivery stage, the delivery stage is particularly important, the corresponding defense strategy is updated in time, and the corresponding strategy is adjusted in time according to the action of an attacker to generate an optimal defense strategy; for the vulnerability utilization stage, safety detection, safety monitoring, blocking and audit are required, namely daily safety monitoring work, and the stage of utilizing a defense strategy to carry out defense is also required; for the installation implantation stage, the most important is to find and isolate in the shortest time, pay attention to the terminal/server security management strategy and anti-virus, block and close the equipment with problems in time and update the corresponding security strategy; a command and control stage, which is the last chance of the defender preventing the attack, and the defender can control the influence if the opponent can not send the command, and the stage is the last attempt of testing the response strategy, so the access control is paid more attention to, and the corresponding defense strategy is formed in time; for the target achievement stage, the target of the attacker is achieved in the stage, and the defender needs to reduce the influence of the attack on the whole system as much as possible at the moment, recover the paralyzed equipment in time, eliminate the danger and perfect the strategy at the same time.
On the basis, the defense strategy is self-adaptively researched by using a reinforcement learning Q-learning algorithm, so that the utility of the resource is optimized. The scheme selects and adopts the reinforcement Learning Q-Learning algorithm, the algorithm can realize the adaptivity of strategy selection by taking an iterative Q value as a feedback model, the selection of the strategy is usually carried out according to the direction of the maximum Q value, and the Q table is updated in the selection process through a continuous iterative mode of 'strategy generation-defense state-strategy generation' so as to maximize the Q value, realize the convergence of the algorithm and further find out the optimal defense strategy. The implementation of the Q-Learning algorithm requires setting of states, policies, and reward functions.
First, in step S31, an initial protection state matrix is set. In the defense strategy self-adaptive matching optimization stage, the real-time protection state of the system to the network threat is the key of security defense strategy optimization, and the defense states of all security defense strategies after the network threat are counted by formulating an initial security defense strategy. And if the defense state meets the defense requirement, the state value of the corresponding cyber-threat-security policy is marked as 1, if the defense state does not meet the defense requirement, the state value of the corresponding cyber-threat-security policy is marked as-1, and if no corresponding result exists, the state value is set as 0, so that an initial defense state matrix is constructed.
Then, in step S32, an operation set is set. The setting of the action set needs to be set according to the defense state matrix, the reconfiguration of the security policy is regarded as an action, the action comprises the increase and decrease of the virtual function set contained in the security policy, and all the virtual functions are to be selected in the resource pool. The strategy of insufficient defense is supplemented or replaced with resources, and the resources of excessive defense are simplified, so that the effect of safely defending the resources is achieved.
Finally, in step S33, a reward function is formulated. After an action is executed, a reward value is acquired by using a reward function, and a corresponding Q value in a Q table is updated, wherein the Q table is a mapping table between defense states, the action and the reward. The goal of this approach is to maximize the utility of the use of security defense resources, and from the analysis of the defense state matrix and the corresponding defense states, the system will perform actions to optimize the policy optimization of the next round of security resources. Aiming at the condition that the defense state matrix is 1, when the action execution is to reduce the security defense resource configuration or is not changed, in the next round of security threat, if the corresponding security policy can maintain the defense state matrix to be 1, the reward value of the correspondingly updated network threat-security policy is set to be 1, otherwise, the reward value is set to be-1; aiming at the situation that the defense state matrix is-1, when the action execution is to increase the security defense resource allocation, in the next round of security threat, if the corresponding security policy can maintain the defense state matrix as 1, the reward value of the network threat-security policy after corresponding update is set as 1, otherwise, the reward value is set as-1, in the process, the reward value is increased in an accumulated mode, and therefore the reward matrix of the network threat-security policy, namely a Q table, is finally obtained so as to obtain the optimal security defense policy. Meanwhile, the optimized defense strategy adjusts and optimizes the knowledge migration model of the strategy.
The second aspect of the present invention further provides a system for merging scheduling and response of dynamic resource pools, including:
the dynamic resource pool building module is used for carrying out light weight treatment on the entity defense resources according to the characteristics of the entity defense resources and adding the light weight treatment into the resource pool;
the defense strategy generation module is used for defining the equipment state in the network according to an application scene, establishing strategy spaces of an attack and defense party and a profit space based on an attack and defense strategy generation model, and generating an optimal defense resource scheduling strategy through Nash balance;
and the defense resource self-adaptive module is used for performing iterative optimization on the optimal defense strategy by using a reinforcement learning algorithm according to the state of equipment in the network after the network threat.
In some embodiments, the dynamic resource pool building module comprises:
the defense resource characteristic acquisition submodule is used for acquiring the defense resource characteristics of the entity;
the defense resource lightweight submodule is used for judging the entity defense resource type and carrying out containerization processing if the entity defense resource type is a software program; if the program is a non-software program, virtualizing the material resources through the virtual machine;
and the defense resource adding submodule is used for adding the entity defense resources processed by light weight into the resource pool.
In some embodiments, the defense policy generation module includes:
the device state definition submodule is used for defining the device states in the network according to the application scene, wherein the device states comprise a defense state, an attack state, a normal state, an attack state and a paralysis state;
the attack and defense strategy generation model establishing submodule is used for establishing an attack and defense strategy generation model according to the conversion relation between the equipment states and is defined as
;
Wherein
On behalf of the attacker, the system may,
the person who stands for the defense can take the defense,
the representative chooses the attack strategy of different strengths,
on behalf of the nth attack strategy,
the representative of choosing a defense strategy of different strengths,
on behalf of the nth defense strategy, it is,
representing the probability of selecting different strength attack strategies,
representing the probability of selecting an nth attack strategy,
representing the probability of selecting a different defense strategy,
representing the probability of selecting the nth defense strategy,
a revenue function representing the attacker's role,
a revenue function representing the defender,
on behalf of the i-th attack strategy,
representing the jth defense strategy.
And the optimal defense strategy generation submodule is used for calculating the optimal defense strategy according to Nash equilibrium and a defense strategy generation model and with the aim of minimizing system loss.
In some embodiments, the defensive resource adaptation module further comprises:
the initial defense state matrix construction submodule is used for setting a plurality of defense strategies, counting the defense states of all the security defense strategies after network threat, and constructing an initial defense state matrix according to whether the defense states meet defense requirements or not;
the action set setting submodule is used for setting an action set, supplementing or replacing defense resources for a strategy with insufficient defense, and reducing the defense resources for resources with excessive defense;
and the reward function setting submodule is used for setting a reward function, and increasing a reward value in the initial defense state matrix after the action is executed until the initial defense state matrix reaches the maximum value.
The third aspect of the invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as claimed in any one of the above when executing the computer program.
The fourth aspect of the invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any one of the above.
It will be understood by those skilled in the art that although some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments.
Those skilled in the art will appreciate that the description of each embodiment has a respective emphasis, and reference may be made to the related description of other embodiments for those parts of an embodiment that are not described in detail.
Although the embodiments of the present application have been described in conjunction with the accompanying drawings, those skilled in the art will be able to make various modifications and variations without departing from the spirit and scope of the application, and such modifications and variations are included in the specific embodiments of the present invention as defined in the appended claims, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of various equivalent modifications and substitutions within the technical scope of the present disclosure, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.