CN113810406B - Network space security defense method based on dynamic defense graph and reinforcement learning - Google Patents
Network space security defense method based on dynamic defense graph and reinforcement learning Download PDFInfo
- Publication number
- CN113810406B CN113810406B CN202111078688.1A CN202111078688A CN113810406B CN 113810406 B CN113810406 B CN 113810406B CN 202111078688 A CN202111078688 A CN 202111078688A CN 113810406 B CN113810406 B CN 113810406B
- Authority
- CN
- China
- Prior art keywords
- defense
- graph
- information
- reinforcement learning
- vulnerability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1491—Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The invention discloses a network security defense method based on a dynamic defense graph and reinforcement learning, which comprises the steps of scanning target network information through network vulnerability scanners such as Nmap, taking a structure information and vulnerability information group of a network topology structure in the scanned information as input information of the dynamic defense graph to generate the dynamic defense graph, further training all penetration paths in the whole attack graph by utilizing the depth reinforcement learning to obtain an optimal defense path, arranging a corresponding honeypot or intrusion detection system, finally dynamically updating the defense graph again according to the deployment information of the intrusion detection system and the honeypot, and obtaining the optimal defense path again by utilizing the depth reinforcement learning. The method can improve the efficiency and the accuracy of the network security defense and can save the defense cost of the network security defense.
Description
Technical Field
The invention belongs to the field of network security protection facing dynamic defense graphs and reinforcement learning, and particularly relates to a network security protection method based on deep reinforcement learning and dynamic construction of a network model.
Background
With the rapid development of computer technology, network attack technology is also rapidly developing, and various network attack machine events emerge endlessly. There are many kinds of sensitive information that inevitably attract various human attacks such as information disclosure, information theft, data tampering, data addition and deletion, computer virus, etc. from all over the world. In order to ensure the security of the cyberspace, the key is the analysis of the cyberspace topology, the vulnerabilities, and the determination of the optimal defense strategy of the network to prevent the attackers from exploiting these vulnerabilities for illegal penetration. Different from the traditional artificial network defense, the deep reinforcement learning and dynamic defense graph technology can derive the optimal defense path in advance and carry out defense dynamically.
The defense graph is a network security assessment technology based on a model. From the perspective of defenders, on the basis of comprehensively analyzing various network configurations and vulnerability information, all possible defense paths are found out, and a visualization method for representing an attack process scene is provided, so that network security management personnel can be helped to intuitively understand the relationship among all vulnerabilities in a target network, the relationship between the vulnerabilities and the network security configuration and potential threats generated by the vulnerabilities. The network security assessment technology based on the defense graph model is used for carrying out deep security assessment modeling and analysis on the basis of the defense graph.
Reinforcement learning is generally a continuous decision-making process, the basic idea of which is to learn the optimal strategy for learning purposes by maximizing the cumulative rewards that an agent receives from the environment. The deep reinforcement learning fully utilizes the neural network as a parameter structure, and optimizes a deep reinforcement learning strategy by combining the perception capability of the deep learning and the decision capability of the reinforcement learning, so that an intelligent agent can continuously learn by self from the environment where the intelligent agent is located along with the passage of time.
However, when traditionally using a defensive graph for network evaluation, static network data is considered, and static analysis can only determine a priori risks of network components. However, dynamic defense graphs can update these risks based on evidence that any network component may be compromised, such as from security information and time management and Intrusion Detection Systems (IDS). Dynamic analysis also allows analysis of the path of the attacker to determine nodes that are more likely to be attacked in the next step. This enables an administrator to assess the security risk of valuable resources in the network.
However, in either static or dynamic defense graphs, most do not take into account the abilities of the attacker, and therefore the likelihood that a particular attack is performed. Without these considerations, threats and their effects are easily misjudged and cause significant cost loss.
At present, the defense graph technology still has some problems, such as the defects of static network data input, tedious defense paths, incapability of dynamically judging network environment changes and the like, and the dual defense effects are achieved in order to realize dynamic detection of network environment and data and optimal path judgment of the defense paths
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a network security defense method based on a dynamic defense graph and deep reinforcement learning.
In order to achieve the purpose, the technical scheme of the invention is as follows: a network security defense method based on dynamic defense graphs and reinforcement learning specifically comprises the following steps:
(1) Scanning and detecting a host, a port and a vulnerability of a target network, and storing and classifying the obtained scanning information; defining a scanning information data set, and analyzing the connectivity relation between hosts;
(2) Respectively generating nodes and edges of a defense graph by using the scanning information data set acquired in the step (1) to construct the defense graph;
(3) Building a deep reinforcement learning simulation attacker environment, taking the nodes and edges of the defense graph built in the step (2) as the input of the deep reinforcement learning, obtaining the most easily penetrated path of the defense graph through the deep reinforcement learning, arranging a corresponding honeypot or intrusion detection system on the most easily penetrated path, and recording real-time information;
(4) And (4) repeatedly scanning and detecting the host, the port and the vulnerability of the target network according to the intrusion information recorded in real time in the step (3), constructing a dynamic defense graph, and iteratively updating the most easily penetrated path.
Further, the step (1) specifically includes the following sub-steps:
(1.1) scanning and detecting a host, a port and a vulnerability of a target network, acquiring vulnerability information and host configuration information of the target network, and storing and classifying the acquired scanning information;
(1.2) defining the scan information dataset to contain N host A set X of individual hosts,each host represents x i ∈R V×H (i=1,2,....,N host ) I.e. x i Is a matrix containing V x H elements, wherein V represents the host vulnerability and H represents the connectivity relationship between hosts.
Further, the step (2) specifically includes the following sub-steps:
(2.1) nodes generating the defense graph: according to the scanning information data set acquired in the step (1), the loophole of the host is used as a CVE (composite video encryption) section in the defense graphPoint N CVE Taking the precondition of the vulnerability as a preposed node N in the defense graph pre Taking the post-condition of the vulnerability as a post-node N in the defense graph post Taking nodes meeting a plurality of preconditions of the vulnerability as joint nodes N in the defense graph f ;
(2.2) according to the network topological structure relationship, performing connectivity analysis on the nodes of the defense graph, performing directional connection to form the edges of the defense graph, wherein E f Representing a connecting edge of the joint node and a plurality of front nodes around the joint node; e n Denoted as CVE node N CVE With its front node N pre And a rear node N post The connecting edge of (1);
(2.3) constructing a defense graph by using the nodes and edges of the defense graph obtained in the step (2.1) and the step (2.2), wherein the defense graph is represented by DeffendGraph = { E = f ,E n ,N pre ,N post ,N CVE ,N f }。
Further, the step (3) specifically includes the following sub-steps:
(3.1) according to the number N of all nodes of the defense graph, constructing an N multiplied by N model map, simultaneously writing the connectivity relation among the nodes into an action set of an attacker intelligent agent, and sequentially writing all the nodes into an attacker intelligent agent state set to construct a deep reinforcement learning simulation attacker environment;
(3.2) pre-training an attacker agent attacker on the basis of a deep Q network algorithm (DQN) in reinforcement learning, and pre-training to obtain a target strategy pi t Building a deep reinforcement learning simulation attacker environment and constructing a reinforcement learning training model;
(3.3) inputting the NxN model map obtained in the step (3.1) into the reinforcement learning training model obtained in the step (3.2), and learning the strategy pi of the pre-training model according to the depth intensity t Generating attack sequence state action pairs of the attackers at T moments, and connecting all the attack sequence state action pairs of the attackers to obtain a path which is most easily penetrated;
and (3.4) according to the most easily penetrated path obtained in the step (3.3), arranging the honeypot system and the intrusion detection system in the nodes and the host in the most easily penetrated path, and recording real-time information.
Further, the step (4) specifically includes the following sub-steps:
and (4.1) when the intrusion detection system or the honeypot system has attacker information, namely the attacker has attack activity on the node or the path, scanning and detecting the host, the port and the vulnerability of the target network again, and adding the newly added vulnerability of the honeypot system into the input information of the defense graph.
And (4.2) when a dynamic defense graph is constructed, incremental deletion is carried out on the basis of the primary defense graph, the information scanned in the step (4.2) is compared with the information scanned in the step (1.1) to obtain the difference on the topological structure and the difference between host vulnerabilities under the same topological structure, and nodes and edges are deleted on the basis of the initial defense graph aiming at the differences.
And (4.3) repeating the steps (3.2) to (3.4), and iterating the updated most easily penetrated path to construct a dynamic defense graph.
The technical conception of the invention is as follows: in the deep reinforcement learning training for simulating the attackers to attack the target network, the attackers can realize penetration to the target network according to the vulnerability information, the topological structure information and the like of the target network, and attack such as information extraction, virus planting and the like is carried out on the host of the target network, so that the target network loses the safety. Based on the situation, the optimal path judgment is carried out by utilizing the dynamic attack graph and the reinforcement learning so as to carry out network protection, and meanwhile, the honeypot system and the intrusion detection system are arranged on the basis of the training result, so that the aim of network security protection is achieved. Firstly, acquiring network configuration information and vulnerability information of a target network by using vulnerability scanning tools such as Nmap and the like, and classifying and sorting the information; secondly, the classified information is used as input of a defense graph, nodes and edges are respectively constructed by utilizing a defense graph algorithm, and a complete defense graph is generated; then, taking the node information and the side information in the defense graph as the state and action of deep reinforcement learning to be input, and acquiring the most easily penetrated path of the defense graph by using a Deep Q Network (DQN); then, arranging a honeypot system and an intrusion detection system on the path, and carrying out real-time interaction to obtain attacker information; and finally, constructing an initial signal of a dynamic defense graph by utilizing honeypot information and information of an intrusion detection system which are interacted in real time, scanning information of a target network by utilizing scanners such as Nmap and the like again, dynamically constructing the defense graph again, calculating a path which is most easily penetrated again by utilizing deep reinforcement learning, and arranging the honeypot and the intrusion detection system, so that the aim of protecting the target network is fulfilled, and the efficiency and the accuracy of network security defense are improved.
The invention has the following beneficial effects: 1) The method utilizes a defensive map technology to visually display the model structure of the target network; 2) The invention utilizes the dynamic defense graph updating technology to reduce the efficiency cost of generating the pure attack graph; 3) The method trains the most easily attacked path of the target network by deep reinforcement learning so as to save the defense cost of network security defense; 4) The invention utilizes the honeypot technology and the intrusion detection system as the defense method, and takes the signals thereof as the initial signals of the dynamic defense graph, thereby more automatically implementing the second training to achieve the purpose of network security defense.
Drawings
FIG. 1 is a schematic diagram of the process of the present invention;
FIG. 2 is a schematic diagram of a dynamic defense graph of the method of the present invention;
fig. 3 is a schematic diagram of an algorithm structure of DQN in reinforcement learning in the method of the present invention.
Detailed Description
The following detailed description of specific embodiments of the present invention is provided in conjunction with the accompanying drawings.
Referring to fig. 1 to 3, a network security protection defense method based on a dynamic defense graph and reinforcement learning includes the following steps:
(1) Network information data generation, comprising the following substeps:
(1.1) scanning target network information: the method comprises the steps of scanning and detecting a host, a port and a vulnerability of a target network through a current open-source network vulnerability scanner Nmap, and storing and classifying the scanning information. Since the Nmap has good open source, the vulnerability information and the configuration information of the target network can be easily obtained by using the integrating codes of the Nmap integration, such as vulnerability scanning, routing tracking and the like.
(1.2) defining the scan information dataset to contain N host A set X of one sample is taken,each sample represents x i ∈R V×H (i=1,2,....,N host ) I.e. x i Is a matrix containing V x H elements, wherein V represents host vulnerability and H represents connectivity relationship between hosts. Creating N vulnerability CVSS scoring sets S according to the serial number of the host vulnerability and the universal vulnerability scoring system (CVSS) scoring of the National Vulnerability Database (NVD), wherein S = { S = 1 ,s 2 ,...,s N Each sample represents s i ∈R 2 (i =1, 2.... N), i.e. s i Is a matrix containing 2 elements, which are the basic score of the vulnerability and the blasting score of the vulnerability respectively.
(2) Respectively generating nodes and edges of a defense graph by using the scanning information data set acquired in the step (1) to construct the defense graph; the method specifically comprises the following substeps:
(2.1) generation of defensive graph nodes: dividing different information into different nodes according to the scanning information data set acquired in the step (1), wherein a host is taken as a unit, and a vulnerability of the host is taken as a CVE node N in a defense graph CVE The CVE nodes correspond to vulnerabilities of the hosts, each host may have multiple vulnerabilities, i.e., one host node may have multiple CVE nodes; establishing corresponding preposition nodes N at the same time pre Rear node N post Corresponding to the node; using the precondition of the vulnerability as a preposed node N in a defense graph pre To represent the prerequisites required by an attacker to exploit the vulnerability; using the postcondition of the vulnerability as a postnode N in a defense graph post (ii) a Taking nodes meeting a plurality of preconditions of the vulnerability as joint nodes N in the defense graph f The said joint node N f With a precondition ofCan be a postcondition for other vulnerabilities.
(2.2) generation of a defense graph edge: after creating the different nodes from step (2.1), the different nodes need to be connected by edges. The host is still used as a unit, and the vulnerability precondition node, the vulnerability node and the vulnerability post-node of each host are sequentially connected in a directional manner. However, different hosts have different topological relations, so that connectivity analysis and defensive graph rule derivation are required to be performed according to different topological relations, and the host post-node N is used for post And a front node N pre And performing directional connection. Thus, the edge is represented as: e f Representing a connecting edge of the joint node and a plurality of front nodes around the joint node; e n Denoted as CVE node N CVE With its front node N pre And a rear node N post The connecting edge of (2);
(2.3) constructing a defense graph by using the nodes and edges of the defense graph obtained in the step (2.1) and the step (2.2), and visually displaying the model structure of the target network by using a Graphviz tool, wherein the defense graph is represented by DeffendGraph = { E = { (E) f ,E n ,N pre ,N post ,N CVE ,N f }; the deffendgraph is a general term of a defense graph formed by a target network.
(3) Building a deep reinforcement learning simulation attacker environment, taking the nodes and edges of the defense graph built in the step (2) as the input of the deep reinforcement learning, obtaining the most easily penetrated path of the defense graph through the deep reinforcement learning, and arranging a corresponding honeypot or intrusion detection system on the most easily penetrated path; the method specifically comprises the following substeps:
(3.1) building a deep reinforcement learning simulation attacker environment: and according to the number N of all nodes of the defense graph, constructing an N multiplied by N model map, simultaneously writing the connectivity relation among the nodes into an action set of an Attacker intelligent agent Attacker, and sequentially writing all the nodes into an Attacker intelligent agent Attacker state set to obtain the constructed deep reinforcement learning simulation Attacker environment.
(3.2) Attacker agent of Attacker in step(3.1) pre-training the deep reinforcement learning simulation attacker environment to obtain a target strategy pi t : an Attacker agent Attacker is trained based on a deep Q network algorithm (DQN) in reinforcement learning, the Attacker aims to safely penetrate a target host as fast as possible, Q learning is combined with a convolutional neural network by the DQN to construct a reinforcement learning training model, and the algorithm steps are as follows, as shown in FIG. 3:
(3.2.1) the DQN not only solves the problem that the state space is too large and difficult to maintain by combining a deep neural network and a Q learning algorithm of reinforcement learning, but also has the potential far greater than artificial feature representation due to the strong feature extraction capability of the neural network. The Q learning in the reinforcement learning is performed by iteration updating a state-action value function Q through a Bellman equation in a time sequence difference mode:
Q i+1 (s t ,a t )=Q i (s t ,a t )+α(y i -Q i (s t ,a t ))
wherein the content of the first and second substances,is a target Q value, s t+1 As an action a t The next state to occur, a t+1 Is s t+1 Possible actions in the state. α is the learning rate and γ is the discount factor. According to the Bellman optimal equation theory, the Q function can be approximated to a real value Q by continuously iteratively updating the above formula * So as to finally obtain the optimal strategy:
(3.2.2) DQN also uses the target network mechanism, i.e. at the current Q θ On the basis of a network structure, a target with the same structure is set upThe network forms the whole model framework of DQN, and during the training process, the current Q is θ The predicted Q value output by the network is used to select action a, another target->The network is used to calculate a target Q value. The loss function is defined by calculating the mean square error of the predicted Q value and the target Q value:
wherein the content of the first and second substances,updating the current Q for a target Q value by back-gradient propagation through a neural network θ The parameter θ of the network.
(3.2.3) in the training process, the DQN adopts an experience playback mechanism to convert the state into the process (state s) i And action a i Prize r i Next state s i ') stored in the empirical replay buffer Buff as a training data set for the network model and subject to batch learning in the form of random sampling.
(3.2.4) sampling N training data sets from the empirical replay buffer Buff, updating the current Q by minimizing the loss function θ Network parameters of the network, for the targetNetworks whose network parameters need not be updated iteratively, but rather at intervals from the current Q θ And copying the network parameters in the network, and then carrying out the next round of learning.
(3.3) inputting the NxN model map obtained in the step (3.1) into the reinforcement learning training model obtained in the step (3.2), and learning the strategy pi of the pre-training model according to the depth intensity t Generating attack sequence state action pairs of the attackers at T moments, and connecting all the attack sequence state action pairs of the attackers to obtain (state, action) {(s) 1 ,a 1 ),...,(s T ,a T ) As the most easily permeated path.
And (3.4) according to the most easily penetrated path obtained in the step (3.3), arranging a honeypot system and an intrusion detection system in the nodes and the host in the most easily penetrated path so as to be used as protection for network security defense and perform real-time information recording.
(4) And (4) repeatedly scanning and detecting the host, the port and the vulnerability of the target network according to the intrusion information recorded in real time in the step (3), constructing a dynamic defense graph, and iteratively updating the most easily penetrated path.
(4.1) because frequent network scanning consumes a large amount of time cost, the recorded information of an Intrusion Detection System (IDS) and a honeypot system is selected and used as a starting signal of dynamic defense graph scanning, when attacker information appears in the Intrusion Detection System (IDS) or the honeypot, namely the attacker has attack activity on the node or the path, nmap scanning is carried out again at the moment, and the vulnerability of the newly-added honeypot system is added into the defense graph input information.
(4.2) time cost is very well considered in the construction process of the defense graph. Therefore, when the dynamic defense graph is constructed, incremental deletion is carried out on the basis of the primary defense graph, and the cost of constructing the original node again is avoided. And (3) comparing the information scanned in the step (4.2) with the information scanned in the step (1.1) to obtain the difference on the topological structure and the difference between the host vulnerabilities under the same topological structure, and aiming at the differences, performing node deletion on the basis of the initial defense graph. Thereby achieving the effect of dynamic update.
And (4.3) repeating the steps (3.2) to (3.4), iterating the updated most easily penetrated path, and constructing a dynamic defense graph, thereby achieving the effect of target network defense.
In conclusion, the invention utilizes the defensive map technology to visually display the model structure of the target network; the efficiency cost of generating a pure attack graph is reduced, and the defense cost of network security defense is saved. The invention utilizes the honeypot technology and the intrusion detection system to record information in real time, thereby realizing the purpose of network security defense more automatically.
The embodiments described in this specification are merely illustrative of the implementation forms of the inventive concept, and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments, but also equivalent technical means that can be conceived by one skilled in the art based on the inventive concept.
Claims (4)
1. A network security defense method based on dynamic defense graphs and reinforcement learning is characterized by comprising the following steps:
(1) Scanning and detecting a host, a port and a vulnerability of a target network, and storing and classifying the obtained scanning information; defining a scanning information data set, and analyzing the connectivity relation between hosts;
(2) Respectively generating nodes and edges of a defense graph by using the scanning information data set acquired in the step (1) to construct the defense graph;
(3) Building a deep reinforcement learning simulation attacker environment, taking the nodes and edges of the defense graph built in the step (2) as the input of the deep reinforcement learning, obtaining the most easily penetrated path of the defense graph through the deep reinforcement learning, arranging a corresponding honeypot or intrusion detection system on the most easily penetrated path, and recording real-time information;
the step (3) specifically comprises the following substeps:
(3.1) according to the number N of all nodes of the defense graph, constructing an N multiplied by N model map, simultaneously writing the connectivity relation among the nodes into an action set of an attacker intelligent agent, and sequentially writing all the nodes into an attacker intelligent agent state set to construct a deep reinforcement learning simulation attacker environment;
(3.2) pre-training an attacker agent attacker on the basis of a deep Q network algorithm in reinforcement learning, and pre-training to obtain a target strategy pi t Building a deep reinforcement learning simulation attacker environment and constructing a reinforcement learning training model;
(3.3) inputting the NxN model map obtained in the step (3.1) into the reinforcement learning training model obtained in the step (3.2), and learning the strategy pi of the pre-training model according to the depth intensity t Generating attack sequence state action pairs of the attackers at T moments, and connecting all the attack sequence state action pairs of the attackers to obtain a path which is most easily penetrated;
(3.4) according to the most easily penetrated path obtained in the step (3.3), arranging a honeypot system and an intrusion detection system in the nodes and the host in the most easily penetrated path, and recording real-time information;
(4) And (4) repeatedly scanning and detecting the host, the port and the vulnerability of the target network according to the intrusion information recorded in real time in the step (3), constructing a dynamic defense graph, and iteratively updating the most easily penetrated path.
2. The dynamic defense graph and reinforcement learning-based network security defense method according to claim 1, wherein the step (1) specifically comprises the following sub-steps:
(1.1) scanning and detecting a host, a port and a vulnerability of a target network, acquiring vulnerability information and host configuration information of the target network, and storing and classifying the acquired scanning information;
3. The dynamic defense graph and reinforcement learning-based network security defense method according to claim 1, wherein the step (2) specifically comprises the following sub-steps:
(2.1) nodes generating the defense graph: according to the scanning information data set acquired in the step (1), the vulnerability of the host is used as a CVE node N in the defense graph CVE Taking the precondition of the vulnerability as a preposed node N in the defense graph pre Taking the post-condition of the vulnerability as a post-node N in the defense graph post Taking the nodes meeting a plurality of preconditions of the vulnerability as joint nodes N in the defense graph f ;
(2.2) according to the network topological structure relationship, performing connectivity analysis on the nodes of the defense graph, performing directional connection to form the edges of the defense graph, wherein E f Representing a connecting edge of the joint node and a plurality of front nodes around the joint node; e n Denoted as CVE node N CVE With its front node N pre And a rear node N post The connecting edge of (2);
(2.3) constructing a defense graph by using the nodes and edges of the defense graph obtained in the step (2.1) and the step (2.2), wherein the defense graph is represented by DeffendGraph = { E = f ,E n ,N pre ,N post ,N CVE ,N f }。
4. The dynamic defense graph and reinforcement learning-based network security defense method according to claim 1, wherein the step (4) specifically comprises the following sub-steps:
(4.1) when the intrusion detection system or the honeypot system has attacker information, namely the attacker has attack activity on the node or the path, scanning and detecting the host, the port and the vulnerability of the target network again, and adding the newly added vulnerability of the honeypot system into the input information of the defense graph;
(4.2) when a dynamic defense graph is constructed, incremental deletion is carried out on the basis of the primary defense graph, the information scanned in the step (4.2) is compared with the information scanned in the step (1.1), the difference on the topological structure and the difference between host vulnerabilities under the same topological structure are obtained, and for the differences, node and edge deletion is carried out on the basis of the primary defense graph;
and (4.3) repeating the steps (3.2) to (3.4), and iterating the updated most easily penetrated path to construct a dynamic defense graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111078688.1A CN113810406B (en) | 2021-09-15 | 2021-09-15 | Network space security defense method based on dynamic defense graph and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111078688.1A CN113810406B (en) | 2021-09-15 | 2021-09-15 | Network space security defense method based on dynamic defense graph and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113810406A CN113810406A (en) | 2021-12-17 |
CN113810406B true CN113810406B (en) | 2023-04-07 |
Family
ID=78940905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111078688.1A Active CN113810406B (en) | 2021-09-15 | 2021-09-15 | Network space security defense method based on dynamic defense graph and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113810406B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114301647A (en) * | 2021-12-20 | 2022-04-08 | 上海纽盾科技股份有限公司 | Prediction defense method, device and system for vulnerability information in situation awareness |
CN114338203B (en) * | 2021-12-31 | 2023-10-03 | 河南信大网御科技有限公司 | Intranet detection system and method based on mimicry honeypot |
CN116866084B (en) * | 2023-08-30 | 2023-11-21 | 国网山东省电力公司信息通信公司 | Intrusion response decision-making method and system based on reinforcement learning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108494810A (en) * | 2018-06-11 | 2018-09-04 | 中国人民解放军战略支援部队信息工程大学 | Network security situation prediction method, apparatus and system towards attack |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101282332B (en) * | 2008-05-22 | 2011-05-11 | 上海交通大学 | System for generating assaulting chart facing network safety alarm incident |
CN103139220A (en) * | 2013-03-07 | 2013-06-05 | 南京理工大学常熟研究院有限公司 | Network security attack defense method using state attack and defense graph model |
CN107948137A (en) * | 2017-11-01 | 2018-04-20 | 北京理工大学 | A kind of optimal attack paths planning method based on improved Q study |
US11347867B2 (en) * | 2018-05-18 | 2022-05-31 | Ns Holdings Llc | Methods and apparatuses to evaluate cyber security risk by establishing a probability of a cyber-attack being successful |
CN110874470A (en) * | 2018-12-29 | 2020-03-10 | 北京安天网络安全技术有限公司 | Method and device for predicting network space security based on network attack |
CN110166428B (en) * | 2019-04-12 | 2021-05-07 | 中国人民解放军战略支援部队信息工程大学 | Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game |
CN110138764B (en) * | 2019-05-10 | 2021-04-09 | 中北大学 | Attack path analysis method based on hierarchical attack graph |
CN112491818B (en) * | 2020-11-12 | 2023-02-03 | 南京邮电大学 | Power grid transmission line defense method based on multi-agent deep reinforcement learning |
CN113037777B (en) * | 2021-04-09 | 2021-12-03 | 广州锦行网络科技有限公司 | Honeypot bait distribution method and device, storage medium and electronic equipment |
-
2021
- 2021-09-15 CN CN202111078688.1A patent/CN113810406B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108494810A (en) * | 2018-06-11 | 2018-09-04 | 中国人民解放军战略支援部队信息工程大学 | Network security situation prediction method, apparatus and system towards attack |
Also Published As
Publication number | Publication date |
---|---|
CN113810406A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113810406B (en) | Network space security defense method based on dynamic defense graph and reinforcement learning | |
US10185832B2 (en) | Methods and systems for defending cyber attack in real-time | |
De Vries et al. | Systems for detecting advanced persistent threats: A development roadmap using intelligent data analysis | |
CN115296924B (en) | Network attack prediction method and device based on knowledge graph | |
CN104809404A (en) | Data layer system of information security attack-defense platform | |
CN106534195A (en) | Network attacker behavior analyzing method based on attack graph | |
Keshk et al. | An explainable deep learning-enabled intrusion detection framework in IoT networks | |
Derbyshire et al. | “Talking a different Language”: Anticipating adversary attack cost for cyber risk assessment | |
Zhu | Attack pattern discovery in forensic investigation of network attacks | |
CN116566674A (en) | Automated penetration test method, system, electronic equipment and storage medium | |
Giacobe | Measuring the effectiveness of visual analytics and data fusion techniques on situation awareness in cyber-security | |
CN115580430A (en) | Attack tree-pot deployment defense method and device based on deep reinforcement learning | |
Yamin et al. | Use of cyber attack and defense agents in cyber ranges: A case study | |
CN114386042A (en) | Method suitable for deduction of power enterprise network war chess | |
Chelliah et al. | Similarity-based optimised and adaptive adversarial attack on image classification using neural network | |
Şeker | Use of Artificial Intelligence Techniques/Applications in Cyber Defense | |
CN115361215A (en) | Network attack behavior detection method based on causal graph | |
CN114978595A (en) | Threat model construction method and device and computer equipment | |
Chen et al. | State-based attack detection for cloud | |
Aly et al. | Navigating the Deception Stack: In-Depth Analysis and Application of Comprehensive Cyber Defense Solutions | |
CN113837398A (en) | Graph classification task poisoning attack method based on federal learning | |
Sweet et al. | Synthetic intrusion alert generation through generative adversarial networks | |
Grant et al. | Identifying tools and technologies for professional offensive cyber operations | |
Al-Saraireh | Enhancing the Penetration Testing Approach and Detecting Advanced Persistent Threat Using Machine Learning | |
Gill et al. | A Systematic Review on Game-Theoretic Models and Different Types of Security Requirements in Cloud Environment: Challenges and Opportunities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |