CN113810406B - Network space security defense method based on dynamic defense graph and reinforcement learning - Google Patents

Network space security defense method based on dynamic defense graph and reinforcement learning Download PDF

Info

Publication number
CN113810406B
CN113810406B CN202111078688.1A CN202111078688A CN113810406B CN 113810406 B CN113810406 B CN 113810406B CN 202111078688 A CN202111078688 A CN 202111078688A CN 113810406 B CN113810406 B CN 113810406B
Authority
CN
China
Prior art keywords
defense
graph
information
reinforcement learning
vulnerability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111078688.1A
Other languages
Chinese (zh)
Other versions
CN113810406A (en
Inventor
陈晋音
李晓豪
李玮峰
贾澄钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111078688.1A priority Critical patent/CN113810406B/en
Publication of CN113810406A publication Critical patent/CN113810406A/en
Application granted granted Critical
Publication of CN113810406B publication Critical patent/CN113810406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a network security defense method based on a dynamic defense graph and reinforcement learning, which comprises the steps of scanning target network information through network vulnerability scanners such as Nmap, taking a structure information and vulnerability information group of a network topology structure in the scanned information as input information of the dynamic defense graph to generate the dynamic defense graph, further training all penetration paths in the whole attack graph by utilizing the depth reinforcement learning to obtain an optimal defense path, arranging a corresponding honeypot or intrusion detection system, finally dynamically updating the defense graph again according to the deployment information of the intrusion detection system and the honeypot, and obtaining the optimal defense path again by utilizing the depth reinforcement learning. The method can improve the efficiency and the accuracy of the network security defense and can save the defense cost of the network security defense.

Description

Network space security defense method based on dynamic defense graph and reinforcement learning
Technical Field
The invention belongs to the field of network security protection facing dynamic defense graphs and reinforcement learning, and particularly relates to a network security protection method based on deep reinforcement learning and dynamic construction of a network model.
Background
With the rapid development of computer technology, network attack technology is also rapidly developing, and various network attack machine events emerge endlessly. There are many kinds of sensitive information that inevitably attract various human attacks such as information disclosure, information theft, data tampering, data addition and deletion, computer virus, etc. from all over the world. In order to ensure the security of the cyberspace, the key is the analysis of the cyberspace topology, the vulnerabilities, and the determination of the optimal defense strategy of the network to prevent the attackers from exploiting these vulnerabilities for illegal penetration. Different from the traditional artificial network defense, the deep reinforcement learning and dynamic defense graph technology can derive the optimal defense path in advance and carry out defense dynamically.
The defense graph is a network security assessment technology based on a model. From the perspective of defenders, on the basis of comprehensively analyzing various network configurations and vulnerability information, all possible defense paths are found out, and a visualization method for representing an attack process scene is provided, so that network security management personnel can be helped to intuitively understand the relationship among all vulnerabilities in a target network, the relationship between the vulnerabilities and the network security configuration and potential threats generated by the vulnerabilities. The network security assessment technology based on the defense graph model is used for carrying out deep security assessment modeling and analysis on the basis of the defense graph.
Reinforcement learning is generally a continuous decision-making process, the basic idea of which is to learn the optimal strategy for learning purposes by maximizing the cumulative rewards that an agent receives from the environment. The deep reinforcement learning fully utilizes the neural network as a parameter structure, and optimizes a deep reinforcement learning strategy by combining the perception capability of the deep learning and the decision capability of the reinforcement learning, so that an intelligent agent can continuously learn by self from the environment where the intelligent agent is located along with the passage of time.
However, when traditionally using a defensive graph for network evaluation, static network data is considered, and static analysis can only determine a priori risks of network components. However, dynamic defense graphs can update these risks based on evidence that any network component may be compromised, such as from security information and time management and Intrusion Detection Systems (IDS). Dynamic analysis also allows analysis of the path of the attacker to determine nodes that are more likely to be attacked in the next step. This enables an administrator to assess the security risk of valuable resources in the network.
However, in either static or dynamic defense graphs, most do not take into account the abilities of the attacker, and therefore the likelihood that a particular attack is performed. Without these considerations, threats and their effects are easily misjudged and cause significant cost loss.
At present, the defense graph technology still has some problems, such as the defects of static network data input, tedious defense paths, incapability of dynamically judging network environment changes and the like, and the dual defense effects are achieved in order to realize dynamic detection of network environment and data and optimal path judgment of the defense paths
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a network security defense method based on a dynamic defense graph and deep reinforcement learning.
In order to achieve the purpose, the technical scheme of the invention is as follows: a network security defense method based on dynamic defense graphs and reinforcement learning specifically comprises the following steps:
(1) Scanning and detecting a host, a port and a vulnerability of a target network, and storing and classifying the obtained scanning information; defining a scanning information data set, and analyzing the connectivity relation between hosts;
(2) Respectively generating nodes and edges of a defense graph by using the scanning information data set acquired in the step (1) to construct the defense graph;
(3) Building a deep reinforcement learning simulation attacker environment, taking the nodes and edges of the defense graph built in the step (2) as the input of the deep reinforcement learning, obtaining the most easily penetrated path of the defense graph through the deep reinforcement learning, arranging a corresponding honeypot or intrusion detection system on the most easily penetrated path, and recording real-time information;
(4) And (4) repeatedly scanning and detecting the host, the port and the vulnerability of the target network according to the intrusion information recorded in real time in the step (3), constructing a dynamic defense graph, and iteratively updating the most easily penetrated path.
Further, the step (1) specifically includes the following sub-steps:
(1.1) scanning and detecting a host, a port and a vulnerability of a target network, acquiring vulnerability information and host configuration information of the target network, and storing and classifying the acquired scanning information;
(1.2) defining the scan information dataset to contain N host A set X of individual hosts,
Figure SMS_1
each host represents x i ∈R V×H (i=1,2,....,N host ) I.e. x i Is a matrix containing V x H elements, wherein V represents the host vulnerability and H represents the connectivity relationship between hosts.
Further, the step (2) specifically includes the following sub-steps:
(2.1) nodes generating the defense graph: according to the scanning information data set acquired in the step (1), the loophole of the host is used as a CVE (composite video encryption) section in the defense graphPoint N CVE Taking the precondition of the vulnerability as a preposed node N in the defense graph pre Taking the post-condition of the vulnerability as a post-node N in the defense graph post Taking nodes meeting a plurality of preconditions of the vulnerability as joint nodes N in the defense graph f
(2.2) according to the network topological structure relationship, performing connectivity analysis on the nodes of the defense graph, performing directional connection to form the edges of the defense graph, wherein E f Representing a connecting edge of the joint node and a plurality of front nodes around the joint node; e n Denoted as CVE node N CVE With its front node N pre And a rear node N post The connecting edge of (1);
(2.3) constructing a defense graph by using the nodes and edges of the defense graph obtained in the step (2.1) and the step (2.2), wherein the defense graph is represented by DeffendGraph = { E = f ,E n ,N pre ,N post ,N CVE ,N f }。
Further, the step (3) specifically includes the following sub-steps:
(3.1) according to the number N of all nodes of the defense graph, constructing an N multiplied by N model map, simultaneously writing the connectivity relation among the nodes into an action set of an attacker intelligent agent, and sequentially writing all the nodes into an attacker intelligent agent state set to construct a deep reinforcement learning simulation attacker environment;
(3.2) pre-training an attacker agent attacker on the basis of a deep Q network algorithm (DQN) in reinforcement learning, and pre-training to obtain a target strategy pi t Building a deep reinforcement learning simulation attacker environment and constructing a reinforcement learning training model;
(3.3) inputting the NxN model map obtained in the step (3.1) into the reinforcement learning training model obtained in the step (3.2), and learning the strategy pi of the pre-training model according to the depth intensity t Generating attack sequence state action pairs of the attackers at T moments, and connecting all the attack sequence state action pairs of the attackers to obtain a path which is most easily penetrated;
and (3.4) according to the most easily penetrated path obtained in the step (3.3), arranging the honeypot system and the intrusion detection system in the nodes and the host in the most easily penetrated path, and recording real-time information.
Further, the step (4) specifically includes the following sub-steps:
and (4.1) when the intrusion detection system or the honeypot system has attacker information, namely the attacker has attack activity on the node or the path, scanning and detecting the host, the port and the vulnerability of the target network again, and adding the newly added vulnerability of the honeypot system into the input information of the defense graph.
And (4.2) when a dynamic defense graph is constructed, incremental deletion is carried out on the basis of the primary defense graph, the information scanned in the step (4.2) is compared with the information scanned in the step (1.1) to obtain the difference on the topological structure and the difference between host vulnerabilities under the same topological structure, and nodes and edges are deleted on the basis of the initial defense graph aiming at the differences.
And (4.3) repeating the steps (3.2) to (3.4), and iterating the updated most easily penetrated path to construct a dynamic defense graph.
The technical conception of the invention is as follows: in the deep reinforcement learning training for simulating the attackers to attack the target network, the attackers can realize penetration to the target network according to the vulnerability information, the topological structure information and the like of the target network, and attack such as information extraction, virus planting and the like is carried out on the host of the target network, so that the target network loses the safety. Based on the situation, the optimal path judgment is carried out by utilizing the dynamic attack graph and the reinforcement learning so as to carry out network protection, and meanwhile, the honeypot system and the intrusion detection system are arranged on the basis of the training result, so that the aim of network security protection is achieved. Firstly, acquiring network configuration information and vulnerability information of a target network by using vulnerability scanning tools such as Nmap and the like, and classifying and sorting the information; secondly, the classified information is used as input of a defense graph, nodes and edges are respectively constructed by utilizing a defense graph algorithm, and a complete defense graph is generated; then, taking the node information and the side information in the defense graph as the state and action of deep reinforcement learning to be input, and acquiring the most easily penetrated path of the defense graph by using a Deep Q Network (DQN); then, arranging a honeypot system and an intrusion detection system on the path, and carrying out real-time interaction to obtain attacker information; and finally, constructing an initial signal of a dynamic defense graph by utilizing honeypot information and information of an intrusion detection system which are interacted in real time, scanning information of a target network by utilizing scanners such as Nmap and the like again, dynamically constructing the defense graph again, calculating a path which is most easily penetrated again by utilizing deep reinforcement learning, and arranging the honeypot and the intrusion detection system, so that the aim of protecting the target network is fulfilled, and the efficiency and the accuracy of network security defense are improved.
The invention has the following beneficial effects: 1) The method utilizes a defensive map technology to visually display the model structure of the target network; 2) The invention utilizes the dynamic defense graph updating technology to reduce the efficiency cost of generating the pure attack graph; 3) The method trains the most easily attacked path of the target network by deep reinforcement learning so as to save the defense cost of network security defense; 4) The invention utilizes the honeypot technology and the intrusion detection system as the defense method, and takes the signals thereof as the initial signals of the dynamic defense graph, thereby more automatically implementing the second training to achieve the purpose of network security defense.
Drawings
FIG. 1 is a schematic diagram of the process of the present invention;
FIG. 2 is a schematic diagram of a dynamic defense graph of the method of the present invention;
fig. 3 is a schematic diagram of an algorithm structure of DQN in reinforcement learning in the method of the present invention.
Detailed Description
The following detailed description of specific embodiments of the present invention is provided in conjunction with the accompanying drawings.
Referring to fig. 1 to 3, a network security protection defense method based on a dynamic defense graph and reinforcement learning includes the following steps:
(1) Network information data generation, comprising the following substeps:
(1.1) scanning target network information: the method comprises the steps of scanning and detecting a host, a port and a vulnerability of a target network through a current open-source network vulnerability scanner Nmap, and storing and classifying the scanning information. Since the Nmap has good open source, the vulnerability information and the configuration information of the target network can be easily obtained by using the integrating codes of the Nmap integration, such as vulnerability scanning, routing tracking and the like.
(1.2) defining the scan information dataset to contain N host A set X of one sample is taken,
Figure SMS_2
each sample represents x i ∈R V×H (i=1,2,....,N host ) I.e. x i Is a matrix containing V x H elements, wherein V represents host vulnerability and H represents connectivity relationship between hosts. Creating N vulnerability CVSS scoring sets S according to the serial number of the host vulnerability and the universal vulnerability scoring system (CVSS) scoring of the National Vulnerability Database (NVD), wherein S = { S = 1 ,s 2 ,...,s N Each sample represents s i ∈R 2 (i =1, 2.... N), i.e. s i Is a matrix containing 2 elements, which are the basic score of the vulnerability and the blasting score of the vulnerability respectively.
(2) Respectively generating nodes and edges of a defense graph by using the scanning information data set acquired in the step (1) to construct the defense graph; the method specifically comprises the following substeps:
(2.1) generation of defensive graph nodes: dividing different information into different nodes according to the scanning information data set acquired in the step (1), wherein a host is taken as a unit, and a vulnerability of the host is taken as a CVE node N in a defense graph CVE The CVE nodes correspond to vulnerabilities of the hosts, each host may have multiple vulnerabilities, i.e., one host node may have multiple CVE nodes; establishing corresponding preposition nodes N at the same time pre Rear node N post Corresponding to the node; using the precondition of the vulnerability as a preposed node N in a defense graph pre To represent the prerequisites required by an attacker to exploit the vulnerability; using the postcondition of the vulnerability as a postnode N in a defense graph post (ii) a Taking nodes meeting a plurality of preconditions of the vulnerability as joint nodes N in the defense graph f The said joint node N f With a precondition ofCan be a postcondition for other vulnerabilities.
(2.2) generation of a defense graph edge: after creating the different nodes from step (2.1), the different nodes need to be connected by edges. The host is still used as a unit, and the vulnerability precondition node, the vulnerability node and the vulnerability post-node of each host are sequentially connected in a directional manner. However, different hosts have different topological relations, so that connectivity analysis and defensive graph rule derivation are required to be performed according to different topological relations, and the host post-node N is used for post And a front node N pre And performing directional connection. Thus, the edge is represented as: e f Representing a connecting edge of the joint node and a plurality of front nodes around the joint node; e n Denoted as CVE node N CVE With its front node N pre And a rear node N post The connecting edge of (2);
(2.3) constructing a defense graph by using the nodes and edges of the defense graph obtained in the step (2.1) and the step (2.2), and visually displaying the model structure of the target network by using a Graphviz tool, wherein the defense graph is represented by DeffendGraph = { E = { (E) f ,E n ,N pre ,N post ,N CVE ,N f }; the deffendgraph is a general term of a defense graph formed by a target network.
(3) Building a deep reinforcement learning simulation attacker environment, taking the nodes and edges of the defense graph built in the step (2) as the input of the deep reinforcement learning, obtaining the most easily penetrated path of the defense graph through the deep reinforcement learning, and arranging a corresponding honeypot or intrusion detection system on the most easily penetrated path; the method specifically comprises the following substeps:
(3.1) building a deep reinforcement learning simulation attacker environment: and according to the number N of all nodes of the defense graph, constructing an N multiplied by N model map, simultaneously writing the connectivity relation among the nodes into an action set of an Attacker intelligent agent Attacker, and sequentially writing all the nodes into an Attacker intelligent agent Attacker state set to obtain the constructed deep reinforcement learning simulation Attacker environment.
(3.2) Attacker agent of Attacker in step(3.1) pre-training the deep reinforcement learning simulation attacker environment to obtain a target strategy pi t : an Attacker agent Attacker is trained based on a deep Q network algorithm (DQN) in reinforcement learning, the Attacker aims to safely penetrate a target host as fast as possible, Q learning is combined with a convolutional neural network by the DQN to construct a reinforcement learning training model, and the algorithm steps are as follows, as shown in FIG. 3:
(3.2.1) the DQN not only solves the problem that the state space is too large and difficult to maintain by combining a deep neural network and a Q learning algorithm of reinforcement learning, but also has the potential far greater than artificial feature representation due to the strong feature extraction capability of the neural network. The Q learning in the reinforcement learning is performed by iteration updating a state-action value function Q through a Bellman equation in a time sequence difference mode:
Q i+1 (s t ,a t )=Q i (s t ,a t )+α(y i -Q i (s t ,a t ))
wherein the content of the first and second substances,
Figure SMS_3
is a target Q value, s t+1 As an action a t The next state to occur, a t+1 Is s t+1 Possible actions in the state. α is the learning rate and γ is the discount factor. According to the Bellman optimal equation theory, the Q function can be approximated to a real value Q by continuously iteratively updating the above formula * So as to finally obtain the optimal strategy:
Figure SMS_4
(3.2.2) DQN also uses the target network mechanism, i.e. at the current Q θ On the basis of a network structure, a target with the same structure is set up
Figure SMS_5
The network forms the whole model framework of DQN, and during the training process, the current Q is θ The predicted Q value output by the network is used to select action a, another target->
Figure SMS_6
The network is used to calculate a target Q value. The loss function is defined by calculating the mean square error of the predicted Q value and the target Q value:
Figure SMS_7
wherein the content of the first and second substances,
Figure SMS_8
updating the current Q for a target Q value by back-gradient propagation through a neural network θ The parameter θ of the network.
(3.2.3) in the training process, the DQN adopts an experience playback mechanism to convert the state into the process (state s) i And action a i Prize r i Next state s i ') stored in the empirical replay buffer Buff as a training data set for the network model and subject to batch learning in the form of random sampling.
(3.2.4) sampling N training data sets from the empirical replay buffer Buff, updating the current Q by minimizing the loss function θ Network parameters of the network, for the target
Figure SMS_9
Networks whose network parameters need not be updated iteratively, but rather at intervals from the current Q θ And copying the network parameters in the network, and then carrying out the next round of learning.
(3.3) inputting the NxN model map obtained in the step (3.1) into the reinforcement learning training model obtained in the step (3.2), and learning the strategy pi of the pre-training model according to the depth intensity t Generating attack sequence state action pairs of the attackers at T moments, and connecting all the attack sequence state action pairs of the attackers to obtain (state, action) {(s) 1 ,a 1 ),...,(s T ,a T ) As the most easily permeated path.
And (3.4) according to the most easily penetrated path obtained in the step (3.3), arranging a honeypot system and an intrusion detection system in the nodes and the host in the most easily penetrated path so as to be used as protection for network security defense and perform real-time information recording.
(4) And (4) repeatedly scanning and detecting the host, the port and the vulnerability of the target network according to the intrusion information recorded in real time in the step (3), constructing a dynamic defense graph, and iteratively updating the most easily penetrated path.
(4.1) because frequent network scanning consumes a large amount of time cost, the recorded information of an Intrusion Detection System (IDS) and a honeypot system is selected and used as a starting signal of dynamic defense graph scanning, when attacker information appears in the Intrusion Detection System (IDS) or the honeypot, namely the attacker has attack activity on the node or the path, nmap scanning is carried out again at the moment, and the vulnerability of the newly-added honeypot system is added into the defense graph input information.
(4.2) time cost is very well considered in the construction process of the defense graph. Therefore, when the dynamic defense graph is constructed, incremental deletion is carried out on the basis of the primary defense graph, and the cost of constructing the original node again is avoided. And (3) comparing the information scanned in the step (4.2) with the information scanned in the step (1.1) to obtain the difference on the topological structure and the difference between the host vulnerabilities under the same topological structure, and aiming at the differences, performing node deletion on the basis of the initial defense graph. Thereby achieving the effect of dynamic update.
And (4.3) repeating the steps (3.2) to (3.4), iterating the updated most easily penetrated path, and constructing a dynamic defense graph, thereby achieving the effect of target network defense.
In conclusion, the invention utilizes the defensive map technology to visually display the model structure of the target network; the efficiency cost of generating a pure attack graph is reduced, and the defense cost of network security defense is saved. The invention utilizes the honeypot technology and the intrusion detection system to record information in real time, thereby realizing the purpose of network security defense more automatically.
The embodiments described in this specification are merely illustrative of the implementation forms of the inventive concept, and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments, but also equivalent technical means that can be conceived by one skilled in the art based on the inventive concept.

Claims (4)

1. A network security defense method based on dynamic defense graphs and reinforcement learning is characterized by comprising the following steps:
(1) Scanning and detecting a host, a port and a vulnerability of a target network, and storing and classifying the obtained scanning information; defining a scanning information data set, and analyzing the connectivity relation between hosts;
(2) Respectively generating nodes and edges of a defense graph by using the scanning information data set acquired in the step (1) to construct the defense graph;
(3) Building a deep reinforcement learning simulation attacker environment, taking the nodes and edges of the defense graph built in the step (2) as the input of the deep reinforcement learning, obtaining the most easily penetrated path of the defense graph through the deep reinforcement learning, arranging a corresponding honeypot or intrusion detection system on the most easily penetrated path, and recording real-time information;
the step (3) specifically comprises the following substeps:
(3.1) according to the number N of all nodes of the defense graph, constructing an N multiplied by N model map, simultaneously writing the connectivity relation among the nodes into an action set of an attacker intelligent agent, and sequentially writing all the nodes into an attacker intelligent agent state set to construct a deep reinforcement learning simulation attacker environment;
(3.2) pre-training an attacker agent attacker on the basis of a deep Q network algorithm in reinforcement learning, and pre-training to obtain a target strategy pi t Building a deep reinforcement learning simulation attacker environment and constructing a reinforcement learning training model;
(3.3) inputting the NxN model map obtained in the step (3.1) into the reinforcement learning training model obtained in the step (3.2), and learning the strategy pi of the pre-training model according to the depth intensity t Generating attack sequence state action pairs of the attackers at T moments, and connecting all the attack sequence state action pairs of the attackers to obtain a path which is most easily penetrated;
(3.4) according to the most easily penetrated path obtained in the step (3.3), arranging a honeypot system and an intrusion detection system in the nodes and the host in the most easily penetrated path, and recording real-time information;
(4) And (4) repeatedly scanning and detecting the host, the port and the vulnerability of the target network according to the intrusion information recorded in real time in the step (3), constructing a dynamic defense graph, and iteratively updating the most easily penetrated path.
2. The dynamic defense graph and reinforcement learning-based network security defense method according to claim 1, wherein the step (1) specifically comprises the following sub-steps:
(1.1) scanning and detecting a host, a port and a vulnerability of a target network, acquiring vulnerability information and host configuration information of the target network, and storing and classifying the acquired scanning information;
(1.2) defining the scan information dataset to contain N host A set X of individual hosts,
Figure FDA0004057242270000011
each host represents x i ∈R V×H (i=1,2,....,N host ) I.e. x i Is a matrix containing V x H elements, wherein V represents host vulnerability and H represents connectivity relationship between hosts.
3. The dynamic defense graph and reinforcement learning-based network security defense method according to claim 1, wherein the step (2) specifically comprises the following sub-steps:
(2.1) nodes generating the defense graph: according to the scanning information data set acquired in the step (1), the vulnerability of the host is used as a CVE node N in the defense graph CVE Taking the precondition of the vulnerability as a preposed node N in the defense graph pre Taking the post-condition of the vulnerability as a post-node N in the defense graph post Taking the nodes meeting a plurality of preconditions of the vulnerability as joint nodes N in the defense graph f
(2.2) according to the network topological structure relationship, performing connectivity analysis on the nodes of the defense graph, performing directional connection to form the edges of the defense graph, wherein E f Representing a connecting edge of the joint node and a plurality of front nodes around the joint node; e n Denoted as CVE node N CVE With its front node N pre And a rear node N post The connecting edge of (2);
(2.3) constructing a defense graph by using the nodes and edges of the defense graph obtained in the step (2.1) and the step (2.2), wherein the defense graph is represented by DeffendGraph = { E = f ,E n ,N pre ,N post ,N CVE ,N f }。
4. The dynamic defense graph and reinforcement learning-based network security defense method according to claim 1, wherein the step (4) specifically comprises the following sub-steps:
(4.1) when the intrusion detection system or the honeypot system has attacker information, namely the attacker has attack activity on the node or the path, scanning and detecting the host, the port and the vulnerability of the target network again, and adding the newly added vulnerability of the honeypot system into the input information of the defense graph;
(4.2) when a dynamic defense graph is constructed, incremental deletion is carried out on the basis of the primary defense graph, the information scanned in the step (4.2) is compared with the information scanned in the step (1.1), the difference on the topological structure and the difference between host vulnerabilities under the same topological structure are obtained, and for the differences, node and edge deletion is carried out on the basis of the primary defense graph;
and (4.3) repeating the steps (3.2) to (3.4), and iterating the updated most easily penetrated path to construct a dynamic defense graph.
CN202111078688.1A 2021-09-15 2021-09-15 Network space security defense method based on dynamic defense graph and reinforcement learning Active CN113810406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111078688.1A CN113810406B (en) 2021-09-15 2021-09-15 Network space security defense method based on dynamic defense graph and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111078688.1A CN113810406B (en) 2021-09-15 2021-09-15 Network space security defense method based on dynamic defense graph and reinforcement learning

Publications (2)

Publication Number Publication Date
CN113810406A CN113810406A (en) 2021-12-17
CN113810406B true CN113810406B (en) 2023-04-07

Family

ID=78940905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111078688.1A Active CN113810406B (en) 2021-09-15 2021-09-15 Network space security defense method based on dynamic defense graph and reinforcement learning

Country Status (1)

Country Link
CN (1) CN113810406B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301647A (en) * 2021-12-20 2022-04-08 上海纽盾科技股份有限公司 Prediction defense method, device and system for vulnerability information in situation awareness
CN114338203B (en) * 2021-12-31 2023-10-03 河南信大网御科技有限公司 Intranet detection system and method based on mimicry honeypot
CN116866084B (en) * 2023-08-30 2023-11-21 国网山东省电力公司信息通信公司 Intrusion response decision-making method and system based on reinforcement learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494810A (en) * 2018-06-11 2018-09-04 中国人民解放军战略支援部队信息工程大学 Network security situation prediction method, apparatus and system towards attack

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101282332B (en) * 2008-05-22 2011-05-11 上海交通大学 System for generating assaulting chart facing network safety alarm incident
CN103139220A (en) * 2013-03-07 2013-06-05 南京理工大学常熟研究院有限公司 Network security attack defense method using state attack and defense graph model
CN107948137A (en) * 2017-11-01 2018-04-20 北京理工大学 A kind of optimal attack paths planning method based on improved Q study
US11347867B2 (en) * 2018-05-18 2022-05-31 Ns Holdings Llc Methods and apparatuses to evaluate cyber security risk by establishing a probability of a cyber-attack being successful
CN110874470A (en) * 2018-12-29 2020-03-10 北京安天网络安全技术有限公司 Method and device for predicting network space security based on network attack
CN110166428B (en) * 2019-04-12 2021-05-07 中国人民解放军战略支援部队信息工程大学 Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game
CN110138764B (en) * 2019-05-10 2021-04-09 中北大学 Attack path analysis method based on hierarchical attack graph
CN112491818B (en) * 2020-11-12 2023-02-03 南京邮电大学 Power grid transmission line defense method based on multi-agent deep reinforcement learning
CN113037777B (en) * 2021-04-09 2021-12-03 广州锦行网络科技有限公司 Honeypot bait distribution method and device, storage medium and electronic equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494810A (en) * 2018-06-11 2018-09-04 中国人民解放军战略支援部队信息工程大学 Network security situation prediction method, apparatus and system towards attack

Also Published As

Publication number Publication date
CN113810406A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN113810406B (en) Network space security defense method based on dynamic defense graph and reinforcement learning
US10185832B2 (en) Methods and systems for defending cyber attack in real-time
De Vries et al. Systems for detecting advanced persistent threats: A development roadmap using intelligent data analysis
CN115296924B (en) Network attack prediction method and device based on knowledge graph
CN104809404A (en) Data layer system of information security attack-defense platform
CN106534195A (en) Network attacker behavior analyzing method based on attack graph
Keshk et al. An explainable deep learning-enabled intrusion detection framework in IoT networks
Derbyshire et al. “Talking a different Language”: Anticipating adversary attack cost for cyber risk assessment
Zhu Attack pattern discovery in forensic investigation of network attacks
CN116566674A (en) Automated penetration test method, system, electronic equipment and storage medium
Giacobe Measuring the effectiveness of visual analytics and data fusion techniques on situation awareness in cyber-security
CN115580430A (en) Attack tree-pot deployment defense method and device based on deep reinforcement learning
Yamin et al. Use of cyber attack and defense agents in cyber ranges: A case study
CN114386042A (en) Method suitable for deduction of power enterprise network war chess
Chelliah et al. Similarity-based optimised and adaptive adversarial attack on image classification using neural network
Şeker Use of Artificial Intelligence Techniques/Applications in Cyber Defense
CN115361215A (en) Network attack behavior detection method based on causal graph
CN114978595A (en) Threat model construction method and device and computer equipment
Chen et al. State-based attack detection for cloud
Aly et al. Navigating the Deception Stack: In-Depth Analysis and Application of Comprehensive Cyber Defense Solutions
CN113837398A (en) Graph classification task poisoning attack method based on federal learning
Sweet et al. Synthetic intrusion alert generation through generative adversarial networks
Grant et al. Identifying tools and technologies for professional offensive cyber operations
Al-Saraireh Enhancing the Penetration Testing Approach and Detecting Advanced Persistent Threat Using Machine Learning
Gill et al. A Systematic Review on Game-Theoretic Models and Different Types of Security Requirements in Cloud Environment: Challenges and Opportunities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant