CN113783881B - Network honeypot deployment method facing penetration attack - Google Patents

Network honeypot deployment method facing penetration attack Download PDF

Info

Publication number
CN113783881B
CN113783881B CN202111078546.5A CN202111078546A CN113783881B CN 113783881 B CN113783881 B CN 113783881B CN 202111078546 A CN202111078546 A CN 202111078546A CN 113783881 B CN113783881 B CN 113783881B
Authority
CN
China
Prior art keywords
network
attack
obj
honeypot
honeypots
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111078546.5A
Other languages
Chinese (zh)
Other versions
CN113783881A (en
Inventor
陈晋音
李玮峰
李晓豪
贾澄钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111078546.5A priority Critical patent/CN113783881B/en
Publication of CN113783881A publication Critical patent/CN113783881A/en
Application granted granted Critical
Publication of CN113783881B publication Critical patent/CN113783881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a network honeypot deployment method facing penetration attack, which is characterized in that the network structure of the network honeypot deployment method is scanned, an attribute attack graph is generated and converted into a Bayesian attack graph containing penetration success rate information, and the penetration success rate of each network node is recorded and stored. And then establishing a reward function based on the node penetration success probability, wherein the attacker captured by the honeypot can obtain rewards related to the penetration success probability, and the reward value is higher when the success rate is higher. Honeypot deployments have a basic negative reward value, meaning that unlimited deployment of honeypots does not yield the maximum benefit. Therefore, the problem that the maintenance cost is high due to the fact that honeypots are deployed on each path is effectively avoided. And then taking the node and path information in the attack graph as the input of reinforcement learning, then carrying out reinforcement learning by using an SARSA learning mechanism, and providing a scheme for deploying honeypots with the maximum profit according to the environment.

Description

Network honeypot deployment method facing penetration attack
The technical field is as follows:
the invention relates to the field of network security protection facing attack graph technology and reinforcement learning, in particular to a network honeypot deployment method facing penetration attack.
Technical background:
as more and more people participate in internet life, everyone enjoys the convenience brought by the internet, and the network security problem is gradually rising and getting more and more. The problem of information security threatened by hackers is repeatedly evolving in today's cyberspace. As a result, the network is susceptible to higher levels of interference, which makes the network security more vulnerable. The diversity of devices also makes maintaining them (e.g., patching bugs) a more challenging management problem.
A deceptive defense technique called honeypot was then produced. After 20 years of development, honeypots are continuously updated and iterated, and the evolution of honeypots aims to cope with emerging threats. From "data packets found on the internet" in 1993 to locking and capturing internet of things attacks, the development of honeypots has become a circular process. A retrospective analysis was performed on malware captured on honeypots. The analysis lays a new direction for the subsequent national defense network security and the honeypot development.
Modern computer networks are highly connected and heterogeneous to provide more sophisticated services and to accommodate ever-increasing and rapidly changing demands. For example, these networks connect computers of different operating systems and protocols. Furthermore, more and more devices are added to the network each day. For example, deployment of wireless devices, as well as internet of things, robotics, sensors, makes networks larger and denser.
The contribution of honeypots to security is considered a reactive process. The value of honeypot deployment comes from the captured dataset. The longer the attack interaction can be maintained, the larger the data set and subsequent analysis. Global honeypot projects track emerging threats. Virtual technologies provide honeypot operators with a means to abstract deployments from the production network and bare metal infrastructure. In response to the prevalence of honeypots, honeypot detection tools have been developed and incorporate detection techniques into malware deployments.
The attack graph is a network vulnerability assessment method based on a model. The attack graph technology can correlate the vulnerabilities of all hosts in the network to carry out deep analysis, discover attack paths threatening the network security and display the attack paths in a graph mode. The security manager can visually observe the relationship between each vulnerability in the network by using the attack graph, and the minimum cost is selected to make up for the network vulnerability.
Reinforcement learning is a type of algorithm, which allows a computer to learn from errors by continuously trying and finally finding out a rule, thereby learning a method for achieving the purpose. Reinforcement learning is currently applied in a variety of scenarios where actions or decisions need to be performed.
An internet of things network architecture is also deployed in a battlefield environment in which it is referred to as a battlefield internet event. In a broader sense, the internet of things network also refers to devices used in military combat that may communicate over tactical networks other than the internet. Therefore, it is crucial to protect the resiliency and robustness of the network's critical nodes on the battlefield. In the information collection phase (also called reconnaissance phase), the attacker collects internal information of the target network using a series of tools and scanning techniques. Attackers typically map the target network using a software scanning tool (e.g., nmap, etc.) or infer the network through traffic analysis. On the other hand, a network administrator (defender) can effectively protect its own network at an early stage of reconnaissance by deceiving an attacker and manipulating a network interface to mask the true state of the network. The problems of high technical requirements, cost increase possibly caused by too many devices and the like are not mentioned although the automatic honeypot deployment is realized in the existing honeypot deployment scheme, and the deployment scheme is not flexible enough and is difficult to cope with complex network attacks due to the method of automatically deploying the use script. The invention obtains the network information by generating the Bayesian attack graph and deploys the honeypot system by the reinforcement learning through the penetration success probability, thereby playing the role of protecting the system and overcoming the problem of network redundancy.
At present, the existing honeypot system has the situation of redundant deployment, honeypots are mechanically deployed at certain nodes, although the cost of an attacker can be increased, the basic problem is that the pertinence is not strong, the server scale is enlarged, and the maintenance cost of the server system is increased. And once bypassed, does not serve any protective function.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a network honeypot deployment method facing penetration attack.
In order to achieve the purpose, the technical scheme of the invention is as follows: a network honeypot deployment method facing penetration attack comprises the following steps:
(1) Scanning and detecting a target network to obtain scanning information of the target network, and storing and classifying the scanning information; performing connectivity analysis according to the network topology structure relationship and the host vulnerability relationship to generate an attribute attack graph; calculating the availability E of the vulnerability by using a general vulnerability scoring system, calculating the conditional probability P between the attribute nodes and the probability P (obj) of the target node being attacked, and generating a Bayesian attack graph from the attribute attack graph;
(2) Enumerating each possible attack path, distributing a group of k honeypots for P (obj) according to the attack probability along the possible paths of the attacker, defining the cost for deploying the honeypots, setting a reward value Cap, calculating a reward function, and optimizing the honeypot deployment;
(3) And (3) optimizing the honeypots deployed in the step (2) by using an SARSA reinforcement learning algorithm in combination with the Bayesian attack graph generated in the step (1) to obtain an optimal honeypot deployment path.
Further, the step (1) includes the sub-steps of:
(1.1) scanning and detecting a host, a port and a vulnerability of a target network to obtain scanning information of the target network, and classifying the scanning information, the storage information, the port information and the like;
(1.2) defining the data set to contain N host A set X of individual hosts,
Figure BDA0003262957660000031
each host represents x i ∈R Q×H (i=1,2,....,N host ) I.e. x i The method comprises the following steps of (1) forming a matrix containing Q multiplied by H elements, wherein Q represents host vulnerability, and H represents connectivity relation between hosts;
(1.3) generating an attribute attack graph: performing connectivity analysis according to the network topology relationship and the host vulnerability relationship by using the scanning information of the target network acquired in the step (1.1), and performing directional connection to form an edge of an attribute attack graph; dividing the scanning information of different target networks into different nodes according to the data set of the target scanning information in the step (1.1); connecting edges and nodes to generate an attribute attack graph;
(1.4) calculating the usability E of the vulnerability based on a general vulnerability scoring system, wherein the formula is as follows;
E=20VCU(0≤E≤10)
where V is the access vector, C is the access complexity, and U is the access validation;
(1.5) according to the usability E of the vulnerability obtained in the step (1.4), the greater the difficulty of calculating the atomic attack, the greater the attack difficulty, and the calculation formula is as follows:
Figure BDA0003262957660000032
wherein D represents the difficulty of the corresponding atomic attack;
the difficulty of atomic attack represents the conditional probability P between attribute nodes, and the formula is as follows:
Figure BDA0003262957660000033
if the target attribute to be obtained by the attacker is obj, all direct and indirect father nodes of obj on the attack path are Pre (obj), and the direct father node is DPre (obj), then the probability of the target node being attacked is P (obj):
P(obj)=P(obj|Pre(obj)P(Pre(obj));
(1.6) obtaining a possible path of an attacker according to the probability P (obj) that the target node is attacked in the step, and converting the attribute attack graph generated in the step (1.3) to generate a Bayesian attack graph; and completing the elimination of the loop and analyzing the success probability of the permeation path.
Further, the step (2) includes the sub-steps of:
(2.1) the defender enumerating each possible attack path, distributing a set of k honeypots along the attacker's possible paths, which can deviate the attacker from the true target node;
(2.2) weighting the defender's reward by the probability P (obj) that the protected or attacked node is breached; the defender placing a new honeypot at the edge of the network incurs a fixed cost, assuming that the cost of placing this honeypot is P;
(2.3) for simple target networks specifically: if the honeypot does not capture the attacker, setting the reward value Cap to be 0; if the honeypots capture attackers, custom setting a reward value Cap, and defining a reward function R formula for deploying a single honeypot as follows:
R=-(P-Cap*(1-E p ))
wherein E is p = P (obj) is the probability of the protected or attacked node being attacked;
for complex target networks, in particular: if the honeypot does not capture the attacker, setting the reward value Cap to be 0; if the honeypot captures the attacker, setting a reward value Cap in a self-defined mode, utilizing the probability of being attacked calculated in the step (1.5) to be P (obj), and deploying the honeypot according to the probability of being attacked to be P (obj); reward function
Figure BDA0003262957660000041
The formula of (1) is as follows:
R=-(P-Cap(1-max(P(obj))));
the simple target network is within 2 layers of network layers, and the number of the hosts is less than 4; and if so, the network is regarded as a complex network.
Further, the step (3) includes the sub-steps of:
(3.1) establishing a Q-table storage state S and all actions a and Q (S, A) to be taken, and taking all the actions a and Q (S, A) as a training data set of the network model;
(3.2) creating an Agent, wherein the Agent comprises a learning algorithm and an action space; the Agent intelligent Agent utilizes SARSA reinforcement learning algorithm to train; in the training, a first network path state is initialized randomly, then an action a is selected from a Q-table by using an element-greed based on a network current state S for each step in a training round, the action a is executed to obtain a new network path state S 'and a current reward r, an action a' in the S 'is obtained by using the element-greed, and the value of Q (S, A) in a table is updated by using the action a' to continue the training;
(3.3) continuously repeating the method in the step (3.2) until the Q-table is not updated any more, and generating an optimal strategy pi, wherein the formula is as follows:
Q(S,A)=Q(S,A)+α(R+γQ(S`,A`)-Q(S,A))
wherein, α represents the learning rate and γ is the reward discount;
(3.4) according to the optimal strategy pi obtained in the step (3.3), obtaining the action a to be executed in the state s; the action a comprises deploying honeypots or not executing two actions, and the state s is only related to the attack success rate of the Bayesian attack graph of the attack path; and finally deploying the honeypots according to the optimal strategy pi.
The beneficial effects of the invention are as follows: 1) The Bayesian attack graph is used for basically knowing the security of the own network structure and deploying the honeypot system on the basis of the basic knowledge. 2) The honeypot system is deployed by using a reward value mechanism, the deployment can increase the deployment cost, but successfully captures that hackers can obtain rewards according to the infiltration success rate of the nodes. 3) An importance mechanism is introduced to a network structure with a large network space, and a honeypot system is selectively deployed according to the importance of nodes, so that honeypot deployment cost and maintenance cost are reduced. 4) And the honey pot system is deployed by combining reinforcement learning with a Bayesian attack graph, so that the network security is improved.
Drawings
FIG. 1 is a diagram of the network environment architecture of the experiment of the present invention;
FIG. 2 is a flow chart of a method of the present invention;
FIG. 3 is a schematic diagram of a honeypot application;
FIG. 4 is a schematic diagram illustrating the generation principle of an attack graph;
FIG. 5 is a Bayesian attack graph of the present experimental environment;
fig. 6 is a schematic diagram of reinforcement learning.
The specific implementation scheme is as follows:
the following detailed description of embodiments of the invention is provided in connection with the accompanying drawings. Referring to fig. 1 to 6, a reinforcement learning honeypot deployment method based on an attack graph is described.
The technical conception of the invention is as follows: firstly, an attacker attacks our network, and the attacker can use certain tools to understand our network structure and generate an attack graph to attack the network which we defend. Then, the network defender can also use the Nmap to scan the network structure of the defender, and an attribute attack graph is generated and converted into a Bayesian attack graph containing penetration success rate information. And recording and storing the permeation success probability of each network node after the Bayesian attack graph exists. And then establishing a reward function, wherein the reward function is based on the probability of successful infiltration of the nodes, and an attacker captured by the honeypot can obtain rewards related to the probability of successful infiltration, and the reward value is higher as the success rate is higher. Honeypot deployment has a basic negative reward value, which means that unlimited deployment of honeypots does not yield the maximum benefit. Therefore, the problem that the maintenance cost is high due to the fact that honeypots are deployed on each path is effectively avoided. And then, taking the node and path information in the attack graph as the state of reinforcement learning to input, and taking whether to deploy honeypots as action input. Then learning is performed using the SARSA learning mechanism. Finally, reinforcement learning can provide a scheme for deploying honeypots with the largest income according to the environment, attack graphs of attackers can also fail due to changes of network spaces after honeypot deployment, workload of the attackers is increased, even the attackers can not attack successfully and leave more traceable information, and the purpose of protecting the network spaces is achieved.
The invention discloses a reinforcement learning honeypot deployment method based on an attack graph, and FIG. 2 is a flow chart of the method, which specifically comprises the following substeps:
(1) Scanning and detecting the target network to obtain scanning information of the target network, and storing and classifying the scanning information; performing connectivity analysis according to the network topology structure relationship and the host vulnerability relationship to generate an attribute attack graph; calculating the availability E of the vulnerability by using a general vulnerability scoring system, calculating the conditional probability P between the attribute nodes and the probability P (obj) of the target node under attack, and generating a Bayesian attack graph from the attribute attack graph:
(1.1) experimental network structure as shown in fig. 1, a device a is a host providing web services, a node (B) is a firewall, the node (B) is connected to a device C providing FTP, SSH, and RSH services and a device B providing FTP and RSH services, and a node (a) is an access entry, and first scans experimental network information and detects a host, a port, and a vulnerability of a target network by using an open source scanning tool such as NMAP. And stores and categorizes the scan information.
(1.2) defining the data set to contain N host A set X of one sample is taken,
Figure BDA0003262957660000061
each sample representing x i ∈R V×H (i=1,2,....,N host ) I.e. x i Is a matrix containing V x H elements, wherein V represents the host vulnerability and H represents the connectivity relationship between hosts.
And (1.3) generating an attribute attack graph, dividing different information into different nodes by using the scanning information of the target network acquired in the step (1.1), wherein the host is taken as a unit, and the vulnerability, the precondition, the postcondition and the joint node owned by each host are taken as host vulnerability relations. After creating the different nodes, the different nodes need to be connected by edges. And determining edges of the defense graph, and performing connectivity analysis according to information such as network topology structure relationship, host vulnerability relationship and the like, wherein the host is still taken as a unit, and vulnerability precondition nodes, vulnerability nodes and vulnerability postcondition nodes of each host are sequentially connected. However, different hosts have different topological relations, so that the post-condition node and the pre-condition node of the host need to be connected according to different topological relations, edges of the attribute attack graph are generated after connection, and the node and the edge are obtained to obtain the attribute defense graph.
(1.4) in the Common Vulnerability Scoring System (CVSS), the availability E index of vulnerabilities is defined as:
E=20VCU(0≤E≤10)
where V is an Access Vector (AV), C is an Access Complexity (AC), and U is an access Authentication (AU); the 3 parameters described above collectively depict the availability of vulnerabilities. The smaller the value of the availability E of a vulnerability, the more difficult it is to represent an atomic attack. After the vulnerability data is queried, the prior probabilities of 5 nodes, namely ftp (0,1), user (0), ftp (0,2), ftp (1,2) and sshd (0,1), are 0.6, 0.3, 0.4, 0.7 and 0.5.
(1.5) the usability E of the vulnerability is in inverse proportion to the attack difficulty, so that the difficulty of the atomic attack is calculated according to the usability E of the vulnerability obtained in the step (1.4), and the larger the value of the difficulty of the atomic attack is, the calculation formula is as follows:
Figure BDA0003262957660000062
wherein D represents the difficulty of the corresponding atomic attack;
edges among attribute nodes in the Bayesian network represent the process of utilizing the attack, the vulnerability is utilized with lower probability when the attack difficulty is higher, and the vulnerability and the attack difficulty are in inverse proportion. Thus, the difficulty of an atomic attack represents the conditional probability P between attribute nodes, as follows:
Figure BDA0003262957660000063
if the target attribute to be obtained by the attacker is obj, all direct and indirect father nodes of obj on the attack path are Pre (obj), and the direct father node is DPre (obj), then the probability of the target node being attacked is P (obj):
P(obj)=P(obj|Pre(obj)P(Pre(obj));
(1.6) according to the probability that the target node is attacked in the step P (obj), converting the attribute attack graph generated in the step (1.3), generating a Bayesian attack graph by using Mulval, generating a schematic diagram of the Bayesian attack graph as shown in the step 3, when a child node exists in a stack, storing a loop in the stack, calculating the difficulty of atomic attack in the loop, finding out the atomic attack node with the maximum difficulty, deleting the edge of the atomic attack node, finishing the elimination of the loop, and analyzing the probability of success of the penetration path. As can be seen from fig. 5, the attackers generate three attack paths:
①Path_1:[ftp(0,1)and user(0)]→[trust(1,0)]→[user(1)and ftp(1,2)]→[trust(2,1)]→[user(2)]→[root(2)];
②Path_2:[user(0)and sshd(0,1)]→[temp]→[user(1)and ftp(1,2)]→[trust(2,1)]→[user(2)]→[root(2)];
③Path_3:[user(0),ftp(0,2)]→[trust(2,0)]→[user(2)]→[root(2)].
according to fig. 5, the target network has three attack paths. The penetration success probability of the three attack paths can be obtained by calculating the probability that the target node is attacked to be P (obj). Comparing penetration success probabilities, it can be found that, given evidence of P (ftp _ rhost (0,1)) =1, P (sshd (0,1)) =1, P (ftp _ rhost (0,2)) =1, the probabilities of success of attack paths Path _1, path _2, path _3 corresponding to the evidence respectively increase. Since user (0) occurs in 3 attack paths, the probability of success of Path _1, path _2, and Path _3 increases given evidence of P (user (0)) = 1.
The success probabilities of penetration of the three paths are respectively 0.0215,0.0328,0.0258, which is obtained through calculation, in an actual situation, the more attributes an attacker obtains, that is, the more evidences, the higher the probability of success of the attack is. From the above analysis, it can be seen that, when evidence is added, the probability of attack success is increased, and the experimental data is consistent with the actual situation.
(2) Distributing a group of k honeypots for P (obj) according to the probability of being attacked along the possible path of the attacker, defining the cost for deploying the honeypots, setting a reward value Cap, calculating a reward function, and optimizing the honeypot deployment:
the defender does not know what the specific intention of the attacker is and is not much aware of what the attack graph of the attacker is, but can deduce what the node to be attacked next by the attacker (the next target host is which the current node is connected) according to the Bayesian attack graph, so that the defender can place honeypots on the path to increase the attack cost of the attacker.
In a simple target network, an defender attempts to defend against the next potentially intruded node from the entry point node. In a complex target network, a defender will defend against a node located "jumping off the entry node".
Since an attacker uses a path inside the network, the allocated honeypots also need to cover a path in the network. Otherwise, randomly allocating honeypots will not guarantee the security of the set of nodes we consider in this honeypot deployment model.
(2.1) the defender enumerates each possible attack path, assigning a set of k honeypots along the attacker's possible paths to fool the attacker into reaching his target and misleading his actions. Such honeypots can cause attackers to deviate from the true target node. For a complex target network environment with N paths, the honeypots can be set according to the attack success probability P (obj) provided by the attack graph;
(2.2) weighting the defender's reward by the probability P (obj) that the protected or attacked node is breached; defending people inPlacing a new honeypot at the edge of the network incurs a fixed cost, assuming that the cost of placing this honeypot is P; total deployment cost R of honeypots t The calculation formula is as follows:
Figure BDA0003262957660000081
wherein, l is the total node size of the network system, a is the action matrix, and h is the number of deployed honeypots.
(2.3) for simple target networks specifically: if the honeypot does not capture the attacker, setting the reward value Cap to be 0; if the honeypots capture attackers, the reward value Cap is set in a self-defined mode, and then a reward function formula for deploying a single honeypot is defined as follows:
R=-(P-Cap*(1-E p ))
wherein E is p P (obj) is the probability of a protected or attacked node being attacked, the reward function R accounts for the cost of the original deployment of honeypots as P, the reward value of-P will be obtained as long as the defender deploys honeypots at a certain node, but if this action is reported back (honeypots successfully capture attackers), cap (1-E) will be obtained p ) The weighted prize value of. The action cost consumption of the cost P of honeypot deployment is reduced to a certain extent.
Three possible attack paths are obtained in the experimental process, so honeypots are deployed in the three possible attack paths;
eventually the attacker falls into the honeypot on path 2. The total deployment cost obtained by the calculation method is as follows:
R t =-3P+0.9672Cap
the simple target network is within 2 layers of network layers, and the number of the hosts is less than 4; and if so, the network is regarded as a complex network.
For complex target networks, in particular: and (4) deploying the honeypots according to the probability of being attacked P (obj) by utilizing the probability of being attacked P (obj) calculated in the step (1.5). Then the reward function
Figure BDA0003262957660000083
The formula (c) is as follows:
Figure BDA0003262957660000082
then, through the new calculation method, the total deployment cost becomes:
R t =-P+0.9672Cap
therefore, the two honeypot deployment costs P are directly reduced, the complexity of the protected network space is reduced, and honeypot deployment is optimized.
(3) Optimizing the honeypots deployed in the step (2) by using an SARSA reinforcement learning algorithm to obtain an optimal honeypot deployment path, and comprising the following substeps:
(3.1) establish a Q-table to save the state s and all actions a, Q (s, a) that will be taken. The action is stored in an action space Q-table and used as a training data set of the network model;
and (3.2) creating an Agent, wherein the Agent comprises a learning algorithm and an action space, and the learning algorithm is an algorithm for how the Agent selects a strategy. In this experiment, the Agent's learning algorithm uses the SARSA reinforcement learning algorithm, and the schematic diagram of reinforcement learning is shown in FIG. 6. Randomly initializing a first network path state, firstly using an element-greed from a Q-table for each step in a round based on a current state s of a network (when the Q table does not have the state during the first operation, an action space of s-a can be created by the Q table, and the action space is initially all 0, selecting an action a, executing the action a, then obtaining a new network path state s 'and a current reward r, and simultaneously using the element-greed to obtain a' when the next state s 'is obtained, directly using the value of Q (s, a) in the updated table by the aid of the a',
and (3.3) continuously repeating the step (3.2) until the Q-table is not updated any more until the optimal strategy pi is generated. The formula is as follows
Q(S,A)=Q(S,A)+α(R+γQ(S`,A`)-Q(S,A))
Where α represents the learning rate, which is a discount accumulation reward mechanism, and γ is a reward discount. The Agent records the reward values of all strategies, and after the Agent traverses all possible attack conditions in the experimental network, a strategy which can enable the reward value r to be maximum, namely an optimal strategy, is generated.
(3.4) what the Agent needs to do is to learn a "policy" pi by trying it out in the network environment, according to which the action to be performed is known in the state x
a=π(x)
The goodness of a policy depends on the cumulative rewards accrued after long-term execution of the policy. In the invention, the action a comprises the step of deploying honeypots or the step of not executing two actions, and the state s only has a relation with the attack success rate of the Bayes attack graph of the attack path. It is known in the previous section that, since path 2 has the highest penetration success rate, agent finally chooses to arrange honeypot system on path 2 to increase the workload of the attacker. The structure of the network space is changed after the honeypots are arranged, the original attack graph of an attacker can also be invalid, if the attacker still uses the old attack path, the attack of the attacker can be invalid due to the addition of the honeypots, and if the attacker successfully falls into the honeypots, personal information can be left, so that a defender can conveniently take evidence to carry out subsequent sanctions on the attacker.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (3)

1. A network honeypot deployment method facing penetration attack is characterized by comprising the following steps:
(1) Scanning and detecting the target network to obtain scanning information of the target network, and storing and classifying the scanning information; performing connectivity analysis according to the network topology structure relationship and the host vulnerability relationship to generate an attribute attack graph; calculating the availability E of the vulnerability by using a general vulnerability scoring system, and calculating the conditional probability P between the attribute nodes s And the probability of the target node being attacked is P: (obj) generating a Bayesian attack graph from the attribute attack graph;
(2) Enumerating each possible attack path, distributing a group of k honeypots for P (obj) according to the possible paths of the attacker, defining the cost for deploying the honeypots, setting a reward value Cap, calculating a reward function, and optimizing honeypot deployment;
(3) Optimizing the honeypots deployed in the step (2) by using an SARSA reinforcement learning algorithm in combination with the Bayesian attack graph generated in the step (1) to obtain an optimal honeypot deployment path;
the step (3) includes the substeps of:
(3.1) establishing a Q-table storage state S and all actions a to be taken, which are marked as Q (S, A); taking all the actions a and Q (S, A) as a training data set of a network model;
(3.2) creating an Agent, wherein the Agent comprises a learning algorithm and an action space; the Agent utilizes SARSA reinforcement learning algorithm to train; in the training, a first network path state is initialized randomly, then an action a is selected from a Q-table by using an element-greed based on a network current state S for each step in a training round, the action a is executed to obtain a new network path state S 'and a current reward r, an action a' in the S 'is obtained by using the element-greed, and the value of Q (S, A) in a table is updated by using the action a' to continue the training;
(3.3) continuously repeating the method in the step (3.2) until the Q-table is not updated any more, and generating an optimal strategy pi, wherein the formula is as follows:
Q(S,A)=Q(S,A)+α(R+γQ(S`,A`)-Q(S,A))
where α represents the learning rate and γ is the reward discount;
(3.4) according to the optimal strategy pi obtained in the step (3.3), obtaining the action a to be executed in the state s; the action a comprises the step of deploying honeypots or the step of not executing two actions, and the state s is only related to the attack success rate of the Bayesian attack graph of the attack path; and finally deploying the honeypots according to the optimal strategy pi.
2. The cyber-honeypot deployment method facing infiltration attacks according to claim 1, wherein the step (1) comprises the following sub-steps:
(1.1) scanning and detecting a host, a port and a vulnerability of a target network to obtain scanning information of the target network, and storing and classifying the port information;
(1.2) defining the data set to contain N host A set X of individual hosts,
Figure FDA0004043738080000021
each host represents x i ∈R Q×H ,i=1,2,....,N host I.e. x i The method comprises the following steps of (1) forming a matrix containing Q multiplied by H elements, wherein Q represents host vulnerability, and H represents connectivity relation between hosts;
(1.3) generating an attribute attack graph: performing connectivity analysis according to the network topology relationship and the host vulnerability relationship by using the scanning information of the target network acquired in the step (1.1), and performing directional connection to form an edge of an attribute attack graph; dividing the scanning information of different target networks into different nodes according to the data set of the target scanning information in the step (1.1); connecting edges and nodes to generate an attribute attack graph;
(1.4) calculating the usability E of the vulnerability based on a general vulnerability scoring system, wherein the formula is as follows;
E=20VCU,0≤E≤10
where V is the access vector, C is the access complexity, and U is the access validation;
(1.5) according to the usability E of the vulnerability obtained in the step (1.4), the greater the difficulty of calculating the atomic attack, the greater the attack difficulty, and the calculation formula is as follows:
Figure FDA0004043738080000022
wherein D represents the difficulty of the corresponding atomic attack;
the difficulty of an atomic attack represents the conditional probability P between attribute nodes s The formula is as follows:
Figure FDA0004043738080000023
if the target attribute to be obtained by the attacker is obj, all direct and indirect father nodes of obj on the attack path are Pre (obj), and the direct father node is DPre (obj), then the probability of the target node being attacked is P (obj):
P(obj)=P(obj|Pre(obj)P(Pre(obj));
(1.6) obtaining a possible path of an attacker according to the probability P (obj) that the target node is attacked in the step, and converting the attribute attack graph generated in the step (1.3) to generate a Bayesian attack graph; and completing loop elimination and analyzing the success probability of the permeation path.
3. The cyber honeypot deployment method facing a penetration attack according to claim 1, wherein the step (2) comprises the substeps of:
(2.1) enumerating each possible attack path by the defender, and distributing a group of k honeypots along the possible paths of the attacker to make the attacker deviate from a real target node;
(2.2) weighting the defender's reward by the probability P (obj) that the protected or attacked node is breached; the defender placing a new honeypot at the edge of the network incurs a fixed cost, assuming that the cost of placing this honeypot is P;
(2.3) for simple target networks specifically: if the new honeypot does not capture the attacker, setting the reward value Cap to 0; if the new honeypot captures the attacker, custom setting a reward value Cap, and defining a reward function R formula for deploying a single honeypot as follows:
R=-(P-Cap*(1-E p ))
wherein E is p = P (obj) is the probability of a protected or attacked node being attacked;
for complex target networks, in particular: if the new honeypot does not capture the attacker, setting the reward value Cap to 0; if the new honeypot captures the attacker, the reward value Cap is set in a self-defining mode and utilizedThe probability of being attacked calculated in the step (1.5) is P (obj), and honeypots are deployed according to the probability of being attacked which is P (obj); reward function
Figure FDA0004043738080000031
The formula of (1) is as follows:
Figure FDA0004043738080000032
the simple target network is within 2 layers of network layers, and the number of the hosts is less than 4; and if so, the network is regarded as a complex network.
CN202111078546.5A 2021-09-15 2021-09-15 Network honeypot deployment method facing penetration attack Active CN113783881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111078546.5A CN113783881B (en) 2021-09-15 2021-09-15 Network honeypot deployment method facing penetration attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111078546.5A CN113783881B (en) 2021-09-15 2021-09-15 Network honeypot deployment method facing penetration attack

Publications (2)

Publication Number Publication Date
CN113783881A CN113783881A (en) 2021-12-10
CN113783881B true CN113783881B (en) 2023-04-07

Family

ID=78843886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111078546.5A Active CN113783881B (en) 2021-09-15 2021-09-15 Network honeypot deployment method facing penetration attack

Country Status (1)

Country Link
CN (1) CN113783881B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114205161B (en) * 2021-12-13 2024-03-29 北京影安电子科技有限公司 Network attacker discovery and tracking method
CN114268559B (en) * 2021-12-27 2024-02-20 天翼物联科技有限公司 Directional network detection method, device, equipment and medium based on TF-IDF algorithm
CN114338203B (en) * 2021-12-31 2023-10-03 河南信大网御科技有限公司 Intranet detection system and method based on mimicry honeypot
CN114465784A (en) * 2022-01-21 2022-05-10 内蒙古工业大学 Honeypot identification method and device of industrial control system
CN114363093B (en) * 2022-03-17 2022-10-11 浙江君同智能科技有限责任公司 Honeypot deployment active defense method based on deep reinforcement learning
CN114978731B (en) * 2022-05-30 2023-06-30 北京计算机技术及应用研究所 System and method for realizing honeypot trapping based on diversity expansion
CN115001855A (en) * 2022-07-18 2022-09-02 南京理工大学 Deep reinforcement learning intelligent agent selection attack method based on track approximation
CN117081855B (en) * 2023-10-13 2024-02-02 深圳市前海新型互联网交换中心有限公司 Honeypot optimization method, honeypot protection method and honeypot optimization system
CN117808174B (en) * 2024-03-01 2024-05-28 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN118101332A (en) * 2024-04-22 2024-05-28 广州大学 Self-adaptive honey point deployment method based on attack graph

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926055A (en) * 2021-03-09 2021-06-08 中国人民解放军空军工程大学 Virus attack defense method based on time probability attack graph

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750691A (en) * 2019-10-10 2020-02-04 腾讯云计算(北京)有限责任公司 Method and device for computer security management
CN110768987A (en) * 2019-10-28 2020-02-07 电子科技大学 SDN-based dynamic deployment method and system for virtual honey network
CN111371758B (en) * 2020-02-25 2022-03-25 东南大学 Network spoofing efficiency evaluation method based on dynamic Bayesian attack graph
CN112653582B (en) * 2020-12-21 2022-03-01 上海交通大学 Semi-passive industrial control network security analysis tool and method based on Bayesian attack graph

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926055A (en) * 2021-03-09 2021-06-08 中国人民解放军空军工程大学 Virus attack defense method based on time probability attack graph

Also Published As

Publication number Publication date
CN113783881A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN113783881B (en) Network honeypot deployment method facing penetration attack
US11277432B2 (en) Generating attack graphs in agile security platforms
Sohal et al. A cybersecurity framework to identify malicious edge device in fog computing and cloud-of-things environments
US11483318B2 (en) Providing network security through autonomous simulated environments
Martins et al. Host-based IDS: A review and open issues of an anomaly detection system in IoT
Ferguson-Walter et al. Game theory for adaptive defensive cyber deception
Zhuang et al. Investigating the application of moving target defenses to network security
US11677776B2 (en) Dynamic attack path selection during penetration testing
Fugate et al. Artificial intelligence and game theory models for defending critical networks with cyber deception
Suratkar et al. An adaptive honeypot using Q-Learning with severity analyzer
Islam et al. Chimera: Autonomous planning and orchestration for malware deception
Fielder et al. Defense-in-depth vs. critical component defense for industrial control systems
Zhu et al. Game-theoretic and machine learning-based approaches for defensive deception: A survey
Li et al. Defensive deception framework against reconnaissance attacks in the cloud with deep reinforcement learning
CN113904804B (en) Intranet safety protection method, system and medium based on behavior strategy
Meier et al. Towards an AI-powered Player in Cyber Defence Exercises
Landsborough et al. Towards Self-Adaptive Cyber Deception for Defense.
Zhao et al. A decoy chain deployment method based on SDN and NFV against penetration attack
EP3252645B1 (en) System and method of detecting malicious computer systems
Shi et al. Draining the water hole: Mitigating social engineering attacks with cybertweak
Aly et al. Navigating the Deception Stack: In-Depth Analysis and Application of Comprehensive Cyber Defense Solutions
Major et al. Creating cyber deception games
Kiekintveld et al. Strategic Cyber Camouflage
Zhu Foundations of cyber resilience: The confluence of game, control, and learning theories
Thukkaraju et al. Interdependent Mission Impact Assessment of an IoT System with Hypergame-heoretic Attack-Defense Behavior Modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant