CN116545687A

CN116545687A - Automatic network simulation attack framework based on attack tree and deep reinforcement learning

Info

Publication number: CN116545687A
Application number: CN202310487592.3A
Authority: CN
Inventors: 王昊天
Original assignee: Shanghai Dragon Technology Co ltd
Current assignee: Shanghai Dragon Technology Co ltd
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2023-08-04

Abstract

The invention discloses an automatic network simulation attack framework based on an attack tree and deep reinforcement learning, which comprises a data collection module, a deep learning algorithm module and a simulation attack module; and a data collection module: the method is responsible for collecting network data and constructing training data required by a reinforcement learning algorithm; the deep learning algorithm module: the method comprises the steps of training a deep learning model, and performing penetration test by using the deep learning training model; and (5) simulating an attack module: the method is used for automatically simulating the attack test on the optimal attack path analyzed by the deep learning model. The invention constructs a real network topology by collecting information of a target of network simulation attack, converts the topology into an attack tree model, carries out deep reinforcement learning aiming at the attack tree model, searches the most suitable network simulation attack path, carries out automatic network simulation penetration attack test on the path, finds the best attack path for the given topology, and identifies the security weakness of the simulated attack network.

Description

Automatic network simulation attack framework based on attack tree and deep reinforcement learning

Technical Field

The invention belongs to the technical field of network information security, and particularly relates to an automatic network simulation attack framework based on attack trees and deep reinforcement learning.

Background

Simulation attacks are a method commonly used in computer or network security research and experiments, and the aim is to push the design of a security protection system with higher reliability and usability as much as possible. The simulation attack adopts a non-real attack means to simulate the attack on the computer system or the network system, and aims to evaluate the pertinence and generalization capability of the countermeasure system so as to evaluate the safety and adaptability of the system, thereby developing more perfect safety measures; simulation attack is an effective security test technology, and aims to detect the weak point of loading of an application program in a computer system, know the actual reliability of the system and provide advice for the security of the system. The simulation attack can be performed on software and hardware, and the simulation attack on the software is more common, and the simulation attack on the corresponding hardware is less.

In the prior art, a security model of a client network is provided for the client according to the requirement of the client, the network security model of the client can use a graph-based tool to judge the accessibility information of a possible attacker to the network and the vulnerability information of the network, generate a possible attack step of the attacker, analyze an attack path set with high attack success rate to evaluate the risk of a network system, and reduce the cost for simulating network attack; for example, CN201810593920.7 is based on the network attack target recognition method of the attack graph and the technical scheme disclosed by the system; the above technical solution for providing a network system security model for clients to judge possible attack paths of an attacker focuses on generating a group of attack paths from the attacker to an attack target, and although a possible attack path set can be given, it is difficult to judge that the attacker may use a specific attack path each time; in addition, such services do not give specific attack plans, and do not simulate the real interactions between the attack and the defender.

In the prior art, based on the provided network attack framework, based on the existing vulnerability or threat information in the database, a proper attack path is judged, the network attack is simulated on the client network, and the client network system is helped to find out the vulnerability or the weak point; for example, CN202110056313.9 is a network attack behavior recognition method based on reinforcement learning Dyna framework; the selection of the attack strategy by the technical scheme focuses on the fact that the corresponding attack is generated by means of the existing vulnerability repository and the information fixation of the threat workpiece, the attack strategy is required to be manually selected by a user, the automatic network simulation attack cannot be performed, the attack correction is performed according to the feedback of the network attack, and a sufficient attack planning method is lacked.

Disclosure of Invention

The invention provides an automatic network simulation attack framework based on attack tree and deep reinforcement learning, which comprises the steps of collecting information of a target of network simulation attack, constructing a real network topology, converting the topology into an attack tree model, carrying out deep reinforcement learning aiming at the attack tree model, searching the most suitable network simulation attack path, carrying out automatic network simulation penetration attack test on the path, finding the best attack path for a given topology, and identifying the security weakness of a simulated attack network; the mutual dependence between the attack behavior and the attack steps is represented by converting the real network topology into an attack tree, so that formalization and documentation of the attack process can be realized; and the optimal attack path is found for the given network topology by a deep reinforcement learning mode, and network simulation attack penetration test is automatically carried out according to the optimal attack path obtained by the deep learning method, which is a function which is not possessed by other network simulation attack schemes at present, and solves the problems in the background technology.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention discloses an automatic network simulation attack framework based on attack tree and deep reinforcement learning, which comprises a data collection module, a deep learning algorithm module and a simulation attack module;

and a data collection module: the method is responsible for collecting network data and constructing training data required by a reinforcement learning algorithm; collecting network data includes collecting a host data set and collecting a vulnerability data set;

the deep learning algorithm module: the method comprises the steps of training a deep learning model, and then performing penetration test by using the deep learning training model; converting a network topological structure obtained by the data collection module into an attack tree to generate a training data set required by a deep learning algorithm; the attack tree is simplified using a depth-first algorithm (DFS); according to the network topology structure in the host data set and the vulnerability CVSS score in the vulnerability data set, which are collected by the data collection module, enabling nodes in the attack tree to contain scores reaching the nodes, wherein the scores are further used as rewards scores in a deep learning algorithm; using continuous training to determine the easiest attack path; in the learning process, the deep learning model proxies an attacker, the target environment is modeled by an attack tree containing scores, and the attacker moves from one node to another node until reaching a target server;

and (5) simulating an attack module: the method is used for carrying out automatic simulation attack testing on the optimal attack path analyzed by the deep learning model; the simulation attack module automatically performs simulation attack test along the optimal attack path on the attack tree obtained by the deep learning algorithm module, returns the test result and helps the user to know the weakness and vulnerability of the target network.

Further, the collecting host data sets uses a technology including network asset scanning and service fingerprint identification, data packets with different rules are sent to each address of a target network, component information such as services, protocols, operating systems and the like used by the target network is judged according to different returns, data including actual addresses, used ports, protocols and known properties are stored in a database, a real host data set is obtained, and the obtained real host data set is used as a basis of a deep learning algorithm module.

Further, the data packets with different rules are transmitted to simulate the normal access of users to the data packets which want to establish connection; at this time, the server determines that the normal user wants to establish a connection, returns service and protocol information corresponding to the data packet, and can know service related information of the opposite server opened at the port by receiving the content returned by the opposite server.

Further, the collecting vulnerability data set is to collect currently known vulnerabilities, including CVE numbers and Microsoft vulnerability identification numbers of vulnerabilities, and CVSS scores and availability scores of vulnerabilities; the vulnerability data set comprises all vulnerability data in a National Vulnerability Database (NVD) and a Microsoft (MS) vulnerability database, and meanwhile, the vulnerability data newly appearing on the network is added into the vulnerability data set in real time.

Further, the CVSS score of the vulnerability is a standardized indicator for assessing vulnerability severity; the CVSS score of the vulnerability comprises a basic score and an availability score, wherein the basic score represents the severity of the vulnerability and has a value ranging from 0 to 10; the measurement standard of the basic score comprises factors such as attack complexity of the vulnerability, influence range of the vulnerability, detectability of the vulnerability and the like; availability represents the difficulty of an attacker to exploit the vulnerability, and the value range is from 0 to 10 minutes; the measure of availability includes the rights of the attacker, the contact pattern of the attacker, and the factors of the resources required by the attacker.

Further, the simplified attack tree is realized by searching all reachable nodes of the attack tree and only reserving the reachable nodes in the attack tree.

Further, the reward score is calculated based on a vulnerability score standard of a universal scoring system (CVSS), and basic scores and availability scores of vulnerabilities are comprehensively considered, wherein a calculation formula specifically comprises:

vulnerability correspondence rewards score = vulnerability base score + vulnerability availability score/10.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention provides an automatic network simulation attack framework based on attack tree and deep learning simulation by using a method combining the attack tree and the deep learning model; the framework can automatically scan the topological structure of the target network, the topological structure of the target network is combined with a vulnerability database maintained by us to generate an attack tree model which can be input into a deep learning framework, CVSS scores in vulnerability information on the attack tree are associated with deep learning reward signals, an optimal attack path is found out, and network simulation attack tests are carried out along the optimal attack path through an automatic simulation attack module to check the weaknesses and the vulnerabilities of the network system;

(2) The method combining the attack tree and the deep learning can successfully find out the most suitable attack path from a large number of complex mechanisms of the network topology; most of the previous selection of these attack paths is performed manually by a tester who needs to first analyze the target system and then exploit the discovered vulnerabilities in different ways to penetrate the system and destroy network resources in the proof of concept attack. This process is a tedious, time consuming and complex task, requiring a large amount of implicit knowledge that is not formalized easily and is subject to human error; an attack strategy may be suggested to assist in the attack training activities of network security.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a frame topology diagram of an automated network simulation attack frame based on attack trees and deep reinforcement learning of the present invention;

FIG. 2 is a schematic diagram of an attack tree;

FIG. 3 is an example of raw data collected in a particular embodiment;

fig. 4 is an example of some content in a vulnerability information dataset.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides an automatic network simulation attack framework based on attack tree and deep reinforcement learning, which constructs a real network topology by collecting information of a target of network simulation attack, converts the topology into an attack tree model, carries out deep reinforcement learning aiming at the attack tree model, searches for the most suitable network simulation attack path, carries out an automatic network simulation penetration attack test on the path, finds out the best attack path for a given topology, and identifies the security weakness of a simulated attack network.

The formalization and documentation of the attack process can be realized by converting the real network topology into an attack tree to represent the interdependencies between the attack behavior and the attack steps. And the optimal attack path is found for the given network topology by means of deep reinforcement learning, and network simulation attack penetration test is automatically carried out according to the optimal attack path obtained by the deep learning method, which is a function not possessed by most other network simulation attack software on the market.

As shown in FIG. 1, the automatic network simulation attack framework based on the attack tree and the deep reinforcement learning comprises a data collection module, a deep learning algorithm module and a simulation attack module;

the deep learning algorithm module: the method comprises the steps of training a deep learning model, and then performing penetration test by using the deep learning training model; converting a network topological structure obtained by the data collection module into an attack tree to generate a training data set required by a deep learning algorithm; the attack tree is simplified by using a depth-first algorithm; according to the network topology structure in the host data set and the vulnerability CVSS score in the vulnerability data set, which are collected by the data collection module, enabling nodes in the attack tree to contain scores reaching the nodes, wherein the scores are further used as rewards scores in a deep learning algorithm; using continuous training to determine the easiest attack path; in the learning process, the deep learning model proxies an attacker, the target environment is modeled by an attack tree containing scores, and the attacker moves from one node to another node until reaching a target server;

The host data set is collected by using a network asset scanning and service fingerprint identification technology, sending data packets with different rules to each address of a target network, judging the information of components such as services, protocols, operating systems and the like used by the target network according to different returns, storing data comprising actual addresses, used ports, protocols and known properties into a database, obtaining a real host data set, and taking the obtained real host data set as the basis of a deep learning algorithm module.

The corresponding fingerprint identification technology mainly sends data packets with different structures to a host and a port, the data packets contain special contents, for example, if the other side opens 80 ports of a certain server, data packets simulating remote connection are sent to the port, and the port can consider the data packets normally accessed by a user when the data packets are received and return the data packets. For example, the opposite side server opens 80 ports, and the sent data packet simulates that the user normally accesses the data packet which wants to establish connection, at this time, the server considers that the normal user wants to establish connection, returns the related information such as service and protocol, and receives the content returned by the opposite side server, so as to know the service related information of the opposite side server opened at the ports.

In addition to collecting host data, there is also a vulnerability data set that collects currently known vulnerabilities, including the CVE number and Microsoft vulnerability identification number of the vulnerability, as well as the base score and availability score of the CVSS of the vulnerability. It should be noted here that the CVSS (Common Vulnerability Scoring System) score of a vulnerability is a standardized indicator for assessing vulnerability severity. The vulnerability score comprises a basic score and an availability score, wherein the basic score represents the severity of the vulnerability, and the value range is from 0 to 10. The measurement standard of the basic score comprises factors such as attack complexity of the vulnerability, influence range of the vulnerability, detectability of the vulnerability and the like. Availability represents the difficulty of an attacker to exploit the vulnerability, and the value range is from 0 to 10 minutes. The measure of availability includes the rights of the attacker, the contact mode of the attacker, the resources required by the attacker, and other factors. The vulnerability data set comprises all vulnerability data in a National Vulnerability Database (NVD) and a Microsoft (MS) vulnerability database, and meanwhile, security engineers can also add the newly-appearing vulnerability data on the network into the vulnerability data set in real time.

CVSS scoring you can be understood as an official, widely accepted industry standard specification on networks, an open standard for assessing vulnerability severity in computer systems, these scores also being used primarily to help security professionals and organizations better understand vulnerability severity and determine what action needs to be taken to repair them.

The CVSS score is used for selecting a basic score and an availability score of the vulnerability, wherein the basic score is used for evaluating the inherent severity of the vulnerability and comprises the aspects of attack vector, attack complexity, authentication requirement, confidentiality, integrity, availability and the like of the vulnerability. The availability score is a sub-term in the base score that is used to evaluate whether an attacker is vulnerable to an attack using the vulnerability. It takes into account the steps and skills that an attacker needs to take, as well as whether the attacker needs to resort to specific tools or techniques in order to exploit the vulnerability successfully. To name a few specific examples:

CVE-2021-26855: exchange Server ProxyShell vulnerability, CVSS score 10.0 (base score 9.1, availability score 3.9).

CVE-2021-34527: windows Print Spooler vulnerability, CVSS score 7.8 (base score 7.8, availability score 3.9).

CVE-2021-3129: the Laravel framework deserializes vulnerabilities with a CVSS score of 9.8 (base score of 9.8 and availability score of 3.9).

The network topology structure obtained by the data collection module is converted into an attack tree, specifically:

because the network topology information is obtained in the last step, such as which hosts are in the network, which open ports are on the hosts, and which vulnerabilities are possible. For example, if only the a host is found to be able to connect to the B host, the confidential content on the B host is obtained, and a vulnerability V is found on the a host, which allows a hacker to remotely log in, the a and vulnerability V are added as child nodes to the underside of the host B node. And by analogy, all network topology information is facilitated, nodes are continuously added to the attack tree, and finally a complete attack tree is obtained.

An attack tree is a tool for describing and analyzing security threats of a computer system and a network, and is a tree-shaped structure, and an attack path and an attack method of an attacker to a target system or a network, and conditions and probabilities of successful attack are described from a root node. The root node in the attack tree represents the target of the attack, such as a certain application or system. The next level node of the root node represents different attack paths or modes that an attacker may take, such as an attack against a system vulnerability, a social engineering attack, a malware attack, etc. Each child node further expands the attack steps until reaching the lowest level nodes, which represent the condition and probability of attack success.

As shown in FIG. 2, the nodes of the attack tree are divided into "and" nodes and "or" nodes. The label "and" indicates that a parent node can be targeted after all child nodes have been targeted. The label "or" indicates that a parent node may also be targeted as long as one of the child nodes has been targeted; the node G0 is an attack target, for example, the host login B obtains the file reading authority of the host B, and the nodes G1 and G2. are attack conditions, for example, the host login a can utilize the V vulnerability.

The CVSS score of a vulnerability is a standardized indicator for assessing vulnerability severity; the CVSS score of the vulnerability comprises a basic score and an availability score, wherein the basic score represents the severity of the vulnerability and has a value ranging from 0 to 10; the measurement standard of the basic score comprises factors such as attack complexity of the vulnerability, influence range of the vulnerability, detectability of the vulnerability and the like; availability represents the difficulty of an attacker to exploit the vulnerability, and the value range is from 0 to 10 minutes; the measure of availability includes the rights of the attacker, the contact pattern of the attacker, and the factors of the resources required by the attacker.

The attack tree is optimized, a depth-first algorithm (DFS) is used for simplifying the attack tree, all reachable nodes of the attack tree are searched, only the reachable nodes in the attack tree are reserved, the attack tree is simplified, and algorithm efficiency is improved. Based on the network topology in the host data set and the vulnerability CVSS scores in the vulnerability data set collected by the data collection module, the nodes in the attack tree are made to contain scores to reach the nodes, which will be further used as reward scores in the deep learning algorithm. In the deep learning model, continuous training is used to determine the easiest attack path, and for the activation function, a "softmax" function is selected, and the output of the deep learning model is the best attack path. During the learning process, the deep learning model proxies the attacker, the target environment is modeled by the score-attached attack tree obtained above, and the attacker can move from one node to another until reaching the target server. Rewards corresponding to all vulnerabilities in the deep learning model are calculated by a vulnerability scoring standard based on a universal vulnerability scoring system (CVSS), and basic scores and availability scores of the vulnerabilities are comprehensively considered; the calculation formula is as follows:

vulnerability correspondence rewards score = vulnerability base score + vulnerability availability score/10;

the deep learning algorithm module mentioned in the above description specifically uses a deep reinforcement learning algorithm, and combines the idea of a deep learning network, in which an agent (or referred to as a proxy) exists. The agent takes the environmental state as input, predicts the Q value of each action through the deep neural network, and selects the action with the highest Q value to execute. With the interaction of the intelligent agent and the environment, the Q value function of the intelligent agent is gradually optimized, so that the intelligent agent can be better adapted to the environment, and the optimal strategy is learned. The Q value herein refers to an expected value of long-term return (jackpot) obtained by performing a certain action in a certain state in reinforcement learning. It can be used to evaluate how good a particular action is taken in a given state, and thus to select an optimal strategy. Specifically, in the deep reinforcement learning algorithm, a Q-value function is used to estimate the Q-value of each state-action pair. The function receives as input the current state and optional actions and outputs a corresponding Q value. When an agent interacts in an environment, it selects an action based on the current state and the learned Q-value function. Thereafter, the smart will observe the new state and rewards earned and use this information to update the Q function so that it estimates the Q more accurately. This process will continue until the Q function converges to the optimal solution.

Depth-first algorithms (DFS) are used to simplify the exploration of all reachable nodes on an attack tree, in particular by means of depth-first search algorithms after the last generation of the attack tree. Depth first search algorithm is a commonly used graph traversal algorithm that searches with depth as the priority order. The algorithm starts traversing from a certain vertex of the graph, searches forward along one path continuously until the path cannot continue expanding, then returns to the previous node, and continues searching for another path. This process continues until all nodes that communicate with the originating node have been traversed. For example, a node host computer C is found on the attack tree, the network cannot penetrate to the host computer C through the current and found network structure and relevant vulnerability information, namely, the node C cannot be reached, and the node is deleted from the attack tree after the node C cannot be reached through a depth-first search algorithm. The size of the attack tree can be greatly reduced through the operation, and the algorithm efficiency is improved.

The above determination of the easy attack path is a deep learning model, and the main idea is to evaluate the expected value (i.e. Q value) of the long-term return (cumulative reward) obtained by executing a certain action in a certain state in the algorithm by combining the difficulty of attack and the harvest after the successful attack result with the deep neural network until the Q value function converges to the optimal solution, so that the algorithm can guide the next action according to the Q value to obtain the considered optimal solution.

The activation function is a nonlinear function used on each node of the neural network, and it can be understood that a basic concept in machine learning is to have an activation function on each node of the neural network. Its main role is to map the output of the nodes to a non-linear space so that the model can learn more complex relationships.

A specific example of automated simulated attack testing using the present framework is listed below. The data collection module is first used to scan and collect data from the network information of the target network address to obtain information including the actual IP address, the ports and protocols used, the vulnerabilities that may exist, etc., for example, fig. 3 is an example of the raw data collected. In addition to the collected host data set, a vulnerability information data set is maintained that contains the CVSS number and Microsoft vulnerability identification number of the vulnerability, as well as the base score and availability score of the CVSS of the vulnerability. Fig. 4 is an example of some content in one vulnerability information dataset. According to the collected host data set and vulnerability information data set, an attack tree can be generated according to the topology structure of the target network and vulnerability information, and each node of the attack tree contains a score reaching the node.

And inputting the obtained attack tree model into a deep learning algorithm module, and correlating rewards corresponding to each vulnerability in the model with the vulnerability basic score and the availability score obtained before, so as to finally obtain the optimal attack path.

And the final simulation attack module performs automatic simulation attack on the attack path and returns an attack result. Helping users to learn about vulnerabilities and vulnerabilities of the target network.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. The automatic network simulation attack framework based on the attack tree and the deep reinforcement learning is characterized by comprising a data collection module, a deep learning algorithm module and a simulation attack module;

2. The automated network simulation attack framework based on attack tree and deep reinforcement learning according to claim 1, wherein the collection of host data sets is to use network asset scanning and service fingerprint identification technologies, send data packets with different rules to each address of a target network, judge component information such as service, protocol, operating system and the like used by the target network according to different returns, store data including actual address, used port and protocol, known property into a database, obtain a real host data set, and use the obtained real host data set as a basis of a deep learning algorithm module.

3. The automated network simulation attack framework based on attack tree and deep reinforcement learning according to claim 2, wherein the data packets with different rules are transmitted to simulate normal access of a user to the data packets for which connection establishment is desired; at this time, the server recognizes that the normal user wants to establish a connection, returns service and protocol information corresponding to the data packet, by receiving the content returned by the opposite side server, the service related information of the opposite side server opened at the port can be known.

4. The automated network simulation attack framework based on attack trees and deep reinforcement learning according to claim 1, wherein the collection vulnerability data set is a collection of currently known vulnerabilities, including CVE numbers and Microsoft vulnerability identification numbers of vulnerabilities, and CVSS scores and availability scores of vulnerabilities; the vulnerability data set comprises all vulnerability data in a national vulnerability database and a Microsoft vulnerability database, and meanwhile, the newly-appearing vulnerability data on the network is added into the vulnerability data set in real time.

5. The automated network simulation attack framework based on attack trees and deep reinforcement learning according to claim 4, wherein the CVSS score of the vulnerability is a standardized indicator for assessing vulnerability severity; the CVSS score of the vulnerability comprises a basic score and an availability score, wherein the basic score represents the severity of the vulnerability and has a value ranging from 0 to 10; the measurement standard of the basic score comprises factors such as attack complexity of the vulnerability, influence range of the vulnerability, detectability of the vulnerability and the like; availability represents the difficulty of an attacker to exploit the vulnerability, and the value range is from 0 to 10 minutes; the measure of availability includes the rights of the attacker, the contact pattern of the attacker, and the factors of the resources required by the attacker.

6. The automated network simulation attack framework based on attack tree and deep reinforcement learning according to claim 1, wherein the simplified attack tree is implemented by searching all reachable nodes of the attack tree, and only the reachable nodes in the attack tree are reserved.

7. The automated network simulation attack framework based on attack trees and deep reinforcement learning according to claim 1, wherein the reward score is calculated based on vulnerability scoring criteria of a general scoring system, and the basic score and the availability score of the vulnerability are comprehensively considered, and the calculation formula is specifically: