CN113660241B

CN113660241B - Automatic penetration test method based on deep reinforcement learning

Info

Publication number: CN113660241B
Application number: CN202110916929.9A
Authority: CN
Inventors: 郑超; 陆秋文; 李鑫; 孙彦斌; 崔翔
Original assignee: Jizhi Hainan Information Technology Co ltd
Current assignee: Jizhi Hainan Information Technology Co ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2023-05-23
Anticipated expiration: 2041-08-11
Also published as: CN113660241A

Abstract

The invention discloses an automatic penetration test method and a framework based on deep reinforcement learning. The method integrally comprises the steps of network environment data acquisition and preprocessing, deep reinforcement learning model construction and external packaging of a penetration tool. The method mainly comprises the steps of scanning a network environment through a ZoomEye, collecting topology information, constructing a state reachable matrix through collected data, performing model training and learning by using a DoubleDQN algorithm, interacting a real environment by using a penetration tool as an external tool, and feeding back the DoubleDQN model, so that attack path prediction with good performance is realized. Compared with a general automatic permeation scheme, the technical scheme can effectively improve the efficiency of permeation test, and is a feasible automatic permeation scheme.

Description

Automatic penetration test method based on deep reinforcement learning

Technical Field

The invention relates to the field of automatic infiltration in network security, in particular to a deep reinforcement learning method.

Background

Along with the rapid development of information technology and the continuous expansion of network application range, people are in the threat of potential safety hazards of the network while bringing great convenience to people. These threats include viruses, worms, trojans, etc., which launch attacks in a main way that are security vulnerabilities that occur through the system applications. To prevent these threats, it is necessary to discover security problems existing in computer systems early and determine the extent of damage of vulnerabilities, so that it is necessary to periodically perform penetration tests on computer network systems in enterprise networks and perform security repair of the systems according to the results of the penetration tests.

Penetration testing (penetration test) refers to an evaluation method for evaluating the security of a computer network system by simulating an attack method of a malicious hacker. The process includes active analysis of any vulnerability, technical deficiency, or vulnerability of the system from a location where an attacker may be present, and conditional active exploitation of the security vulnerability from this location to achieve certain control authority to obtain the target system asset.

The penetration test is generally divided into seven stages of early interaction, information collection, threat modeling, vulnerability analysis, penetration attack, post penetration attack and report generation, and the generated penetration test report can be used for rapidly and effectively analyzing and determining threats faced by all assets of the organization information, carrying out threat assessment on the threats faced by each asset, balancing expenditure by referring to risk quantification, and effectively reducing the safety cost investment of the organization.

Penetration testing is performed mainly by professional security personnel using different security testing tools to test the target system, however in actual penetration testing, many security personnel are required to make a common effort to obtain the rights of the target system. In the infiltration process, especially for threat modeling and infiltration attack phases, much effort is required from security personnel, so that it is also a link that needs to be automated infiltration and important to improve efficiency.

There are many semi-automated penetration testing frameworks, but most of the strategies, functions and methods of use used by these frameworks are similar and require manual assistance to fully consider the characteristics of the target system to further customize the penetration testing strategy, so that a great deal of safety experience is still required to support.

Therefore, in order to save labor cost and realize an automatic penetration test system with unified format, a deep reinforcement learning algorithm is utilized to realize optimal selection in penetration test attack through continuous interaction in a penetration test stage, and finally, the most likely penetration test attack scheme is obtained, so that the efficiency and accuracy of penetration test are improved.

Disclosure of Invention

In order to overcome the problems in the related art, the embodiment of the disclosure provides an automatic penetration test method and system based on deep reinforcement learning, and newly provides a method based on deep reinforcement learning, which is interacted and fed back with a real environment, has good performance effect, and can effectively obtain a penetration attack path and scheme.

The embodiment of the invention provides an automatic penetration test method and system based on deep reinforcement learning, comprising the following steps:

and the network environment information collecting and processing module is used for collecting information and detecting loopholes of hosts in the network topology, converting the network topology into a state reachable matrix according to the rule base and inputting the state reachable matrix into the Double DQN network.

The deep reinforcement learning module combines the deep learning with reinforcement learning through an input state reachable matrix, and trains and learns the attack path through a Double DQN model to generate the most likely attack path.

And the vulnerability verification module is used for verifying the detected vulnerability in the real host computer and realizing feedback of the Double DQN model so as to reach the most possible attack path.

The network environment information collecting and processing module mainly uses an online scanning tool ZoomEye to collect host information and vulnerabilities.

Further, the collected information is matched with a rule base, and then the MulVAL is used for building an attack tree between topologies through the matched vulnerability information or tactical behaviors.

Further, rule bases can be classified into vulnerability rules and tactical behavior rules, wherein the rule bases can be used as vulnerability rules by data of known vulnerability bases such as CVE, NVD and CNVD, and tactical behavior is used as rules by specific tactics in ATT & CK issued by MITRE.

Furthermore, all possible paths of the network topology are given to the attack tree generated by the MulVAL, traversal is simplified by using breadth first traversal (BFS), a state reachable matrix is constructed according to all the discovered possible attack paths, and the generated matrix can be better used as the input of a Double DQN model.

The deep reinforcement learning module uses a Double DQN model and is mainly responsible for carrying out iterative optimal selection of attack paths on an input state matrix and finally outputting the optimal possible penetration attack paths.

Preferably, the DQN algorithm is a method of approximating Q-Learning by a neural network, Q-Learning is a model-free reinforcement Learning technique, and can deal with random transition and rewards without an environmental model, and starting from the current state, the total return value of all continuous steps is expected to be maximum, so that the selection of the optimal strategy can be realized, and the core is the Q-value function iteration process, and the formula is as follows:

further, since Q-learning takes all actions Q values corresponding to the states, followed by directly selecting the action with the largest Q value, this results in a more highly estimated value than the true value. Max operations are also used in the DQN algorithm so that selecting and evaluating an action value will overestimate.

Preferably, double Q-Learning uses two value functions for decoupling, which randomly update the two value functions with each other and update the network weights θ and θ with each other's experience ^- The problem of overestimation is avoided by separating the selection action and the evaluation action. The formula is as follows:

further, the lower value of Q-Learning increases with increasing m, while Double Q-Learning is an unbiased estimate and does not change excessively with increasing m, substantially around 0.

Preferably, the Double DQN algorithm is identical to the DQN algorithm, and there are two Q network structures identical to each other. Based on the DQN algorithm, the problem of overestimation is eliminated by decoupling the two steps of selection of the target Q value action and calculation of the target Q value. The formula is as follows:

further, all parameters in the Q-Learning network are updated by gradient back propagation of the neural network using a mean square error loss function. The formula is as follows:

preferably, the model is optimized by artificially tuning the Double DQN model, and because all parameters are for DQN, overestimation can be further eliminated. Therefore, through the continuous path selection iteration of the attack target, an attack path reaching the target computer can be finally obtained and can be used as an optimal attack path.

Preferably, in order to be able to attack real systems using an automated penetration test framework, it is necessary that the system be able to interact with the actual network environment, such as running commands, exploiting vulnerabilities, etc.

Further, at this stage, the system uses the existing penetration test tool metajoint to interact with hosts in the actual topology. By sending an operation command to the real host or utilizing the vulnerability, feedback is obtained from the host and sent to the Double DQN model as an incentive.

Further, a wrapper is created for the penetration test tool so that the results of the Double DQN model trained as described above can be used to send commands to the penetration tool Metasplot, which in turn will perform operations on the real target system through these commands. The results of these actions continue to be received as feedback and used to decide how to continue a given attack path.

The technical scheme provided by the method and the system for automatic penetration test based on deep reinforcement learning can comprise the following beneficial effects: aiming at the characteristic that the current network attack is initiated by a vulnerability, the optimal attack path is difficult to be obtained effectively and quickly by using a traditional method of hiring a security tester or using a semi-automatic penetration test tool, and the optimal attack path is finally obtained by constructing a deep reinforcement learning model and adding interaction feedback of the penetration test tool and a real host computer to continuously update and select the attack path.

Drawings

FIG. 1 is a schematic overall flow chart of the present invention;

FIG. 2 is a schematic diagram of an automated penetration test system based on deep reinforcement learning according to the present invention;

FIG. 3 is a schematic diagram of interaction between the Double DQN model and penetration test of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention discloses an automated penetration test method and system based on deep reinforcement learning, wherein the method and system comprise a plurality of links of information collection, information processing, model construction and the like of a target topology.

The specific implementation process is as follows:

as shown in fig. 2, a target topology network to be subjected to a penetration test is scanned through a zooeye, and an API interface provided by the zooeye is used for receiving target network host information, including information such as operating system information, port opening information, a used protocol, unrepaired known vulnerabilities and the like;

then, extracting effective fields including vulnerability scores, vulnerability types and the like to establish a vulnerability matching rule base by crawling vulnerability information on CVE, NVD and CNVD by a crawler; a library of behavior matching rules is built by crawling specific tactics in MITRE ATT & CK by a crawler and the tools used.

Then, rule base matching is carried out on the collected host information by using a MulVAL tool, and an attack tree reaching the target host is generated and output. And simplifying the attack tree by a breadth-first traversal algorithm, and converting the attack tree into a state reachable matrix acceptable by the Double DQN model.

Then training a Double DQN model, taking the initialized state reachable matrix as the input of the Double DQN model, and firstly adopting a 3-layer Convolutional Neural Network (CNN) to read in the matrix;

then, after convolution pooling, the nonlinear conversion of the two complete connection layers is connected, and finally, a Q value is generated for each action at the output layer, and finally, an optimal path is selected according to the Q value.

Then, the penetration test tool metaplus sends the exploit code or command to the target host according to the path given by the doukole DQN model, and feedback and stimulus are made to the model according to whether the rights of the target host are acquired.

After the automatic penetration is finished, the system gives an optimal possible attack path according to training and learning results, and gives out vulnerabilities and commands used from an attack initial point to a target host attack.

The training process comprises a plurality of parameters, and parameters to be set in the model include host information parameters, threat scoring standards, model rewarding settings and the like. Different initial parameters may determine different implementations.

While the foregoing is directed to embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, and such changes and modifications are also intended to be included within the scope of the invention.

Claims

1. An automated penetration testing system based on deep reinforcement learning, the system comprising: the network environment information collecting and processing module is used for collecting information and detecting loopholes of a host in the network topology, converting the network topology into a state reachable matrix according to a rule base and inputting the state reachable matrix into the Double DQN network; the deep reinforcement learning module is used for training and learning the model by combining deep learning and reinforcement learning through an input state reachable matrix to realize prediction of the most probable attack path; the vulnerability verification module is used for verifying detected vulnerabilities on a real host, feeding back a Double DQN model, so that the most likely attack path is achieved, a vulnerability scanning engine ZoomEye is used for collecting information of a network topology environment, a MulVAL is used for establishing an attack tree from the collected information, the attack tree is converted into a state reachable matrix through breadth first traversal (BFS), the attack tree is used as acceptable input of the Double DQN model, a representative cost function-based method is used, the most likely attack path is obtained by establishing the Double DQN model on the basis of combining a Convolutional Neural Network (CNN) with a Q-learning algorithm in traditional reinforcement learning, an existing penetration testing tool Metaseplus is used for interacting with the actual network environment, and an interaction result is used as a feedback input Double DQN model for determining how to attack of a specific path.