CN113660241A

CN113660241A - Automatic penetration testing method based on deep reinforcement learning

Info

Publication number: CN113660241A
Application number: CN202110916929.9A
Authority: CN
Inventors: 郑超; 陆秋文; 李鑫; 孙彦斌; 崔翔
Original assignee: Zhongdian Jizhi Hainan Information Technology Co Ltd
Current assignee: Zhongdian Jizhi Hainan Information Technology Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-16
Anticipated expiration: 2041-08-11
Also published as: CN113660241B

Abstract

The invention discloses an automatic penetration testing method and a frame based on deep reinforcement learning. The method integrally comprises the steps of network environment data acquisition and preprocessing, deep reinforcement learning model construction and penetration tool external packaging. The method mainly comprises the steps of scanning a network environment through ZoomEye and collecting topological information, constructing a state reachable matrix through collected data, performing model training and learning by using a DoubleDQN algorithm, interacting a real environment by using a penetration tool as an external tool, and feeding back a DoubleDQN model, so that attack path prediction with good performance is realized. Compared with a general automatic infiltration scheme, the technical scheme disclosed by the invention can effectively improve the efficiency of infiltration testing, and is a feasible automatic infiltration scheme.

Description

Automatic penetration testing method based on deep reinforcement learning

Technical Field

The invention relates to the field of automatic penetration in network security, in particular to a deep reinforcement learning method.

Background

With the rapid development of information technology and the continuous expansion of network application range, people are greatly facilitated and at the same time are in the threat of network safety hidden danger. These threats include viruses, worms, trojans, etc., and the primary way in which they launch attacks is through security holes that arise through system applications. In order to prevent these threats, it is necessary to detect the security problem existing in the computer system early and determine the damage degree of the existing vulnerability, so that it is necessary to perform penetration test on the computer network system periodically in the enterprise network and perform security repair of the system according to the result of the penetration test.

Penetration test (penetration test) refers to an evaluation method for evaluating the security of a computer network system by simulating an attack method of a malicious hacker. The process includes active analysis of any vulnerability, technical defect or vulnerability of the system from a location where an attacker may exist and conditionally exploit the security vulnerability to a certain degree of control authority to obtain the target system asset.

The penetration test is generally divided into seven stages of early-stage interaction, information collection, threat modeling, vulnerability analysis, penetration attack, post-penetration attack and report generation, the threats faced by all assets of organization information can be quickly and effectively analyzed and determined through the generated penetration test report, the threat assessment is made for the threats faced by each asset, the expenditure is balanced by referring to risk quantification, and the security cost investment of an organization is effectively reduced.

Penetration testing is mainly performed by professional security personnel using different security testing tools to test a target system, however, in actual penetration testing, many security personnel are required to make a joint effort to obtain the authority of the target system. In the infiltration process, especially for the threat modeling and infiltration attack stage, much effort is required to be paid to security personnel, so that the method is also a link which needs to be automated to infiltrate and needs to focus on improving efficiency.

Many semi-automated penetration testing frameworks exist, but most of the strategies, functions and using methods used by the frameworks are similar, and manual assistance is needed to fully consider the characteristics of a target system to further customize the penetration testing strategy, so that abundant safety experience is needed for support.

Therefore, in order to save labor cost and realize an automatic penetration test system with a uniform format, the method is a demand in the field, and by using a deep reinforcement learning algorithm, optimal selection in penetration test attack can be continuously and interactively realized in a penetration test stage, and finally, the most possible penetration test attack scheme is obtained, so that the efficiency and the accuracy of penetration test are improved.

Disclosure of Invention

In order to overcome the problems in the related field, the embodiments of the present disclosure provide an automated penetration testing method and system based on deep reinforcement learning, and a method based on deep reinforcement learning through interaction and feedback with a real environment is newly provided, so that the performance effect is good, and a path and a scheme of penetration attack can be effectively obtained.

The embodiment of the invention provides an automatic penetration testing method and system based on deep reinforcement learning, which comprises the following steps:

and the network environment information collection processing module is used for carrying out information collection and vulnerability detection on the host in the network topology, converting the network topology into a state reachable matrix according to the rule base and inputting the state reachable matrix into the Double DQN network.

And the deep reinforcement learning module combines the deep learning and the reinforcement learning through the input state reachable matrix, and trains and learns the attack path through the Double DQN model to generate the most possible attack path.

And the vulnerability verification module is used for verifying the detected vulnerability in the real host, and realizing the feedback of the Double DQN model, thereby achieving the most possible attack path.

The network environment information collecting and processing module mainly uses an online scanning tool zoomEye to collect host information and vulnerabilities.

Further, the collected information is matched with a rule base, and then MulVAL is used for establishing an attack tree between topologies through the matched vulnerability information or tactical behaviors.

Further, the rule base can be divided into vulnerability rules and tactical behavior rules, wherein the rule base can use data of well-known vulnerability bases such as CVE, NVD and CNVD as the vulnerability rules, and the tactical behavior uses specific tactics in ATT & CK issued by MITRE as the rules.

Further, all possible paths of the network topology are given to the attack tree generated by the MulVAL, traversal is simplified by using breadth-first traversal (BFS), a state reachable matrix is constructed according to all the possible attack paths found, and the generated matrix can be better used as the input of the Double DQN model.

The deep reinforcement learning module uses a Double DQN model and is mainly responsible for carrying out iterative optimal selection on an attack path on an input state matrix and finally outputting an optimal possible penetration attack path.

Preferably, the DQN algorithm is a method of approximating a function with Q-Learning through a neural network, Q-Learning is a model-free reinforcement Learning technique, random transitions and reward problems can be handled without an environmental model, starting from the current state, the selection of the optimal strategy can be achieved when the total return value expectation of all successive steps is maximized, the core of the method is a Q value function iterative process, and the formula is as follows:

further, since the Q value of all actions corresponding to the Q-learning state is directly followed by the action of selecting the action with the largest Q value, this results in a more estimated value than the true value. Max operations are also used in the DQN algorithm so that an action value is selected and evaluated over-estimated.

Preferably, Double Q-Learning uses two value functions for decoupling, which randomly update the two value functions, andusing each other's experience to update network weights theta and theta^-The problem of overestimation is avoided by splitting the selection action and the evaluation action. The formula is as follows:

further, the estimated value of Q-Learning increases with increasing m, while Double Q-Learning is an unbiased estimate and does not change excessively with increasing m, and is substantially around 0.

Preferably, the Double DQN algorithm is the same as the DQN algorithm, and there are two Q network structures in common. On the basis of the DQN algorithm, the problem of over-estimation is eliminated by decoupling the two steps of selection of the action of the target Q value and calculation of the target Q value. The formula is as follows:

further, all parameters in the Q-Learning network are updated by the gradient back propagation of the neural network using the mean square error loss function. The formula is as follows:

preferably, the Double DQN model is artificially parameterized to achieve the best results, and since all parameters are DQN-specific, overestimation can be further eliminated. Therefore, an attack path reaching the target computer can be finally obtained through the ceaseless path selection iteration of the attack target, and the attack path can be used as an optimal attack path.

Preferably, in order to be able to attack a real system using an automated penetration testing framework, it is required that the system be able to interact with the actual network environment, such as running commands, exploiting vulnerabilities, and the like.

Further, at this stage, the system uses the existing penetration test tool Metasploit to interact with hosts in the actual topology. And sending an operation command or utilizing a vulnerability to the real host computer, obtaining feedback from the host computer, and sending the feedback into the Double DQN model as excitation.

Further, a wrapper is created for the penetration test tool so that the results of the Double DQN model trained as described above can be used to send commands to the penetration tool, Metasplait, which in turn will perform operations on the real target system through these commands. The results of these actions continue to be received as feedback and used to decide how to continue a given attack path.

The technical scheme provided by the automatic penetration testing method and system based on deep reinforcement learning can have the following beneficial effects: aiming at the characteristic that the current network attack is initiated by a vulnerability, the optimal attack path is difficult to effectively and quickly obtain by adopting the traditional method of employing a security tester or using a semi-automatic penetration testing tool, and the optimal attack path is finally obtained by constructing a deep reinforcement learning model and continuously updating and selecting the attack path by adding interactive feedback of the penetration testing tool and a real host.

Drawings

FIG. 1 is a schematic overall flow diagram of the present invention;

FIG. 2 is a schematic structural diagram of an automated penetration testing system based on deep reinforcement learning according to the present invention;

FIG. 3 is a schematic diagram of interaction between the Double DQN model and penetration test according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention discloses an automated penetration testing method and system based on deep reinforcement learning, which includes multiple links of information collection, information processing, model construction, etc. for a target topology.

The specific implementation process is as follows:

as shown in fig. 2, firstly, a target topology network to be subjected to a penetration test is scanned through a Zoom Eye, and target network host information, including information of an operating system, port opening information, a used protocol, an unrepaired known bug and the like, is received by using an API interface provided by the Zoom Eye;

then, crawling vulnerability information on the CVE, the NVD and the CNVD through a crawler, extracting effective fields including vulnerability scores, vulnerability types and the like, and establishing a vulnerability matching rule base; and a behavior matching rule base is established by crawling specific tactics in the MITER ATT & CK and the used tools through a crawler.

And then, carrying out rule base matching on the collected host information by using a MulVAL tool, and generating and outputting an attack tree reaching a target host. And simplifying the attack tree through a breadth-first traversal algorithm, and converting the attack tree into a reachable matrix of the acceptable state of the Double DQN model.

Then, training a Double DQN model, taking the initialized state reachable matrix as the input of the Double DQN model, and firstly reading in the matrix by adopting a 3-layer Convolutional Neural Network (CNN);

and then, after convolution pooling, connecting nonlinear conversion of two complete connection layers, finally generating a Q value for each action on an output layer, and finally selecting an optimal path according to the Q value.

And then, the penetration testing tool Metasplait sends a vulnerability utilization code or command to the target host according to a path given by the double DQN model, and feeds back and excites the model according to whether the authority of the target host is acquired.

After the automatic penetration is finished, the system gives an optimal possible attack path according to the training and learning results and gives a vulnerability and a command used for attacking from an attack initial point to a target host.

The training process comprises a plurality of parameters, and the parameters to be set in the model comprise host information parameters, threat scoring standards, model reward setting and the like. Different initial parameters may determine different implementations.

While the foregoing is directed to embodiments of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. An automated penetration testing method based on deep reinforcement learning, which is characterized in that the module comprises:

And the deep reinforcement learning module can reach the matrix through the input state, and can train and learn the model by combining the deep learning and the reinforcement learning so as to predict the most possible attack path.

2. The network environment information collection processing module of claim 1, wherein the vulnerability scanning engine zoomeeye is used to collect information of the network topology environment, MulVAL is used to build an attack tree from the collected information, and the attack tree is converted into a state reachable matrix through breadth-first traversal (BFS) as an acceptable input to the Double DQN model.

3. The deep reinforcement learning module of claim 1, wherein a representative cost function-based method is used to obtain the most likely attack path by building a Double DQN model based on a Convolutional Neural Network (CNN) combined with Q-learning algorithm in conventional reinforcement learning.

4. The vulnerability verification module of claim 1, wherein existing penetration testing tool Metasplait is used to interact with the actual network environment and the interaction result is used as feedback input to Double DQN model for deciding how to perform specific path attacks.