CN113660241A - Automatic penetration testing method based on deep reinforcement learning - Google Patents

Automatic penetration testing method based on deep reinforcement learning Download PDF

Info

Publication number
CN113660241A
CN113660241A CN202110916929.9A CN202110916929A CN113660241A CN 113660241 A CN113660241 A CN 113660241A CN 202110916929 A CN202110916929 A CN 202110916929A CN 113660241 A CN113660241 A CN 113660241A
Authority
CN
China
Prior art keywords
reinforcement learning
vulnerability
network
deep reinforcement
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110916929.9A
Other languages
Chinese (zh)
Other versions
CN113660241B (en
Inventor
郑超
陆秋文
李鑫
孙彦斌
崔翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdian Jizhi Hainan Information Technology Co Ltd
Original Assignee
Zhongdian Jizhi Hainan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdian Jizhi Hainan Information Technology Co Ltd filed Critical Zhongdian Jizhi Hainan Information Technology Co Ltd
Priority to CN202110916929.9A priority Critical patent/CN113660241B/en
Publication of CN113660241A publication Critical patent/CN113660241A/en
Application granted granted Critical
Publication of CN113660241B publication Critical patent/CN113660241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)

Abstract

The invention discloses an automatic penetration testing method and a frame based on deep reinforcement learning. The method integrally comprises the steps of network environment data acquisition and preprocessing, deep reinforcement learning model construction and penetration tool external packaging. The method mainly comprises the steps of scanning a network environment through ZoomEye and collecting topological information, constructing a state reachable matrix through collected data, performing model training and learning by using a DoubleDQN algorithm, interacting a real environment by using a penetration tool as an external tool, and feeding back a DoubleDQN model, so that attack path prediction with good performance is realized. Compared with a general automatic infiltration scheme, the technical scheme disclosed by the invention can effectively improve the efficiency of infiltration testing, and is a feasible automatic infiltration scheme.

Description

Automatic penetration testing method based on deep reinforcement learning
Technical Field
The invention relates to the field of automatic penetration in network security, in particular to a deep reinforcement learning method.
Background
With the rapid development of information technology and the continuous expansion of network application range, people are greatly facilitated and at the same time are in the threat of network safety hidden danger. These threats include viruses, worms, trojans, etc., and the primary way in which they launch attacks is through security holes that arise through system applications. In order to prevent these threats, it is necessary to detect the security problem existing in the computer system early and determine the damage degree of the existing vulnerability, so that it is necessary to perform penetration test on the computer network system periodically in the enterprise network and perform security repair of the system according to the result of the penetration test.
Penetration test (penetration test) refers to an evaluation method for evaluating the security of a computer network system by simulating an attack method of a malicious hacker. The process includes active analysis of any vulnerability, technical defect or vulnerability of the system from a location where an attacker may exist and conditionally exploit the security vulnerability to a certain degree of control authority to obtain the target system asset.
The penetration test is generally divided into seven stages of early-stage interaction, information collection, threat modeling, vulnerability analysis, penetration attack, post-penetration attack and report generation, the threats faced by all assets of organization information can be quickly and effectively analyzed and determined through the generated penetration test report, the threat assessment is made for the threats faced by each asset, the expenditure is balanced by referring to risk quantification, and the security cost investment of an organization is effectively reduced.
Penetration testing is mainly performed by professional security personnel using different security testing tools to test a target system, however, in actual penetration testing, many security personnel are required to make a joint effort to obtain the authority of the target system. In the infiltration process, especially for the threat modeling and infiltration attack stage, much effort is required to be paid to security personnel, so that the method is also a link which needs to be automated to infiltrate and needs to focus on improving efficiency.
Many semi-automated penetration testing frameworks exist, but most of the strategies, functions and using methods used by the frameworks are similar, and manual assistance is needed to fully consider the characteristics of a target system to further customize the penetration testing strategy, so that abundant safety experience is needed for support.
Therefore, in order to save labor cost and realize an automatic penetration test system with a uniform format, the method is a demand in the field, and by using a deep reinforcement learning algorithm, optimal selection in penetration test attack can be continuously and interactively realized in a penetration test stage, and finally, the most possible penetration test attack scheme is obtained, so that the efficiency and the accuracy of penetration test are improved.
Disclosure of Invention
In order to overcome the problems in the related field, the embodiments of the present disclosure provide an automated penetration testing method and system based on deep reinforcement learning, and a method based on deep reinforcement learning through interaction and feedback with a real environment is newly provided, so that the performance effect is good, and a path and a scheme of penetration attack can be effectively obtained.
The embodiment of the invention provides an automatic penetration testing method and system based on deep reinforcement learning, which comprises the following steps:
and the network environment information collection processing module is used for carrying out information collection and vulnerability detection on the host in the network topology, converting the network topology into a state reachable matrix according to the rule base and inputting the state reachable matrix into the Double DQN network.
And the deep reinforcement learning module combines the deep learning and the reinforcement learning through the input state reachable matrix, and trains and learns the attack path through the Double DQN model to generate the most possible attack path.
And the vulnerability verification module is used for verifying the detected vulnerability in the real host, and realizing the feedback of the Double DQN model, thereby achieving the most possible attack path.
The network environment information collecting and processing module mainly uses an online scanning tool zoomEye to collect host information and vulnerabilities.
Further, the collected information is matched with a rule base, and then MulVAL is used for establishing an attack tree between topologies through the matched vulnerability information or tactical behaviors.
Further, the rule base can be divided into vulnerability rules and tactical behavior rules, wherein the rule base can use data of well-known vulnerability bases such as CVE, NVD and CNVD as the vulnerability rules, and the tactical behavior uses specific tactics in ATT & CK issued by MITRE as the rules.
Further, all possible paths of the network topology are given to the attack tree generated by the MulVAL, traversal is simplified by using breadth-first traversal (BFS), a state reachable matrix is constructed according to all the possible attack paths found, and the generated matrix can be better used as the input of the Double DQN model.
The deep reinforcement learning module uses a Double DQN model and is mainly responsible for carrying out iterative optimal selection on an attack path on an input state matrix and finally outputting an optimal possible penetration attack path.
Preferably, the DQN algorithm is a method of approximating a function with Q-Learning through a neural network, Q-Learning is a model-free reinforcement Learning technique, random transitions and reward problems can be handled without an environmental model, starting from the current state, the selection of the optimal strategy can be achieved when the total return value expectation of all successive steps is maximized, the core of the method is a Q value function iterative process, and the formula is as follows:
Figure BDA0003205959650000031
further, since the Q value of all actions corresponding to the Q-learning state is directly followed by the action of selecting the action with the largest Q value, this results in a more estimated value than the true value. Max operations are also used in the DQN algorithm so that an action value is selected and evaluated over-estimated.
Preferably, Double Q-Learning uses two value functions for decoupling, which randomly update the two value functions, andusing each other's experience to update network weights theta and theta-The problem of overestimation is avoided by splitting the selection action and the evaluation action. The formula is as follows:
Figure BDA0003205959650000041
further, the estimated value of Q-Learning increases with increasing m, while Double Q-Learning is an unbiased estimate and does not change excessively with increasing m, and is substantially around 0.
Preferably, the Double DQN algorithm is the same as the DQN algorithm, and there are two Q network structures in common. On the basis of the DQN algorithm, the problem of over-estimation is eliminated by decoupling the two steps of selection of the action of the target Q value and calculation of the target Q value. The formula is as follows:
Figure BDA0003205959650000042
further, all parameters in the Q-Learning network are updated by the gradient back propagation of the neural network using the mean square error loss function. The formula is as follows:
Figure BDA0003205959650000043
preferably, the Double DQN model is artificially parameterized to achieve the best results, and since all parameters are DQN-specific, overestimation can be further eliminated. Therefore, an attack path reaching the target computer can be finally obtained through the ceaseless path selection iteration of the attack target, and the attack path can be used as an optimal attack path.
Preferably, in order to be able to attack a real system using an automated penetration testing framework, it is required that the system be able to interact with the actual network environment, such as running commands, exploiting vulnerabilities, and the like.
Further, at this stage, the system uses the existing penetration test tool Metasploit to interact with hosts in the actual topology. And sending an operation command or utilizing a vulnerability to the real host computer, obtaining feedback from the host computer, and sending the feedback into the Double DQN model as excitation.
Further, a wrapper is created for the penetration test tool so that the results of the Double DQN model trained as described above can be used to send commands to the penetration tool, Metasplait, which in turn will perform operations on the real target system through these commands. The results of these actions continue to be received as feedback and used to decide how to continue a given attack path.
The technical scheme provided by the automatic penetration testing method and system based on deep reinforcement learning can have the following beneficial effects: aiming at the characteristic that the current network attack is initiated by a vulnerability, the optimal attack path is difficult to effectively and quickly obtain by adopting the traditional method of employing a security tester or using a semi-automatic penetration testing tool, and the optimal attack path is finally obtained by constructing a deep reinforcement learning model and continuously updating and selecting the attack path by adding interactive feedback of the penetration testing tool and a real host.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic structural diagram of an automated penetration testing system based on deep reinforcement learning according to the present invention;
FIG. 3 is a schematic diagram of interaction between the Double DQN model and penetration test according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention discloses an automated penetration testing method and system based on deep reinforcement learning, which includes multiple links of information collection, information processing, model construction, etc. for a target topology.
The specific implementation process is as follows:
as shown in fig. 2, firstly, a target topology network to be subjected to a penetration test is scanned through a Zoom Eye, and target network host information, including information of an operating system, port opening information, a used protocol, an unrepaired known bug and the like, is received by using an API interface provided by the Zoom Eye;
then, crawling vulnerability information on the CVE, the NVD and the CNVD through a crawler, extracting effective fields including vulnerability scores, vulnerability types and the like, and establishing a vulnerability matching rule base; and a behavior matching rule base is established by crawling specific tactics in the MITER ATT & CK and the used tools through a crawler.
And then, carrying out rule base matching on the collected host information by using a MulVAL tool, and generating and outputting an attack tree reaching a target host. And simplifying the attack tree through a breadth-first traversal algorithm, and converting the attack tree into a reachable matrix of the acceptable state of the Double DQN model.
Then, training a Double DQN model, taking the initialized state reachable matrix as the input of the Double DQN model, and firstly reading in the matrix by adopting a 3-layer Convolutional Neural Network (CNN);
and then, after convolution pooling, connecting nonlinear conversion of two complete connection layers, finally generating a Q value for each action on an output layer, and finally selecting an optimal path according to the Q value.
And then, the penetration testing tool Metasplait sends a vulnerability utilization code or command to the target host according to a path given by the double DQN model, and feeds back and excites the model according to whether the authority of the target host is acquired.
After the automatic penetration is finished, the system gives an optimal possible attack path according to the training and learning results and gives a vulnerability and a command used for attacking from an attack initial point to a target host.
The training process comprises a plurality of parameters, and the parameters to be set in the model comprise host information parameters, threat scoring standards, model reward setting and the like. Different initial parameters may determine different implementations.
While the foregoing is directed to embodiments of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (4)

1. An automated penetration testing method based on deep reinforcement learning, which is characterized in that the module comprises:
and the network environment information collection processing module is used for carrying out information collection and vulnerability detection on the host in the network topology, converting the network topology into a state reachable matrix according to the rule base and inputting the state reachable matrix into the Double DQN network.
And the deep reinforcement learning module can reach the matrix through the input state, and can train and learn the model by combining the deep learning and the reinforcement learning so as to predict the most possible attack path.
And the vulnerability verification module is used for verifying the detected vulnerability in the real host, and realizing the feedback of the Double DQN model, thereby achieving the most possible attack path.
2. The network environment information collection processing module of claim 1, wherein the vulnerability scanning engine zoomeeye is used to collect information of the network topology environment, MulVAL is used to build an attack tree from the collected information, and the attack tree is converted into a state reachable matrix through breadth-first traversal (BFS) as an acceptable input to the Double DQN model.
3. The deep reinforcement learning module of claim 1, wherein a representative cost function-based method is used to obtain the most likely attack path by building a Double DQN model based on a Convolutional Neural Network (CNN) combined with Q-learning algorithm in conventional reinforcement learning.
4. The vulnerability verification module of claim 1, wherein existing penetration testing tool Metasplait is used to interact with the actual network environment and the interaction result is used as feedback input to Double DQN model for deciding how to perform specific path attacks.
CN202110916929.9A 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning Active CN113660241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916929.9A CN113660241B (en) 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916929.9A CN113660241B (en) 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113660241A true CN113660241A (en) 2021-11-16
CN113660241B CN113660241B (en) 2023-05-23

Family

ID=78479468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916929.9A Active CN113660241B (en) 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113660241B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112278A (en) * 2023-02-17 2023-05-12 西安电子科技大学 Q-learning-based network optimal attack path prediction method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106549950A (en) * 2016-11-01 2017-03-29 南京理工大学 A kind of matrix method for visualizing based on state attacking and defending figure
CN109407676A (en) * 2018-12-20 2019-03-01 哈尔滨工业大学 The moving robot obstacle avoiding method learnt based on DoubleDQN network and deeply
CN110309658A (en) * 2019-06-27 2019-10-08 暨南大学 A kind of dangerous XSS defensive system recognition methods based on intensified learning
US20200067962A1 (en) * 2018-08-24 2020-02-27 California Institute Of Technology Model based methodology for translating high-level cyber threat descriptions into system-specific actionable defense tactics
CN112221149A (en) * 2020-09-29 2021-01-15 中北大学 Artillery and soldier continuous intelligent combat drilling system based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106549950A (en) * 2016-11-01 2017-03-29 南京理工大学 A kind of matrix method for visualizing based on state attacking and defending figure
US20200067962A1 (en) * 2018-08-24 2020-02-27 California Institute Of Technology Model based methodology for translating high-level cyber threat descriptions into system-specific actionable defense tactics
CN109407676A (en) * 2018-12-20 2019-03-01 哈尔滨工业大学 The moving robot obstacle avoiding method learnt based on DoubleDQN network and deeply
CN110309658A (en) * 2019-06-27 2019-10-08 暨南大学 A kind of dangerous XSS defensive system recognition methods based on intensified learning
CN112221149A (en) * 2020-09-29 2021-01-15 中北大学 Artillery and soldier continuous intelligent combat drilling system based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贵重;: "基于ATT&CK的多源数据深度安全检测技术研究" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112278A (en) * 2023-02-17 2023-05-12 西安电子科技大学 Q-learning-based network optimal attack path prediction method and system

Also Published As

Publication number Publication date
CN113660241B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
Hu et al. Automated penetration testing using deep reinforcement learning
CN105871882B (en) Network security risk analysis method based on network node fragility and attack information
CN111581645B (en) Iterative attack method of automatic penetration test system based on AI
KR100851521B1 (en) Cyber Attack System for Vulnerability Assessment and Method Thereof
CN111475817B (en) Data collection method of automatic penetration test system based on AI
CN111488587B (en) Automatic penetration test system based on AI
CN107908645B (en) Online social platform rumor propagation immune method based on seepage analysis
CN111488588B (en) Automatic penetration test method based on AI
CN111475818B (en) Penetration attack method of automatic penetration test system based on AI
Zhou et al. NIG-AP: a new method for automated penetration testing
CN116405246A (en) Vulnerability exploitation chain construction technology based on attack and defense combination
CN111049827A (en) Network system safety protection method, device and related equipment
KR102419451B1 (en) Artificial intelligence based threat analysis automation system and method
CN112491860A (en) Industrial control network-oriented collaborative intrusion detection method
CN115037553B (en) Information security monitoring model construction method and device, information security monitoring model application method and device, and storage medium
CN110493262A (en) It is a kind of to improve the network attack detecting method classified and system
CN113660241B (en) Automatic penetration test method based on deep reinforcement learning
CN115580430A (en) Attack tree-pot deployment defense method and device based on deep reinforcement learning
CN116545687A (en) Automatic network simulation attack framework based on attack tree and deep reinforcement learning
CN114386042A (en) Method suitable for deduction of power enterprise network war chess
CN111488586B (en) Automatic permeation testing system post-permeation method based on AI
KR102578421B1 (en) Method And System for managing of attack equipment of Cyber Attack Simulation Platform
CN115242487B (en) APT attack sample enhancement and detection method based on meta-behavior
CN115277065B (en) Anti-attack method and device in abnormal traffic detection of Internet of things
Conti et al. Bio-inspired security analysis for IoT scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 571924 Room 301, 3rd floor, building A09, Hainan Ecological Software Park, Laocheng hi tech Industrial Demonstration Zone, Chengmai County, Haikou City, Hainan Province

Applicant after: Jizhi (Hainan) Information Technology Co.,Ltd.

Address before: 571924 Room 301, 3rd floor, building A09, Hainan Ecological Software Park, Laocheng hi tech Industrial Demonstration Zone, Chengmai County, Haikou City, Hainan Province

Applicant before: Zhongdian Jizhi (Hainan) Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant