CN113660241B - Automatic penetration test method based on deep reinforcement learning - Google Patents

Automatic penetration test method based on deep reinforcement learning Download PDF

Info

Publication number
CN113660241B
CN113660241B CN202110916929.9A CN202110916929A CN113660241B CN 113660241 B CN113660241 B CN 113660241B CN 202110916929 A CN202110916929 A CN 202110916929A CN 113660241 B CN113660241 B CN 113660241B
Authority
CN
China
Prior art keywords
attack
reinforcement learning
learning
model
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110916929.9A
Other languages
Chinese (zh)
Other versions
CN113660241A (en
Inventor
郑超
陆秋文
李鑫
孙彦斌
崔翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jizhi Hainan Information Technology Co ltd
Original Assignee
Jizhi Hainan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jizhi Hainan Information Technology Co ltd filed Critical Jizhi Hainan Information Technology Co ltd
Priority to CN202110916929.9A priority Critical patent/CN113660241B/en
Publication of CN113660241A publication Critical patent/CN113660241A/en
Application granted granted Critical
Publication of CN113660241B publication Critical patent/CN113660241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)

Abstract

The invention discloses an automatic penetration test method and a framework based on deep reinforcement learning. The method integrally comprises the steps of network environment data acquisition and preprocessing, deep reinforcement learning model construction and external packaging of a penetration tool. The method mainly comprises the steps of scanning a network environment through a ZoomEye, collecting topology information, constructing a state reachable matrix through collected data, performing model training and learning by using a DoubleDQN algorithm, interacting a real environment by using a penetration tool as an external tool, and feeding back the DoubleDQN model, so that attack path prediction with good performance is realized. Compared with a general automatic permeation scheme, the technical scheme can effectively improve the efficiency of permeation test, and is a feasible automatic permeation scheme.

Description

Automatic penetration test method based on deep reinforcement learning
Technical Field
The invention relates to the field of automatic infiltration in network security, in particular to a deep reinforcement learning method.
Background
Along with the rapid development of information technology and the continuous expansion of network application range, people are in the threat of potential safety hazards of the network while bringing great convenience to people. These threats include viruses, worms, trojans, etc., which launch attacks in a main way that are security vulnerabilities that occur through the system applications. To prevent these threats, it is necessary to discover security problems existing in computer systems early and determine the extent of damage of vulnerabilities, so that it is necessary to periodically perform penetration tests on computer network systems in enterprise networks and perform security repair of the systems according to the results of the penetration tests.
Penetration testing (penetration test) refers to an evaluation method for evaluating the security of a computer network system by simulating an attack method of a malicious hacker. The process includes active analysis of any vulnerability, technical deficiency, or vulnerability of the system from a location where an attacker may be present, and conditional active exploitation of the security vulnerability from this location to achieve certain control authority to obtain the target system asset.
The penetration test is generally divided into seven stages of early interaction, information collection, threat modeling, vulnerability analysis, penetration attack, post penetration attack and report generation, and the generated penetration test report can be used for rapidly and effectively analyzing and determining threats faced by all assets of the organization information, carrying out threat assessment on the threats faced by each asset, balancing expenditure by referring to risk quantification, and effectively reducing the safety cost investment of the organization.
Penetration testing is performed mainly by professional security personnel using different security testing tools to test the target system, however in actual penetration testing, many security personnel are required to make a common effort to obtain the rights of the target system. In the infiltration process, especially for threat modeling and infiltration attack phases, much effort is required from security personnel, so that it is also a link that needs to be automated infiltration and important to improve efficiency.
There are many semi-automated penetration testing frameworks, but most of the strategies, functions and methods of use used by these frameworks are similar and require manual assistance to fully consider the characteristics of the target system to further customize the penetration testing strategy, so that a great deal of safety experience is still required to support.
Therefore, in order to save labor cost and realize an automatic penetration test system with unified format, a deep reinforcement learning algorithm is utilized to realize optimal selection in penetration test attack through continuous interaction in a penetration test stage, and finally, the most likely penetration test attack scheme is obtained, so that the efficiency and accuracy of penetration test are improved.
Disclosure of Invention
In order to overcome the problems in the related art, the embodiment of the disclosure provides an automatic penetration test method and system based on deep reinforcement learning, and newly provides a method based on deep reinforcement learning, which is interacted and fed back with a real environment, has good performance effect, and can effectively obtain a penetration attack path and scheme.
The embodiment of the invention provides an automatic penetration test method and system based on deep reinforcement learning, comprising the following steps:
and the network environment information collecting and processing module is used for collecting information and detecting loopholes of hosts in the network topology, converting the network topology into a state reachable matrix according to the rule base and inputting the state reachable matrix into the Double DQN network.
The deep reinforcement learning module combines the deep learning with reinforcement learning through an input state reachable matrix, and trains and learns the attack path through a Double DQN model to generate the most likely attack path.
And the vulnerability verification module is used for verifying the detected vulnerability in the real host computer and realizing feedback of the Double DQN model so as to reach the most possible attack path.
The network environment information collecting and processing module mainly uses an online scanning tool ZoomEye to collect host information and vulnerabilities.
Further, the collected information is matched with a rule base, and then the MulVAL is used for building an attack tree between topologies through the matched vulnerability information or tactical behaviors.
Further, rule bases can be classified into vulnerability rules and tactical behavior rules, wherein the rule bases can be used as vulnerability rules by data of known vulnerability bases such as CVE, NVD and CNVD, and tactical behavior is used as rules by specific tactics in ATT & CK issued by MITRE.
Furthermore, all possible paths of the network topology are given to the attack tree generated by the MulVAL, traversal is simplified by using breadth first traversal (BFS), a state reachable matrix is constructed according to all the discovered possible attack paths, and the generated matrix can be better used as the input of a Double DQN model.
The deep reinforcement learning module uses a Double DQN model and is mainly responsible for carrying out iterative optimal selection of attack paths on an input state matrix and finally outputting the optimal possible penetration attack paths.
Preferably, the DQN algorithm is a method of approximating Q-Learning by a neural network, Q-Learning is a model-free reinforcement Learning technique, and can deal with random transition and rewards without an environmental model, and starting from the current state, the total return value of all continuous steps is expected to be maximum, so that the selection of the optimal strategy can be realized, and the core is the Q-value function iteration process, and the formula is as follows:
Figure BDA0003205959650000031
further, since Q-learning takes all actions Q values corresponding to the states, followed by directly selecting the action with the largest Q value, this results in a more highly estimated value than the true value. Max operations are also used in the DQN algorithm so that selecting and evaluating an action value will overestimate.
Preferably, double Q-Learning uses two value functions for decoupling, which randomly update the two value functions with each other and update the network weights θ and θ with each other's experience - The problem of overestimation is avoided by separating the selection action and the evaluation action. The formula is as follows:
Figure BDA0003205959650000041
further, the lower value of Q-Learning increases with increasing m, while Double Q-Learning is an unbiased estimate and does not change excessively with increasing m, substantially around 0.
Preferably, the Double DQN algorithm is identical to the DQN algorithm, and there are two Q network structures identical to each other. Based on the DQN algorithm, the problem of overestimation is eliminated by decoupling the two steps of selection of the target Q value action and calculation of the target Q value. The formula is as follows:
Figure BDA0003205959650000042
further, all parameters in the Q-Learning network are updated by gradient back propagation of the neural network using a mean square error loss function. The formula is as follows:
Figure BDA0003205959650000043
preferably, the model is optimized by artificially tuning the Double DQN model, and because all parameters are for DQN, overestimation can be further eliminated. Therefore, through the continuous path selection iteration of the attack target, an attack path reaching the target computer can be finally obtained and can be used as an optimal attack path.
Preferably, in order to be able to attack real systems using an automated penetration test framework, it is necessary that the system be able to interact with the actual network environment, such as running commands, exploiting vulnerabilities, etc.
Further, at this stage, the system uses the existing penetration test tool metajoint to interact with hosts in the actual topology. By sending an operation command to the real host or utilizing the vulnerability, feedback is obtained from the host and sent to the Double DQN model as an incentive.
Further, a wrapper is created for the penetration test tool so that the results of the Double DQN model trained as described above can be used to send commands to the penetration tool Metasplot, which in turn will perform operations on the real target system through these commands. The results of these actions continue to be received as feedback and used to decide how to continue a given attack path.
The technical scheme provided by the method and the system for automatic penetration test based on deep reinforcement learning can comprise the following beneficial effects: aiming at the characteristic that the current network attack is initiated by a vulnerability, the optimal attack path is difficult to be obtained effectively and quickly by using a traditional method of hiring a security tester or using a semi-automatic penetration test tool, and the optimal attack path is finally obtained by constructing a deep reinforcement learning model and adding interaction feedback of the penetration test tool and a real host computer to continuously update and select the attack path.
Drawings
FIG. 1 is a schematic overall flow chart of the present invention;
FIG. 2 is a schematic diagram of an automated penetration test system based on deep reinforcement learning according to the present invention;
FIG. 3 is a schematic diagram of interaction between the Double DQN model and penetration test of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention discloses an automated penetration test method and system based on deep reinforcement learning, wherein the method and system comprise a plurality of links of information collection, information processing, model construction and the like of a target topology.
The specific implementation process is as follows:
as shown in fig. 2, a target topology network to be subjected to a penetration test is scanned through a zooeye, and an API interface provided by the zooeye is used for receiving target network host information, including information such as operating system information, port opening information, a used protocol, unrepaired known vulnerabilities and the like;
then, extracting effective fields including vulnerability scores, vulnerability types and the like to establish a vulnerability matching rule base by crawling vulnerability information on CVE, NVD and CNVD by a crawler; a library of behavior matching rules is built by crawling specific tactics in MITRE ATT & CK by a crawler and the tools used.
Then, rule base matching is carried out on the collected host information by using a MulVAL tool, and an attack tree reaching the target host is generated and output. And simplifying the attack tree by a breadth-first traversal algorithm, and converting the attack tree into a state reachable matrix acceptable by the Double DQN model.
Then training a Double DQN model, taking the initialized state reachable matrix as the input of the Double DQN model, and firstly adopting a 3-layer Convolutional Neural Network (CNN) to read in the matrix;
then, after convolution pooling, the nonlinear conversion of the two complete connection layers is connected, and finally, a Q value is generated for each action at the output layer, and finally, an optimal path is selected according to the Q value.
Then, the penetration test tool metaplus sends the exploit code or command to the target host according to the path given by the doukole DQN model, and feedback and stimulus are made to the model according to whether the rights of the target host are acquired.
After the automatic penetration is finished, the system gives an optimal possible attack path according to training and learning results, and gives out vulnerabilities and commands used from an attack initial point to a target host attack.
The training process comprises a plurality of parameters, and parameters to be set in the model include host information parameters, threat scoring standards, model rewarding settings and the like. Different initial parameters may determine different implementations.
While the foregoing is directed to embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, and such changes and modifications are also intended to be included within the scope of the invention.

Claims (1)

1. An automated penetration testing system based on deep reinforcement learning, the system comprising: the network environment information collecting and processing module is used for collecting information and detecting loopholes of a host in the network topology, converting the network topology into a state reachable matrix according to a rule base and inputting the state reachable matrix into the Double DQN network; the deep reinforcement learning module is used for training and learning the model by combining deep learning and reinforcement learning through an input state reachable matrix to realize prediction of the most probable attack path; the vulnerability verification module is used for verifying detected vulnerabilities on a real host, feeding back a Double DQN model, so that the most likely attack path is achieved, a vulnerability scanning engine ZoomEye is used for collecting information of a network topology environment, a MulVAL is used for establishing an attack tree from the collected information, the attack tree is converted into a state reachable matrix through breadth first traversal (BFS), the attack tree is used as acceptable input of the Double DQN model, a representative cost function-based method is used, the most likely attack path is obtained by establishing the Double DQN model on the basis of combining a Convolutional Neural Network (CNN) with a Q-learning algorithm in traditional reinforcement learning, an existing penetration testing tool Metaseplus is used for interacting with the actual network environment, and an interaction result is used as a feedback input Double DQN model for determining how to attack of a specific path.
CN202110916929.9A 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning Active CN113660241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916929.9A CN113660241B (en) 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916929.9A CN113660241B (en) 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113660241A CN113660241A (en) 2021-11-16
CN113660241B true CN113660241B (en) 2023-05-23

Family

ID=78479468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916929.9A Active CN113660241B (en) 2021-08-11 2021-08-11 Automatic penetration test method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113660241B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115396156B (en) * 2022-07-29 2024-06-21 中国人民解放军国防科技大学 Vulnerability priority processing method based on deep reinforcement learning
CN116112278A (en) * 2023-02-17 2023-05-12 西安电子科技大学 Q-learning-based network optimal attack path prediction method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106549950A (en) * 2016-11-01 2017-03-29 南京理工大学 A kind of matrix method for visualizing based on state attacking and defending figure
US11425157B2 (en) * 2018-08-24 2022-08-23 California Institute Of Technology Model based methodology for translating high-level cyber threat descriptions into system-specific actionable defense tactics
CN109407676B (en) * 2018-12-20 2019-08-02 哈尔滨工业大学 The Obstacle Avoidance learnt based on DoubleDQN network and deeply
CN110309658B (en) * 2019-06-27 2021-02-05 暨南大学 Unsafe XSS defense system identification method based on reinforcement learning
CN112221149B (en) * 2020-09-29 2022-07-19 中北大学 Artillery and soldier continuous intelligent combat drilling system based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贵重 ; .基于ATT&CK的多源数据深度安全检测技术研究.电信工程技术与标准化.2020,(第10期),全文. *

Also Published As

Publication number Publication date
CN113660241A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
Hu et al. Automated penetration testing using deep reinforcement learning
Maeda et al. Automating post-exploitation with deep reinforcement learning
CN113660241B (en) Automatic penetration test method based on deep reinforcement learning
CN111581645B (en) Iterative attack method of automatic penetration test system based on AI
KR100851521B1 (en) Cyber Attack System for Vulnerability Assessment and Method Thereof
CN111475817B (en) Data collection method of automatic penetration test system based on AI
CN111488587B (en) Automatic penetration test system based on AI
CN111488588B (en) Automatic penetration test method based on AI
CN111475818B (en) Penetration attack method of automatic penetration test system based on AI
Zhou et al. NIG-AP: A new method for automated penetration testing
CN114915475B (en) Method, device, equipment and storage medium for determining attack path
CN111049827A (en) Network system safety protection method, device and related equipment
CN116405246A (en) Vulnerability exploitation chain construction technology based on attack and defense combination
CN115037553B (en) Information security monitoring model construction method and device, information security monitoring model application method and device, and storage medium
Ashtiani et al. A distributed simulation framework for modeling cyber attacks and the evaluation of security measures
Kotenko et al. NETWORK SECURITY EVALUATION BASED ON SIMULATION OF MALFACTOR’S BEHAVIOR
CN116545687A (en) Automatic network simulation attack framework based on attack tree and deep reinforcement learning
CN111488586B (en) Automatic permeation testing system post-permeation method based on AI
CN114386042A (en) Method suitable for deduction of power enterprise network war chess
Zhao et al. A hybrid ranking approach to estimate vulnerability for dynamic attacks
CN115242487B (en) APT attack sample enhancement and detection method based on meta-behavior
Kayacik et al. Automatically evading IDS using GP authored attacks
Conti et al. Bio-inspired security analysis for IoT scenarios
Liu et al. Efficient Defense Decision‐Making Approach for Multistep Attacks Based on the Attack Graph and Game Theory
CN113923007A (en) Safety penetration testing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 571924 Room 301, 3rd floor, building A09, Hainan Ecological Software Park, Laocheng hi tech Industrial Demonstration Zone, Chengmai County, Haikou City, Hainan Province

Applicant after: Jizhi (Hainan) Information Technology Co.,Ltd.

Address before: 571924 Room 301, 3rd floor, building A09, Hainan Ecological Software Park, Laocheng hi tech Industrial Demonstration Zone, Chengmai County, Haikou City, Hainan Province

Applicant before: Zhongdian Jizhi (Hainan) Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant