CN115102705A - Automatic network security detection method based on deep reinforcement learning - Google Patents

Automatic network security detection method based on deep reinforcement learning Download PDF

Info

Publication number
CN115102705A
CN115102705A CN202210355346.8A CN202210355346A CN115102705A CN 115102705 A CN115102705 A CN 115102705A CN 202210355346 A CN202210355346 A CN 202210355346A CN 115102705 A CN115102705 A CN 115102705A
Authority
CN
China
Prior art keywords
host
tested
environment
network security
security detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210355346.8A
Other languages
Chinese (zh)
Other versions
CN115102705B (en
Inventor
张旻
李倩玉
郑敬华
胡淼
李阳
施凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210355346.8A priority Critical patent/CN115102705B/en
Publication of CN115102705A publication Critical patent/CN115102705A/en
Application granted granted Critical
Publication of CN115102705B publication Critical patent/CN115102705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides an automatic network security detection method based on deep reinforcement learning, which comprises the following steps: the method comprises the steps of constructing an environment information acquisition module, wherein the environment information acquisition module is used for scanning environment information from a network to be tested and a host to be tested; based on the environment information acquisition module, acquiring basic information scanned from an environment, and constructing a state matrix of the environment to be detected which can be understood by the intelligent agent; constructing an intelligent agent; determining a behavior strategy of the intelligent agent based on the acquired environment state matrix to be tested and the reward information of the reward module; and based on the behavior strategy of the intelligent agent, taking a behavior, determining the execution result and influence of the behavior in the environment to be tested, updating the reward information of the reward module, and guiding the intelligent agent to update the neural network parameters. According to the scheme of the invention, the automatic network security detection of the network environment is effectively realized, so that the problems of high cost and high professional technical requirement of manual network security detection are solved.

Description

Automatic network security detection method based on deep reinforcement learning
Technical Field
The invention relates to the field of network space security, in particular to an automatic network security detection method based on deep reinforcement learning.
Background
At present, more and more data and services are delivered through an electronic platform, so that the rapid development of network infrastructure is promoted, the risk of network vulnerabilities is increased, and the security of modern network systems and infrastructures is guaranteed to become a vital challenge facing the security field. As a security exercise, network security Testing (PT) employs an attack gesture to perform authorized simulated attacks on a computer system, and the attack security assessment method finds the usefulness of a vulnerability undisputable. Sarraute et al divides network security detection into 6 main steps of information collection, attack and network security detection, local information collection, rights elevation, movement, footprint clearing and the like, and the detailed content of each step is shown in FIG. 1. To date, PT has been performed primarily by trained human attackers, the success of which is critically dependent on expert expertise.
PT is a time-consuming, labor-intensive, and complex task, requires expert knowledge about the target system and potential attacks that may be made on it, requires a great deal of implicit knowledge, is difficult to formalize, and is prone to human error. In any field, automation is the best solution to save time and resources, and automating the PT process provides a way to effectively solve this problem. Currently, many frameworks and tools have been developed to support automated PTs, such as Metasploit, Nessus, and Tenable, among others. However, the existing tool cannot replace manual work to make intelligent decision, and cannot realize the intellectualization and full automation of network security detection. Therefore, automated network security detection remains a challenging task.
With the development of artificial intelligence in various fields, security professionals begin to implement automated network security detection based on AI. Therefore, it is important to recognize the use of artificial intelligence from the viewpoint of attack and defense. Reinforcement Learning (RL) is a type of machine Learning that learns by exploration of the environment and accumulation of experience, and the intelligence of the RL can adapt itself to a real-time, continuous environment without a priori data sets. The ability of the RL suggests that it is well suited for solving the PT problem.
In 2013, Sarraute et al used a Partially Observable Markov Decision Process (POMDP) to formalize the PT problem. Meanwhile, a 4AL decomposition algorithm is established, a large network is divided into smaller networks according to a network structure, and the smaller networks are solved one by one through POMDP. In 2018, Ghanem and Chen propose an intelligent network security detection method using reinforcement learning, a system is modeled as a POMDP, and an external POMDP solver is used for testing. The research result based on the POMDP confirms the hypothesis that the reinforcement learning can improve the accuracy and reliability of the network security detection. However, most of these works are limited to the planning phase, not the entire implementation phase in a real environment. In addition, in 2014, Durkota et al proposed an algorithm for calculating an optimal attack strategy with an attack graph of action cost and failure probability, which converts the optimal path planning problem of the attack graph into a Markov Decision Process (MDP), and generates the optimal attack strategy to guide network security detection. In 2019, the Monday and others describe network security detection as MDP, and propose an attack planning (NIG-AP) algorithm based on network information gain. In 2020, Hu et al build an automatic network security detection framework based on deep reinforcement learning, and automatically find the optimal attack path of a given topological structure. In 2021, Zennaro et al formalized the simple CTF topic as a network security detection problem that was solved based on model-free reinforcement learning. The MDP-based reinforcement learning can perform network security detection in a searching manner without prior knowledge, however, due to the complexity of the environment of actual network security detection, the attempt only stays in a scene with a small scale and a simple host structure.
Disclosure of Invention
In order to solve the technical problems, the invention provides an automatic network security detection method based on deep reinforcement learning, which is used for solving the technical problems of low intellectualization degree, low automation degree and low practicability of the automatic network security detection method in the prior art.
According to a first aspect of the invention, an automatic network security detection method based on deep reinforcement learning is provided, and the method comprises the following steps:
step S1: the method comprises the following steps of constructing an environment information acquisition module, wherein the environment information acquisition module is used for scanning and finding the following basic information from a network to be tested and a host to be tested: the operating system, the survival port and the service information are numbered and stored for subsequent processing and state updating according to the discovery sequence;
step S2: judging whether a preset target is reached, if so, ending the method; if not, go to step S3; the preset target is used for realizing network security detection on a specific target;
step S3: acquiring basic information scanned from an environment based on the environment information acquisition module, collecting and sorting the serial number of the host to be tested, the configuration of the host to be tested, the authority of the host to be tested and the selected times of the host to be tested, and constructing a state matrix of the environment to be tested which can be understood by the intelligent agent;
step S4: constructing an intelligent agent; determining a behavior strategy of the intelligent agent based on the environment state matrix to be tested and the reward information of the reward module;
step S5: and based on the behavior strategy of the intelligent agent, taking a behavior, determining the execution result and influence of the behavior in the environment to be tested, updating the reward information of the reward module, guiding the intelligent agent to update the neural network parameters, and entering the step S2.
Preferably, the step S2, wherein:
the preset target is to realize network security detection on a specific target, and comprises the steps of starting from a certain starting host, performing network security detection on a certain specific host in a network environment and/or performing network security detection on a single host.
Preferably, the step S3, wherein:
based on the type of network security detection, the intelligent agent scans the environment to be detected: if the network security detection type is network security detection, starting from a certain host to be detected of the network to be detected, scanning network information to be detected, and determining potential bugs of the scanned host to be detected based on feedback information; determining a detection mechanism for each host to be detected based on the potential loopholes of each host to be detected; and if the network security detection type is single-host network security detection, taking the single host as the current host to be detected, determining the potential vulnerability of the current host to be detected, configuring the authority for a detection mechanism, and detecting the current host to be detected by selecting the behavior corresponding to the potential vulnerability of the current host to be detected.
Preferably, the state information structure determines a state matrix of the environment to be measured, and the state matrix is used as the input of the agent and contains the understanding of the agent on the environment;
the environment state matrix to be measured is expressed as follows:
[h i |p 0 (h i ) ... p m (h i )|privilege(h i )|times(h i )]
wherein h is i Indicating the number of the current host to be tested, m is the number of all configuration information of the host to be tested, privilege (h) i ) Indicating the authority of the agent on the current host under test, p num (h i ) Indicates the host h to be tested i The configuration information owned by time (h) i ) Indicates the host h to be tested i Is selected a number of times.
Preferably, in step S5, the updating the reward information of the reward module based on the behavior policy of the agent includes:
if the agent successfully completes the action, the reward is obtained, and the reward is designed to be a fixed value of 20;
if the intelligent agent obtains the specific authority, obtaining the reward, wherein the reward is designed to be a fixed value of 50;
if the intelligent agent selects the error behavior, obtaining punishment, wherein the punishment is designed to be a fixed value of 50; the error behavior includes but is not limited to unexecutable behavior, repetitive behavior;
and if the behavior of the intelligent agent fails, punishment is obtained, and the punishment is designed to be a fixed value 20.
According to a second aspect of the present invention, there is provided an automatic network security detection apparatus based on deep reinforcement learning, the apparatus comprising:
an environment information module: the method comprises the following steps of configuring an environment information acquisition module, wherein the environment information acquisition module is used for scanning and finding the following basic information from a network to be tested and a host to be tested: the operating system, the survival port and the service information are numbered and stored for subsequent processing and state updating according to the discovery sequence;
a judging module: the method comprises the steps of configuring to judge whether a preset target is reached, wherein the preset target is to realize network security detection on a specific target;
the to-be-tested environment state matrix construction module comprises: the environment information acquisition module is configured to acquire basic information scanned from an environment, collect and arrange the serial number of the host to be tested, the configuration of the host to be tested, the authority of the host to be tested and the selected times of the host to be tested, and construct a state matrix of the environment to be tested which can be understood by the intelligent agent;
the intelligent agent building module: configured to build an agent; determining a behavior strategy of the intelligent agent based on the environment state matrix to be tested and the reward information of the reward module;
reward punishment module: the intelligent agent behavior monitoring system is configured to take behaviors based on behavior strategies of the intelligent agent, determine execution results and influences of the behaviors in the environment to be tested, update reward information of the reward module, guide the intelligent agent to update neural network parameters and trigger the judgment module.
According to a third aspect of the present invention, there is provided an automated network security detection system based on deep reinforcement learning, including:
a processor for executing a plurality of instructions;
a memory for storing a plurality of instructions;
wherein the plurality of instructions are for storage by the memory and for loading and executing the method as previously described by the processor.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having a plurality of instructions stored therein; the plurality of instructions for being loaded by a processor and performing the method as described above.
According to the scheme of the invention, the method constructs an automatic model which is more in line with the real network security detection process, and solves the problems of low intellectualization and automation degree and low practicability in the existing automatic network security detection method. The method of the invention adopts a deep reinforcement learning technology, abstracts and models the actual network security detection process into a Markov decision process, solves the problems and realizes the following effects: (1) the invention provides a new idea for constructing an automatic network security detection model based on deep reinforcement learning, which formalizes the real network security detection process, enhances the normalization and the understandability of the network security detection process, is convenient for professional and non-professional security personnel to clearly understand and realize network security detection, and effectively saves the labor cost; (2) by using the method, the automatic exploration of the network security detection of the states of the network and the host can be realized, and the attack path reaching the target host is generated, so that the vulnerability can be compensated in the later period, and the security of the network and the host can be enhanced; (3) the automatic network security detection model learned and generated by the method can be suitable for similar and simple network environments without repeatedly training intelligent agents, and has high flexibility and mobility.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are provided for illustration purposes. In the drawings:
FIG. 1 illustrates the main steps of network security detection in the prior art;
FIG. 2 is a flowchart of an automated network security detection method based on deep reinforcement learning according to an embodiment of the present invention;
figure 3 is a diagram of a markov decision process according to one embodiment of the present invention;
FIG. 4 is a schematic diagram of an automated network security inspection model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an automated network security detection model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an intelligent neural network architecture according to an embodiment of the present invention;
fig. 7 is a block diagram of an automated network security detection device based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, a flowchart of an automated network security detection method based on deep reinforcement learning according to an embodiment of the present invention is described with reference to fig. 2. As shown in fig. 2, 4 and 5, the method comprises the following steps:
step S1: the method comprises the following steps of constructing an environment information acquisition module, wherein the environment information acquisition module is used for scanning and finding the following basic information from a network to be tested and a host to be tested: the operating system, the survival port and the service information are numbered and stored for subsequent processing and state updating according to the discovery sequence;
step S2: judging whether a preset target is reached, if so, ending the method; if not, go to step S3; the preset target is used for realizing network security detection on a specific target;
step S3: acquiring basic information scanned from an environment based on the environment information acquisition module, collecting and sorting the serial number of the host to be tested, the configuration of the host to be tested, the authority of the host to be tested and the selected times of the host to be tested, and constructing a state matrix of the environment to be tested which can be understood by the intelligent agent;
step S4: constructing an intelligent agent; determining a behavior strategy of the intelligent agent based on the environment state matrix to be tested and the reward information of the reward module;
step S5: and based on the behavior strategy of the intelligent agent, taking a behavior, determining the execution result and influence of the behavior in the environment to be tested, updating the reward information of the reward module, guiding the intelligent agent to update the neural network parameters, and entering the step S2.
The invention realizes the automation and the intellectualization of the network security detection process based on the deep reinforcement learning, firstly formally represents the network security detection process in the real environment as a Markov decision process, then constructs the reinforcement learning state, behavior and reward information based on the formalized representation, then constructs the Q value of a deep neural network fitting intelligent body according to the scale of the state and the behavior information, and finally constructs a complete model to realize the automatic network security detection.
The network security detection is a dynamic interactive process, which comprises modules of information collection, remote attack, local attack, transverse movement and the like, and the network security detection needs to be formalized into a Markov decision process, so that the problem can be solved based on the relevant technology of reinforcement learning. However, the topology structure in the real network environment is complex and contains massive configuration information, and formal representation is difficult, so the method extracts common information in the real network security detection scene, analyzes and summarizes common artificial expert operation, and constructs a simulation environment simulation dynamic interaction process.
The network security detection has high requirements on the knowledge and skill of security experts, is a task which has high complexity and high cost and needs to be continuously carried out, and an effective method for solving the task is to automate the process. The invention aims at the requirements of automatic and intelligent network security detection and builds a model more conforming to a real scene based on deep reinforcement learning.
The invention firstly needs to formalize and characterize the network security detection process, and represents the network security detection process as a Markov decision process. The Markov decision process is composed of four parts of state, behavior, reward and probability model, and the interactive process is shown in figure 3, wherein S represents the state, A represents the behavior, R represents the reward, and P represents the probability of adopting the behavior to reach the next state in the current state.
The idea of the invention is to formalize and characterize the network security detection process, and represent it as a Markov decision process.
The step S2, wherein:
the preset target is to realize network security detection of a specific target, and comprises the network security detection of a specific host in a network environment and/or the network security detection of a single host from a certain starting host.
The step S3, wherein:
the state information structure determines a state matrix of the environment to be detected, the state matrix is used as the input of the intelligent agent and contains the understanding of the intelligent agent to the environment, and the state information in the information structure should fully consider various information in the environment as far as possible, including a network topology structure to be detected for network security, configuration information contained in a host in a network and the like.
Based on the type of network security detection, the intelligent agent scans the environment to be detected: if the network security detection type is network security detection, starting from a certain host to be detected of the network to be detected, scanning network information to be detected, and determining potential bugs of the scanned host to be detected based on feedback information; and determining a detection mechanism for each host to be detected based on the potential loopholes of each host to be detected. For example, different behavior strategies and different priorities are adopted to probe each host to be tested. And if the network security detection type is single-host network security detection, taking the single host as the current host to be detected, determining the potential vulnerability of the current host to be detected, configuring the authority for a detection mechanism, and detecting the current host to be detected by selecting the behavior corresponding to the potential vulnerability of the current host to be detected.
The network security detection and the single-host network security detection are consistent with the sequential decision process of the Markov decision process, and the network security detection process is formalized into the sequential decision process.
The environment state matrix to be tested is also an environment network communication state matrix to be tested, and is an n multiplied by n matrix, wherein n represents the number of hosts to be tested in the network, and the configuration information of a single host to be tested is represented as a one-dimensional matrix with the size of m, wherein m is the number of all configuration information of the hosts to be tested, privilege (h) i ) Indicating the authority of the agent on the current host to be tested. The state matrix at this time is represented as follows:
Figure RE-GDA0003819323970000091
in this definition, the size of the network connectivity information matrix is n (n + m +1), and in this definition, when the scale of the network or the range of the configuration information changes, the scale of the network connectivity information matrix changes, so that the model may need to be retrained when the network environment is changed, and the mobility and the practicability are not strong enough.
For the network scale, the range of the network is difficult to define from one network to thousands of networks, the matrix is easy to be sparse when the range is too large, and the matrix overflows when the range is too small. Therefore, in consideration of practicability, the network connectivity information matrix is further modified, and the improved network connectivity information matrix is represented as follows:
[h i |p 0 (h i ) ... p m (h i )|privilege(h i )|times(h i )]
wherein h is i Indicating the current host number to be tested, p num (h i ) Indicates the configuration information, time (h) owned by the host under test i ) The number of times of the host to be tested is represented, and num is more than or equal to 0 and less than or equal to m. For example, 11 operating systems, 12 services, 11 service versions, and 51 common ports are selected from configuration information of a host to be tested, and the selected common configuration information is shown in table 1:
Figure RE-GDA0003819323970000092
table 1 host common configuration information
The step S4, wherein:
the intelligent agent determines behaviors based on the output, wherein the behaviors are the output of the intelligent agent and represent the decision made by the intelligent agent aiming at the current environment, for the intelligent agent for reinforcement learning, the intelligent agent cannot generate new behaviors by self, and only can select a certain behavior from a behavior library according to the learned experience, so that the behavior information of experts is fully considered to construct a proper behavior library, and the behaviors at least comprise scanning, local and remote vulnerability exploitation, transverse movement and the like.
As shown in fig. 6, the agent is a neural network model, and the neural network model includes three fully connected layers; the training process of the neural network model comprises the following steps: and updating the parameters of the neural network model through the output of the full connection layer until the precision reaches a preset threshold value. When designing a state information structure and a behavior module, in order to improve the practicability of the state information structure and the behavior module in a real scene, environmental information and available action information in the real scene are fully considered, so that the problem of huge state space and behavior space is inevitable. Therefore, the invention adopts the deep neural network to carry out nonlinear function approximation so as to fit the action value function.
In this embodiment, initially, the bonus information of the bonus module is 0. Determining a behavior strategy of the agent based on the network communication information matrix and the host configuration matrix, wherein the behavior strategy comprises the following steps:
analyzing the state of the environment to be tested input into the intelligent agent, observing the configuration information contained in the current host to be tested, judging the possible local loophole and the host information connected with the current host to be tested, and judging the possible remote loophole and the certificate information which can be connected to other hosts to be tested.
Therefore, for behavior information selectable by an agent, the invention divides network security detection behaviors into three categories, namely local exploit operation for realizing host authorization, remote exploit operation for attacking other hosts and mobile operation connected to other hosts. For common host configuration information, the listed part of optional actions are classified according to behaviors as shown in table 2.
Figure RE-GDA0003819323970000101
Table 2 common behavior information
For example: based on the definition of the state and the behavior in the automated network security detection model, the input state matrix size is 80, and the output matrix size is 20. Designing a neural network architecture as shown in the following figure, the network architecture comprises three fully-connected layers, the scale of the fully-connected layer is 256,128 and 64 respectively, and each row is output through the fully-connected layer
Is a corresponding value as the O value according to
Figure RE-GDA0003819323970000111
And updating parameters and training the intelligent agent.
In the embodiment, when the network environments are different, the states and the actions which can be taken are also different, so that the model trained aiming at the current network is difficult to be suitable for other networks, and the constructed automatic network security detection model has low applicability, therefore, various factors in a basic real network security detection scene are considered as fully as possible when the states and the actions are designed, so that the model mobility and the applicability are improved.
The step S5, wherein:
the reward is feedback of the behavior of the intelligent agent, influences the correctness and the convergence rate of the decision of the intelligent agent, and consists of positive feedback obtained by the intelligent agent by adopting the correct behavior and negative feedback obtained by adopting the wrong behavior. The invention divides the reward into a positive feedback part and a negative feedback part.
Updating the reward information of the reward module based on the behavior strategy of the intelligent agent, wherein the reward information comprises:
if the agent successfully completes the action, the reward is obtained, and the reward is designed to be a fixed value of 20;
if the agent receives a certain right, a reward is obtained, said reward being designed with a fixed value 50. Particularly, in order to guide the network security detection capability of the intelligent agent on the target host to be detected, a larger reward with a fixed value of 100 is designed for the target host to be detected, so that the intelligent agent is encouraged to learn an effective strategy;
if the intelligent agent selects the wrong behavior, punishment is obtained, and the punishment is designed to be a fixed value 50; the error behavior includes but is not limited to unexecutable behavior, repetitive behavior;
and if the behavior of the intelligent agent fails, punishment is obtained, and the punishment is designed to be a fixed value 20.
Fig. 7 is a schematic structural diagram of an automated network security detection apparatus based on deep reinforcement learning according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes:
an environment information module: the method comprises the following steps of configuring an environment information acquisition module, wherein the environment information acquisition module is used for scanning and finding the following basic information from a network to be tested and a host to be tested: the operating system, the survival port and the service information are numbered and stored for subsequent processing and state updating according to the discovery sequence;
a judging module: the method comprises the steps of judging whether a preset target is reached, wherein the preset target is to realize network security detection on a specific target;
the to-be-tested environment state matrix construction module comprises: the environment information acquisition module is configured to acquire basic information scanned from an environment, collect and arrange the serial number of the host to be tested, the configuration of the host to be tested, the authority of the host to be tested and the selected times of the host to be tested, and construct a state matrix of the environment to be tested which can be understood by the intelligent agent;
the intelligent agent building module: configured to build an agent; determining a behavior strategy of the intelligent agent based on the environment state matrix to be tested and the reward information of the reward module;
module is punished to award: the intelligent agent behavior monitoring system is configured to take behaviors based on behavior strategies of the intelligent agent, determine execution results and influences of the behaviors in the environment to be tested, update reward information of the reward module, guide the intelligent agent to update neural network parameters and trigger the judgment module.
The embodiment of the invention further provides an automatic network security detection system based on deep reinforcement learning, which comprises:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the plurality of instructions are for storage by the memory and for loading and executing the method as previously described by the processor.
The embodiment of the invention further provides a computer readable storage medium, wherein a plurality of instructions are stored in the storage medium; the plurality of instructions for being loaded by a processor and performing the method as described above.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and in actual implementation, there may be other divisions, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may also be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a physical machine Server, or a network cloud Server, etc., and needs to install a Windows or Windows Server operating system) to perform some steps of the method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are still within the scope of the technical solution of the present invention.

Claims (8)

1. An automatic network security detection method based on deep reinforcement learning is characterized by comprising the following steps:
step S1: the method comprises the following steps of constructing an environment information acquisition module, wherein the environment information acquisition module is used for scanning and finding the following basic information from a network to be tested and a host to be tested: the operating system, the survival port and the service information are numbered and stored for subsequent processing and state updating according to the discovery sequence;
step S2: judging whether a preset target is reached, if so, ending the method; if not, go to step S3; the preset target is used for realizing network security detection on a specific target;
step S3: acquiring basic information scanned from an environment based on the environment information acquisition module, collecting and sorting the serial number of the host to be tested, the configuration of the host to be tested, the authority of the host to be tested and the selected times of the host to be tested, and constructing a state matrix of the environment to be tested which can be understood by the intelligent agent;
step S4: constructing an intelligent agent; determining a behavior strategy of the intelligent agent based on the environment state matrix to be tested and the reward information of the reward module;
step S5: and based on the behavior strategy of the intelligent agent, taking an action, determining the execution result and the influence of the action in the environment to be tested, updating the reward information of the reward module, guiding the intelligent agent to update the neural network parameters, and entering the step S2.
2. The method of claim 1, wherein said step S2, wherein:
the preset target is to realize network security detection of a specific target, and comprises the network security detection of a specific host in a network environment and/or the network security detection of a single host from a certain starting host.
3. The method of claim 2, wherein said step S3, wherein:
based on the type of network security detection, the intelligent agent scans the environment to be detected: if the network security detection type is network security detection, starting from a certain host to be detected of the network to be detected, scanning network information to be detected, and determining potential bugs of the scanned host to be detected based on feedback information; determining a detection mechanism for each host to be detected based on the potential loopholes of each host to be detected; and if the network security detection type is single-host network security detection, the single host is used as the current host to be detected, the potential bug of the current host to be detected is determined, the authority is configured for a detection mechanism, and the current host to be detected is detected by selecting the behavior corresponding to the potential bug of the current host to be detected.
4. The method of claim 3, wherein the state information structure determines a state matrix of the environment under test as input to the agent, including the agent's understanding of the environment;
the environment state matrix to be measured is represented as follows:
[h i |p 0 (h i )...p m (h i )|privilege(h i )|times(h i )]
wherein h is i Indicating the number of the current host to be tested, m is the number of all configuration information of the host to be tested, privilege (h) i ) Indicating the authority of the agent on the current host under test, p num (h i ) Indicates the host h to be tested i The configuration information owned by time (h) i ) Indicates the host h to be tested i Is selected a number of times.
5. The method of claim 4, wherein the step S5, updating the reward information of the reward module based on the behavior policy of the agent, comprises:
if the agent successfully completes the action, obtaining a reward, wherein the reward is designed to be a fixed value of 20;
if the intelligent agent obtains the specific authority, obtaining the reward, wherein the reward is designed to be a fixed value of 50;
if the intelligent agent selects the wrong behavior, punishment is obtained, and the punishment is designed to be a fixed value 50; the error behavior includes but is not limited to unexecutable behavior, repetitive behavior;
and if the behavior of the intelligent agent fails, obtaining punishment, wherein the punishment is designed to be a fixed value 20.
6. An automatic network security detection device based on deep reinforcement learning, which is characterized by comprising:
an environment information module: the environment information acquisition module is configured to be constructed and used for scanning and finding the following basic information from a network to be tested and a host to be tested: the operating system, the survival port and the service information are numbered and stored for subsequent processing and state updating according to the discovery sequence;
a judgment module: the method comprises the steps of configuring to judge whether a preset target is reached, wherein the preset target is to realize network security detection on a specific target;
the to-be-tested environment state matrix construction module comprises: the environment information acquisition module is configured to acquire basic information scanned from an environment, collect and arrange the serial number of the host to be tested, the configuration of the host to be tested, the authority of the host to be tested and the selected times of the host to be tested, and construct a state matrix of the environment to be tested which can be understood by the intelligent agent;
the intelligent agent building module: configured to build an agent; determining a behavior strategy of the intelligent agent based on the environment state matrix to be tested and the reward information of the reward module;
reward punishment module: the intelligent agent behavior monitoring system is configured to take behaviors based on behavior strategies of the intelligent agent, determine execution results and influences of the behaviors in the environment to be tested, update reward information of the reward module, guide the intelligent agent to update neural network parameters and trigger the judgment module.
7. An automated network security detection system based on deep reinforcement learning, comprising:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the plurality of instructions are for storage by the memory and for loading and execution by the processor of the method of any one of claims 1-5.
8. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for being loaded by a processor and for performing the method of any one of claims 1 to 5.
CN202210355346.8A 2022-04-02 2022-04-02 Automatic network security detection method based on deep reinforcement learning Active CN115102705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210355346.8A CN115102705B (en) 2022-04-02 2022-04-02 Automatic network security detection method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210355346.8A CN115102705B (en) 2022-04-02 2022-04-02 Automatic network security detection method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN115102705A true CN115102705A (en) 2022-09-23
CN115102705B CN115102705B (en) 2023-11-03

Family

ID=83287957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210355346.8A Active CN115102705B (en) 2022-04-02 2022-04-02 Automatic network security detection method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115102705B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235742A (en) * 2023-11-13 2023-12-15 中国人民解放军国防科技大学 Intelligent penetration test method and system based on deep reinforcement learning
CN117252111A (en) * 2023-11-15 2023-12-19 中国电建集团贵阳勘测设计研究院有限公司 Active monitoring method for hidden danger and dangerous case area of dyke

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225019A (en) * 2019-06-04 2019-09-10 腾讯科技(深圳)有限公司 A kind of network security processing method and device
CN112861442A (en) * 2021-03-10 2021-05-28 中国人民解放军国防科技大学 Multi-machine collaborative air combat planning method and system based on deep reinforcement learning
CN113919485A (en) * 2021-10-19 2022-01-11 西安交通大学 Multi-agent reinforcement learning method and system based on dynamic hierarchical communication network
CN114115342A (en) * 2021-11-19 2022-03-01 南京航空航天大学 Unmanned cluster multi-domain cooperation system and method based on conflict processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225019A (en) * 2019-06-04 2019-09-10 腾讯科技(深圳)有限公司 A kind of network security processing method and device
CN112861442A (en) * 2021-03-10 2021-05-28 中国人民解放军国防科技大学 Multi-machine collaborative air combat planning method and system based on deep reinforcement learning
CN113919485A (en) * 2021-10-19 2022-01-11 西安交通大学 Multi-agent reinforcement learning method and system based on dynamic hierarchical communication network
CN114115342A (en) * 2021-11-19 2022-03-01 南京航空航天大学 Unmanned cluster multi-domain cooperation system and method based on conflict processing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235742A (en) * 2023-11-13 2023-12-15 中国人民解放军国防科技大学 Intelligent penetration test method and system based on deep reinforcement learning
CN117235742B (en) * 2023-11-13 2024-05-14 中国人民解放军国防科技大学 Intelligent penetration test method and system based on deep reinforcement learning
CN117252111A (en) * 2023-11-15 2023-12-19 中国电建集团贵阳勘测设计研究院有限公司 Active monitoring method for hidden danger and dangerous case area of dyke
CN117252111B (en) * 2023-11-15 2024-02-23 中国电建集团贵阳勘测设计研究院有限公司 Active monitoring method for hidden danger and dangerous case area of dyke

Also Published As

Publication number Publication date
CN115102705B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
Homer et al. A sound and practical approach to quantifying security risk in enterprise networks
CN115102705A (en) Automatic network security detection method based on deep reinforcement learning
Gong et al. Evolutionary generation of test data for many paths coverage based on grouping
Zennaro et al. Modelling penetration testing with reinforcement learning using capture‐the‐flag challenges: Trade‐offs between model‐free learning and a priori knowledge
Larsen et al. Statistical Model Checking Past, Present, and Future: (Track Introduction)
CN111475818A (en) Permeation attack method of automatic permeation test system based on AI
CN114021188A (en) Method and device for interactive security verification of federated learning protocol and electronic equipment
CN116032602A (en) Method, device, equipment and storage medium for automatically identifying threat data
CN112783513B (en) Code risk checking method, device and equipment
EP4009586A1 (en) A system and method for automatically neutralizing malware
Puzanov et al. Deep reinforcement one-shot learning for artificially intelligent classification in expert aided systems
CN116545687A (en) Automatic network simulation attack framework based on attack tree and deep reinforcement learning
Ibias et al. SqSelect: Automatic assessment of failed error propagation in state-based systems
Liu et al. Giving feedback on interactive student programs with meta-exploration
Sheikhi et al. Coverage-guided fuzz testing for cyber-physical systems
CN117692242A (en) Network attack path analysis method based on graph analysis
CN114915446B (en) Intelligent network security detection method integrating priori knowledge
Barrero et al. On the statistical distribution of the expected run-time in population-based search algorithms
CN116743468A (en) Dynamic attack path generation method based on reinforcement learning
Dang et al. Multi-task optimization-based test data generation for mutation testing via relevance of mutant branch and input variable
CN114817928A (en) Network space data fusion analysis method and system, electronic device and storage medium
CN114611990A (en) Method and device for evaluating contribution rate of element system of network information system
Di Nardo et al. Evolutionary robustness testing of data processing systems using models and data mutation (T)
CN112131587B (en) Intelligent contract pseudo-random number security inspection method, system, medium and device
Naqvi et al. Adaptive Immunity for Software: Towards Autonomous Self-healing Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant