CN116383826A - Binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning - Google Patents
Binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN116383826A CN116383826A CN202310302345.1A CN202310302345A CN116383826A CN 116383826 A CN116383826 A CN 116383826A CN 202310302345 A CN202310302345 A CN 202310302345A CN 116383826 A CN116383826 A CN 116383826A
- Authority
- CN
- China
- Prior art keywords
- sample
- state
- mutation
- binary
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000002787 reinforcement Effects 0.000 title claims abstract description 20
- 238000005457 optimization Methods 0.000 title claims abstract description 19
- 238000004540 process dynamic Methods 0.000 title claims abstract description 8
- 230000035772 mutation Effects 0.000 claims abstract description 79
- 230000008569 process Effects 0.000 claims abstract description 29
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 19
- 230000009471 action Effects 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 description 12
- 238000010801 machine learning Methods 0.000 description 6
- 238000005065 mining Methods 0.000 description 6
- 230000001960 triggered effect Effects 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000012038 vulnerability analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Virology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning, which comprises the following steps: the binary program input sample of the fuzzy test is used as an environment state and is input into a deep reinforcement learning model, and the environment state represents the corresponding state of the input sample by a byte array method; converting the current sample state into a variant sample state through a variant strategy, and selecting the input sample state of the next time step based on the binary program sample execution path information; calculating the state of the variation sample based on the coverage rate index to obtain feedback rewards; judging whether the mutation strategy is effective or not according to the feedback rewards, optimizing mutation strategy selection, and realizing dynamic optimization of the binary vulnerability discovery process. The invention improves the quality of a mutation generation sample, thereby improving the working efficiency of the binary code fuzzy test, effectively finding and exposing software vulnerabilities, and remarkably improving the quality and the safety level of the software.
Description
Technical Field
The invention relates to the technical field of binary code security, in particular to a binary vulnerability-mining process dynamic optimization method based on deep reinforcement learning.
Background
With the increasing perfection of computer system functions, the system composition is more and more complex, the scale is also more and more large, the software scale is unprecedented, and the exposed software safety problem is also more and more. Software bugs are a non-negligible problem, which may be due to logic flaws left behind by programmers without thought when writing applications. Any implicit imperfection or human error can cause significant loss to individuals and society, as there are often people who exploit these programmatic vulnerabilities to attack the destruction system.
Whether software has vulnerabilities is a determining factor that affects the security of the information system. Although it is impossible to completely avoid the occurrence of the bug in the whole life cycle of the software system, if the bug can be detected and identified in time, then the security repair patch is released, and the security of the computer software can be ensured to a great extent. The method for accurately detecting and identifying the software bug, further analyzing the risk degree, the generation reason and the utilized mode of the bug, and releasing the safety repair patch in time is a significant activity in the computer software industry.
Vulnerability discovery is a common defense method, and aims to detect vulnerabilities hidden in programs as much as possible quickly and repair them in time. Software vulnerability discovery techniques can be broadly divided into two categories, depending on the object being discovered: the first type is source code vulnerability mining, namely vulnerability mining is carried out on files of open source codes; the other type is binary code vulnerability discovery, namely vulnerability discovery for closed source software. Most software manufacturers do not open source codes of products and only can acquire binary programs for protecting commercial interests and intellectual property rights, so that the vulnerability mining of the binary codes has wide universality, significance and practical value.
Binary code bug detection presents problems compared to source code bug detection, including difficulty in directly extracting program information, bulkiness of assembly code, difficulty in analysis, cumbersome workload, and difficulty and time-consuming detection. With the development of artificial intelligence, machine learning has been developed by training a model using existing data and then using the model to make predictions, and many researchers have applied this technique for vulnerability analysis and mining. Compared with the traditional method, the vulnerability detection model using machine learning can process a large-scale data set, so that the detection speed is improved, and the detection cost is reduced. Meanwhile, the machine learning can liberate manual work due to the characteristic of automatic learning. At present, research on the software vulnerability discovery based on machine learning is still immature, and the existing software vulnerability discovery method based on machine learning often has higher false alarm rate and false alarm rate due to lack of a standard data set and incapability of extracting an effective feature set.
Therefore, it has great significance and prospect to study how to utilize machine learning to conduct software vulnerability discovery and improve the accuracy of vulnerability discovery.
Disclosure of Invention
In view of the above, the invention provides a binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning, which realizes dynamic optimization of a mutation strategy and improves the binary code vulnerability accurate positioning efficiency.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning comprises the following steps:
the binary program input sample of the fuzzy test is used as an environment state and is input into a deep reinforcement learning model, and the environment state represents the corresponding state of the input sample by a byte array method;
converting the current sample state into a variant sample state through a variant strategy, and selecting the input sample state of the next time step based on the binary program sample execution path information;
calculating the state of the variation sample based on the coverage rate index to obtain feedback rewards;
judging whether the mutation strategy is effective or not according to the feedback rewards, optimizing mutation strategy selection, and realizing dynamic optimization of the binary vulnerability discovery process.
Preferably, converting the current state into the mutated sample state by the mutation policy specifically includes:
according to the current sample state s t Obtaining a mutation action a from a mutation action space according to a strategy function pi screening t The following formula is shown:
a t =π(s t )
according to the current sample state s of the input by the mutation action t Performing mutation treatment to obtain a mutation sample state s t ' as shown in the following formula:
s t ′=Mutate(s t ,a t )
wherein, the mutation () is a mutation function.
Preferably, selecting the input sample state of the next time step based on the binary program sample execution path information specifically includes:
setting sample queue Q s And a set P of all path information performed by the existing samples;
if the current sample state at time step t is s t Variant sample state s t ' and a new sample execution path p is generated during execution t Then sample execution path p t Add to queue Q s And update set P, and will s t ' input sample state as next time step; if no new sample execution path p is generated during execution t Then from the effective sample queue Q s Randomly selected samples as input sample state for next time stepAs shown in the following formula:
where random_choose () is a Random selection function, and P represents a set of all path information that has been performed by the existing samples.
Preferably, the feedback rewards are calculated for the variant sample state based on the coverage rate index, and specifically include:
calculating and recording the variant sample state s t ' corresponding execution path information, resulting in record set M t =Execute(s t ′);
Wherein execution () is an execution path recording function, M t Is the record set of execution path information, M t M of each element m ij Representing the slave basic block b i To basic block b j The execution times of the jump edge;
judging m i,j Whether or not is greater than 0, if m i,j >0, then indicating that the jump edge has been performed at least once, the subset of records satisfying this condition is denoted as M t ' as shown in the following formula:
′
M t ={m i,j |m i,j >0,m i,j ∈M t }
according to record subset M t ' sum record set M t The proportion of the skip edges of the current sample to all skip edges of the target program is calculated when the current sample is executed and is used as feedback rewards of the current sample, and the specific calculation formula is as follows:
wherein R is t Representing the proportion of the jump edges to all the jump edges of the object program, size () represents the number of set elements.
Compared with the prior art, the invention discloses a dynamic optimization method for a binary vulnerability discovery process based on deep reinforcement learning, which improves the traditional fuzzy test process based on a deep reinforcement learning model, can reduce randomness and blindness in the variation process of an input sample, increases the generation probability of an effective sample, and improves the quality of the variation generation sample, thereby improving the working efficiency of binary code fuzzy test, effectively finding and exposing software vulnerabilities, and remarkably improving the quality and safety level of software.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a dynamic optimization method of a binary vulnerability discovery process based on deep reinforcement learning.
FIG. 2 is a block diagram of a dynamic optimization method of a binary vulnerability discovery process based on deep reinforcement learning.
FIG. 3 is a diagram showing dynamic optimization of a sample mutation strategy according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning, which comprises the following steps as shown in fig. 1 and 2:
the binary program input sample of the fuzzy test is used as an environment state and is input into a deep reinforcement learning model, and the environment state represents the corresponding state of the input sample by a byte array method;
converting the current sample state into a variant sample state through a variant strategy, and selecting the input sample state of the next time step based on the binary program sample execution path information;
calculating the state of the variation sample based on the coverage rate index to obtain feedback rewards;
judging whether the mutation strategy is effective or not according to the feedback rewards, optimizing mutation strategy selection, and realizing dynamic optimization of the binary vulnerability discovery process.
In this embodiment, for characterizing an input sample, there are a variety of design modes, including constructing an input set space from all substrings in the sample, constructing an input set space from sample bit-format data, and so on. From the standpoint of data mutation granularity, mutation granularity of the character strings is larger, and mutation granularity of the bits is smaller, which is not beneficial to effective mutation of the data. The invention uses byte array method to represent the corresponding state s of the input sample data D, in order to maximize the probability of finding new path, a subset of all element sets of the binary program input sample is set as null, namelyMeanwhile, in order to better utilize the existing data variation history experience, the system sets an effective sample queue Q according to the data variation history experience s And all path information that has been performed by the existing samples is set as a set P. If at time step t the sample state is s t The mutated sample state is s t ' and a new execution path p is generated during execution t Then add it to queue Q s And update set P, and will s t ' as a status input for the next time step; otherwise from the active sample queue Q s As a state input for the next time step, as shown in the following equation:
where random_choose () is a randomly selected function.
In this embodiment, the core of the fuzzy test procedure is to perform data mutation on the sample to obtain a new sample capable of triggering the abnormal state of the target program. In view of comprehensive consideration of performance and efficiency, the method provided by the invention generalizes and summarizes common data mutation methods as a mutation action space A. The specific table is shown below:
TABLE 1 mutation action space
The reinforcement learning model is based on the current sample state s t Filtering and obtaining a mutation action a from a mutation action space according to a strategy function pi () t The following formula is shown:
a t =π(s t )
where pi () is a policy function, s t Is the current sample state, a t Is a mutation action.
According to the selected action, the current input data state s t Performing mutation processing to fully explore the environment state space and the mutation action space and obtain the corresponding state s of the mutated sample with higher path coverage rate t ' as shown in the following formula,
s t ′=Mutate(s t ,a t )
where Mutate () is a mutation function.
In this embodiment, in the conventional fuzzy test process, the test result is determined by whether to trigger a program potential bug, and the expression form is usually whether to monitor whether the target program to be tested enters an abnormal state such as crash or suspension. However, the fuzzy test triggers binary program exception, which is a long and time-consuming process, and if only this is used as a feedback signal of the environment, it is difficult to adjust the mutation strategy in time, so that time and resources are wasted on invalid sample mutation.
In order to solve the problems, the coverage rate index is used for measuring the program execution area covered by the current sample. Samples with larger coverage rate can fully explore the code execution space of the target program, so that abnormal logic of program execution is triggered with higher probability, namely, potential dangerous loopholes of the target program are triggered. Common coverage indicators include, for example, a dedicated coverage, a row coverage, a basic block coverage, a branch coverage, a conditional coverage, an edge coverage, and the like. By preprocessing the target program such as pile insertion, the corresponding coverage rate of the sample can be obtained immediately after the execution of the target program is finished, the value is obviously changed in the continuous fuzzy test process, and the good and bad value of the current sample can be fed back in time, so that the scheduling program can adjust the following mutation strategy selection accordingly.
Compared with other coverage indexes, the coverage rate of the edges can provide relatively more path information, and the coverage rate of the edges is selected as a feedback rewards calculation method.
Calculating a variant sample state s t ' the corresponding execution path information of this time is recorded into the shared memory and is recorded as M t =Execute(s t '), each element m in the record ij Representing a block from a base block b i To another basic block b j The number of execution times of the jump edge. If m is ij >0, then indicating that the jump edge has been performed at least once, the subset of records satisfying the condition is denoted as M t ' as shown in the following formula:
′
M t ={m i,j |m i,j >0,m i,j ∈M t }
wherein m is i,j Representing basic block b i To b j The number of execution times of the jump edge.
The feedback rewards are calculated as follows, namely the proportion of the jump edges of the current sample to all jump edges of the target program when the current sample is executed is as follows:
wherein R is t Representing hopsThe size () represents the number of collection elements, which is the proportion of the edge that the object takes to all the jumps.
In this embodiment, determining whether the mutation policy is valid according to the feedback prize size specifically includes: and directly judging whether the mutation strategy is effective or not through the feedback rewards, if the feedback rewards are larger than a preset value, the mutation strategy is effective, otherwise, the mutation strategy is ineffective.
In this embodiment, as shown in fig. 3, according to the correlation characteristics of the sample mutation process, the present invention proposes a method for dynamically optimizing the sample mutation strategy. Aiming at the problem that only the current mutation strategy is considered to be executed by the reward function and then the branch detection information is divided, a sample state s is introduced t All historical execution path information of a strategy function pi () of a mutation action in the input sample mutation process is counted to obtain information of triggering new branch times, and an average triggering new branch update rate is defined aiming at distribution information of mutation strategies generating triggerable new branchesThe mutation strategy is used for measuring the distribution condition of the new branches triggered in the whole mutation process. On the basis, a reward function of the total number of times of triggering new branches based on mutation strategies and the average new branch update rate is formed>And the dynamic and accurate adjustment is further implemented on the reward feedback by utilizing all the historical execution path information, so that the dynamic optimization of the sample mutation strategy is realized, and the sample mutation efficiency in the whole defect mining process is improved.
(1) Rewards for triggering new branch total times based on mutation strategy
In the continuous mutation process of the sample state, the mutation strategy with more number of triggered new branches has stronger optimizing capability on the input sample, and rewards should be given to improve the priority of the mutation strategy in the next execution. In the existing reward function, only the condition that the mutation strategy function pi () is optimized for the input sample in the current mutation operation is considered, and the condition of triggering the new branch times is not considered, so that the reward function needs to pay attention to the information of the total times of triggering the new branch by the sample pairs in the sample mutation process.
(2) Rewards for average triggering of new branch update rates based on mutation policies
In the mutation process of the input sample, besides the total number of new branches of the branch detection result in the history execution process of the mutation strategy can evaluate the optimizing capability of the new branches on the input sample, the average triggering of the new branch update rate also needs to be focused, and especially the occurrence sequence of the mutation strategy causes the influence on the average triggering of the new branch update rate. For example, in the 30 mutation processes, two mutation strategies are updated on the branch detection result in the 10 mutation processes, but one mutation strategy is that the branch detection result is updated in the first 10 mutation processes, the other mutation strategy is that the branch detection result is updated in the adjacent 10 mutation processes, the average trigger new branch update rate of the mutation strategies using the adjacent 10 mutation processes is higher, the probability of updating the branch detection result for the next mutation of the input sample is higher, and the priority should be higher.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (4)
1. The binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning is characterized by comprising the following steps of:
the binary program input sample of the fuzzy test is used as an environment state and is input into a deep reinforcement learning model, and the environment state represents the corresponding state of the input sample by a byte array method;
converting the current sample state into a variant sample state through a variant strategy, and selecting the input sample state of the next time step based on the binary program sample execution path information;
calculating the state of the variation sample based on the coverage rate index to obtain feedback rewards;
judging whether the mutation strategy is effective or not according to the feedback rewards, optimizing mutation strategy selection, and realizing dynamic optimization of the binary vulnerability discovery process.
2. The method for dynamically optimizing a binary vulnerability discovery process based on deep reinforcement learning of claim 1, wherein converting the current state into a variant sample state by a variant strategy specifically comprises:
according to the current sample state s t Obtaining a mutation action a from a mutation action space according to a strategy function pi screening t The following formula is shown:
a t =π(s t )
according to the current sample state s of the input by the mutation action t Performing mutation treatment to obtain a mutation sample state s t ' as shown in the following formula:
s t ′=Mutate(s t ,a t )
wherein, the mutation () is a mutation function.
3. The method for dynamically optimizing a binary vulnerability discovery process based on deep reinforcement learning of claim 1, wherein selecting the input sample state of the next time step based on the binary program sample execution path information specifically comprises:
setting sample queue Q s And a set P of all path information performed by the existing samples;
if the current sample state at time step t is s t Variant sample state s t ' and a new sample execution path p is generated during execution t Then sample execution path p t Add to queue Q s And update set P, and will s t ' input sample state as next time step; if no new sample execution path p is generated during execution t Then from the effective sample queue Q s As the input sample state for the next time step, as shown in the following equation:
where random_choose () is a Random selection function, and P represents a set of all path information that has been performed by the existing samples.
4. The dynamic optimization method for the binary vulnerability discovery process based on deep reinforcement learning of claim 1, wherein the feedback rewards are calculated for the variant sample states based on the coverage index, and specifically comprise:
calculating and recording the variant sample state s t ' corresponding execution path information, resulting in record set M t =Execute(s t ′);
Wherein execution () is an execution path recording function, M t Is the record set of execution path information, M t M of each element m ij Representing the slave basic block b i To basic block b j The execution times of the jump edge;
judging m i,j Whether or not is greater than 0, if m i,j >0, then indicating that the jump edge has been performed at least once, the subset of records satisfying this condition is denoted as M t ' as shown in the following formula:
′
M t ={m i,j |m i,j >0,m i,j ∈M t }
according to record subset M t ' sum record setM t The proportion of the skip edges of the current sample to all skip edges of the target program is calculated when the current sample is executed and is used as feedback rewards of the current sample, and the specific calculation formula is as follows:
wherein R is t Representing the proportion of the jump edges to all the jump edges of the object program, size () represents the number of set elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310302345.1A CN116383826A (en) | 2023-03-27 | 2023-03-27 | Binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310302345.1A CN116383826A (en) | 2023-03-27 | 2023-03-27 | Binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116383826A true CN116383826A (en) | 2023-07-04 |
Family
ID=86972485
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310302345.1A Pending CN116383826A (en) | 2023-03-27 | 2023-03-27 | Binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383826A (en) |
-
2023
- 2023-03-27 CN CN202310302345.1A patent/CN116383826A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shen et al. | Automating performance bottleneck detection using search-based application profiling | |
Murtaza et al. | A host-based anomaly detection approach by representing system calls as states of kernel modules | |
Jia et al. | An analysis and survey of the development of mutation testing | |
CN109032942A (en) | A kind of fuzz testing frame based on AFL | |
Michael et al. | Two state-based approaches to program-based anomaly detection | |
CN102045358A (en) | Intrusion detection method based on integral correlation analysis and hierarchical clustering | |
Liu et al. | Explainable ai for android malware detection: Towards understanding why the models perform so well? | |
CN114757468B (en) | Root cause analysis method for process execution abnormality in process mining | |
CN114374541A (en) | Abnormal network flow detector generation method based on reinforcement learning | |
Zhang et al. | CBUA: A probabilistic, predictive, and practical approach for evaluating test suite effectiveness | |
Cheng et al. | Research on audit log association rule mining based on improved Apriori algorithm | |
CN114139164A (en) | Variation method for kernel fuzzy test of trusted operating system | |
CN114756471A (en) | Vulnerability type oriented fuzzy test method and system based on byte sensitive energy distribution | |
Lin et al. | Browser fuzzing by scheduled mutation and generation of document object models | |
Malik et al. | Detecting android security vulnerabilities using machine learning and system calls analysis | |
Sun et al. | AFLTurbo: Speed up path discovery for greybox fuzzing | |
Huang et al. | Dissimilarity‐based test case prioritization through data fusion | |
CN109918901A (en) | The method that real-time detection is attacked based on Cache | |
CN111400718B (en) | Method and device for detecting system vulnerability and attack and related equipment | |
CN116383826A (en) | Binary vulnerability discovery process dynamic optimization method based on deep reinforcement learning | |
CN114661577B (en) | Fuzzy test method based on deterministic strategy and coverage guidance | |
Dang et al. | Multi-task optimization-based test data generation for mutation testing via relevance of mutant branch and input variable | |
CN110059966A (en) | The contribution analysis method and device of influence factor | |
Joffe et al. | Directing a search towards execution properties with a learned fitness function | |
CN114462043A (en) | Java anti-serialization vulnerability detection system and method based on reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |