CN109597767B - Genetic variation-based fuzzy test case generation method and system - Google Patents

Genetic variation-based fuzzy test case generation method and system Download PDF

Info

Publication number
CN109597767B
CN109597767B CN201811554639.9A CN201811554639A CN109597767B CN 109597767 B CN109597767 B CN 109597767B CN 201811554639 A CN201811554639 A CN 201811554639A CN 109597767 B CN109597767 B CN 109597767B
Authority
CN
China
Prior art keywords
test case
data
character string
subset
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811554639.9A
Other languages
Chinese (zh)
Other versions
CN109597767A (en
Inventor
卢凯
周旭
何兴陆
张文喆
王睿伯
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201811554639.9A priority Critical patent/CN109597767B/en
Publication of CN109597767A publication Critical patent/CN109597767A/en
Application granted granted Critical
Publication of CN109597767B publication Critical patent/CN109597767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Abstract

The invention discloses a genetic variation-based fuzzy test case generation method and a system, wherein the method comprises the steps of selecting two seed test cases, and for the data position of a new test case, if the data of the two seeds are the same, the data is inherited to the current data position of the new test case, if the data of the two seeds are different and randomly belong to a character string comparison set extracted by static analysis of a target binary file, the data is randomly mutated into the data in a preset character string comparison set, and otherwise, the data of one seed is randomly selected to be inherited to the current data position of the new test case. The invention inherits the advantages of the test case generation method based on generation and the test case generation method based on variation, simultaneously avoids the corresponding disadvantages of the test case generation method and the test case generation method, can realize the core code of a large-scale fuzzing target program without manual operation, and has the advantages of higher possibility of improving the path coverage rate and easy triggering crash when generating the test case.

Description

Genetic variation-based fuzzy test case generation method and system
Technical Field
The invention relates to the field of vulnerability discovery in the field of computers, in particular to a fuzzy test case generation method and a fuzzy test case generation system based on genetic variation, which are used for providing a vulnerability discovery fuzzy test case for a target program of vulnerability discovery.
Background
Test case generation methods are roughly classified into two types, namely, test case generation based on generation and test case generation based on mutation. The existing test case generation method based on generation is to write a test case generation rule manually, so that a test case can be generated according to a target rule, and the generated test case can bypass an error check code of a target program, so that a core function code of the target program is fuzzing (an automatic software test technology based on defect injection); however, the test case generation method requires a large amount of manual intervention, which results in an excessive labor cost, and meanwhile, the rules of different target programs are different, which results in a poor expandability of the test case generation method based on generation, and is not suitable for fuzzing a large number of different target programs. The existing test case generation method based on variation generates a new test case by randomly varying the existing normal input, so that the generated test case can utilize some information in the existing normal input to bypass error check codes, can directly run without manual operation, and can achieve the effect of higher expansibility only by replacing the normal input aiming at different programs; however, the test case generated by the test case generation method can only reach some codes near the codes from fuzzing to normal test cases, and relatively distant codes or codes with harsh entry conditions are difficult to reach, so that the generated test case is difficult to fuzze all codes of the target program.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a fuzzy test case generation method and a fuzzy test case generation system based on genetic variation, which inherits the respective advantages of the test case generation method based on generation and the test case generation method based on variation, simultaneously avoids the corresponding disadvantages of the test case generation method and the test case generation method, can realize the core code of a large-scale fuzzy target program without manual operation, and has the advantages of higher possibility of improving the path coverage rate and easy triggering of crash when generating the test case.
In order to solve the technical problems, the invention adopts the technical scheme that:
a fuzzy test case generation method based on genetic variation comprises the following implementation steps:
1) selecting two seed test cases;
2) selecting a data position as a current data position according to the seed length of the new test case;
3) judging whether the data of the two seed test cases are the same or not according to the current data position, and if so, skipping to execute the step 4); otherwise, skipping to execute the step 5);
4) the data of the current data position of the seed test case is inherited to the current data position of the new test case;
5) judging whether any one of the data of the current data positions of the two seed test cases belongs to a preset character string comparison set, wherein the preset character string comparison set is obtained by performing static analysis on a target binary file for executing the test case and extracting character string data in the target binary file, and if yes, randomly mutating the data of the current data position of the new test case into the data in the preset character string comparison set, and skipping to execute the step 7); otherwise, skipping to execute the step 6);
6) randomly selecting data of the current data position of one seed test case to be inherited to the current data position of a new test case;
7) judging whether the seed length is traversed or not, if not, continuously selecting the next data position as the current data position, and skipping to execute the step 3); otherwise, skipping to execute the next step;
8) and carrying out random mutation on the new test sample according to a specified proportion, and outputting the new test sample after the random mutation is finished as the finally obtained new test sample.
Optionally, the character string comparison set preset in step 5) is obtained by performing static analysis on a target binary file for executing the test case and extracting character string data therein.
Optionally, the character string comparison set is a first subset PAC, a second subset PSC, and a third subset CSP which are obtained by dividing according to the density of the locations of the character string data in the target binary file for executing the test case, the density of the first subset PAC is higher than the densities of the second subset PSC and the third subset CSP, and the densities of the second subset PSC and the third subset CSP are the same.
Optionally, the detailed steps of step 5) include:
5.1) judging whether any one of the data of the current data positions of the two seed test cases belongs to a first subset PAC in the character string comparison set or not, if so, randomly mutating the data of the current data position of the new test case into the data of the first subset PAC in the preset character string comparison set according to a first probability, otherwise, randomly mutating the data into second subset PSC or third subset CSP random data, and skipping to execute the step 7); otherwise, skipping to execute the step 5.2);
5.2) judging whether any one of the data of the current data positions of the two seed test cases belongs to a second subset PSC in the character string comparison set or not, if so, randomly mutating the data of the current data position of the new test case into the data of the second subset PSC in the preset character string comparison set according to a second probability, otherwise, randomly mutating the data into random data, and skipping to execute the step 7); otherwise, skipping to execute the step 5.3);
5.3) judging whether any one of the data of the current data positions of the two seed test cases belongs to the third subset CSP in the character string comparison set or not, if so, preferentially and randomly mutating the data of the current data position of the new test case into the data of the third subset CSP in the preset character string comparison set according to a second probability, otherwise, randomly mutating the data into random data, and skipping to execute the step 7); otherwise, the jump is performed to step 6).
Optionally, the first probability is greater than the second probability.
Optionally, the extracting and generating step of the character string comparison set includes:
s1) extracting all character string data in a target binary file for executing the test case, recording the positions of the character string data, dividing the character strings into different character string groups according to the density of the positions of the character strings, wherein the character string groups are called P, each character string group is called P, and P belongs to P;
s2) acquiring character string comparison information in a target binary file, recording a comparison code position when the character string data are recorded for use, dividing the character string into different character string groups according to the density of the character string comparison code position, wherein the character string group set is called C, each character string group is called C, and C belongs to C;
s3) anding the string sets in the two types of string sets to obtain a union of p and c, where such set is called a first subset PAC, and each string set is called PAC, PAC ∈ PC, PAC = p ∞;
s4) performing a difference operation on the string groups in the two types of string sets to obtain a difference set of p and c and a difference set of c and p, wherein such string sets are called a second subset PSC and a third subset CSP, each string group is called PSC and CSP, PSC belongs to PSC, CSP belongs to CSP, PSC = p-c, CSP = c-p, respectively;
s5) removing the character string group with only one element in the second subset PSC and the third subset CSP from the two sets, and combining all the individual character strings into a new fourth subset S, to finally obtain a character string comparison set composed of the first subset PAC, the second subset PSC, the third subset CSP and the fourth subset S.
Optionally, the step 1) of selecting two seed test cases specifically refers to selecting from a seed set, and the detailed step of generating the seed set includes:
1.1) collecting seed test cases used as training sets;
1.2) putting the seed test cases into a target binary file for execution, and acquiring path coverage information in the execution of each seed test case;
1.3) randomly combining two seed test cases with path coverage information of PS1 and PS2 to generate a new combined test case, putting the new combined test case into a target binary file for execution, and acquiring path coverage information PN of the new combined test case;
1.4) taking the path coverage information of the two seed test cases as PS1 and PS2 as input, taking the difference between the path coverage information PN and the path coverage information of the new combined test case corresponding to the two seed test cases as PS1 and PS2 as a basis for classification, constructing a training set by using the collected seed test cases to complete a training machine learning model, and adding the new test case and the path coverage information thereof into a seed set when the difference between the path coverage information PN and the path coverage information of the new combined test case is PS1 and PS2 is greater than a threshold value;
1.5) inputting two seed test cases with arbitrary path coverage information of PS1 and PS2 into a training machine learning model for classification, selecting two seed test cases corresponding to the new combined test case of the two seed test cases and two seeds with the path coverage information of the type with the largest difference of PS1 and PS2 to combine into a new test case, putting the new test case into a target binary file for execution, acquiring the path coverage information PN of the new combined test case, judging whether the path coverage rate is increased within a specified time length, if so, jumping to execute the step 1.4), otherwise, judging that the generation of the seed set is finished and exiting.
The invention also provides a genetic variation-based fuzzy test case generation system which comprises computer equipment and computer equipment, wherein the computer equipment is programmed to execute the steps of the genetic variation-based fuzzy test case generation method.
Compared with the prior art, the invention has the following advantages:
1. the preset character string comparison set is obtained by performing static analysis on a target binary file for executing a test case and extracting character string data in the target binary file, and partial information in a program binary code is obtained by statically analyzing the binary code of a target program, so that related information used for character string comparison in the program can be obtained. This information is useful in vulnerability mining and is often used in error checking. If the test case generation is purely random, the information is difficult to match randomly. For example, in a browser program, an analysis module analyzes an html file according to a tag in the html file, the html file needs to be created strictly according to a character string corresponding to the tag, otherwise, the created test case is processed for an error text by the analysis module, and thus a core function module of the browser program cannot be mined. Therefore, the information is obtained through static analysis, and the method has great significance for the aspect of subsequently guiding the generation of the test case.
2. The method for processing the seed pair (two seed test cases) is also the core content of the invention, and the core thought of the method is derived from the gene recombination thought in genetics. As in genetics, the same genes that are common to the father and mother are more likely to be more important genes, and should be inherited rather than easily changed. This process simulates inheritance and the like in genetics. And then judging whether the data of different positions belong to the character string comparison class, and if so, replacing the data in the position with a random one in the character string comparison class. This process simulates the targeted alteration of genes in genetic engineering. If the data does not belong to the character string comparison class, the data in one seed is selected to be inherited. This process mimics the difference in the displayed traits of dominant and stealth genes in genetics. And finally, randomly mutating the test cases to improve the coverage rate of the target program codes. This process mimics genetic mutations in genetics. Therefore, the fuzzy test case generation method based on genetic variation inherits the advantages of the test case generation method based on generation and the test case generation method based on variation, simultaneously avoids the corresponding defects of the test case generation method based on generation and the test case generation method based on variation, and can realize the core code of a large-scale fuzzing target program without manual operation.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of the principle of genetic variation in the method of the embodiment of the present invention.
Detailed Description
As shown in fig. 1, the implementation steps of the fuzzy test case generation method based on genetic variation in this embodiment include:
1) selecting two seed test cases;
2) selecting a data position as a current data position according to the seed length of the new test case;
3) judging whether the data of the two seed test cases are the same or not according to the current data position, and if so, skipping to execute the step 4); otherwise, skipping to execute the step 5);
4) the data of the current data position of the seed test case is inherited to the current data position of the new test case;
5) judging whether any one of the data of the current data positions of the two seed test cases belongs to a preset character string comparison set, wherein the preset character string comparison set is obtained by performing static analysis on a target binary file for executing the test case and extracting character string data in the target binary file, and if yes, randomly mutating the data of the current data position of the new test case into the data in the preset character string comparison set, and skipping to execute the step 7); otherwise, skipping to execute the step 6);
6) randomly selecting data of the current data position of one seed test case to be inherited to the current data position of a new test case;
7) judging whether the seed length is traversed or not, if not, continuously selecting the next data position as the current data position, and skipping to execute the step 3); otherwise, skipping to execute the next step;
8) and carrying out random mutation on the new test sample according to a specified proportion, and outputting the new test sample after the random mutation is finished as the finally obtained new test sample.
In this embodiment, the preset character string comparison set in step 5) is obtained by performing static analysis on a target binary file for executing a test case and extracting character string data therein. And performing static analysis on the target binary file to obtain the relevant information of the target binary file, assisting the generation of a subsequent test case, and optimizing the problem of what value the target binary file is mutated in the mutation process.
In this embodiment, the character string comparison set is a first subset PAC, a second subset PSC, and a third subset CSP which are obtained by dividing according to the density of the positions of the character string data in the target binary file for executing the test case, the density of the first subset PAC is higher than the densities of the second subset PSC and the third subset CSP, and the densities of the second subset PSC and the third subset CSP are the same.
In this embodiment, the detailed steps of step 5) include:
5.1) judging whether any one of the data of the current data positions of the two seed test cases belongs to a first subset PAC in the character string comparison set or not, if so, randomly mutating the data of the current data position of the new test case into the data of the first subset PAC in the preset character string comparison set according to a first probability, otherwise, randomly mutating the data into second subset PSC or third subset CSP random data, and skipping to execute the step 7); otherwise, skipping to execute the step 5.2);
5.2) judging whether any one of the data of the current data positions of the two seed test cases belongs to a second subset PSC in the character string comparison set or not, if so, randomly mutating the data of the current data position of the new test case into the data of the second subset PSC in the preset character string comparison set according to a second probability, otherwise, randomly mutating the data into random data, and skipping to execute the step 7); otherwise, skipping to execute the step 5.3);
5.3) judging whether any one of the data of the current data positions of the two seed test cases belongs to the third subset CSP in the character string comparison set or not, if so, preferentially and randomly mutating the data of the current data position of the new test case into the data of the third subset CSP in the preset character string comparison set according to a second probability, otherwise, randomly mutating the data into random data, and skipping to execute the step 7); otherwise, the jump is performed to step 6).
In this embodiment, the first probability is greater than the second probability.
In this embodiment, the step of extracting and generating the character string comparison set includes:
s1) extracting all character string data in a target binary file for executing the test case, recording the positions of the character string data, dividing the character strings into different character string groups according to the density of the positions of the character strings, wherein the character string groups are called P, each character string group is called P, and P belongs to P;
s2) acquiring character string comparison information in a target binary file, recording a comparison code position when the character string data are recorded for use, dividing the character string into different character string groups according to the density of the character string comparison code position, wherein the character string group set is called C, each character string group is called C, and C belongs to C;
s3) anding the string sets in the two types of string sets to obtain a union of p and c, where such set is called a first subset PAC, and each string set is called PAC, PAC ∈ PC, PAC = p ∞;
s4) performing a difference operation on the string groups in the two types of string sets to obtain a difference set of p and c and a difference set of c and p, wherein such string sets are called a second subset PSC and a third subset CSP, each string group is called PSC and CSP, PSC belongs to PSC, CSP belongs to CSP, PSC = p-c, CSP = c-p, respectively;
s5) removing the character string group with only one element in the second subset PSC and the third subset CSP from the two sets, and combining all the individual character strings into a new fourth subset S, to finally obtain a character string comparison set composed of the first subset PAC, the second subset PSC, the third subset CSP and the fourth subset S.
Through the detailed steps of the extraction and generation steps of the character string comparison set, 4 types of sets PAC, PSC, CSP and S are obtained, the sets are divided into different priority levels, the first subset PAC is a first level, the second subset PSC and the third subset CSP are a second level, the fourth subset S is a third level, and after set classification, operations with different priorities can be carried out in subsequent seed mutation according to different levels when test cases are generated.
In this embodiment, the step 1) of selecting two seed test cases specifically refers to selecting from a seed set, a large amount of rule format information related to a target program can be obtained from seeds through a seed set with rich types, and the test case generation method can be guided to generate more effective test cases through reasonable use of the rule information.
In this embodiment, the manner of generating the seed set is to perform iterative learning by using a machine learning model, and by means of machine learning, the rationality in seed selection is further enhanced, and the possibility that a new path is found by a newly generated seed can be further improved. In this embodiment, the detailed step of generating the seed set includes:
1.1) collecting seed test cases used as training sets;
1.2) putting the seed test cases into a target binary file for execution, and acquiring path coverage information in the execution of each seed test case;
1.3) randomly combining two seed test cases with path coverage information of PS1 and PS2 to generate a new combined test case, putting the new combined test case into a target binary file for execution, and acquiring path coverage information PN of the new combined test case;
1.4) taking the path coverage information of the two seed test cases as PS1 and PS2 as input, taking the difference between the path coverage information PN and the path coverage information of the new combined test case corresponding to the two seed test cases as PS1 and PS2 as a basis for classification, constructing a training set by using the collected seed test cases to complete a training machine learning model, and adding the new test case and the path coverage information thereof into a seed set when the difference between the path coverage information PN and the path coverage information of the new combined test case is PS1 and PS2 is greater than a threshold value;
1.5) inputting two seed test cases with arbitrary path coverage information of PS1 and PS2 into a training machine learning model for classification, selecting two seed test cases corresponding to the new combined test case of the two seed test cases and two seeds with the path coverage information of the type with the largest difference of PS1 and PS2 to combine into a new test case, putting the new test case into a target binary file for execution, acquiring the path coverage information PN of the new combined test case, judging whether the path coverage rate is increased within a specified time length, if so, jumping to execute the step 1.4), otherwise, judging that the generation of the seed set is finished and exiting.
The step of generating the seed set is implemented by a fuzzing tool.
Through the continuous reinforcement learning of the generated seed set, the machine learning capability is gradually enhanced along with the increase of the operation times, and the path coverage rate of the test case generated by the seeds selected by the machine learning model is greatly improved.
In this embodiment, when collecting seed test cases used as a training set in step 1.1), collecting a large number of legal seeds should include: (1) common normal test cases, such as video processing programs, should be collected for some common videos, dramas, mv, and so on; (2) some generated test cases, for example, using a video processing program as an example, should generate various videos using a video generation program. Furthermore, the variety of seeds collected should be as rich as possible, for example, using video processing program as an example, the types of various video format files should be included, such as mp4, rmvb, avi, wma, rm, mpeg, mov, mkv, flv, f4v, m4v, 3gp, dat, ts, mts, vob, etc.
As shown in fig. 2, a portion a represents a comparison set of strings derived from a static analysis of the target program, and the comparison set includes a first subset PAC, a second subset PSC, and a third subset CSP. In the two seed test cases of the first seed and the second seed, different operations are subsequently performed on different conditions at the same position of data and at different positions of data, wherein the positions of the unmarked letter areas of the first seed and the second seed in the figure represent that the seed data at the positions are the same, and the positions of the other marked letter areas represent that the seed data are different.
The position of the unmarked letter region in fig. 2 represents that the data in the seed is the same, which indicates that the data at this position is relatively stable, and the data representing this position is more likely to be regular format data and should not be easily changed. As in genetics, the same genes that are common to the father and mother are more likely to be more important genes, and should be inherited rather than easily changed. The process simulates operations such as inheritance in genetics; the data in the representative seeds at the positions marked with the letter areas in the graph are different, the conditions of the data are judged, and whether the data at the different positions belong to the red character string comparison class in the graph or not is judged. And then according to different data types, different operations are carried out on different position data when a new test case is generated.
The data of the part marked with the letter B in the graph 2 represents that the data at the same position in the two seeds belong to a character string comparison class, at the moment, a character string can be randomly selected from the character string comparison class to be put into a newly generated test case, because the data at the position are mostly the data of the character string comparison class, the character string comparison operation is randomly selected in the set, a new unknown path can be found, and the unknown path is a core code region which is obtained by the fact that the two seeds reach the new position, and the process simulates the directional modification of genes in genetic engineering;
the portions marked with letters D and C in fig. 2 respectively represent that the data at the same position in the two seeds do not belong to the string comparison class, when one of the yellow or green portions is randomly selected for inheritance, this process simulates the difference in the displayed traits of the dominant and invisible genes in genetics; the generated test cases are subjected to random mutation according to a certain proportion, the process simulates gene mutation in genetics, and then the generated new test cases are positioned in the latest test cases generated by the method, and the path coverage rate in the fuzzy test can be directionally improved.
Since the fuzzy test case generation method based on genetic variation is a genetic test case generation method, after a large number of legal seeds with abundant types are collected, the abundant and high-quality seeds can help the test case generation method to more efficiently inherit excellent data of various types of seeds, and great help is provided for improving code coverage rate of a fuzzing tool and helping the test case to bypass formatting inspection at the early stage of a program to execute more program core codes; in the fuzzy test case generation method based on genetic variation, the seeds are paired pairwise, and different seeds have different execution paths, so that the selection mode of the seeds is also very important.
In summary, in the fuzzy test case generation method based on genetic variation of this embodiment, two known test cases are inherited at the same position by a genetic variation method in genetics, and operations of selective inheritance or random variation are performed on different pairs of positions of the two test cases, so that a portion of the two test cases, which is more efficient and more stable with respect to a fuzzing program, is inherited to a subsequent generation, and a portion, which is relatively inefficient and changeable, is subjected to probabilistic variation, thereby greatly improving the effectiveness of the test cases. In addition, the present embodiment further provides a genetic variation-based fuzz test case generation system, which includes a computer device programmed to execute the steps of the genetic variation-based fuzz test case generation method according to the present embodiment.
The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present invention, or modify equivalent embodiments to equivalent variations, without departing from the scope of the invention, using the teachings disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims (7)

1. A fuzzy test case generation method based on genetic variation is characterized by comprising the following implementation steps:
1) selecting two seed test cases;
2) selecting a data position as a current data position according to the seed length of the new test case;
3) judging whether the data of the two seed test cases are the same or not according to the current data position, and if so, skipping to execute the step 4); otherwise, skipping to execute the step 5);
4) the data of the current data position of the seed test case is inherited to the current data position of the new test case;
5) judging whether any one of the data of the current data positions of the two seed test cases belongs to a preset character string comparison set, wherein the preset character string comparison set is obtained by performing static analysis on a target binary file for executing the test case and extracting character string data in the target binary file, and if yes, randomly mutating the data of the current data position of the new test case into the data in the preset character string comparison set, and skipping to execute the step 7); otherwise, skipping to execute the step 6); the character string comparison set comprises a first subset PAC, a second subset PSC and a third subset CSP which are obtained by dividing according to the density of the positions of the character string data in a target binary file for executing the test case, wherein the density of the first subset PAC is higher than that of the second subset PSC and the third subset CSP, and the densities of the second subset PSC and the third subset CSP are the same;
6) randomly selecting data of the current data position of one seed test case to be inherited to the current data position of a new test case;
7) judging whether the seed length is traversed or not, if not, continuously selecting the next data position as the current data position, and skipping to execute the step 3); otherwise, skipping to execute the next step;
8) and carrying out random mutation on the new test sample according to a specified proportion, and outputting the new test sample after the random mutation is finished as the finally obtained new test sample.
2. The method according to claim 1, wherein the comparison set of strings preset in step 5) is obtained by performing static analysis on a target binary file for executing the test case to extract string data therein.
3. The genetic variation-based fuzzy test case generation method according to claim 1, wherein the detailed step of step 5) comprises:
5.1) judging whether any one of the data of the current data positions of the two seed test cases belongs to a first subset PAC in the character string comparison set or not, if so, randomly mutating the data of the current data position of the new test case into the data of the first subset PAC in the preset character string comparison set according to a first probability, otherwise, randomly mutating the data into second subset PSC or third subset CSP random data, and skipping to execute the step 7); otherwise, skipping to execute the step 5.2);
5.2) judging whether any one of the data of the current data positions of the two seed test cases belongs to a second subset PSC in the character string comparison set or not, if so, randomly mutating the data of the current data position of the new test case into the data of the second subset PSC in the preset character string comparison set according to a second probability, otherwise, randomly mutating the data into random data, and skipping to execute the step 7); otherwise, skipping to execute the step 5.3);
5.3) judging whether any one of the data of the current data positions of the two seed test cases belongs to the third subset CSP in the character string comparison set or not, if so, preferentially and randomly mutating the data of the current data position of the new test case into the data of the third subset CSP in the preset character string comparison set according to a second probability, otherwise, randomly mutating the data into random data, and skipping to execute the step 7); otherwise, the jump is performed to step 6).
4. The genetic variation-based fuzz test case generation method according to claim 3, wherein the first probability is greater than the second probability.
5. The genetic variation-based fuzzy test case generation method according to claim 1, wherein said step of extracting and generating said comparison set of character strings comprises:
s1) extracting all character string data in a target binary file for executing the test case, recording the positions of the character string data, dividing the character strings into different character string groups according to the density of the positions of the character strings, wherein the character string groups are called P, each character string group is called P, and P belongs to P;
s2) acquiring character string comparison information in a target binary file, recording a comparison code position when the character string data are recorded for use, dividing the character string into different character string groups according to the density of the character string comparison code position, wherein the character string group set is called C, each character string group is called C, and C belongs to C;
s3) anding the string sets in the two types of string sets to obtain a union of p and c, where such set is called a first subset PAC, and each string set is called PAC, PAC ∈ PC, PAC = p ∞;
s4) performing a difference operation on the string groups in the two types of string sets to obtain a difference set of p and c and a difference set of c and p, wherein such string sets are called a second subset PSC and a third subset CSP, each string group is called PSC and CSP, PSC belongs to PSC, CSP belongs to CSP, PSC = p-c, CSP = c-p, respectively;
s5) removing the character string group with only one element in the second subset PSC and the third subset CSP from the two sets, and combining all the individual character strings into a new fourth subset S, to finally obtain a character string comparison set composed of the first subset PAC, the second subset PSC, the third subset CSP and the fourth subset S.
6. The genetic variation-based fuzzy test case generation method according to claim 1, wherein the step 1) of selecting two seed test cases specifically means selecting from a seed set, and the detailed step of generating the seed set comprises:
1.1) collecting seed test cases used as training sets;
1.2) putting the seed test cases into a target binary file for execution, and acquiring path coverage information in the execution of each seed test case;
1.3) randomly combining two seed test cases with path coverage information of PS1 and PS2 to generate a new combined test case, putting the new combined test case into a target binary file for execution, and acquiring path coverage information PN of the new combined test case;
1.4) taking the path coverage information of the two seed test cases as PS1 and PS2 as input, taking the difference between the path coverage information PN and the path coverage information of the new combined test case corresponding to the two seed test cases as PS1 and PS2 as a basis for classification, constructing a training set by using the collected seed test cases to complete a training machine learning model, and adding the new test case and the path coverage information thereof into a seed set when the difference between the path coverage information PN and the path coverage information of the new combined test case is PS1 and PS2 is greater than a threshold value;
1.5) inputting two seed test cases with arbitrary path coverage information of PS1 and PS2 into a training machine learning model for classification, selecting two seed test cases corresponding to the new combined test case of the two seed test cases and two seeds with the path coverage information of the type with the largest difference of PS1 and PS2 to combine into a new test case, putting the new test case into a target binary file for execution, acquiring the path coverage information PN of the new combined test case, judging whether the path coverage rate is increased within a specified time length, if so, jumping to execute the step 1.4), otherwise, judging that the generation of the seed set is finished and exiting.
7. A genetic variation-based fuzz test case generation system, comprising a computer device, wherein the computer device is programmed to execute the steps of the genetic variation-based fuzz test case generation method according to any one of claims 1 to 6.
CN201811554639.9A 2018-12-19 2018-12-19 Genetic variation-based fuzzy test case generation method and system Active CN109597767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811554639.9A CN109597767B (en) 2018-12-19 2018-12-19 Genetic variation-based fuzzy test case generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811554639.9A CN109597767B (en) 2018-12-19 2018-12-19 Genetic variation-based fuzzy test case generation method and system

Publications (2)

Publication Number Publication Date
CN109597767A CN109597767A (en) 2019-04-09
CN109597767B true CN109597767B (en) 2021-11-12

Family

ID=65963986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811554639.9A Active CN109597767B (en) 2018-12-19 2018-12-19 Genetic variation-based fuzzy test case generation method and system

Country Status (1)

Country Link
CN (1) CN109597767B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175120B (en) * 2019-05-20 2020-11-27 北京理工大学 Fuzzy test case self-adaptive mutation method and device based on reinforcement learning
CN110401581B (en) * 2019-07-22 2020-12-01 杭州电子科技大学 Industrial control protocol fuzzy test case generation method based on flow tracing
CN110443045B (en) * 2019-08-13 2020-12-15 北京计算机技术及应用研究所 Fuzzy test case generation method based on machine learning method
CN110633221B (en) * 2019-09-26 2022-06-28 信联科技(南京)有限公司 Fuzzy test automation vulnerability positioning method
CN111258909B (en) * 2020-02-07 2024-03-15 中国信息安全测评中心 Test sample generation method and device
CN111913877B (en) * 2020-07-03 2021-09-28 中国科学院信息工程研究所 Fuzzy test method and device for text configuration file
CN111694755B (en) * 2020-07-31 2023-07-18 抖音视界有限公司 Application program testing method and device, electronic equipment and medium
CN115134278A (en) * 2021-03-24 2022-09-30 奇安信科技集团股份有限公司 Fuzzy test method and device, electronic equipment and storage medium
CN112948277A (en) * 2021-04-30 2021-06-11 上海大学 Fuzzy test case generation system and method based on coupling self-encoder
CN113746819B (en) * 2021-08-24 2022-08-23 中国科学院信息工程研究所 Binary software protocol detection load mining method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859274A (en) * 2009-04-07 2010-10-13 西门子(中国)有限公司 Method and system for fuzz testing
CN102622558A (en) * 2012-03-01 2012-08-01 北京邮电大学 Excavating device and excavating method of binary system program loopholes
DE102012217705A1 (en) * 2011-09-29 2013-04-04 Siemens Aktiengesellschaft Method for implementing Fuzz testing, involves injecting test examples of application, inputting text document into software, and monitoring reaction and/or status of software to determine whether abnormal phenomena arise or not
CN103853650A (en) * 2012-11-28 2014-06-11 西门子公司 Test case generating method and device for fuzz testing
CN103914383A (en) * 2014-04-04 2014-07-09 福州大学 Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm
US9619375B2 (en) * 2014-05-23 2017-04-11 Carnegie Mellon University Methods and systems for automatically testing software
CN107193731A (en) * 2017-05-12 2017-09-22 北京理工大学 Use the fuzz testing coverage rate improved method of control variation
CN107832619A (en) * 2017-10-10 2018-03-23 电子科技大学 Vulnerability of application program automatic excavating system and method under Android platform
CN108845944A (en) * 2018-06-28 2018-11-20 中国人民解放军国防科技大学 Method for improving software fuzz testing efficiency by combining symbolic execution

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859274A (en) * 2009-04-07 2010-10-13 西门子(中国)有限公司 Method and system for fuzz testing
DE102012217705A1 (en) * 2011-09-29 2013-04-04 Siemens Aktiengesellschaft Method for implementing Fuzz testing, involves injecting test examples of application, inputting text document into software, and monitoring reaction and/or status of software to determine whether abnormal phenomena arise or not
CN102622558A (en) * 2012-03-01 2012-08-01 北京邮电大学 Excavating device and excavating method of binary system program loopholes
CN103853650A (en) * 2012-11-28 2014-06-11 西门子公司 Test case generating method and device for fuzz testing
CN103914383A (en) * 2014-04-04 2014-07-09 福州大学 Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm
US9619375B2 (en) * 2014-05-23 2017-04-11 Carnegie Mellon University Methods and systems for automatically testing software
CN107193731A (en) * 2017-05-12 2017-09-22 北京理工大学 Use the fuzz testing coverage rate improved method of control variation
CN107832619A (en) * 2017-10-10 2018-03-23 电子科技大学 Vulnerability of application program automatic excavating system and method under Android platform
CN108845944A (en) * 2018-06-28 2018-11-20 中国人民解放军国防科技大学 Method for improving software fuzz testing efficiency by combining symbolic execution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A mutation-based fuzz testing approach for network protocol vulnerability detection;Xing Han 等;《IEEE》;20130610;第1018-1022页 *
Evolving indigestible codes: Fuzzing interpreters with genetic programming;Sanjay Rawat 等;《IEEE》;20130916;第37-39页 *
基于Fuzzing的Android漏洞挖掘技术;刘朋;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415(第04期);I138-207 *
基于模糊测试的Android平台漏洞挖掘方法及软件实现;迟欣茹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415(第04期);I138-138 *

Also Published As

Publication number Publication date
CN109597767A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN109597767B (en) Genetic variation-based fuzzy test case generation method and system
US10785241B2 (en) URL attack detection method and apparatus, and electronic device
Pirscoveanu et al. Analysis of malware behavior: Type classification using machine learning
Kuritzin et al. Incomplete lineage sorting and hybridization statistics for large-scale retroposon insertion data
CN112464105B (en) Internet platform information pushing method based on big data positioning and cloud computing center
CN103886068A (en) Data processing method and device for Internet user behavior analysis
CN101751530B (en) Method for detecting loophole aggressive behavior and device
TWI740262B (en) Method, apparatus for identifying genetic variation and storage medium thereof
JP2017068825A (en) Software development system and program
CN103761337A (en) Method and system for processing unstructured data
JP2017004123A (en) Determination apparatus, determination method, and determination program
Alam Intelligent web usage clustering based recommender system
CN114844689A (en) Website logic vulnerability detection method and system based on finite-state machine
CN102707938B (en) Table-form software specification manufacturing and supporting method and device
JPWO2018021163A1 (en) Signature creation apparatus, signature creation method, recording medium having signature creation program recorded therein, and software determination system
CN110162472A (en) A kind of method for generating test case based on fuzzing test
CN107832852B (en) Data processing learning method and system and electronic equipment
CN113609394A (en) Information flow-oriented safety recommendation system
CN116069514B (en) Deadlock avoidance method for flexible manufacturing system containing unreliable resources
CN111459774B (en) Method, device, equipment and storage medium for acquiring flow of application program
JP4921453B2 (en) Bit string data sorting apparatus, method and program
CN113259369B (en) Data set authentication method and system based on machine learning member inference attack
CN109298686A (en) System and method for using business intelligence for rule-based design and manufacture technology
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
CN107368532A (en) A kind of user agent's field information processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant