CN109117364B

CN109117364B - Target-oriented test case generation method and system

Info

Publication number: CN109117364B
Application number: CN201810713776.6A
Authority: CN
Inventors: 李丰; 彭佳琪; 刘丙昌; 许丽丽; 陈宏程; 刘炳宏; 霍玮; 邹维
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2021-06-15
Anticipated expiration: 2038-07-03
Also published as: CN109117364A

Abstract

The invention provides a target-oriented test case generation method, which comprises the following steps: calculating the distance from each node on the CFG of the target program to a target, wherein the target is a node or an edge on the CFG; performing a guiding fuzzy test according to the distance, if the input covers the target, generating a target-oriented test case, otherwise, calling a guiding symbol to execute; and synchronizing the input generated by the execution of the guide symbols into a queue of the guide fuzzy test to perform mutation preferentially, and if the input covers the target, generating a target-oriented test case. The method combines the guiding fuzzy test and the guiding symbol execution, not only solves the limitation that the fuzzy test can not be varied to meet the complex constraint, but also can make up the expansibility problem of the symbol execution; meanwhile, a more efficient guiding strategy and an interaction strategy when the guiding strategy and the guiding strategy are combined are designed, and the generation efficiency of the target-oriented test case is improved.

Description

Target-oriented test case generation method and system

Technical Field

The invention relates to the field of software engineering, in particular to a high-efficiency target-oriented test case generation method and a system.

Background

Software testing is an important component in the field of software engineering and is also a key technology for guaranteeing the quality of computer software. The software testing technology mainly discovers possible defects in software by means of constructing test cases for running a system. Therefore, software testing emphasizes the coverage of test cases to the target program, and the higher the coverage, the more fully the program is tested. Meanwhile, how to generate an appropriate test case to enable the test to specifically cover the specified target is also becoming a key branch in software test research. The technique can be applied to a number of scenarios, such as:

a) crash reproduction: when a user reports the blast to a manufacturer, the manufacturer needs to construct an input recurrence crash field according to a known blast report, and the input requirement at least can cover a sentence triggering the blast;

b) gain of test case: when software is updated, a new test case needs to be constructed to realize the complete test of the updated program, and at the moment, a proper test case needs to be generated by taking a newly added code as a target;

c) and (3) verifying a static detection result: and selecting potential vulnerability statements detected by the static auditing tool as targets, and constructing corresponding input operation verification to determine whether the vulnerability really exists.

The existing test case generation technology comprises gray box testing, symbolic execution, taint analysis and the like, and a guidance strategy is added on the basis of the technologies for target-oriented test case generation, so that a specified target is quickly covered, and the method mainly comprises the following steps:

(1) and (3) guiding fuzzy test: on the basis of the traditional coverage rate-oriented fuzzy test technology, distance measurement between input and a target is added, and variation related parameters are adjusted according to the measurement values. However, this strategy does not change the limitations of the fuzz testing itself, i.e., it is difficult to satisfy complex constraints such as magic number checks by mutation.

(2) The pilot symbol performs: multiple strategies are added on the basis of traditional symbol execution, the efficiency of path exploration is improved, and the method comprises backward symbol exploration guided by a function call chain, exploration on a slice where a target can reach, higher weight given to branches with short distances and the like. However, when the path-oriented complex program is used, the strategies still cannot break through the limitation that the symbolic execution is difficult to expand, and the technology has the problems of insufficient support for library functions, limited solver capability and the like.

Therefore, at present, a technology combining fuzz testing and symbolic execution exists for making up respective defects, but the technology is still program-oriented and high in coverage rate, and cannot be directly applied to an object-oriented test case generation scene.

In summary, the existing target-oriented test case generation technologies have respective limitations and cannot quickly generate an input covering a specified target.

Disclosure of Invention

In order to overcome the defect that the existing guide fuzzy test can meet the complex constraint, the invention provides a target-oriented test case generation method and a target-oriented test case generation system, wherein the guide fuzzy test and the guide symbol execution are combined, so that the limitation that the fuzzy test can not be varied to meet the complex constraint is solved, and the problem of expansibility of symbol execution can be solved; meanwhile, a more efficient guiding strategy and an interaction strategy when the guiding strategy and the guiding strategy are combined are designed, and the generation efficiency of the target-oriented test case is improved.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for generating a target-oriented test case comprises the following steps:

calculating the distance from each node on a CFG (Control Flow Graph) of a target program to a target, wherein the target is a node or an edge on the CFG;

and performing a guided fuzz test according to the distance, comprising:

adding a fitness function of the distance measure to each input;

selecting the position of the input in the queue according to the distance metric value, and setting the input variation frequency;

if the input covers the target, generating a target-oriented test case;

if the input does not cover the target and the input distance is not reduced within a period of time, calling a guide symbol to execute;

the pilot symbol execution adopts concolic execution, including:

sorting all unprocessed inputs according to distance, and preferentially processing the input closest to the input;

selecting a missing branch needing to be explored according to a must-pass node sequence formed by a series of must-pass nodes, if the missing branch has a solution, starting to perform symbolic execution on a next CFG (computational fluid dynamics) after pruning with the uncovered must-pass node as a target to explore more paths, otherwise, backtracking to find a conflicting branch, and selecting a non-conflicting branch on the other side to explore;

and synchronizing the input generated by the execution of the guide symbols into a queue of the guide fuzzy test to perform mutation preferentially, and if the input covers the target, generating a target-oriented test case.

The method for calculating the distance comprises the following steps:

calculating the distance db1 from any node on the CFG of the current function to a function calling point, wherein the calling point can Call the subsequent function of the current function on the CG (Call Graph);

calculating the distance df between the called function and the target function on the CG, and multiplying a coefficient x;

calculating a distance db2 from the function entrance to the target on the CFG of the target function;

the distance from the node to the target for the entire target program is db1+ xdf + db 2.

The method for selecting the position of the input in the queue comprises the following steps:

inserting the new input into a position in the queue behind the current variant input and sorted by distance by using insertion sorting;

after each round of the queue, the inputs are sorted in their entirety.

The method for setting the input variation frequency comprises the following steps:

normalizing the input distance, and adjusting the variation frequency according to the normalized input distance d as follows:

when d is 0, the new variation frequency is 5 times of the original variation frequency;

when d is 1, the new variation frequency is 0.4 times of the original variation frequency;

when d is 0.5, the frequency of variation is unchanged.

The concolic execution is to record specific value and symbol constraint for an input at the same time, try exploration on missing branches which are not taken in an input execution track, and generate an input covering the branch and more paths behind the branch.

Wherein, the scope of missing branch exploration is as follows: selecting the node which can be covered furthest before and the next uncovered node as the source node and the destination node, and calculating the node set between the two nodes as the scope of the missing branch exploration.

An object-oriented test case generation system (PRA) comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the program comprising instructions for performing the steps of the above method.

A computer-readable storage medium storing a computer program comprising instructions which, when executed by a processor of a server, cause the server to perform the steps of the above-described method.

Compared with the prior art, the method can successfully generate more target-oriented test cases in the same time, the average time consumption for generating the test cases is shorter, and experiments prove that the average time consumption of Driller in the prior art is 3.7 times that of the method and the average time consumption of AFLGo is 2.3 times that of the method.

Drawings

FIG. 1 is a flow diagram of an embodiment of the present invention for generating object-oriented test cases.

FIG. 2 is a Venn diagram of the number of differential branches of the PRA and Driller and AFLGo successfully generated inputs of the present invention.

Detailed Description

The foregoing features and advantages of the invention will be apparent from the following more particular description.

(1) The overall thought of the guiding fuzzy test method in the invention is as follows: by adding a distance measure of the input to the target, inputs closer to the target are given higher variance weights. The specific strategy comprises the following steps:

distance measurement: the invention employs a method of calculating distances over the function CG and CFG within several functions. For any one point, calculating the distance db1 from the point to a function call point on the CFG within the current function, requiring the call point to call the current function's successor function at the CG; then calculating the distance df between the called function and the target function on the CG and giving a multiplying coefficient (such as 20) to the distance to expand the influence of the distance between the functions; finally, the distance db2 from the entry point of the function to the target in the CFG within the target function is combined. The sum of the three is the distance from the node to the target in the whole target program.

Adjusting the variation weight: the invention adjusts the weight of the variation from two aspects, including the input variation sequence and the input variation frequency. For adjusting the variation sequence, the invention uses the insertion sorting, namely, each time a new input is stored, the new input is inserted into the position which is sorted by distance behind the current variation input, so that the input with the closest distance can be selected to be varied preferentially next time. However, this strategy cannot achieve global ordering of the queue, so the present invention performs overall ordering of the inputs after each round of traversal of the queue, and ensures that the next round still selects the nearest priority variation. For setting the variation frequency, the invention needs to perform normalization processing on the input distance preferentially, and then adjust the variation frequency by using the normalized distance. The basic principle is as follows: for an input with d of 0 (minimum), the new variation frequency is 5 times the original one; for an input (maximum) with d of 1, the new variation frequency is 0.4 times the original one; for an input of 0.5 for d (the median value), the variation frequency remains unchanged. For this purpose, the present invention designs a piecewise function based on exponential operation, as shown below.

Wherein n is the original variation frequency, n' is the new variation frequency, and d is the normalized input distance.

(2) The invention also uses distance as a measure when combining fuzz testing and symbolic execution, i.e. when the input distance is not reduced any more in a period of time, complex constraint may occur in the program, and the bottleneck needs to be broken through by symbolic execution. During interaction with the symbol, the invention sorts the inputs according to distance and preferentially processes the input with the closest distance.

(3) The basic method adopted by the oriented symbol execution technology in the invention is concolic execution, namely, a specific value and a symbol constraint are recorded for one input at the same time, the other side branch (marked as missing branch) which is not taken in an input execution track is tried to be explored, and the input is generated to cover the branch and more paths behind the branch. Considering that not all missing branch exploration has meaning for the coverage target, and the current key branch should be the branch closest to the target and the input is not satisfied, the present invention adds a guiding strategy for branch selection and symbol exploration on the basis of concolic execution, specifically as follows:

adopting a branch selection strategy based on the sequence guidance of the previous step: the present invention needs to calculate a series of nodes that must be covered to cover the target, i.e. the sequence of nodes that must be covered before. Selecting the nodes which can be covered by the input and must pass through before the farthest and the nodes which are not covered next as the source nodes and the destination nodes, and calculating the node set between the two nodes as the range of the branch exploration. That is, only nodes covered by the missing branch belong to the range will be selected for exploration. During the exploration process, the present invention is also limited to the scope of starting further symbolic execution on the next pruned CFG targeting the uncovered predecessor node, forming a local path exploration targeting the next predecessor node.

If the selected branch has no solution, namely, the constraint in the path conflicts with the constraint added by the selected branch, the invention does not discard the branch, but tries to trace back to find the constraint on the path which conflicts with the selected branch, and then selects the branch of which the other side does not conflict with the newly added constraint to search, thereby bypassing the constraint bottleneck which is difficult to satisfy.

For the method of the present invention, an application example is listed below, and fig. 1 is a flow chart thereof, specifically as follows:

the first step is as follows: when a program P and a target T (the target is generally a node or an edge on a program CFG) are known, firstly, the distance between each node on the CFG and the target is calculated and stored in a file; then, a guided fuzz testing process is created, which comprises the following steps: adding distance measurement to each input as a fitness function, selecting the position of the input in the queue according to the measurement value, setting the variation frequency according to the measurement value, and running to monitor whether the input covers the target, wherein once the covered target is input, the target input (namely a target-oriented test case) is generated, and the whole task is completed.

The second step is that: and judging whether the input distance in the fuzzy test is converged at intervals, namely judging whether the average value of the input distance in each period is lower than that in the previous period. If the number is lower, the mutation can be continued; if the distance is not reduced any more, the distance is considered to be not converged any more, and a symbol is required to be called for execution; when the symbol execution is invoked, all unprocessed inputs are also sorted in ascending order of distance, with the closest input being processed first (see interaction 1 of FIG. 1).

The third step: and in the process of executing a certain input concolic, selecting a missing branch needing to be explored according to the sequence guidance. If the branch has a solution, starting to prune the symbol execution and exploring more paths; and if the branch has no solution, applying a branch backtracking strategy, finding a conflict branch and performing reverse exploration. Since inputs resulting from symbolic execution tend to be closer to the target, synchronizing these inputs into the fuzz test queue generally results in faster variations and faster overall input generation efficiency (see interaction 2 of FIG. 1).

To illustrate the beneficial effects of the present invention, the following experiments were performed:

the experiment of the invention adopts 126 programs in the CGC of the network defense competition held by DAPRA as a data set, and each program has two versions before and after the patch. And developing a test case generation experiment by taking the newly added patch code as a target, and identifying 254 targets for the test case generation experiment. Comparing PRA with the existing pilot fuzz test technique representative AFLGo and the technique representative Driller combining fuzz testing and symbolic execution, each object tested for 8 hours, as shown in the venn diagram of fig. 2, verifies the effect of the pilot and interaction strategies of the present invention.

The experimental results are as follows: as shown in fig. 2, 100 is the number of branches of the input successfully generated by all three of PRA, Driller and AFLGo; 20 is the number of branches for which only PRA succeeds; 1 directly above is the number of branches where only PRA failed (i.e. both Driller and AFLGo succeeded); the top left 1 is the number of successful branches for Driller only; 16 is the number of branches where PRA, Driller succeeded and AFLGo failed; 0 is the number of successful branches for AFLGo only; 7 is the number of branches that were successful in PRA, AFLGo, but failed in Driller. Therefore, the PRA can generate test cases for 143 targets within 8 hours, and the average input generation time is 41 minutes and 47 seconds; driller can succeed with 118 targets; AFLGo was only 108 successful. Among these, PRA can cover almost all the targets that Driller and AFLGo succeed, while generating an additional 20 that are all not covered. For the target that all three succeeded, the average time consumption was counted. The PRA average time is 17 minutes and 16 seconds; the average time of Driller is 1 hour, 3 minutes and 27 seconds, which is 3.7 times of PRA; AFLGo takes on average 40 minutes 10 seconds, which is 2.3 times that of PRA. Therefore, compared with the existing method, the method for generating the guide test case provided by the invention has the advantages that the operation efficiency and the success quantity are obviously improved.

Claims

1. A method for generating a target-oriented test case comprises the following steps:

calculating the distance from each node on the CFG of the target program to a target, wherein the target is a node or an edge on the CFG;

and performing a guided fuzz test according to the distance, comprising:

adding a fitness function of the distance measure to each input;

if the input covers the target, generating a target-oriented test case;

the pilot symbol execution adopts concolic execution, including:

2. The method of claim 1, wherein the distance is calculated by:

calculating the distance db1 from any node on the CFG of the current function to a function calling point which can call the subsequent function of the current function on the CG;

3. The method of claim 1, wherein selecting the position of the input in the queue is by:

after each round of the queue, the inputs are sorted in their entirety.

4. The method of claim 1, wherein the input variance frequency is set by:

when d is 0.5, the frequency of variation is unchanged.

5. The method of claim 1, wherein the concolic execution is to record both specific value and symbolic constraints on an input, attempt exploration on a missing branch that is not taken in an input execution trace, and generate an input that covers the branch and more paths thereafter.

6. The method of claim 1, wherein the scope of the missing branch exploration is: selecting the node which can be covered furthest before and the next uncovered node as the source node and the destination node, and calculating the node set between the two nodes as the scope of the missing branch exploration.

7. An object-oriented test case generation system comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the program comprising instructions for carrying out the steps of the method of any of claims 1 to 6.

8. A computer-readable storage medium storing a computer program comprising instructions which, when executed by a processor of a server, cause the server to perform the steps of the method of any of claims 1-6.