CN109522221B

CN109522221B - Method and system for improving fuzzy test efficiency

Info

Publication number: CN109522221B
Application number: CN201811257109.8A
Authority: CN
Inventors: 陈恺; 宗珮媛; 梁瑞刚
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2021-08-31
Anticipated expiration: 2038-10-26
Also published as: CN109522221A

Abstract

The invention relates to a method and a system for improving the efficiency of a fuzzy test. The method comprises the following steps: 1) collecting input generated by the variation of the fuzzy tester and target code accessibility information of the input as training data; 2) training a deep learning model of accessibility of input and target codes by using training data; 3) and judging the accessibility of the new input in the target program by using the trained deep learning model, if the new input is accessible, delivering the new input to the target program for execution, and if the new input is not accessible, discarding the new input. Further, in step 3), if the input is not reachable, the characteristic of the inaccessible input is compared with the characteristic of the new input to check whether the predetermined value is credible, if so, the input is discarded, otherwise, the new input is delivered to the target program for execution. The invention focuses on filtering useless input which can not reach the target code in the fuzzy test, and can be complemented with other fuzzy test methods or used simultaneously, thereby effectively improving the fuzzy test efficiency.

Description

Method and system for improving fuzzy test efficiency

Technical Field

The invention belongs to the technical field of computer software technology and fuzzy test, and mainly relates to a method and a system for finding software bugs by accelerated fuzzy test, in particular to a method and a system for accelerated fuzzy test which save the execution time of a test program by filtering test cases which can not reach a target in advance.

Background

Fuzz testing techniques are one of the most popular software testing techniques that expose computer program exceptions. Typically, the fuzz testing tool generates and provides many program inputs to the program, and then by monitoring for exceptions such as crashes during program execution, the fuzz testing tool can capture hidden errors or bugs. The fuzz testing technique is intended to find the ideal program input so that it can bypass a series of checks (e.g., format checks), hit defective code and trigger an error. In practice, however, it is very difficult to generate such program inputs because only a small percentage of the large number of program inputs reach the defective code. Thus, some fuzz testing tools target an increase in code coverage of the target program, and it is desirable to trigger errors by triggering more program branches. On the other hand, some fuzz testing tools aim at generating input-to-error codes to trigger specified errors, which greatly helps the fuzz testing tool quickly expose exceptions to the target program, such fuzz testing by hitting as much as possible on the target code is referred to as targeted fuzz testing. Targeted fuzz testing typically involves two main steps. First, it generates various inputs and gradually changes them; second, it executes the target program and monitors for the occurrence of exceptions. Most current fuzz testing techniques focus on the first step. They rely on the execution of the target program to determine whether an input can trigger an error exception.

In fact, if the target program is not executed, it is difficult to determine whether a new input can reach the error code. One theoretically possible solution is symbolic execution, which can extract all conditional statements from the program starting point to the error code, and then build the model using constraints in the conditional statements and associated operations on the inputs. However, for real programs, symbolic execution is difficult to apply in medium and large software programs due to path explosion problems and bottlenecks in symbolic execution efficiency.

Because of the good data fitting capabilities of deep learning models, some studies have attempted to improve the efficiency of the fuzz testing technique using deep learning techniques. Rajpal et al propose a method of morphing inputs covering new paths by learning the relationships of input morphed locations and coverage, and Godefroid et al similarly generate new morphed inputs for triggering new program paths by modeling the syntax of the input file. Such work places a desire on new program paths to trigger bugs, and in reality, there is often more than one condition that a defective code needs to satisfy, which is difficult to trigger by the first arrival of a newly generated input. While Angora uses lightweight taint analysis techniques to track the effect of input bytes on control flow and triggers target code blocks by generating variant inputs through a gradient descent search algorithm in deep learning. Such work is limited by program analysis techniques such as taint analysis and symbolic solution, and vulnerability checking can only be performed in some simple software. Furthermore, none of the above methods avoids generating invalid program inputs (e.g., inputs that fail to trigger a new path or fail to reach the target code), whereas the methods of the present invention are directed to filtering invalid program inputs to save program runtime for the fuzz tester.

Disclosure of Invention

In view of the above, the present invention judges the reachability of an input to an error code by constructing a model that can filter inputs that cannot reach the error code, and thus does not need to execute a target program using invalid inputs. If the decision time using this model is much shorter than the time actually performed, a significant amount of time can be saved in the fuzz testing process.

The present invention is directed to allowing the target fuzzifier to recognize and discard inputs that fail to reach the target code before the program is actually executed to save time on the fuzz tester. To avoid heavyweight program analysis (e.g., symbolic execution), the model is built using only the reachability of the previous inputs. Unlike those methods that focus primarily on how to generate the appropriate inputs to arrive at the error code, the present method is a method that is complementary compatible with other fuzz testing methods, and not an alternative.

In order to prevent the model identification from reaching the input of error codes, the invention is inspired by the successful use of deep learning in the pattern identification, and classification and identification are carried out on the accessibility of the input through a deep learning technology. In particular, the present invention designs a deep learning model suitable for program input and object code accessibility. The model learns using a large number of input samples for which accessibility is known, and makes an estimate of the accessibility of the object code for the newly generated input based on the learned characteristics. If the prediction is negative, it means that such an input cannot trigger the target code. Thus, such inputs will be discarded from being actually executed by the target program, which may save a significant amount of time during the fuzz testing process. Secondly, due to the black box nature of deep learning, interpretable information is needed to guarantee model estimation results. In order to ensure the accuracy of model judgment, the method adopts a characteristic comparison method to automatically verify the learning result of the model and secondarily filter the estimation result of the model. And finally, when the estimated result of the model is inconsistent with the actual operation result, the model needs to be self-corrected so as to reduce the overall false alarm rate. Therefore, it is necessary to collect input indicating that the estimated reachability does not match the actual reachability, perform learning again, update the model in real time, and ensure reliability of the model.

Specifically, the technical scheme adopted by the invention is as follows:

a method for improving the efficiency of a fuzz test comprises the following steps:

1) collecting input generated by the variation of the fuzzy tester and target code accessibility information of the input as training data;

2) training a deep learning model of accessibility of input and target codes by using the training data;

3) and judging the accessibility of the new input target code in the target program by using the trained deep learning model, if the new input target code is accessible, delivering the new input to the target program for execution, and if the new input target code is not accessible, discarding the new input.

Further, step 3) pre-judging the new input by using the deep learning model, and if the pre-judged value is up, delivering the new input to a target program for execution; if the prejudged value is unreachable, checking whether the prejudged value is credible or not by comparing whether the characteristics of the unreachable input are similar to the characteristics of the new input; if the input is credible, the input is discarded, and if the input is not credible, the new input is delivered to the target program for execution.

Further, the learning goal of the deep learning model in the step 2) is to judge the accessibility of the object code input in the object program, and the input and the accessibility are mapped to calculate the proper weight information by utilizing the fitting capability of the neural network to the data.

Further, step 2) firstly, vectorization processing is carried out on training data, then an initial deep learning model is obtained through incremental training, and the correction model is continuously updated in real time on the basis of the initial deep learning model by using misjudgment data according to a model test result.

Further, the vectorization process includes:

a) normalizing the data length, taking n as the longest byte number of a program input file, completing input data with less than n bytes by 0, and only taking the first n bytes of input data with more than n bytes;

b) converting hexadecimal representation of program input into decimal representation, wherein each byte b is represented by a number from 0 to 255, and input data is a vector with the length of n; the accessibility of the object code input by the program has only two values, namely, reachable or unreachable, the number 0 represents unreachable, the number 1 represents reachable, and the accessibility tag data e corresponding to the input data is a value with the value of 0 or 1.

Further, the incremental training mode comprises:

a) randomizing the data set to ensure that the data distribution is as uniform as possible so as to ensure that the subsequent learning process does not generate oscillatory change;

b) dividing the data set into batches, and carrying out test operation after each batch of training is finished according to a batch continuous training model;

c) and searching a network weight value which enables all data to have the minimum deviation through a gradient descent algorithm to serve as an input feature which represents the accessibility of the target code and is learned by the model.

Further, capturing the weight of the contribution degree of each position in the new input to the final prejudgment result by extracting a feature mapping in a deep neural network, so as to obtain the features of the new input; the method comprises the steps of extracting features of inaccessible input in training data, and recording the non-repeated features into a list to serve as the features of the inaccessible input for subsequent feature comparison.

And further, after feature comparison, if the input is not reliable, the new input is delivered to a target program to be executed and verified, if the input is misjudged, the new input is fed back to the data collection step to correct the model in subsequent learning training, and otherwise, the input features are recorded into a known inaccessible input feature list.

A system for improving the efficiency of fuzz testing, comprising:

the data collection module is responsible for collecting input generated by the variation of the fuzzy tester and the accessibility information of the input target code as training data;

the model construction module is responsible for training a deep learning model of accessibility of input and target codes by utilizing the training data;

and the input filtering module is used for judging the accessibility of the new input target code in the target program by utilizing the trained deep learning model, delivering the new input to the target program for execution if the new input target code is accessible, and discarding the new input if the new input target code is not accessible.

Further, the input filtering module pre-judges the new input by using the deep learning model, and if the pre-judged value is up, the new input is delivered to a target program for execution; if the prejudged value is unreachable, checking whether the prejudged value is credible or not by comparing whether the characteristics of the unreachable input are similar to the characteristics of the new input; if the input is credible, the input is discarded, and if the input is not credible, the new input is delivered to the target program for execution.

Unlike the existing research of fuzz testing by using deep learning, the method of the invention focuses on filtering useless input which cannot reach the target code in the fuzz testing, which means that the method can be complemented with other fuzz testing methods and can be used with the methods at the same time, thereby effectively improving the fuzz testing efficiency.

Drawings

FIG. 1 is a basic block diagram of the process of the present invention;

FIG. 2 is an exemplary diagram of input data vectorization in an embodiment;

FIG. 3 is a diagram of an embodiment of a deep learning model network architecture.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.

Most current fuzz testing techniques rely on the execution of a target program to determine whether an input can trigger a false exception. Unlike these methods, the present invention considers this problem from another point of view: if the fuzz tester can determine whether the inputs can reach the error code without executing the target program, the inputs that cannot reach the target can be filtered, thereby further saving the time overhead of fuzz testing.

Considering that the number of inputs during fuzz testing is typically very large (e.g., millions of inputs), the present invention learns and filters inputs that fail to reach the target code by modeling the program logic using the correspondence between these inputs and the results of the program execution. By the method, the complexity of target program analysis is reduced, the influence of program analysis limitation is avoided, the program logic characteristics are effectively learned through data, and the fuzzy tester is helped to save the target program testing time.

The method of the invention is realized by the following specific processes:

1) data collection:

because a certain amount of learning data sets are required to build a deep learning model, inputs generated by the fuzz tester variants and object code reachability information of the inputs are collected at the initial stage of fuzz testing. This initial data set is used as initial training data for the model. In addition, it is also necessary to collect the false alarm data of the estimated discrepancy with the actual and the accessibility information thereof, so as to provide a learning sample for adjusting the model accuracy.

2) Constructing a model:

the learning goal of the model is to determine the accessibility of the object code entered in the object program, which can be abstracted as a classification problem. And calculating appropriate weight information by mapping the input and the accessibility and utilizing the fitting capacity of the neural network to the data. Meanwhile, since the learning weight of the model needs to be corrected in real time according to the false alarm sample, an incremental learning method needs to be adopted to update the model weight value according to batch training data.

3) Inputting and filtering:

when a new input is generated, the model constructed in the step 2) needs to be pre-judged, and if the pre-judged value is up, the input is delivered to a target program for execution; if the predetermined value is not reachable, checking whether the predetermined value is reliable by comparing the characteristic of the unreachable input with the characteristic of the input. If the input is credible, the input is discarded, and if the input is not credible, the input is delivered to the target program to be executed.

An example is provided below to illustrate the process of assisting the fuzz test in filtering invalid inputs.

Examples are: the image processing program imagemap 7.0.3-8 is taken as a target program, AFL 2.5.2 is taken as a fuzzy tester, an image input file is generated through the AFL and delivered to the target program for execution, the accessibility of the target code of the input file is captured through instrumentation code in the target program, and the AFL also monitors the execution abnormity of the target program. Using 675 th behavior in attribute.c in the target program as the target code, if the input triggers the vulnerability in the target code, the following control constraints in the conditional statement need to be satisfied at the same time:

1.if((image->type＝＝BilevelType)||(image->type＝＝GrayscaleType)||(image->type＝＝GrayscaleAlphaType))

2.if(IssRGBCompatibleColorspace(image->colorspace)＝＝MagickFalse)

3.for(y＝0；y<(ssize_t)image->rows；y++)

4.for(x＝0；x<(ssize_t)image->columns；x++)

5.if(image->colorspace＝＝CMYKColorspace)

as can be seen from the above conditional constraints, if the program input needs to trigger the object code (attribute, c:675), the values of the image- > type field, the image- > colorspace field, the image- > rows field and the image- > columns field need to be defined simultaneously. Because the above conditions involve four variables, corresponding to a plurality of bytes in the program input file, it is difficult to ensure that each variation is within the target range in the process of variation without accurate guide information. Therefore, a large number of input files that cannot reach the target code are generated during the mutation process of the AFL. The steps for processing the product by adopting the method of the invention are as follows:

1) and (6) collecting data. Two types of data need to be collected, one type is file input data generated by an input generator in the AFL in a variation mode, and because the input received by the target program is in a file form, the input generated by the AFL in the variation mode can be stored on a disk in the file form, and the content of the input file can be acquired by reading a file path; the other type is that the file inputs the reachability data of the target code in the execution process of the target program, and the target code reachability information of each input file can be acquired by performing instrumentation on the target code position, and updating the input file path of the trigger target code in the memory once the trigger target code is input. Before the model construction in fig. 1 is completed, variant file inputs generated by AFL in the initial stage of the fuzz test and reachability data of these files in the target program are collected, and after the model construction is completed, only input data and corresponding reachability data are collected, for which the model prediction result does not coincide with the reachability result in the target program.

2) And (5) constructing a model. To construct a classification model that can identify the accessibility of inputs, the data is first vectorized. And then training in an incremental training mode to obtain an initial deep learning model, and continuously updating the correction model in real time on the basis of the initial model by using misjudgment data according to the test result of the model.

A) Data vectorization

The input data received by the deep learning model is in a fixed-length vector form, and in order to carry out vectorization processing on the program input data, firstly, the data length is normalized, n is the longest byte number of a program input file, input data with less than n bytes is supplemented by 0, and input data with more than n bytes is only the first n bytes. In the vectorization process, as shown in fig. 2, the hexadecimal representation of the program input is converted into a decimal representation, each byte b is represented by a number from 0 to 255, and the input data is a vector of length n. And the accessibility of the object code input by the program only has two values, namely, reachable or unreachable, the numeral 0 is used for unreachable, and the numeral l is used for reachable, so that the reachability label data e corresponding to the input data is a value with the value of 0 or 1. The vectorized input data and reachability label data are represented as follows:

{b|b∈0，...，255}ⁿ→{e|e∈0，1}

B) incremental training classification model

Hair brushIt is an obvious goal to train a classification model that can identify the accessibility of program input object codes. Because the Convolutional Neural Network (CNN) is excellent and efficient in the classification task, the CNN is selected as an internal network of a classification model. As shown in fig. 3, the embedded layer is used as an input layer for receiving input data, a multi-layer convolution-pooling structure network is connected, and finally, a layer of full-connection layer is used for normalizing the weight in the network and outputting a pre-judgment result. The learning goal of the classification model is to obtain a function fc that satisfies the relationship between the program input and the accessibility of the input, i.e., f_c(x) → e. Specifically, a nonlinear function conforming to the relationship between the input data x and the reachability data e in the training data set is fitted as much as possible by calculating the weight value w and an offset value bias of each layer of the network through a deep learning algorithm.

Meanwhile, in order to ensure the real-time updating of the model, the weighted values in the classification model are updated in real time by adopting an increment training method. The specific process is as follows:

firstly, randomizing a data set to ensure that the data distribution is as uniform as possible so as to ensure that the subsequent learning process does not generate oscillatory changes.

Next, the data set is divided into batches, the model is continuously trained in 32 or 64 pieces of data per batch, and the test operation can be performed after the training of each batch is finished.

Finally, a network weight value which enables all data deviation to be minimum is searched through a gradient descent algorithm to serve as an input feature which represents the accessibility of the target code and is learned by the model.

Through the above operation, a classification model of the judgment program input accessibility updated in real time can be obtained.

3) And inputting and filtering. By using the classification model obtained by the training, it is possible to predict the accessibility of the newly generated program input without executing the target program. In order to check the learning effect of the model and explain the validity of model prejudgment, a method for comparing the similarity of input features is introduced to carry out secondary filtering on the output result of the model.

TABLE 1 invalid input Filter rules

The filtering of invalid input is performed using the rules in table 1, specifically:

if the model pre-judgment input is reachable, the input is delivered to a target program to run, and the model pre-judgment is verified according to a running result; if the model pre-judgment input is unreachable, the input is handed to the feature comparator in the figure 1 for feature comparison, if the input feature is not similar to the known unreachable input feature, the input is possibly invisible reachable input, potential misjudgment exists, the input is handed to a target program to be executed and verified, if the input feature is misjudgment, the input is fed back to a data collector in the figure 1 to be corrected in subsequent learning training, and otherwise, the input feature is recorded into a known unreachable input feature list; if the input features are similar to known unreachable input features, meaning that the input data is most likely an input that fails to reach the target code, it is discarded if not required for execution by the program.

The feature comparator in fig. 1 is divided into two parts, specifically as follows:

A) input feature extraction

Assuming that the input feature is the most representative data distribution in the input data, the bit position in the input vector that has a large influence on the reachability prediction result may be used as the mode feature of the input. And (3) extracting a feature map (feature map) in the deep neural network by a CNN visualization method to capture the weight of how much each position in the current input contributes to the final prediction result. Therefore, by extracting the feature map of the network, the feature distribution of the input can be obtained. The feature values in the feature map are differentiated so that similar values become the same value. For example, values between the ranges (0.0,0.5) are all set to 0, and values between the ranges [0.5,1.0) are all set to 1. Therefore, the tiny difference between the features can be eliminated, and the influence of the feature difference caused by complex calculation and precision in the network on subsequent feature comparison is avoided.

B) Input feature similarity comparison

Since the purpose of the feature comparison is to confirm whether the current input is really an unreachable input, it is necessary to confirm whether the features of the current input are similar to those of the unreachable input. Before comparison, the inaccessible inputs in the training set are feature extracted and non-repeating features are entered into a list for subsequent comparison. And when the distance is smaller than a certain threshold value, the two characteristics are considered to be similar, otherwise, the two characteristics are not similar.

Wherein x and y are values in the two eigenvectors respectively, and p is 1.

In summary, the present invention provides a fuzz testing acceleration method that focuses on filtering invalid inputs. The method identifies and filters inputs that fail to hit target code by learning modeling program inputs and target code reachability without executing the target program to save runtime of the target program during fuzz testing. The method comprises the steps of collecting program input generated by variation in the fuzzy test process and input accessibility information, constructing a learning model to prejudge newly generated input, meanwhile, carrying out similarity analysis by using learned characteristics to ensure the accuracy of prejudgment, and finally, carrying out self-correction on the model in real time according to the result of the misjudgment.

In the step 2) model construction, other alternative neural network models can be adopted to achieve similar effects, such as a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a generation antagonistic neural network (GAN) and the like. In the scheme of ' B) input feature similarity comparison ' input in step 3), other alternative distance comparison algorithms can be adopted to replace the Ming's distance to measure the similarity of the features, such as Euclidean distance, cosine distance, Hamming distance and the like.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims

1. A method for improving the efficiency of a fuzz test is characterized by comprising the following steps:

2) training a deep learning model of accessibility of input and target codes by using the training data; the learning goal of the deep learning model is to judge the accessibility of the target code input in the target program, and proper weight information is calculated by utilizing the fitting capacity of the neural network on data through mapping the input and the accessibility;

2. The method according to claim 1, wherein step 3) utilizes the deep learning model to prejudge the new input, and if the prejudged value is reachable, the new input is delivered to a target program for execution; if the prejudged value is unreachable, checking whether the prejudged value is credible or not by comparing whether the characteristics of the unreachable input are similar to the characteristics of the new input; if the input is credible, the input is discarded, and if the input is not credible, the new input is delivered to the target program for execution.

3. The method as claimed in claim 1, wherein step 2) firstly performs vectorization processing on the training data, then obtains an initial deep learning model through incremental training, and continuously updates the correction model in real time on the basis of the initial deep learning model by using misjudgment data according to the model test result.

4. The method of claim 3, wherein the vectorization process comprises:

5. The method of claim 3, wherein the incremental training comprises:

6. The method according to claim 2, wherein the feature of the new input is obtained by extracting feature mapping in a deep neural network to capture the weight of the contribution degree of each position in the new input to the final prejudgment result; the method comprises the steps of extracting features of inaccessible input in training data, and recording the non-repeated features into a list to serve as the features of the inaccessible input for subsequent feature comparison.

7. The method according to claim 2 or 6, characterized in that after feature comparison, if not reliable, the new input is delivered to the target program for execution and verification, if false, the new input is fed back to the data collection step for modification of the model in subsequent learning training, otherwise the input features are entered into a list of known inaccessible input features.

8. A system for improving the efficiency of fuzz testing, comprising:

the model construction module is responsible for training a deep learning model of accessibility of input and target codes by utilizing the training data; the learning goal of the deep learning model is to judge the accessibility of the target code input in the target program, and proper weight information is calculated by utilizing the fitting capacity of the neural network on data through mapping the input and the accessibility;

9. The system of claim 8, wherein the input filtering module pre-determines the new input using the deep learning model, and if the pre-determined value is reachable, delivers the new input to a target program for execution; if the prejudged value is unreachable, checking whether the prejudged value is credible or not by comparing whether the characteristics of the unreachable input are similar to the characteristics of the new input; if the input is credible, the input is discarded, and if the input is not credible, the new input is delivered to the target program for execution.