CN113434408B

CN113434408B - Unit test case sequencing method based on test prediction

Info

Publication number: CN113434408B
Application number: CN202110711925.7A
Authority: CN
Inventors: 刘辉; 朱志浩; 李亚辉; 李光杰
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2022-04-08
Anticipated expiration: 2041-06-25
Also published as: CN113434408A

Abstract

The invention relates to a unit test case sequencing method based on test prediction, and belongs to the technical field of computer software testing. For a given method under test, the invention automatically generates millions of input-output pairs, trains a neural network to simulate the behavior of the method, compares the actual output of the method under test with the expected output of the neural network, and the test case is more suspicious when the distance between the two outputs is larger. And finally, selecting the top k most suspicious test cases for manual verification. If there are test cases which can reveal the defect of the tested method, namely the actual output is inconsistent with the expected output, the defect of the tested method is found by the invention. Compared with the prior art, the method and the device can automatically generate the training data according to the tested method, and automatically generate the unit test prediction in the dynamic execution of the method. The invention obviously reduces the workload of manually verifying the test cases.

Description

Unit test case sequencing method based on test prediction

Technical Field

The invention relates to a unit test case sequencing method based on test prediction, and belongs to the technical field of computer software testing.

Background

In the software development process, software testing is one of the most important and time-consuming tasks. In software testing, it is impossible to automatically recognize whether a test case passes or fails without an accurate test oracle. Therefore, test predictions are crucial for automated software testing.

Currently, most automated software testing methods focus on the automatic generation of test cases (inputs). Automatically verifying generated test cases is often challenging and it is difficult to determine which test cases can reveal defects in the method under test. The actual output of the method under test is typically compared to the expected output, and the test case passes only if the actual output equals the corresponding expected output. However, the expected output is often difficult to obtain because most test case generation tools are unable to generate the expected output for the system under test. When the number of test cases is large, how to automatically verify the test cases is called a test prediction problem.

There are many existing approaches to solving/mitigating the test prediction problem. One of the most intuitive ways is to specify the specification of the system under test and to give the expected output of the input according to the specification. Such a test prediction is called a specified test prediction (specified test orders). While this approach is intuitive and effective, specifying the specification of a software system is a challenging task.

The relationship of metamorphosis in metamorphic tests is also often used as an alternative to test prediction. The metamorphic relation does not explicitly specify the expected output, but rather yields a relationship between input and output that results from running the system multiple times. Metamorphic testing significantly alleviates the test prediction problem because metamorphic relationships are easier to specify than the specification of the system. However, finding a metamorphic relationship manually is still quite time consuming.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a unit test case sequencing method based on test predictions in order to solve the technical problem that the expected output of a tested system cannot be generated in the existing test case generation method.

The innovation points of the invention are as follows: for a given method under test, the invention automatically generates millions of input-output pairs, trains a neural network to simulate the behavior of the method, compares the actual output of the method under test with the expected output of the neural network, and the test case is more suspicious when the distance between the two outputs is larger. And finally, selecting the top k most suspicious test cases for manual verification. If there are test cases which can reveal the defect of the tested method, namely the actual output is inconsistent with the expected output, the defect of the tested method is found by the invention.

The invention is realized by the following technical scheme:

a unit test case sequencing method based on test prediction comprises the following steps:

step 1: for the method under test, an input to the method is generated that is on the order of at least millions.

Specifically, according to the type and the rule of input parameters of the tested method, at least one million inputs meeting the requirements of the tested method are randomly generated.

Step 2: and (4) calling the tested method according to the input generated in the step (1) to obtain corresponding output.

Specifically, the generated result of step 1 is input into the method to be tested, and the method is operated to obtain the output corresponding to each input. The inputs generated in step 1 and the outputs obtained in step 2 constitute the test case of the method under test, i.e. < method input, method output >.

And step 3: based on the signature of the measured method, a neural network is initialized.

Specifically, the number of neurons in the input layer and the output layer of the neural network is initialized according to the input and output formats of the method to be tested.

And 4, step 4: and training the neural network.

Specifically, the initialized neural network is trained by using the test case generated in the step 2. If the result of training the neural network is not stable and accurate, the method will fail and terminate, otherwise, the next step is taken.

And 5: and generating a new test case according to the step 1 and the step 2, transmitting the input of the generated test case into the trained neural network, and comparing the output (called actual output) in the test case with the output (called expected output) of the neural network. The loss function of the neural network is divided into two types, one is data in which both the input and the output of the test case are numerical types, and the other is data in which both the input and the output of the test case are non-numerical types (characters and character strings). The relevant details of the two loss functions are as follows:

step 5.1: for data with numerical input and output of test cases, mean square error MSE is used as a loss function.

Step 5.2: for data where both the input and output of the test case are non-numeric, the following penalty functions are defined:

wherein, loss is a loss function of the model, and the difference between the output of the model and the true value is measured; n is the size of the training data; act_iIs the actual output of the ith sample in the method under test,

is the expected output of the ith sample in the method being tested.

Wherein the content of the first and second substances,

representing the distance between the actual output and the expected output of the ith sample in the measured method, m being the number of method output parameters act_iIs the actual output of the ith sample in the method under test,

is the expected output of the ith sample in the method being tested. opt_i,jRepresenting the actual value of the jth output parameter of the ith sample,

is the expected value of the jth output parameter of the ith sample.

The ratio of the edit distance between the actual output and the expected output of the ith sample jth parameter to their sum of lengths is shown.

Wherein the content of the first and second substances,

is the edit distance between two strings, | str | is the length of the string str.

act_i＝<opt_i,1,…,opt_i,m> (4)

Since neural networks require the input and output to be presented as numerical vectors. Thus, both the input and output of the test case are converted into a vector of values. The method expresses each character of non-numerical data in a test case through corresponding ASCII codes and connects the digital ASCII codes to form a digital vector. Finally, the numerical values are normalized to the [0,1] range interval.

Compared to MSE, this loss function has the following advantages:

first, the loss function facilitates a uniform evaluation of different networks. MSE reflects the absolute distance between the actual output and the expected output, and therefore the result of the calculation is significantly affected by the output range. The loss function proposed by the present invention replaces the absolute error with a relative error, taking into account that the outputs of different networks differ significantly from each other. Second, the loss function proposed by the present invention uses the edit distance to calculate the distance between two strings, which is more accurate and interpretable than vector distance based embedding.

If the result of the neural network is not converged, the method cannot simulate the function of the tested method, and at the moment, the method cannot recommend any recommended test case for the tested method. If the results of the network converge, the results can be used as test predictions for the testing of the method elements.

Step 6: and ranking the test cases according to the distance between the actual output and the expected output of the test cases in a descending order, and displaying the top k most suspicious test cases to developers. If there are suspicious test cases that reveal a defect in the method under test, the developer needs to repair the defect.

Specifically, the method calculates a suspicious score for all automatically generated test cases. The suspicious score of a test case is equal to the loss of the network after the test case is input into the network. The method sorts the suspicious scores in a descending order, and displays the first k most suspicious test cases for the detection of developers. If any of the manually verified test cases reveal defects in the method under test, the developer should repair the defects and repeat the unit test on the repaired version of the method under test.

Advantageous effects

Compared with the prior art, the method of the invention has the following beneficial effects:

1. the invention can automatically generate training data according to the tested method, and automatically generate unit test prediction in the dynamic execution of the method;

2. the method uses the sequenced most suspicious test cases suggested by the trained neural network to verify the defects of the tested method, thereby remarkably reducing the workload of manually verifying the test cases;

3. the method uses a real defect function in the Defect data set of Defect 4J for evaluation, and the result shows that the method is accurate and available;

4. the neural network provided by the invention can accurately simulate a method with 78% Defects on a Defect data set of Defect 4J;

5. the accuracy of 12 neural network results in 14 defective methods tested by the Defect data set of Defect 4J is 100%, which shows that all the proposed test cases can reveal the method defect.

Drawings

Fig. 1 is a schematic diagram of the working principle of the present invention.

Detailed Description

The invention is further illustrated and described in detail below with reference to the figures and examples.

Examples

As shown in FIG. 1, a method for sequencing unit test cases based on test predicates.

This example details the procedure and effects of the invention when embodied under 18 methods in the Defects library of Defects Defects 4J.

Table 1 hardware environment configuration information table

Hardware environment configuration	Processor model	Memory device	Operating system
				Test environment	3.2GHz Core i7-8700	16G	64-bit Windows 10

Step 1: an appropriate method was selected from the defect library Defects4J for experimental evaluation.

It is worth noting that not all of the defect methods in the library are suitable for evaluation. There is a need for a method that eliminates the following unavailability:

1. method of error reporting due to exceptions (runtime errors). For the method with the error, the failed test case and the successful test case can be distinguished by capturing the exception. Therefore, there is no need to distinguish using this invention. This class of methods accounts for 38.1% of all methods in the defect library.

2. Errors in the method are not caused by an incorrect implementation. Errors may be due to various causes, such as incorrect implementation, incorrect specification (requirements), and incorrect interpretation of the specification. It is noted that the method of the present invention is designed for developers performing unit tests and is intended to identify errors caused by incorrect implementations. Thus, the present invention eliminates methods that are not error-causing by incorrect implementation, and more than 40% of these methods reveal defects. Such a high rate of failed test cases indicates that errors should be easily discovered if they are not caused by an incorrect implementation. One reasonable reason for ignoring such errors is that they are the result of developers misleading the requirements (or the requirements being in error). This class of methods accounts for 26.4% of all methods of defect libraries;

3. and taking an external complex file as an input or output method. For example, in the project Compress, the getnextzuppentry () method requires a zip file as input. These methods are excluded from evaluation because current implementations of the methods are not capable of automatically generating such complex inputs. This class of methods accounts for 32.5% of all methods of defect libraries.

After excluding these methods, 18 defect methods out of 8 items were finally selected for evaluation. Table 2 gives an overview of the selected methods. The inputs and outputs of the 18 methods selected in table 2 were trained and tested in a hardware environment as shown in table 1.

TABLE 2 selected test methods

Step 2: for the method selected in step 1, its inputs and outputs and their corresponding ranges are specified. Test cases (inputs) are generated using the method of the present invention. Overall, it generated 1,100,000 test cases for each method. One million test cases are randomly selected as training samples, and the other hundred thousand test cases are selected as test samples.

And step 3: training data is input into the proposed model for training, and if the loss function value of the training network is greater than β 0.02, the method will fail due to "inaccurate modeling" and the evaluation of the method will be terminated. If the tested method can be accurately simulated, the effect of network simulation is verified on the test data. A test case fails if and only if the current (error) version of the method and its repair version (which are publicly available for all of the repair versions of the error methods in Defects 4J) generate different outputs for the same inputs specified by the test case. Based on this criterion, the performance (i.e. accuracy and recall) of the proposed method is calculated as shown in table 3:

TABLE 3 simulation Effect of the measured method

It can be seen that the proposed neural network can accurately model 78% (-14/18) of the defect-containing method, validating the basic assumption of the method.

The model was applied to 14 defect methods, whose corresponding neural networks were converged.

The results of the evaluation are given in table 4, which shows how many of the test cases ranked in top k (k 1,10 and 100, respectively) reveal defects, sorted in descending order according to the loss function.

TABLE 4 method Performance evaluation

The results indicate that among 12 of the 14 defective methods (i.e., M1-M12), precision @ k for this method is 100%, indicating that all of the proposed suspect test cases are capable of revealing defects. For the first 11 methods (i.e., M1-M11), the first 100 proposed test cases all revealed defects. Such high accuracy means that a developer can find an error in executing the method under test of the test case only by checking one test case suggested by the method. The method successfully finds the test case capable of revealing the error, which probably greatly promotes the development of unit test.

While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims

1. A unit test case sequencing method based on test prediction is characterized by comprising the following steps:

step 1: randomly generating not less than one million inputs meeting the requirements of the tested method according to the type and the rule of input parameters of the tested method;

step 2: inputting the generated result of the step 1 into a tested method, and operating the method to obtain the output corresponding to each input; the input generated in step 1 and the output obtained in step 2 constitute a test case of the method under test, namely < method input, method output >;

and step 3: initializing the number of neurons of an input layer and an output layer of a neural network according to the input and output formats of a tested method;

and 4, step 4: training the initialized neural network by using the test case generated in the step 2;

if the result of training the neural network is not stable and accurate, i.e. if the loss function value of the training network is greater than β -0.02, the method will fail and terminate, otherwise, the next step is taken;

and 5: generating a new test case according to the step 1 and the step 2, transmitting the input of the generated test case into the trained neural network, and comparing the output of the test case with the output of the neural network; the loss function of the neural network is divided into two types, wherein one type is data with numerical value for input and output of a test case, and the other type is data with non-numerical value for input and output of the test case;

expressing each character of non-numerical data in the test case by corresponding ASCII codes, connecting the digital ASCII codes to form a digital vector, and finally, normalizing the digital value to a [0,1] range interval;

for data of which the input and the output of the test case are numerical values, the mean square error MSE is used as a loss function;

for data where both the input and output of the test case are non-numeric, the following penalty functions are defined:

is the expected output of the ith sample in the method under test;

wherein the content of the first and second substances,

is the expected output of the ith sample in the method under test; opt_i,jRepresenting the actual value of the jth output parameter of the ith sample,

is the expected value of the jth output parameter of the ith sample;

representing the ratio of the edit distance between the actual output and the expected output of the jth parameter of the ith sample to the sum of the lengths of the actual output and the expected output;

wherein the content of the first and second substances,

is the edit distance between two strings, | str | is the length of string str;

act_i＝<opt_i,1,…,opt_i,m> (4)

step 6: ranking the test cases according to the distance between the actual output and the expected output of the test cases in a descending order, and displaying the top k most suspicious test cases to developers; if there are suspicious test cases that reveal a defect in the method under test, the developer needs to repair the defect.