Summary of the invention
The problem to be solved in the present invention is: for the location of mistake technology CBFL based on coverage, discern and handle the correct test case of contingency in the original test use cases, thereby improve the effect based on the robotization location of mistake of coverage.
Technical scheme of the present invention is: based on the test use cases optimization method of the location of mistake technology of coverage, for given test use cases T, therefrom discern the correct test case of contingency by cluster, the correct test case of described contingency refers to that the mistake statement is performed, the test case that but execution result still is " passing through ", the correct test case of contingency that identifies is handled, and the test use cases that is optimized is used for the location of mistake based on coverage:
1) operation test use cases T on the target program of needs test, in the operation test case, collect the execution profile information of test case, moved after the test use cases T, according to the output result, judge that the execution result of each test case is " pass through " or " pass through ";
2) the execution profile information that obtains is carried out cluster; The method of cluster of the present invention is unrestricted;
3) the correct test case of identification contingency: after the execution profile information is carried out cluster, test case is divided in several classes bunch, with the test case of those and " not by " accumulate in the class bunch " by " test case be designated the correct test case of contingency, and they are joined among the set Ticc, set Ticc is the set of the correct test case of the contingency that identified;
4) handle the correct test case of contingency: a kind of recognition result of handling among the set Ticc in the optional following dual mode:
41) filtering policy is deleted the test case among the Ticc from test use cases T;
42) relabel strategy, with the execution result of the test case among the Ticc judge label from " by " make " passing through " into;
Be designated the correct test case of contingency through after the above-mentioned processing, the test use cases T ' that is optimized.
Because the recognition methods of the correct test case of contingency of the present invention has utilized the coverage information of test case, so this method only is used for CBFL, but it is applicable to all location of mistake methods that belong to CBFL.
Cluster analysis is existing the application in test case selection field, and the present invention has also introduced this technology on the correct problem of contingency.Though the both belongs to the category that test use cases is optimized, but the former purpose is to reduce the size of test use cases, this can helper person find mistake as soon as possible, reduce the regression tested time, and the objective of the invention is to improve the ability that test use cases helps location of mistake, reduce the programmer to search the wrong time, thereby reduce the wrong cost of debugging, both cluster target differences, the processing after the cluster are also different.
Because the test case in class bunch has similar execution route, therefore, be not difficult to infer, the test case of passing through in class bunch may be the same with the unsanctioned test case in this class bunch, passed through wrong statement, but, do not lose efficacy so cause because they all do not satisfy 3 conditions that propose in the PIE model.Therefore, these test cases of passing through very likely are the correct test cases of contingency.
The invention has the beneficial effects as follows: in yojan optimized the quality of test use cases in the size of former test use cases, reduced the correct test case of contingency to interference effect based on the location of mistake of coverage, thereby improved the efficient and the accuracy of robotization location of mistake, saved the time cost of programmer's location of mistakes.
Embodiment
The present invention is applied to selecting on the problem of the correct test case of contingency with cluster analysis, the test case of picking out is concentrated the execution result label of deleting or changing these test cases from original test case, optimize original test use cases by above 2 kinds of methods, on the test use cases after the optimization, use existing robotization location of mistake technology, make the efficient of location of mistake and accuracy be improved.
For given test use cases T, this test use cases is made up of 2 parts: Tp and Tf, and target of the present invention identifies Tcc exactly from Tp, and the result of identification is Ticc, and each test case among the Ticc all has very big possibility to belong to Tcc, wherein:
T: to an employed test use cases of given program
Tp: the set of the test case of passing through
Tf: the set of unsanctioned test case
Tcc: the set of the test case that contingency is correct
Ticc: the set of the test case that the contingency that is identified is correct
In actual applications, need finish work of the present invention by 5 steps, as Fig. 1:
1) on the target program of needs test, moves test use cases.In the operation test case, collect the execution profile information of test case.Moved after the test use cases,, judged that the execution result of each test case is according to the output result " pass through " or " pass through ";
2) carry out cluster to carrying out section, the method for cluster is unrestricted, and the test case with similar execution section is got final product by cluster;
3) the correct test case of identification contingency;
4) handle the correct test case of contingency;
5) on the test use cases of optimizing, use existing automatic location of mistake technology based on coverage.
Wherein, step 1), 2), 5) ready-made methods and applications are all arranged in robotization location of mistake and cluster analysis, do not repeat them here.Here be primarily aimed at step 3), 4) carry out detailed explanation.
Carry out after the cluster carrying out section, test case is divided in several classes bunch.All may include in each class bunch by with unsanctioned test case, the present invention is designated the correct test case of contingency with the test case of passing through that those and unsanctioned test case accumulate in the class bunch, and they are joined among the set Ticc.Element among the Ticc very likely is the correct test case of contingency, the reasons are as follows:
(1) test case of carrying out wrong statement might not cause inefficacy, otherwise but be false, unsanctioned test case must be carried out wrong statement;
(2) test case with similar execution section can be divided in the class bunch, and this just means that the test case of being picked out of passing through has similar execution route with unsanctioned test case.
In sum, those test cases that are selected have probably been carried out wrong statement, but still have provided correct output, and correct definition conforms to contingency for this.
In order to handle Ticc, the present invention proposes 2 kinds of different strategies:
● filtering policy.Test case among the Ticc can be concentrated deletion from original test case, thereby reduces the interference effect of these test cases to the robotization location of mistake.
● the relabel strategy.The result of the test case among the Ticc judge label can by from " by " make " by " into.Like this, the suspicious degree of mistake statement is enhanced, and may cause the suspicious degree rank of this statement to rise.
Below by specific embodiment enforcement of the present invention is described.
Siemens's test use cases is the test use cases that is widely used in sphere of learning.A lot of assessment software tests and the wrong method of finding all use it to experimentize as target program.Table 1 has been listed the details of Siemens's program.
The details of table 1 Siemens program
With the west door subroutine is example, and embodiments of the present invention are as follows:
1, section is carried out in implementation of test cases and collection.
Use gcov (GNU call-coverage profiler) as collecting the instrument of carrying out section.For this reason, need in the automatized script of implementation of test cases, add compile option.Gcov can obtain the program coverage information of statement level.After test case ti ∈ T is performed, can produce a .gcov file.This file logging the number of times that is performed of every line code.Concerning test case ti, its statement covers section pi=<e1, e2, and e3 ..., en 〉, wherein, n represents the lines of code of preset sequence, if this line code was performed, ei=1 so, otherwise ei=0.
2, cluster analysis.
The execution section that previous step is collected is the input of cluster analysis.The profile information of each test case all is the object of cluster.The test case number that the sum of object and test use cases T comprise equates.We use Weka as the cluster instrument, tie up the Eudlidean distance as distance function with n-.Given 2 test case t:<e1, e2 ..., en〉and t ':<e1 ', e2 ' ..., en ' section, the distance of these 2 test cases is exactly:
We use simple K-means as clustering algorithm.Simple K-means uses the number of class bunch as parameter.Among the present invention, the size of this parameter is to set according to the size of test use cases T.Make CN represent the number of class bunch, CN=|T|*p, | T| is the size of test use cases T, and 0<p<1.Because present embodiment is selected the K-means clustering method, this method uses the number of class bunch as parameter, so the length of code should meet some requirements, make the execution section of all test cases satisfy certain diversity factor, otherwise do not reach the class bunch number of setting.
3, the identification of the test case that contingency is correct and processing.
The correct test case of identification contingency: after the execution profile information is carried out cluster, test case is divided in several classes bunch, with the test case of those and " not by " accumulate in the class bunch " by " test case be designated the correct test case of contingency, and they are joined among the set Ticc, set Ticc is the set of the correct test case of the contingency that identified;
Handle the correct test case of contingency: a kind of recognition result of handling among the set Ticc in the optional following dual mode:
1) filtering policy is deleted the test case among the Ticc from test use cases T;
2) relabel strategy, with the execution result of the test case among the Ticc judge label from " by " make " passing through " into;
Be designated the correct test case of contingency through after the above-mentioned processing, the test use cases T ' that is optimized.
Above-mentioned identification and processing can use Java exploitation written program to realize, this program is input with the cluster result in the 2nd step and the execution result of each test case, selects and handle the correct test case of contingency according to above-mentioned identification and the correct described method of test case of processing contingency.
4, location of mistake.
Use Tarantula as location of mistake method based on coverage.It is that first will be applied to the system in software diagnosis field based on the sort method of frequency spectrum.Its input is through yojan and the test use cases after optimizing, the tabulation of exporting the suspicious statement of arranging from big to small according to suspicious degree.The programmer can check correlative one by one according to this tabulation, till finding wrong statement.
Fig. 2-4 is the experiment effect figure of the present invention on Siemens's program.
What Fig. 2 represented is the accuracy of the correct test case of identification contingency of the present invention, and p represents the ratio of cluster number, and Fig. 2 (a)-(e) represents p=1%, 2%, 4%, 6%, 8% respectively." FN " and " FP " represents " " " false negative " and " false positive " respectively.The computing formula of " False negative " is: | Tcc-Ticc|/| Tcc|, this metric has been assessed the ability that the present invention discerns contingency correct test case, that is to say whether the present invention can identify the correct test case of contingency as much as possible.This metric is the smaller the better.The computing formula of " false positive " is: | (Tp-Tcc) ∩ Ticc|/| Tp-Tcc|, this metric have assessed the correctness that the present invention discerns the correct test case of contingency.That is to say whether the present invention can think other test cases by mistake as few as possible is the correct test cases of contingency.Horizontal ordinate has been represented the interval at " FN " and " FP " place, and ordinate has been represented the shared ratio of program version that drops on respective bins.
Fig. 3 represents is to use filtering policy, and the present invention is to the raising effect based on the method for the location of mistake of coverage, and p represents the ratio of cluster number, and Fig. 3 (a)-(e) represents p=1%, 2%, 4%, 6%, 8% respectively.
Fig. 4 represents is to use the relabel strategy, and the present invention is to the raising effect based on the method for the location of mistake of coverage, and p represents the ratio of cluster number, and Fig. 4 (a)-(e) represents p=1%, 2%, 4%, 6%, 8% respectively.
Fig. 3 and Fig. 4 have represented statistics for each target program T-score reduction with the case line chart, T-scorereduction △ TS=TS-TS ', and wherein, TS and TS ' have represented respectively before using the present invention and T-score afterwards.T-score is used for the effective metric of estimation error localization method.It has represented the ratio that accounts for the total code amount for the code of finding wrong required inspection.This metric is the smaller the better.The implication that is formulated T-score is as follows:
TS=|Vexamined|/|V|*100%
Wherein, | V| has represented the sum of the executable statement of target program, | Vexamined| has represented in order to find wrong statement, the lines of code that the programmer need check.Therefore, the value of T-score reduction is big more, and it is good more to mean the improvement effect of location of mistake.
2 lines up and down of the case figure of Fig. 3 and Fig. 4 have been represented the 3rd quartile and first quartile respectively.Horizontal line in the middle of the case figure has been represented median, and stain has been represented average.The point of chest above and below has been represented maximal value and minimum value respectively.
As a reference, Fig. 3 and Fig. 4 have shown that also ideally use strategy separately is to the influence of CBFL effect.So-called ideal situation is supposed all selected situation of coming out of the correct test case of all contingency exactly, and promptly " falsenegative " and " false positive " all is 0% situation.