Summary of the invention
The problem to be solved in the present invention is: for the location of mistake technology CBFL based on coverage, identify and process the correct test case of contingency in original test use cases, thereby improve the effect of the robotization location of mistake based on coverage.
Technical scheme of the present invention is: the test suite optimization method of the location of mistake technology based on coverage, for given test use cases T, by cluster, therefrom identify the correct test case of contingency, the correct test case of described contingency refers to that mistake statement is performed, the test case of " passing through " but execution result is still, the test case that the contingency identifying is correct is processed, and the test use cases being optimized is for the location of mistake based on coverage:
1) on the target program of needs test, move test use cases T, in operation test case, collect the execution profile information of test case, after having moved test use cases T, according to Output rusults, the execution result of judging each test case as " pass through " or " by ";
2) the execution profile information obtaining is carried out to cluster; The method of cluster of the present invention is unrestricted;
3) the correct test case of identification contingency: execution profile information is carried out after cluster, test case is divided in several classes bunch, by the test case of those and " not by " be gathered in a class bunch " by " test case be designated the test case that contingency is correct, and they are joined in set Ticc, set Ticc is the set of the test case that identified contingency is correct;
4) process the correct test case of contingency: a kind of recognition result of processing in set Ticc in optional following two kinds of modes:
41) filtering policy is deleted the test case in Ticc from test use cases T;
42) relabel strategy, by the execution result of the test case in Ticc judge label from " by " make " not passing through " into;
Be designated test case that contingency is correct after above-mentioned processing, the test use cases T ' being optimized.
Because the recognition methods of the correct test case of contingency of the present invention has utilized the coverage information of test case, thus the method only for CBFL, but it is applicable to belong to all location of mistake methods of CBFL.
Cluster analysis is existing application in test case selection field, and the present invention has also introduced this technology in the correct problem of contingency.Although both belong to the category of test suite optimization, but the former object is to reduce the size of test use cases, this can helper person find mistake as soon as possible, reduce the regression tested time, and the object of the invention is to improve the ability that test use cases helps location of mistake, reduce programmer to search the wrong time, thereby reduce the wrong cost of debugging, both cluster targets are different, and the processing after cluster is also different.
Because the test case in a class bunch has similar execution route, therefore, be not difficult to infer, the test case of passing through in class bunch may be the same with the unsanctioned test case in this class bunch, passed through wrong statement, but because they all do not meet 3 conditions that propose in PIE model, so do not cause and lost efficacy.Therefore, these test cases of passing through are very likely the test cases that contingency is correct.
The invention has the beneficial effects as follows: in yojan optimized the quality of test use cases in former test use cases big or small, reduced test case that contingency the is correct interference effect to the location of mistake based on coverage, thereby improved efficiency and the accuracy of robotization location of mistake, saved the time cost of programmer's location of mistakes.
Embodiment
The present invention is applied to selecting in problem of test case that contingency is correct by cluster analysis, the test case of picking out is concentrated to the execution result label of deleting or changing these test cases from original test case, by above 2 kinds of methods, optimize original test use cases, on test use cases after optimization, apply existing robotization location of mistake technology, the efficiency of location of mistake and accuracy are improved.
For given test use cases T, this test use cases is comprised of 2 parts: Tp and Tf, and target of the present invention identifies Tcc exactly from Tp, and the result of identification is Ticc, and each test case in Ticc has very large possibility to belong to Tcc, wherein:
T: the test use cases that a given program is used
Tp: the set of the test case of passing through
Tf: the set of unsanctioned test case
Tcc: the set of the test case that contingency is correct
Ticc: the set of the test case that the contingency that identifies is correct
In actual applications, need to complete work of the present invention by 5 steps, as Fig. 1:
1) on the target program of needs test, move test use cases.In operation test case, collect the execution profile information of test case.After having moved test use cases, according to Output rusults, the execution result of judging each test case as " pass through " or " by ";
2) execution profile is carried out to cluster, the method for cluster is unrestricted, has the test case of similar execution profile by cluster;
3) the correct test case of identification contingency;
4) process the correct test case of contingency;
5) on the test use cases of optimizing, apply the existing automatic location of mistake technology based on coverage.
Wherein, step 1), 2), 5) in robotization location of mistake and cluster analysis, have ready-made methods and applications, do not repeat them here.Here mainly for step 3), 4) carry out detailed explanation.
After execution profile is carried out to cluster, test case is divided in several classes bunch.In each class bunch, may include by with unsanctioned test case, the test case of passing through that the present invention is gathered in those and unsanctioned test case in a class bunch is designated the test case that contingency is correct, and they are joined in set Ticc.Element in Ticc is very likely the test case that contingency is correct, the reasons are as follows:
(1) test case of carrying out wrong statement might not cause inefficacy, otherwise but be false, unsanctioned test case must be carried out wrong statement;
(2) test case that has a similar execution profile can be divided in a class bunch, and this just means that the picked out test case of passing through has similar execution route with unsanctioned test case.
In sum, those test cases that are selected have probably been carried out wrong statement, but still have provided correct output, and this definition correct with contingency conforms to.
In order to process Ticc, the present invention proposes 2 kinds of different strategies:
● filtering policy.Test case in Ticc can be concentrated and be deleted from original test case, thereby reduces the interference effect of these test cases to robotization location of mistake.
● relabel strategy.The result of the test case in Ticc judge label can by from " by " make " not passing through " into.Like this, the suspicious degree of mistake statement is enhanced, and may cause the suspicious degree rank of this statement to rise.
Below by specific embodiment, enforcement of the present invention is described.
Siemens's test use cases is the test use cases being widely used in sphere of learning.The method of a lot of assessment software tests and error detection all makes to use it as target program and tests.Table 1 has been listed the details of Siemens's program.
The details of table 1 Siemens program
Take west gate subroutine as example, and embodiments of the present invention are as follows:
1, implementation of test cases collect execution profile.
Use gcov (GNU call-coverage profiler) as the instrument of collecting execution profile.For this reason, need in the automatized script of implementation of test cases, add compile option.Gcov can obtain the programming overlay information of statement level.After test case ti ∈ T is performed, can produce a .gcov file.This file record the number of times that is performed of every line code.Concerning test case ti, its statement covers section pi=<e1, e2, and e3 ..., en>, wherein, n represents the lines of code of preset sequence, if this line code was performed, ei=1 so, otherwise ei=0.
2, cluster analysis.
The execution profile that previous step is collected is the input of cluster analysis.The profile information of each test case is the object of cluster.The test case number that the sum of object and test use cases T comprise equates.We use Weka as Clustering tool, with n-, tie up Eudlidean distance as distance function.Given 2 test case t:<e1, e2 ..., en> and t ': <e1 ', e2 ' ..., the section of en ' >, the distance of these 2 test cases is exactly:
We use simple K-means as clustering algorithm.Simple K-means is used the number of class bunch as parameter.In the present invention, the size of this parameter is to set according to the size of test use cases T.Make CN represent the number of class bunch, CN=|T|*p, | T| is the size of test use cases T, and 0 < p < 1.Because the present embodiment is selected K-means clustering method, the method is used the number of class bunch as parameter, so the length of code should meet some requirements, make the execution profile of all test cases meet certain diversity factor, otherwise do not reach the class bunch number of setting.
Identification and the processing of the test case that 3, contingency is correct.
The correct test case of identification contingency: execution profile information is carried out after cluster, test case is divided in several classes bunch, by the test case of those and " not by " be gathered in a class bunch " by " test case be designated the test case that contingency is correct, and they are joined in set Ticc, set Ticc is the set of the test case that identified contingency is correct;
Process the correct test case of contingency: a kind of recognition result of processing in set Ticc in optional following two kinds of modes:
1) filtering policy is deleted the test case in Ticc from test use cases T;
2) relabel strategy, by the execution result of the test case in Ticc judge label from " by " make " not passing through " into;
Be designated test case that contingency is correct after above-mentioned processing, the test use cases T ' being optimized.
The program that above-mentioned identification and processing can be used Java exploitation to write realizes, it is input that this program be take the 2nd cluster result of step and the execution result of each test case, according to above-mentioned identification with process the correct described method of test case of contingency and select and process the test case that contingency is correct.
4, location of mistake.
Use Tarantula as the location of mistake method based on coverage.It is that first is applied to the sort method based on frequency spectrum the system in software diagnosis field.Its input is the test use cases through yojan and after optimizing, the list of the suspicious statement that output is arranged from big to small according to suspicious degree.Programmer can check correlative one by one according to this list, until find wrong statement.
Fig. 2-4 are the experiment effect figure of the present invention in Siemens's program.
What Fig. 2 represented is the accuracy of the correct test case of identification contingency of the present invention, and p represents the ratio of cluster number, and Fig. 2 (a)-(e) represents respectively p=1%, 2%, 4%, 6%, 8%." FN " and " FP " represents respectively " " " false negative " and " false positive ".The computing formula of " False negative " is: | Tcc-Ticc|/| Tcc|, this metric has been assessed the ability that the present invention identifies test case that contingency is correct, that is to say, whether the present invention can identify the test case that contingency is correct as much as possible.This metric is the smaller the better.The computing formula of " false positive " is: | (Tp-Tcc) ∩ Ticc|/| Tp-Tcc|, this metric has been assessed the correctness that the present invention identifies the test case that contingency is correct.That is to say, whether the present invention can think other test cases by mistake is as few as possible test cases that contingency is correct.Horizontal ordinate has represented the interval at " FN " and " FP " place, and ordinate has represented the shared ratio of program version that drops on respective bins.
Fig. 3 represents is application filtering policy, the raising effect of the present invention to the method for the location of mistake based on coverage, and p represents the ratio of cluster number, Fig. 3 (a)-(e) represents respectively p=1%, 2%, 4%, 6%, 8%.
Fig. 4 represents is application relabel strategy, the raising effect of the present invention to the method for the location of mistake based on coverage, and p represents the ratio of cluster number, Fig. 4 (a)-(e) represents respectively p=1%, 2%, 4%, 6%, 8%.
Fig. 3 and Fig. 4 have represented the statistics for each target program T-score reduction with case line chart, T-scorereduction △ TS=TS-TS ', and wherein, TS and TS ' have represented respectively T-score before application the present invention and afterwards.T-score is the effective metric for estimation error localization method.It has represented in order to find that the code of wrong required inspection accounts for the ratio of total code amount.This metric is the smaller the better.The implication that is formulated T-score is as follows:
TS=|Vexamined|/|V|*100%
Wherein, | V| has represented the sum of the executable statement of target program, | Vexamined| has represented in order to find wrong statement, the lines of code that programmer need to check.Therefore, the value of T-score reduction is larger, and it is better to mean the improvement effect of location of mistake.
2 lines up and down of the case figure of Fig. 3 and Fig. 4 have represented respectively the 3rd quartile and first quartile.Horizontal line in the middle of case figure has represented median, and stain has represented average.The point of chest above and below has represented respectively maximal value and minimum value.
As a reference, Fig. 3 and Fig. 4 have also shown the ideally impact of use strategy separately on CBFL effect.So-called ideal situation, supposes all selected situations out of test case that all contingency is correct exactly, and " falsenegative " and " false positive " is all 0% situation.