CN108563555A

CN108563555A - Failure based on four objective optimizations changes code prediction method

Info

Publication number: CN108563555A
Application number: CN201810021354.2A
Authority: CN
Inventors: 曲豫宾; 李芳�; 陈翔
Original assignee: Nantong Textile Vocational Technology College
Current assignee: Nantong Textile Vocational Technology College
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2018-09-21
Anticipated expiration: 2038-01-10
Also published as: CN108563555B

Abstract

The invention discloses a kind of, and the failure based on four objective optimizations changes code prediction method, belongs to Software Quality Assurance field.Include the following steps：(1) by excavating the version control system and defect tracking system of software project trustship, the data set for building change code failure predication model is collected.(2) multiple models for having non-Pareto dominance relation are eventually constructed by genetic algorithm based on four optimization aims.Wherein four optimization aims are respectively the wrong report quantity that the maximization approach failure code change quantity identified, the code inspection amount for minimizing developer's execution, the context switching times of reduction developer and reduction method change failure code.(3) after step (2), multiple models for having non-Pareto dominance relation can be constructed, therefore when actual prediction, which can flexibly be selected according to developer to the preference of target from these models.

Description

Failure based on four objective optimizations changes code prediction method

Technical field

The invention belongs to Software Quality Assurance technical fields, and in particular to a kind of failure change generation based on four objective optimizations Code prediction technique.

Background technology

Software fault prediction is by excavating software history library (such as version control system, defect tracking system etc.), structure Failure predication model, to identify the incipient fault program module in tested project in advance, by distributing more test resources It, can be with the distribution of optimal inspection resource, to effectively improve the quality of software product onto these program modules.

The present invention is paid close attention to carries out failure predication to the change code for being submitted to version control system.It has following excellent Point：

(1) in general, it is less to be submitted to the lines of code that the change code of version control system is related to by developer (normally only changing tens line codes), if therefore predict it and may contain faulty, the difficulty of fault restoration is simultaneously little.

(2) invention can be deployed in the version control system of enterprises, after user submits change code, the hair Prediction result is simultaneously returned to developer by the bright prediction that can carry out failure in time, and such developer sets being also familiar with code The positioning and removal of failure are quickly completed when meter.

The code inspection of the failure code change quantity, minimum developer's execution that are identified for maximization approach The wrong report quantity that amount, the context switching times for reducing developer and reduction method change failure code, it is necessary to examine Consider this four optimization aims, and designs the construction method for the change code prediction model that is out of order.Therefore the present invention gives birth to.

Invention content

Goal of the invention：The purpose of the present invention is to solve deficiencies in the prior art, provide a kind of effectively for more Change the method that code is predicted in time, after developer submits change code to version control system, can carry out in time Failure predication, to help to quickly complete the positioning and removal of failure, the final quality for improving software product.In structure model When, while considering the failure code that maximization approach identifies and changing quantity, minimize the code that developer executes The wrong report quantity that examination amount, the context switching times for reducing developer and reduction method change failure code these four Different optimization aims.

Technical solution：A kind of failure based on four objective optimizations of the present invention changes code prediction method, including such as Lower step：

(1) it by excavating the version control system and defect tracking system of software project trustship, collects for building change The data set of code failure predication model extracts all of project first by the version control system of analysis project trustship History changes code, is secondly measured to the change code extracted；

(2) four optimization aims are based on, by genetic algorithm, finally constructs and multiple has non-Pareto dominance relation Model；

(3) after constructing multiple models for having non-Pareto dominance relation, according to developer to the preference of target, from It is flexibly selected in these models.

Further, the index of step (1) vacuum metrics includes：(a) degree of scatter of code is changed；(b) code is changed Modification amount；(c) the modification purpose of code is changed；(d) history of code is changed；(e) warp of code dependent developer is changed It tests.

Further, it is based on four optimization aims in step (2), has non-Pareto branch by genetic algorithm structure is multiple Model with relationship, includes the following steps：

2-1) initialization population：It is returned using logarithm probability and changes code prediction model to build failure, it is assumed that use n Measure Indexes measure change code, then the coefficient availability vector w={ w of logarithm probability regression model₁, w₂..., w_nCome It indicates, the type of each vector element is type real；The change code m that the coefficient vector w and needs of setting models are predicted_i, It is wherein v to the metric of the change code with j-th of Measure Indexes_{I, j}, then the model pair can be calculated using following formula Change code prediction go out its contain faulty probability：

0.5 is set a threshold to, if the probability value of prediction is more than 0.5, then it is assumed that the change code may introduce event Barrier, if the probability value of prediction is less than 0.5, then it is assumed that the change code may realize that correctly calculation formula indicates as follows：

Chromosome in population is encoded using the vector, when initialization of population, can generate N number of dyeing at random Body, the vector element random assignment of each chromosome are then based on four optimization aims, calculate each chromosome at this four Adaptive value in optimization aim：

Optimization aim 1：The quantity of the failure code change identified on data set is maximized, value is the bigger the better, Assuming that all history change code that data set contains constitutes set M, the corresponding candidate solution of chromosome is w, and calculation formula is：

Wherein buggy (m_i) whether code change is indicated containing faulty, if being 1 containing faulty value, otherwise value is 0；

Optimization aim 2：Code inspection amount is minimized, value is the smaller the better, and calculation formula is：

Wherein LOC (m_i) indicate that code changes the lines of code being related to；

According to model to data concentrate code change prediction probability value, from big to small, by all codes in M change into Row sorts and carries out code inspection successively, then can calculate the adaptive value with latter two optimization aim；

Optimization aim 3：Since distribution of the defect in tested project substantially meets sixteen principles, which is After spending 20% code inspection amount, the code variation examined is needed, value is the smaller the better, and all codes are examined here The amount of looking into isIts value is higher, indicates under identical code inspection amount, and the code examined change is needed to get over It is more, it means that developer needs to carry out more context switchings, to be had an impact to their development efficiency；

Optimization aim 4：When its return developer examines code change successively, become when encountering first real failure code The code tested before more changes quantity, and value is the smaller the better, and the optimization aim value is higher, indicates the wrong report problem of model It is more serious, and the confidence of developer and patience may be impacted.

It 2-2) is based on a upper population, crossover operator and mutation operator are executed successively, and generate new chromosome, wherein intersecting Operator can select two chromosomes from a upper population at random according to crossover probability, intersected and generate two new dyeing Body, mutation operator can then select a chromosome at random according to mutation probability, into row variation and generate a new chromosome；

2-3) by the chromosome and the merging of new chromosome, formation set B in a upper population, then it is based on Pareto and dominates Relationship is that each chromosome in set B calculates NDR values, is defined first to Pareto dominance relation：

Assuming that there are two candidate solution w_iAnd w_j, then w_iPareto dominates w_j, and if only if：Under four optimization aims, w is solved_i No worse than solution w_j, and at least there is an optimization aim, solve w_iIt is better than solution w_j；

Then based in NDR value selective staining bodies to new population, selects NDR values for 1 chromosome first, then select The chromosome that NDR values are 2, when the chromosome quantitative selected is equal to N (N is population scale), which terminates；

Step 2-2 and step 2-3 2-4) are repeated, after reaching the iterations that population is specified, returns to current population In it is all not by other chromosome Paretos dominate chromosomes, wherein each chromosome correspond to a model.

Further, the calculating process of NDR values is as follows：It is identified from B first all not by other chromosome Paretos Their NDR values are set as 1, and they are removed from set B by the chromosome of domination.It continues thereafter with and identifies institute from B There is the chromosome not dominated by other chromosome Paretos, their NDR values is set as 2, and they are moved from set B It removes.The above process is repeated, until set B is sky.

Advantageous effect：A kind of effective method predicted in time for change code of the present invention, works as developer After submitting change code to version control system, failure predication can be carried out in time, to help to quickly complete determining for failure Position and removal, the final quality for improving software product.When building model, while considering what maximization approach identified Failure code changes quantity, the code inspection amount for minimizing developer's execution, the context switching times for reducing developer And wrong report quantity these four different optimization aims that reduction method changes failure code.

Description of the drawings

Fig. 1 is the overview flow chart of the present invention；

Fig. 2 is the schematic diagram of some specific code modification；

Fig. 3 is the schematic diagram of the labeling process to introducing change code；

Fig. 4 is the sectional drawing for the data set that the present invention is collected for some open source projects；

Fig. 5 is the execution schematic diagram of crossover operator of the present invention；

Fig. 6 is the execution schematic diagram of mutation operator of the present invention.

Specific implementation mode

In order to which more the technology path of statement foregoing invention, following present invention people enumerate for specific embodiment in detail Bright technique effect；It is emphasized that these embodiments are to be not limited to limit the scope of the invention for illustrating the present invention.

Embodiment 1

The present embodiment based on four objective optimizations failure change code prediction method overview flow chart as shown in Figure 1, It is characterized by comprising following steps：

(1) it by excavating the version control system and defect tracking system of software project trustship, collects for building change The data set of code failure predication model.First by the version control system of analysis project trustship, all of project are extracted History changes code.Some specific code revision is as shown in Figure 2.The change code is modified password.c, often If being labeled as "+" before a line, expression is specifically to change the newly-increased code of code, if being labeled as "-", table before every a line Show it is the current code for changing code deletion.

Secondly the change code extracted is measured, Measure Indexes include：(a) degree of scatter of code, example are changed Such as：The subsystem quantity of current change code revision, the catalogue quantity for changing code revision, the quantity of documents for changing code revision Deng.(b) the modification amount of code is changed, such as：Respective code row before the lines of code of newly-increased lines of code, deletion, modification Number etc..(c) the modification purpose of code is changed, i.e., specifically whether the purpose of change code is for removing failure.(d) code is changed History, such as：Developer's number that change code is related to.(e) experience of code dependent developer is changed.1 sieve of table Common Measure Indexes and its meaning are arranged.

The classification of 1 common Measure Indexes of table, title extremely meaning

Finally by the corresponding defect report of analysis change code and modification daily record, the mark to changing code can be completed Note is labeled as introducing the change code of failure and realizes correctly change code.Specifically, it is identified first for repairing The change code of failure.Mainly by searching the keywords such as Fixed or Bug in changing daily record.Then further confirm that this more Change whether code is for repairing failure.Mainly by using the information in defect tracking system.By failure ID number (such as Bug 12345) linking relationship for changing code and failure in defect tracking system can be set up.It finally determines when to introduce and be somebody's turn to do Failure.The specific code repaired the code of failure and changed is determined using diff orders, be then act through annotate lives first It enables to search the newest change code for changing these specific codes.Labeling process can briefly be introduced by Fig. 3.Such as Change code from version B to version C is to repair failure, we by modified code (version C) and change first Preceding code (version B), which is compared, can obtain the specific code that change code is related to (by diff orders).Then from this It goes to search in the code revision history of file and has modified more changing the time (the change code for introducing failure) for these codes recently, and And by the change code signing be introduce failure change code.

Based on the measurement and label for changing code to history, the collection of data set is completed.Fig. 4 is to be directed to some open source projects The sectional drawing of the data set of collection.

(2) four optimization aims are based on, by genetic algorithm, eventually constructs and multiple has non-Pareto dominance relation Model：

2-1) initialization population.The present invention returns (Logistic regression) to build failure using logarithm probability Change code prediction model.Assuming that being measured to change code using n Measure Indexes, then logarithm probability regression model is Number availability vector w={ w₁, w₂..., w_nIndicate, the type of each vector element is type real.The coefficient of setting models to The change code m that amount w and needs are predicted_i, wherein being v to the metric of the change code with j-th of Measure Indexes_{I, j}.It then can be with Using following formula calculate the model to change code prediction go out its contain faulty probability.

The present invention sets a threshold to 0.5, i.e., if the probability value of prediction is more than 0.5, then it is assumed that the change code may Failure can be introduced, if the probability value of prediction is less than 0.5, then it is assumed that the change code may be realized correctly.Its calculation formula indicates It is as follows：

Chromosome in population is encoded using the vector, when initialization of population, can generate N number of dyeing at random Body, the vector element random assignment of each chromosome.Four optimization aims are then based on, calculate each chromosome at this four Adaptive value in optimization aim.

Optimization aim 1：The quantity that failure code change is identified on data set is maximized, value is the bigger the better.It is false If all history change code that data set contains constitutes set M, the corresponding candidate solution of chromosome is w, and calculation formula is：

Wherein buggy (m_i) whether code change is indicated containing faulty, if being 1 containing faulty value, otherwise value is 0。

Optimization aim 2：Code inspection amount is minimized, value is the smaller the better.Its calculation formula is：

Wherein LOC (m_i) indicate that code changes the lines of code being related to.

According to model to data concentrate code change prediction probability value, from big to small, by all codes in M change into Row sorts and carries out code inspection successively, then can calculate the adaptive value with latter two optimization aim.

Optimization aim 3：Since distribution of the defect in tested project substantially meets sixteen principles, which is After spending 20% code inspection amount, need the code variation examined, value the smaller the better.Here all codes are examined The amount of looking into isIts value is higher, indicates under identical code inspection amount, and the code examined change is needed to get over It is more, it means that developer needs to carry out more context switchings, to be had an impact to their development efficiency.

Optimization aim 4：When its return developer examines code change successively, become when encountering first real failure code The code tested before more changes quantity, and value is the smaller the better.The optimization aim value is higher, indicates the wrong report problem of model It is more serious, and the information of developer and patience may be impacted.

We are based on a simplified example, to introduce the computational methods of this four optimization aims successively.Wherein change code The lines of code being related to is by by the value of Measure Indexes LA (i.e. newly-increased lines of code) and the Measure Indexes LD (generations deleted Code line number) the Calais Zhi Xiang obtain.

Assuming that (for easy analysis, LA values are added by we with LD values to be related to data set as change code as shown in table 2 The lines of code arrived), the prediction result of the model based on some chromosome structure is as shown in table 3, we will own in the table Change code is sorted from big to small according to prediction probability value.

2 raw data set of table

ID	NS	ND	……	LA+LD	Actual type
						1	2	3	……	100	1
2	1	2	……	25	0
						3	3	3	……	50	1
4	1	2	……	100	0
						5	2	2	……	50	0
6	1	4	……	100	0
						7	4	2	……	30	1
8	3	3	……	75	0
						9	2	4	……	200	1
10	1	2	……	70	0
						11	5	1	……	200	1

Data set after the sequence of table 3

ID	NS	ND	……	LA+LD	Actual type	Prediction probability	Type of prediction
								10	1	2	……	70	0	0.95	1
7	4	2	……	30	1	0.90	1
								1	2	3	……	100	1	0.85	1
8	3	3	……	75	0	0.82	1
								2	1	2	……	25	0	0.75	1
11	5	1	……	200	1	0.65	1
								3	3	3	……	50	1	0.60	1
4	1	2	……	100	0	0.55	1
								5	2	2	……	50	0	0.50	0
6	1	4	……	100	0	0.40	0
								9	2	4	……	200	1	0.30	0

According to table 3, we can calculate adaptive value of the chromosome in four optimization aims successively.

The result of calculation of optimization aim 1 is：0×1+1×1+1×1+0×1+0×1+1×1+1×1+0×1+0×0+0 × 0+1 × 0=4.

The result of calculation of optimization aim 2 is：70+30+100+75+25+200+50+100=650.

When calculation optimization target 3, when examining to the 3rd change code, it is used for 20% test (because having examined 200 line codes, and total lines of code is that 1000), therefore its optimization target values is 3 to resource.

When calculation optimization target 4, when examining to the 2nd change code, just find that it is really containing event The coding change of barrier, therefore its optimization target values is 1.

It 2-2) is based on a upper population, crossover operator and mutation operator are executed successively, and generate new chromosome, wherein intersecting Operator can select two chromosomes from a upper population at random according to crossover probability, intersected and generate two new dyeing Body.Mutation operator can then select a chromosome at random according to mutation probability, into row variation and generate a new chromosome.

Wherein the schematic diagram of crossover operator is as shown in figure 5, it is intersected between element 3 and element 4, and is generated Two new chromosomes.

The schematic diagram of mutation operator is as shown in fig. 6, it has carried out value variation on element 4, and generates new dyeing Body.

2-3) by the chromosome and the merging of new chromosome, formation set B in a upper population, then it is based on Pareto and dominates Relationship is that each chromosome in set B calculates NDR (non-dominated rank) value.First to Pareto dominance relation into Row definition：

Assuming that there are two candidate solution w_iAnd w_j, then w_iPareto dominates w_j, and if only if：Under four optimization aims, w is solved_i No worse than solution w_j, and at least there is an optimization aim, solve w_iIt is better than solution w_j。

Pareto dominance relation is explained by example, for example, if there are three candidate solutions：

● candidate solution 1 is respectively 4,650,3,1 in the value of four optimization aims.

● candidate solution 2 is respectively 5,630,3,1 in the value of four optimization aims.

● candidate solution 3 is respectively 5,630,3,2 in the value of four optimization aims.

It is noted herein that optimization aim 1 is that the higher the better for value, and optimization aim 2 takes to optimization aim 4 It is worth the smaller the better.

Therefore according to definition, it cannot make out 2 Pareto of candidate solution and dominate candidate solution 1, because it is in optimization aim 1 and optimization Value is more preferable in target 2, and value is equal in other optimization aims.

But candidate solution 3 is unable to Pareto and dominates candidate solution 1, because its value in optimization aim 2 is more preferable, but in optimization mesh Value is worse on mark 4.

The calculating process of NDR values is as follows：All dyeing not dominated by other chromosome Paretos are identified from B first Their NDR values are set as 1, and they are removed from set B by body.Continue thereafter with identified from B it is all not by other The chromosome that chromosome Pareto dominates, is set as 2, and they are removed from set B by their NDR values.It repeats The above process, until set B is sky.

Then based in NDR value selective staining bodies to new population, selects NDR values for 1 chromosome first, then select The chromosome that NDR values are 2, when the chromosome quantitative of selection is equal to N, which terminates.

Step 2-2 and step 2-3 2-4) are repeated, after reaching the iterations that population is specified, returns to current population In it is all not by other chromosome Paretos dominate chromosomes.Each chromosome corresponds to a model.

(3) after step (2), multiple models for having non-Pareto dominance relation can be constructed, therefore in reality When prediction, which can flexibly be selected according to developer to the preference of target from these models.If such as Developer more lays particular stress on optimization aim 1, then in these models selection the better model of 1 time performance of optimization aim come into Row prediction.If developer more lays particular stress on optimization aim 2, selected in these models more preferable in 2 times performances of optimization aim Model predicted.

A kind of effective method predicted in time for change code of the present invention, when developer is to Version Control After system submits change code, failure predication can be carried out in time, to help to quickly complete the positioning and removal of failure, most The quality of software product is improved eventually.When building model, while considering the failure code that maximization approach identifies and becoming More quantity, the code inspection amount for minimizing developer's execution, the context switching times for reducing developer and reduction side Wrong report quantity these four different optimization aims that method changes failure code.

The above described is only a preferred embodiment of the present invention, be not intended to limit the present invention in any form, though So the present invention has been disclosed as a preferred embodiment, and however, it is not intended to limit the invention, any technology people for being familiar with this profession Member, without departing from the scope of the present invention, when the technology contents using the disclosure above make a little change or modification For the equivalent embodiment of equivalent variations, as long as being the content without departing from technical solution of the present invention, according to the technical essence of the invention To any simple modification, equivalent change and modification made by above example, in the range of still falling within technical solution of the present invention.

Claims

1. a kind of failure based on four objective optimizations changes code prediction method, it is characterised in that：Include the following steps：

(1) it by excavating the version control system and defect tracking system of software project trustship, collects for building change code The data set of failure predication model extracts all history of project first by the version control system of analysis project trustship Code is changed, secondly the change code extracted is measured；

(2) multiple moulds for having non-Pareto dominance relation are finally constructed by genetic algorithm based on four optimization aims Type；

(3) after constructing multiple models for having non-Pareto dominance relation, according to developer to the preference of target, from these It is flexibly selected in model.

2. a kind of failure based on four objective optimizations according to claim 1 changes code prediction method, it is characterised in that： The index of step (1) vacuum metrics includes：(a) degree of scatter of code is changed；(b) the modification amount of code is changed；(c) code is changed Modification purpose；(d) history of code is changed；(e) experience of code dependent developer is changed.

3. a kind of failure based on four objective optimizations according to claim 1 changes code prediction method, it is characterised in that： It is based on four optimization aims in step (2), multiple models for having non-Pareto dominance relation are built by genetic algorithm, including Following steps：

2-1) initialization population：It is returned using logarithm probability and changes code prediction model to build failure, it is assumed that use n measurement Index measures change code, then the coefficient availability vector w={ w of logarithm probability regression model₁, w₂..., w_nIndicate, The type of each vector element is type real；The change code m that the coefficient vector w and needs of setting models are predicted_i, wherein using J-th of Measure Indexes is v to the metric of the change code_{I, j}, then the model can be calculated using following formula to changing generation Code predict its contain faulty probability：

0.5 being set a threshold to, if the probability value of prediction is more than 0.5, then it is assumed that the change code may introduce failure, if The probability value of prediction is less than 0.5, then it is assumed that the change code may realize that correctly calculation formula indicates as follows：

Chromosome in population is encoded using the vector, when initialization of population, can generate N number of chromosome at random, The vector element random assignment of each chromosome, is then based on four optimization aims, it is excellent at this four to calculate each chromosome Change the adaptive value in target：

Optimization aim 1：The quantity of the failure code change identified on data set is maximized, value is the bigger the better, it is assumed that All history change code that data set contains constitutes set M, and the corresponding candidate solution of chromosome is w, and calculation formula is：

According to model data are concentrated with the prediction probability value of code change, from big to small, all codes change in M is arranged Sequence simultaneously carries out code inspection successively, then can calculate the adaptive value with latter two optimization aim；

Optimization aim 3：Since distribution of the defect in tested project substantially meets sixteen principles, which is when flower After taking 20% code inspection amount, the code variation examined is needed, value is the smaller the better, here all code inspection amounts ForIts value is higher, indicates under identical code inspection amount, needs the code examined change more, this Mean that developer needs to carry out more context switchings, to be had an impact to their development efficiency；

Optimization aim 4：When its return developer examines code change successively, it is changed when encountering first real failure code The code of preceding test changes quantity, and value is the smaller the better, and the optimization aim value is higher, indicates that the wrong report problem of model is tighter Weight, and the confidence of developer and patience may be impacted.

It 2-2) is based on a upper population, executes crossover operator and mutation operator successively, and generate new chromosome, wherein crossover operator Two chromosomes can be selected from a upper population at random according to crossover probability, be intersected and generated two new chromosomes, become Exclusive-OR operator can then select a chromosome at random according to mutation probability, into row variation and generate a new chromosome；

2-3) by a upper population chromosome and new chromosome merge, set B is formed, then based on Pareto dominance relation NDR values are calculated for each chromosome in set B, Pareto dominance relation is defined first：

Assuming that there are two candidate solution w_iAnd w_j, then w_iPareto dominates w_j, and if only if：Under four optimization aims, w is solved_iNot It is worse than solution w_j, and at least there is an optimization aim, solve w_iIt is better than solution w_j；

Then based in NDR value selective staining bodies to new population, selects NDR values for 1 chromosome first, then select NDR Value is 2 chromosome, and when the chromosome quantitative selected is equal to N (N is population scale), which terminates；

Step 2-2 and step 2-3 2-4) are repeated, after reaching the iterations that population is specified, returns to institute in current population There is the chromosome not dominated by other chromosome Paretos, wherein each chromosome corresponds to a model.

4. a kind of failure based on four objective optimizations according to claim 3 changes code prediction method, it is characterised in that： The calculating process of NDR values is as follows：All chromosomes not dominated by other chromosome Paretos are identified from B first, by it NDR values be set as 1, and they are removed from set B.Continue thereafter with identified from B it is all not by other chromosomes The chromosome that Pareto dominates, is set as 2, and they are removed from set B by their NDR values.Repeat above-mentioned mistake Journey, until set B is sky.