CN103810102A - Method and system for predicting software defects - Google Patents

Method and system for predicting software defects Download PDF

Info

Publication number
CN103810102A
CN103810102A CN201410056820.2A CN201410056820A CN103810102A CN 103810102 A CN103810102 A CN 103810102A CN 201410056820 A CN201410056820 A CN 201410056820A CN 103810102 A CN103810102 A CN 103810102A
Authority
CN
China
Prior art keywords
svm classifier
classifier device
value
parameters
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410056820.2A
Other languages
Chinese (zh)
Inventor
胡昌振
薛静锋
王男帅
单纯
胡晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201410056820.2A priority Critical patent/CN103810102A/en
Publication of CN103810102A publication Critical patent/CN103810102A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a method and system for predicting software defects. The method and system are used for solving the problems that software defect prediction precision is low in the prior art and SVM parameters are difficult to select. The method includes the following steps that firstly, a training data set is acquired, and a software defect prediction model based on an SVM classifier is built; secondly, the optimal metric unit attribute subset of the training data set and optimal values of parameters C and sigma of the SVM classifier are searched for at the same time according to the genetic algorithm, wherein the optimal metric unit attribute subset refers to the attributes capable of independently representing a module corresponding to the training data set, and the optimal values of the parameters C and sigma of the SVM classifier refer to the group of values, capable of determining the optimal classification hyperplane function of the SVM classifier, of the parameters C and sigma; thirdly, the optimal software defect prediction model based on the SVM classifier is acquired according to the acquired optimal metric unit attribute subset and the optimal values of the parameters C and sigma of the SVM classifier; fourthly, software to be tested is subjected to defect prediction according to the acquired optimal software defect prediction model.

Description

A kind of method and system for forecasting software defect
Technical field
The invention belongs to technical field of software security, relate to a kind of method and system for forecasting software defect.
Background technology
Software defect forecasting techniques has experienced long-term development and has continued to keep active.Various forecasting techniquess have different adaptive surfaces, solve different problems, also have different limitations.The situation differences such as the historical data of concrete customer demand, environment, product feature, tissue, applicable failure prediction technology is also different.
Static software defect forecasting techniques origin early, the failure prediction aspect mainly concentrating on based on software size isometry unit is concentrated in early stage research, study the relation between the base attribute such as defect and software size, complexity, the defects count that may exist with this forecasting software.Early 1990s, it is found that not average in software or the completely random of defect and distributes, and occurred the forecasting techniques for defect distribution, the present invention belongs to the one of defect distribution forecasting techniques.
In this class technology, based on support vector machine (Support Vector Machine, be called for short SVM) failure prediction to be verified be effective method comparatively, SVM is a kind of new general learning method of getting up take finite sample Statistical Learning Theory as base growth, effectively solved small sample, high dimension, non-linear etc. problem concerning study.SVM is widely used in the fields such as software defect prediction, image scanning and text classification.But as a kind of new study machine, its parameter is selected to lack unified approach and standard, thereby cause the precision of its software defect prediction not high; And the thought of prior art is all merely to go to improve the precision of failure prediction from the angle that adopts distinct methods to carry out dimensionality reduction to measurement metric property set.
Summary of the invention
The invention provides a kind of method and system for forecasting software defect, not high in order to solve existing software defect precision of prediction, SVM parameter is selected difficult problem.
For a method for forecasting software defect, comprise the following steps:
Step 1, obtain training dataset, and set up the software defect forecast model based on svm classifier device;
Step 2, utilize genetic algorithm to find the optimum measurement metric attribute set of training dataset and the parameters C of svm classifier device, the optimum value of σ simultaneously; Wherein, optimum measurement metric attribute set refers to the attribute that can independently represent training dataset corresponding module; The optimum value of parameters C, σ refers to can determine that group parameters C of svm classifier device optimal classification lineoid function, the value of σ;
The optimum measurement metric attribute set that step 3, basis obtain and the parameters C of svm classifier device, the optimum value of σ, obtain the optimum software bug prediction model based on svm classifier device;
The optimum software bug prediction model that step 4, basis obtain is carried out failure prediction to software under testing.
The software defect forecast model wherein obtaining in step 1 based on svm classifier device adopts following method:
The measurement metric attribute that the 1.1 pairs of training datas are concentrated and parameters C, the σ of svm classifier device carry out binary coding, generate initial individually, obtain initial population;
1.2 pairs of initial individualities are decoded, and obtain measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ; According to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value;
1.3 calculate individual fitness evaluation value according to fitness function and failure prediction accuracy rate, and judge whether individual fitness evaluation value meets end condition, be to export the attribute set of this individuality correspondence and corresponding svm classifier device parameters C, the value of σ, obtain the optimum software bug prediction model based on svm classifier device; Otherwise, carry out genetic manipulation, constantly update the individual new population that obtains, until meet end condition.
Above-mentioned fitness function is: F=100R – m|N – n|;
Wherein, R is the failure prediction accuracy rate of svm classifier device, and N is the number of current attribute set vacuum metrics meta-attribute, and n is constant; M is weight coefficient, pursues compared with high-class accuracy rate and reduces computing cost weight between the two for adjusting, and m is larger, and expression is more responsive to computing cost, and the less expression of m is more insensitive to computing cost, but more focuses on the accuracy rate of failure prediction.
Above-mentioned end condition is set as individual fitness evaluation value and has reached preset value, or genetic manipulation has reached default maximum genetic algebra.
Above-mentioned parameters C, σ to the concentrated measurement metric property set of training data and svm classifier device carries out binary coding and adopts following methods:
Setting initial individual chromosome coding total length is n+x+y position;
The measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute;
The corresponding parameters C of middle x position code length, wherein, the span of C is (0,1000);
The corresponding parameter σ of y position code length of afterbody, its span is (0,10).
In step 4, according to the optimum prediction model obtaining, software under testing is predicted and adopted following methods: attribute data corresponding in software to be predicted is input in forecast model and judges whether attribute data meets the condition that comprises defect, be on the Output rusults of svm classifier device, to make a kind of mark, otherwise on the Output rusults of svm classifier device, make another kind of mark.
For a system for software defect prediction, comprising: forecast model is set up unit, optimizing unit and failure prediction unit;
Forecast model is set up unit, for obtaining training dataset, and sets up the software defect forecast model based on svm classifier device;
Optimizing unit, for utilize genetic algorithm to find the optimum measurement metric attribute set of training dataset and the parameters C of svm classifier device, the optimum value of σ simultaneously; According to the parameters C of the optimum measurement metric attribute set obtaining and svm classifier device, the optimum value of σ, obtain the optimum software bug prediction model based on svm classifier device; Wherein, optimum measurement metric attribute set refers to those attributes that can independently represent training dataset corresponding module; The optimum value of parameters C, σ refers to can determine that group parameters C of svm classifier device optimal classification lineoid function, the value of σ;
Failure prediction unit, for carrying out failure prediction according to the optimum software bug prediction model obtaining to software under testing.
Described optimizing unit comprises: coding unit, svm classifier device training unit and fitness evaluation unit;
Coding unit, for parameters C, the σ of the concentrated measurement metric property set of training data and svm classifier device are carried out to binary coding, generates initial individuality, obtains initial population;
Svm classifier device training unit, for initial individuality is decoded, obtains measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ; According to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value;
Fitness evaluation unit, for calculating individual fitness evaluation value according to fitness function and failure prediction accuracy rate, and judge whether individual fitness evaluation value meets end condition, be to export the attribute set of this individuality correspondence and corresponding svm classifier device parameters C, the value of σ, obtain the optimum software bug prediction model based on svm classifier device; Otherwise, carry out genetic manipulation, constantly update the individual new population that obtains, until meet end condition.
Wherein the fitness function in fitness evaluation unit is:
F=100R–m|N–n|;
Wherein, R is the failure prediction accuracy rate of svm classifier device, and N is the number of current attribute set vacuum metrics meta-attribute, and n is constant; M is weight coefficient, pursues compared with high-class accuracy rate and reduces computing cost weight between the two for adjusting, and m is larger, and expression is more responsive to computing cost, and the less expression of m is more insensitive to computing cost, but more focuses on the accuracy rate of failure prediction.
Described end condition is set as individual fitness evaluation value and has reached preset value, or genetic manipulation has reached default maximum genetic algebra.
Wherein parameters C, the σ of the concentrated measurement metric property set of training data and svm classifier device are carried out to binary coding and adopt following method:
If initial individual chromosome coding total length is n+x+y position;
The measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute;
The corresponding parameters C of middle x position code length, wherein, the span of C is (0,1000);
The corresponding parameter σ of y position code length of afterbody, its span is (0,10).
Wherein failure prediction unit is concentrated corresponding attribute data to be input in optimum software bug prediction model software data to be predicted and is judged whether attribute data meets the condition that comprises defect; Be on the Output rusults of svm classifier device, to make a kind of mark, otherwise on the Output rusults of svm classifier device, make another kind of mark.
Beneficial effect of the present invention:
The present invention concentrates various measurement metric attributes and the module of each module whether to have defect as initial input take historical data, then use genetic algorithm to select measurement metric property set, continuous breeding by each individuality in population is evolved, select in the situation that reducing as far as possible attribute number, failure prediction is there is to the optimum attributes subset of maximum value, and obtain the support vector machine optimized parameter based on this attribute set, set up the software defect forecast model based on support vector machine with this.In forecasting process, by corresponding the software module of required prediction property value input model, can draw the whether defective predicted value of tool of this software module.
The present invention can guarantee the high accuracy of software defect prediction and the optimization precision of support vector machine parameter; Measurement metric attribute is selected and parameter optimization is considered as a whole in the lump simultaneously, solved interacting of may existing between these two work, interrelated problem.
Accompanying drawing explanation
Fig. 1 is a kind of schematic diagram for forecasting software defect method that one embodiment of the invention provides;
Fig. 2 is a kind of process flow diagram for forecasting software defect method that another embodiment of the present invention provides;
Fig. 3 is that chromosome coding total length and figure place individual in one embodiment of the invention distributed schematic diagram;
Fig. 4 is a kind of block diagram for forecasting software defective system that one embodiment of the invention provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Technical conceive of the present invention is: the various measurement metric attributes of software module can be described himself characteristic comparatively all sidedly, the value of all kinds of measurement metrics of software module is more approaching, the each side feature of specification module is more similar, and it is also just more approaching that they exist the possibility of defect.The present invention is based on this, using all kinds of metrics of the historical each module of software project and exist defect whether as knowledge data, and by support vector machine as sorter, according to all kinds of metrics of module to be predicted, matched in corresponding classification, dope this module and there is defect or do not have defect.Because too much measurement metric data may not have practical significance for prediction, and can increase computing cost and affect efficiency, and the current parameter for support vector machine is selected also ununified clear and definite method, mostly be by virtue of experience manually to obtain with great many of experiments, the present invention proposes to select and parameter optimization method based on the measurement metric of genetic algorithm, use genetic algorithm to select measurement metric property set, continuous breeding by each individuality in population is evolved, select in the situation that reducing as far as possible attribute number (reducing computing expense), failure prediction is there is to the optimum attributes subset of maximum value, and the support vector machine optimized parameter of acquisition based on this attribute set, set up the software defect forecast model based on support vector machine with this, thereby effectively improve the software defect precision of prediction based on support vector machine.Genetic algorithm refers to the viewpoint of utilizing biogenetics, in conjunction with the thought of the survival of the fittest and random information exchange, by operations such as natural selection, intersection, variations, upgrades the individual evolution of population that realizes.In searching process, genetic algorithm (Genetic Algorithm, be called for short GA) also start search in the multiple starting points of the random generation of solution space simultaneously, carry out guidance search direction by fitness function, be a kind of search technique that can seek fast in complex search space global optimization solution.Be widely used in fields such as optimization and machine learning at present.
One embodiment of the invention provides a kind of method for forecasting software defect.Fig. 1 is a kind of schematic diagram for forecasting software defect method that one embodiment of the invention provides; Referring to Fig. 1, the method comprises:
Step S100: obtain training dataset, and set up the software defect forecast model based on svm classifier device;
Step S110: utilize genetic algorithm simultaneously to find the optimum measurement metric attribute set of training dataset and the parameters C of svm classifier device, the value of σ;
Step S120: according to the optimum measurement metric attribute set obtaining and the parameters C of svm classifier device, the value of σ, obtain the optimum software bug prediction model based on svm classifier device;
Step S130: software under testing is carried out to failure prediction according to the optimum software bug prediction model obtaining.
Wherein, optimum measurement metric attribute set refers to those attributes that can independently represent training dataset corresponding module; The optimum value of parameters C, σ refers to can determine that group parameters C of svm classifier device optimal classification lineoid function, the value of σ.
Particularly, obtain training dataset and refer in the present embodiment that whether the various measurement metric attributes of each module in version software in the past and module have defect is initial input, obtains training dataset.Meanwhile, it is that instrument is set up software defect forecast model that the present embodiment adopts support vector machine, and the basic thought of support vector machine is to solve kernel function and quadratic programming problem, by kernel function, data-mapping is solved to the problem of Nonlinear separability to high-dimensional feature space.Radial basis kernel function (Radial Basis Function is called for short RBF) has wider convergence range, is comparatively ideal classification foundation function, and the present invention selects radial basis kernel function.The selection of the parameter σ of kernel function and parameter (penalty factor) C is most important to the performance of svm classifier device, only has and selects suitable model parameter, and the superiority of SVM could be brought into play better.In the present embodiment, adopt current most popular radial basis function to be optimized SVM, adopt Genetic Algorithms to select its nuclear parameter σ and penalty factor.
Fig. 2 is a kind of process flow diagram for forecasting software defect method that another embodiment of the present invention provides.Referring to Fig. 2, the workflow of this Software Defects Predict Methods that one embodiment of the invention provides is:
Step S200: obtain training dataset;
Step S201: set up the software defect forecast model based on svm classifier device;
Step S202: the measurement metric attribute to data set and the parameter of SVM are encoded
Particularly, step S202 comprises: setting initial individual chromosome coding total length is n+x+y position; The measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute; The corresponding penalty factor of middle x position code length, wherein, the span of C is (0,1000); The corresponding parameter σ of y position code length of afterbody, its span is (0,10).
Fig. 3 is that chromosome coding total length and figure place individual in one embodiment of the invention distributed schematic diagram; Referring to Fig. 3, in the present embodiment, setting individual chromosome coding total length is n+40 position, the measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute.
N has represented the number of the training dataset vacuum metrics meta-attribute obtaining, and has several measurement metric attributes, and it is several that n just equals.Due to the data set difference getting, the number of the measurement metric attribute of data centralization also can be different, and n is according to the difference of the measurement metric attribute of the data centralization getting and difference.
Referring to Fig. 3, the penalty factor of the middle corresponding support vector machines sorter of 20 code lengths, wherein, the span of C is (0,1000), in the present embodiment, the value of C is accurate to after radix point 3, and the length of 20 can meet this precision, represents a penalty factor being accurate to after radix point in the svm classifier device of 3 by the chromosome length of 20.
Referring to Fig. 3, the parameter σ of the corresponding kernel function of 20 code lengths of afterbody, its span is (0,10).In the present embodiment, the value of σ is accurate to after radix point 5, needs equally the code length of 20 to meet this precision.Represent a parameter σ who is accurate to after radix point the kernel function of 5 by the chromosome length of 20.
Be appreciated that in other embodiments of the invention, individual chromosome coding total length is not limited to n+40 position, should specifically determine according to the precision of the actual measurement metric property set comprising of training set and penalty factor and nuclear parameter σ.
Step S203: generate initial individuality, and the initial individual value of random initializtion, randomly the value of the front n position in chromosome coding total length is made as to 1 or 0; The value of the value of initialization penalty factor and nuclear parameter σ.When initialization, assignment is carried out in each individual corresponding binary coding position.
Step S204: obtain initial population.In the present embodiment, the size of initial population is 100, and a population comprises 100 individualities.
Step S205: use corresponding attribute set and parameter training SVM;
Initial individuality is decoded, obtain measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ; According to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value.
Particularly, initial individuality is decoded, obtain measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ.During due to initial individual initialization, input 1 or 0 that can be random in front n position, this measurement metric attribute is selected in 1 representative, this measurement metric attribute is not selected in 0 representative, in the time of decoding, only get those respective value and be 1 measurement metric attribute, in n measurement metric attribute of data set, the measurement metric attribute that all encoded radios are 1 has formed a measurement metric attribute set.When initial individual, compose a C value can to 20 of the centres of chromosome coding total length, the value of C is accurate to after radix point 3, and 1/1000.In the time of decoding, 20 numerical value with binary coding representation in the middle of this need be decoded into 1 concrete C value.Similarly, in when decoding, 20 of this afterbodys need be decoded into 1 concrete σ value by the numerical value with binary coding representation.The value of σ is accurate to after radix point 5, and 1/10000.
After decoding, according to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value.
The failure prediction accuracy rate calculating under value for the parameters C of a selected measurement metric attribute set and SVM, σ, adopts ten folding cross validation methods to obtain the performance data (as predictablity rate etc.) of current support vector machine.Particularly, ten folding cross validation methods are: the training dataset getting is divided into 10 parts, wherein 1 part is done test set, all the other 9 parts are done training set, calculate 1 failure prediction accuracy rate under parameters C, σ value, so repeat 10 times, obtain 10 failure prediction accuracys rate under parameters C, σ value.These 10 failure prediction accuracys rate are averaging, obtain an average failure prediction accuracy rate, the failure prediction accuracy rate of the svm classifier device using the consensus forecast accuracy rate obtaining as current C, under σ value.
Step S206: obtain fitness evaluation value; Calculate individual fitness evaluation value according to fitness function and failure prediction accuracy rate.In the present embodiment, fitness function is F=100R – m|N – n|; Wherein, R is the failure prediction accuracy rate calculating in step S205, and N is the number of the attribute set vacuum metrics meta-attribute that adopts in S205, and n is constant; M is weight coefficient, pursues compared with high-class accuracy rate and reduces computing cost weight between the two for adjusting, and m is larger, and expression is more responsive to computing cost, and the less expression of m is more insensitive to computing cost, but more focuses on the accuracy rate of failure prediction.
The object that individuality is carried out to fitness evaluation is also will consider the computing cost problems of too that too much measurement metric attribute brings in better the failure prediction accuracy rate of pursuing software defect forecast model is more high.This function comes by adjusting weight coefficient m, is pursuing compared with high-class accuracy rate and is reducing computing cost and do a balance between the two better.
Step S207: judging whether individual fitness evaluation value meets end condition, is to export the attribute set of this individuality correspondence and corresponding svm classifier device parameters C, the value of σ.Wherein, end condition is:
Individual fitness evaluation value has reached predefined value; Or genetic manipulation reaches maximum genetic algebra.
In the present embodiment, maximum genetic algebra was 500 generations; Individual fitness evaluation value is exactly the value of the F that calculates by fitness function in above-mentioned steps S207.In the time that the value of F meets predefined threshold values or genetic manipulation and reached for 500 generation, determine that individual fitness evaluation value has reached end condition.
Step S208: obtain the optimum software bug prediction model based on svm classifier device; If the ideal adaptation degree evaluation of estimate in step S207 has reached end condition, the value of determining measurement metric attribute set corresponding in individuality and parameters C, σ is optimum, and calculating the optimal classification lineoid function of svm classifier device according to the value of this optimized parameter C, σ, the optimum metric attribute subset using according to this optimal classification lineoid function and the software defect prediction found obtains the optimum software bug prediction model based on svm classifier device.
Step S211: otherwise, carry out genetic manipulation, constantly update the individual new population that obtains, until meet end condition.
If the ideal adaptation degree evaluation of estimate in step S207 does not reach end condition, this individuality is carried out to genetic manipulation, constantly update individuality, obtain new population, until meet end condition.Genetic manipulation specifically comprises selection, intersection, variation.
After having optimized, according to the optimum software bug prediction model obtaining, software under testing is carried out to failure prediction, particularly:
Step S209: input test data set; Optimum supporting vector machine model and optimum attributes subset are obtained according to step S208; In the time of concrete prediction, need to collect the measurement metric data in the corresponding optimum attributes subset of software module to be predicted, these measurement metric data are input in best bug prediction model.
Step S210: measurement metric data are input to after best bug prediction model, can obtain for the whether defective predicted value of tool of this model.Be on the Output rusults of svm classifier device, to make a kind of mark, otherwise on the Output rusults of svm classifier device, make another kind of mark.In the present embodiment, adopt letter " N " to represent there is no defect in this software module, letter " Y " represents to comprise defect in this software module, needs software test personnel to take effective countermeasure, emphatically this module is tested.
Thus, the present invention is based on Genetic Algorithms and support vector machines and set up software defect forecast model, find simultaneously software defect is predicted to the optimum metric attribute subset of use and the parameter of SVM, solved measurement metric attribute set and select the problem that is mutually related that may exist between the selection of SVM parameter.Adopt this method can further improve the precision of software defect prediction simultaneously.
Another embodiment of the present invention provides a kind of system for forecasting software defect, and Fig. 4 is a kind of block diagram for forecasting software defective system that one embodiment of the invention provides.Referring to Fig. 4, this system comprises: forecast model is set up unit 410, optimizing unit 420 and failure prediction unit 430;
Forecast model is set up unit 410, for obtaining training dataset, and sets up the software defect forecast model based on svm classifier device;
Optimizing unit 420, for utilize genetic algorithm to find the optimum measurement metric attribute set of training dataset and the parameters C of svm classifier device, the value of σ simultaneously; According to the optimum measurement metric attribute set obtaining and the parameters C of svm classifier device, the value of σ, obtain the optimum software bug prediction model based on svm classifier device;
Failure prediction unit 430, for carrying out failure prediction according to the optimum software bug prediction model obtaining to software under testing;
Wherein, optimum measurement metric attribute set refers to those attributes that can independently represent training dataset corresponding module; The optimum value of parameters C, σ refers to that group parameters C, the σ that can determine svm classifier device optimal classification lineoid function
In the present embodiment, optimizing unit 420 comprises: coding unit, svm classifier device training unit and fitness evaluation unit;
Coding unit, for parameters C, the σ of the concentrated measurement metric property set of training data and svm classifier device are carried out to binary coding, generates initial individuality, and the initial individual value of random initializtion, obtains initial population;
Svm classifier device training unit, for initial individuality is decoded, obtains measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ; According to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value;
Fitness evaluation unit, for calculating individual fitness evaluation value according to fitness function and failure prediction accuracy rate, and judge whether individual fitness evaluation value meets end condition, be to export the attribute set of this individuality correspondence and corresponding svm classifier device parameters C, the value of σ, obtain the optimum software bug prediction model based on svm classifier device; Otherwise, carry out genetic manipulation, constantly update the individual new population that obtains, until meet end condition;
In the present embodiment, the fitness function in fitness evaluation unit is: F=100R-m|N – n|; Wherein, R is the failure prediction accuracy rate of svm classifier device, and N is the number of current attribute set vacuum metrics meta-attribute, and n is constant; M is weight coefficient, pursues compared with high-class accuracy rate and reduces computing cost weight between the two for adjusting, and m is larger, and expression is more responsive to computing cost, and the less expression of m is more insensitive to computing cost, but more focuses on the accuracy rate of failure prediction.
In the present embodiment, end condition is: individual fitness evaluation value has reached preset value; Or genetic manipulation has reached default maximum genetic algebra.
In the present embodiment, parameters C, the σ of the concentrated measurement metric property set of training data and svm classifier device being carried out to binary coding comprises:
If initial individual chromosome coding total length is n+x+y position;
The measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute;
The corresponding penalty factor of middle x position code length, wherein, the span of C is (0,1000);
The corresponding parameter σ of y position code length of afterbody, its span is (0,10).
In the present embodiment, failure prediction unit 430, comprises for software under testing being carried out to failure prediction according to the optimum software bug prediction model obtaining:
Concentrate corresponding attribute data to be input in optimum software bug prediction model software data to be predicted and judge whether attribute data meets the condition that comprises defect; Be on the Output rusults of svm classifier device, to make a kind of mark, otherwise on the Output rusults of svm classifier device, make another kind of mark.
It should be noted that this system for forecasting software defect that the embodiment of the present invention provides is corresponding with the aforementioned method for forecasting software defect, the concrete course of work, referring to preceding method embodiment, is not repeating herein.
In sum, the present invention utilizes the continuous breeding of each individuality in population to evolve by genetic algorithm, select in the situation that reducing as far as possible attribute number, failure prediction is there is to the optimum attributes subset of maximum value, and obtain the support vector machine optimized parameter based on this attribute set, set up the software defect forecast model based on support vector machine with this.Beneficial effect of the present invention is: first, solved measurement metric attribute and selected the interrelated problem that may exist between (being Data Dimensionality Reduction) and the selection of SVM parameter; Secondly, by adjusting the weight parameter in fitness function, between pursuit failure prediction accuracy rate and computing cost, find a kind of balance, more flexible in actual use like this; Finally, technical solution of the present invention can further improve the accuracy rate of software defect prediction.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any modifications of doing within the spirit and principles in the present invention, be equal to replacement, improvement etc., be all included in protection scope of the present invention.

Claims (12)

1. for a method for forecasting software defect, it is characterized in that, comprise the following steps:
Step 1, obtain training dataset, and set up the software defect forecast model based on svm classifier device;
Step 2, utilize genetic algorithm to find the optimum measurement metric attribute set of training dataset and the parameters C of svm classifier device, the optimum value of σ simultaneously; Wherein, optimum measurement metric attribute set refers to the attribute that can independently represent training dataset corresponding module; The optimum value of parameters C, σ refers to can determine that group parameters C of svm classifier device optimal classification lineoid function, the value of σ;
The optimum measurement metric attribute set that step 3, basis obtain and the parameters C of svm classifier device, the optimum value of σ, obtain the optimum software bug prediction model based on svm classifier device;
The optimum software bug prediction model that step 4, basis obtain is carried out failure prediction to software under testing.
2. a kind of method for forecasting software defect as claimed in claim 1, is characterized in that, the software defect forecast model wherein obtaining in step 1 based on svm classifier device adopts following method:
The measurement metric attribute that the 1.1 pairs of training datas are concentrated and parameters C, the σ of svm classifier device carry out binary coding, generate initial individually, obtain initial population;
1.2 pairs of initial individualities are decoded, and obtain measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ; According to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value;
1.3 calculate individual fitness evaluation value according to fitness function and failure prediction accuracy rate, and judge whether individual fitness evaluation value meets end condition, be to export the attribute set of this individuality correspondence and corresponding svm classifier device parameters C, the value of σ, obtain the optimum software bug prediction model based on svm classifier device; Otherwise, carry out genetic manipulation, constantly update the individual new population that obtains, until meet end condition.
3. a kind of method for forecasting software defect as claimed in claim 2, is characterized in that, above-mentioned fitness function is: F=100R – m|N – n|;
Wherein, R is the failure prediction accuracy rate of svm classifier device, and N is the number of current attribute set vacuum metrics meta-attribute, and n is constant; M is weight coefficient, pursues compared with high-class accuracy rate and reduces computing cost weight between the two for adjusting, and m is larger, and expression is more responsive to computing cost, and the less expression of m is more insensitive to computing cost, but more focuses on the accuracy rate of failure prediction.
4. a kind of method for forecasting software defect as claimed in claim 2 or claim 3, is characterized in that, above-mentioned end condition is set as individual fitness evaluation value and has reached preset value, or genetic manipulation has reached default maximum genetic algebra.
5. a kind of method for forecasting software defect as claimed in claim 2 or claim 3, is characterized in that, above-mentioned parameters C, σ to the concentrated measurement metric property set of training data and svm classifier device carries out binary coding and adopt following methods:
Setting initial individual chromosome coding total length is n+x+y position;
The measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute;
The corresponding parameters C of middle x position code length, wherein, the span of C is (0,1000);
The corresponding parameter σ of y position code length of afterbody, its span is (0,10).
6. a kind of method for forecasting software defect as described in claim 1 or 2 or 3, it is characterized in that, in step 4, according to the optimum prediction model obtaining, software under testing is predicted and adopted following methods: attribute data corresponding in software to be predicted is input in forecast model and judges whether attribute data meets the condition that comprises defect, be on the Output rusults of svm classifier device, to make a kind of mark, otherwise on the Output rusults of svm classifier device, make another kind of mark.
7. for a system for software defect prediction, it is characterized in that, comprising: forecast model is set up unit, optimizing unit and failure prediction unit;
Forecast model is set up unit, for obtaining training dataset, and sets up the software defect forecast model based on svm classifier device;
Optimizing unit, for utilize genetic algorithm to find the optimum measurement metric attribute set of training dataset and the parameters C of svm classifier device, the optimum value of σ simultaneously; According to the parameters C of the optimum measurement metric attribute set obtaining and svm classifier device, the optimum value of σ, obtain the optimum software bug prediction model based on svm classifier device; Wherein, optimum measurement metric attribute set refers to those attributes that can independently represent training dataset corresponding module; The optimum value of parameters C, σ refers to can determine that group parameters C of svm classifier device optimal classification lineoid function, the value of σ;
Failure prediction unit, for carrying out failure prediction according to the optimum software bug prediction model obtaining to software under testing.
8. a kind of system for software defect prediction as claimed in claim 7, is characterized in that, described optimizing unit comprises: coding unit, svm classifier device training unit and fitness evaluation unit;
Coding unit, for parameters C, the σ of the concentrated measurement metric property set of training data and svm classifier device are carried out to binary coding, generates initial individuality, obtains initial population;
Svm classifier device training unit, for initial individuality is decoded, obtains measurement metric attribute set corresponding to binary coding and the parameters C of svm classifier device, the value of σ; According to the value of the measurement metric attribute set obtaining and parameters C, σ, svm classifier device is trained, obtain the failure prediction accuracy rate of the svm classifier device under measurement metric attribute set and parameters C, σ value;
Fitness evaluation unit, for calculating individual fitness evaluation value according to fitness function and failure prediction accuracy rate, and judge whether individual fitness evaluation value meets end condition, be to export the attribute set of this individuality correspondence and corresponding svm classifier device parameters C, the value of σ, obtain the optimum software bug prediction model based on svm classifier device; Otherwise, carry out genetic manipulation, constantly update the individual new population that obtains, until meet end condition.
9. a kind of system for software defect prediction as claimed in claim 8, is characterized in that, wherein the fitness function in fitness evaluation unit is:
F=100R–m|N–n|;
Wherein, R is the failure prediction accuracy rate of svm classifier device, and N is the number of current attribute set vacuum metrics meta-attribute, and n is constant; M is weight coefficient, pursues compared with high-class accuracy rate and reduces computing cost weight between the two for adjusting, and m is larger, and expression is more responsive to computing cost, and the less expression of m is more insensitive to computing cost, but more focuses on the accuracy rate of failure prediction.
10. a kind of system for software defect prediction as claimed in claim 8 or 9, is characterized in that, described end condition is set as individual fitness evaluation value and has reached preset value, or genetic manipulation has reached default maximum genetic algebra.
11. a kind of systems for software defect prediction as claimed in claim 8 or 9, is characterized in that, wherein parameters C, the σ of the concentrated measurement metric property set of training data and svm classifier device are carried out to binary coding and adopt following method:
If initial individual chromosome coding total length is n+x+y position;
The measurement metric attribute that code length every corresponding training data in front n position is concentrated; Wherein, the value of each has two kinds of situations: 1 or 0; If certain value is 1, represent to select this measurement metric attribute, if certain value is 0, represent not select this measurement metric attribute;
The corresponding parameters C of middle x position code length, wherein, the span of C is (0,1000);
The corresponding parameter σ of y position code length of afterbody, its span is (0,10).
12. a kind of systems for software defect prediction as claimed in claim 8 or 9, it is characterized in that, wherein failure prediction unit is concentrated corresponding attribute data to be input in optimum software bug prediction model software data to be predicted and is judged whether attribute data meets the condition that comprises defect; Be on the Output rusults of svm classifier device, to make a kind of mark, otherwise on the Output rusults of svm classifier device, make another kind of mark.
CN201410056820.2A 2014-02-19 2014-02-19 Method and system for predicting software defects Pending CN103810102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410056820.2A CN103810102A (en) 2014-02-19 2014-02-19 Method and system for predicting software defects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410056820.2A CN103810102A (en) 2014-02-19 2014-02-19 Method and system for predicting software defects

Publications (1)

Publication Number Publication Date
CN103810102A true CN103810102A (en) 2014-05-21

Family

ID=50706898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410056820.2A Pending CN103810102A (en) 2014-02-19 2014-02-19 Method and system for predicting software defects

Country Status (1)

Country Link
CN (1) CN103810102A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899135A (en) * 2015-05-14 2015-09-09 工业和信息化部电子第五研究所 Software defect prediction method and system
CN105205002A (en) * 2015-10-28 2015-12-30 北京理工大学 Modeling method of software safety defect discovering model based on test workload
CN105608004A (en) * 2015-12-17 2016-05-25 云南大学 CS-ANN-based software failure prediction method
CN105808435A (en) * 2016-03-08 2016-07-27 北京理工大学 Construction method of software defect evaluation model on the basis of complex network
CN106022374A (en) * 2016-05-18 2016-10-12 中国农业银行股份有限公司 Method and device for classifying historical process data
CN106919373A (en) * 2015-12-28 2017-07-04 北京计算机技术及应用研究所 A kind of program code method for evaluating quality
CN107025503A (en) * 2017-04-18 2017-08-08 武汉大学 Across company software failure prediction method based on transfer learning and defects count information
CN107133179A (en) * 2017-06-06 2017-09-05 中国电力科学研究院 A kind of website failure prediction method based on Bayesian network and its realize system
CN107957946A (en) * 2017-12-01 2018-04-24 北京理工大学 Software Defects Predict Methods based on neighborhood insertion protection algorism support vector machines
CN108763096A (en) * 2018-06-06 2018-11-06 北京理工大学 Software Defects Predict Methods based on depth belief network algorithm support vector machines
CN109325543A (en) * 2018-10-10 2019-02-12 南京邮电大学 Software Defects Predict Methods, readable storage medium storing program for executing and terminal
CN109933538A (en) * 2019-04-02 2019-06-25 广东石油化工学院 A kind of real-time bug prediction model enhancing frame towards cost perception
CN109947588A (en) * 2019-02-22 2019-06-28 哈尔滨工业大学 A kind of NAND Flash bit error rate prediction technique based on support vector regression method
CN110147321A (en) * 2019-04-19 2019-08-20 北京航空航天大学 A kind of recognition methods of the defect high risk module based on software network
CN111400180A (en) * 2020-03-13 2020-07-10 上海海事大学 Software defect prediction method based on feature set division and ensemble learning
CN112463643A (en) * 2020-12-16 2021-03-09 郑州航空工业管理学院 Software quality prediction method
CN112905468A (en) * 2021-02-20 2021-06-04 华南理工大学 Ensemble learning-based software defect prediction method, storage medium and computing device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556553A (en) * 2009-03-27 2009-10-14 中国科学院软件研究所 Defect prediction method and system based on requirement change
CN102063550A (en) * 2011-01-07 2011-05-18 浙江工业大学 Intelligent design system of cold extrusion piece with machine intelligence involved design decision

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556553A (en) * 2009-03-27 2009-10-14 中国科学院软件研究所 Defect prediction method and system based on requirement change
CN102063550A (en) * 2011-01-07 2011-05-18 浙江工业大学 Intelligent design system of cold extrusion piece with machine intelligence involved design decision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姜慧研等: "基于ACO-SVM的软件缺陷预测模型的研究", 《计算机学报》 *
崔正斌等: "遗传优化支持向量机的软件可靠性预测模型", 《计算机工程与应用》 *
王培等: "遗传优化支持向量机在软件缺陷预测中的应用", 《电子测量技术》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899135A (en) * 2015-05-14 2015-09-09 工业和信息化部电子第五研究所 Software defect prediction method and system
CN104899135B (en) * 2015-05-14 2017-10-20 工业和信息化部电子第五研究所 Software Defects Predict Methods and system
CN105205002B (en) * 2015-10-28 2017-09-29 北京理工大学 A kind of software safety defect based on test job amount finds the modeling method of model
CN105205002A (en) * 2015-10-28 2015-12-30 北京理工大学 Modeling method of software safety defect discovering model based on test workload
CN105608004A (en) * 2015-12-17 2016-05-25 云南大学 CS-ANN-based software failure prediction method
CN106919373A (en) * 2015-12-28 2017-07-04 北京计算机技术及应用研究所 A kind of program code method for evaluating quality
CN105808435A (en) * 2016-03-08 2016-07-27 北京理工大学 Construction method of software defect evaluation model on the basis of complex network
CN106022374A (en) * 2016-05-18 2016-10-12 中国农业银行股份有限公司 Method and device for classifying historical process data
CN106022374B (en) * 2016-05-18 2019-07-09 中国农业银行股份有限公司 The method and device that a kind of pair of history flow data is classified
CN107025503A (en) * 2017-04-18 2017-08-08 武汉大学 Across company software failure prediction method based on transfer learning and defects count information
CN107133179A (en) * 2017-06-06 2017-09-05 中国电力科学研究院 A kind of website failure prediction method based on Bayesian network and its realize system
CN107133179B (en) * 2017-06-06 2020-12-08 中国电力科学研究院 Website defect prediction method based on Bayesian network and implementation system thereof
CN107957946A (en) * 2017-12-01 2018-04-24 北京理工大学 Software Defects Predict Methods based on neighborhood insertion protection algorism support vector machines
CN107957946B (en) * 2017-12-01 2020-10-20 北京理工大学 Software defect prediction method based on neighborhood embedding protection algorithm support vector machine
CN108763096A (en) * 2018-06-06 2018-11-06 北京理工大学 Software Defects Predict Methods based on depth belief network algorithm support vector machines
CN109325543A (en) * 2018-10-10 2019-02-12 南京邮电大学 Software Defects Predict Methods, readable storage medium storing program for executing and terminal
CN109947588A (en) * 2019-02-22 2019-06-28 哈尔滨工业大学 A kind of NAND Flash bit error rate prediction technique based on support vector regression method
CN109947588B (en) * 2019-02-22 2021-01-12 哈尔滨工业大学 NAND Flash bit error rate prediction method based on support vector regression method
CN109933538B (en) * 2019-04-02 2020-04-28 广东石油化工学院 Cost perception-oriented real-time defect prediction model enhancement method
CN109933538A (en) * 2019-04-02 2019-06-25 广东石油化工学院 A kind of real-time bug prediction model enhancing frame towards cost perception
CN110147321A (en) * 2019-04-19 2019-08-20 北京航空航天大学 A kind of recognition methods of the defect high risk module based on software network
CN111400180A (en) * 2020-03-13 2020-07-10 上海海事大学 Software defect prediction method based on feature set division and ensemble learning
CN111400180B (en) * 2020-03-13 2023-03-10 上海海事大学 Software defect prediction method based on feature set division and ensemble learning
CN112463643A (en) * 2020-12-16 2021-03-09 郑州航空工业管理学院 Software quality prediction method
CN112905468A (en) * 2021-02-20 2021-06-04 华南理工大学 Ensemble learning-based software defect prediction method, storage medium and computing device

Similar Documents

Publication Publication Date Title
CN103810102A (en) Method and system for predicting software defects
CN104361414B (en) Power transmission line icing prediction method based on correlation vector machine
CN108520272A (en) A kind of semi-supervised intrusion detection method improving blue wolf algorithm
CN108446741B (en) Method, system and storage medium for evaluating importance of machine learning hyper-parameter
EP2505827A2 (en) Wind power prediction method of single wind turbine generator
Wang et al. A restart univariate estimation of distribution algorithm: sampling under mixed Gaussian and Lévy probability distribution
CN103105246A (en) Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm
CN108090510A (en) A kind of integrated learning approach and device based on interval optimization
CN108053077A (en) A kind of short-term wind speed forecasting method and system based on two type T-S fuzzy models of section
Kaneriya et al. A range-based approach for long-term forecast of weather using probabilistic markov model
Ueno et al. Computerized adaptive testing based on decision tree
CN109726764A (en) A kind of model selection method, device, equipment and medium
CN114925938B (en) Electric energy meter running state prediction method and device based on self-adaptive SVM model
CN111178537A (en) Feature extraction model training method and device
Petelin et al. Evolving Gaussian process models for predicting chaotic time-series
CN114047452A (en) Method and device for determining cycle life of battery
Couckuyt et al. Towards efficient multiobjective optimization: multiobjective statistical criterions
CN105372989B (en) A kind of estimation method and device of Dispatching Control System deficiency of data parameter
CN110795736B (en) Malicious android software detection method based on SVM decision tree
JP2015038709A (en) Model parameter estimation method, device, and program
JP2020086778A (en) Machine learning model construction device and machine learning model construction method
JP6233432B2 (en) Method and apparatus for selecting mixed model
CN113762591A (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy
Song et al. A dynamic ensemble framework for mining textual streams with class imbalance
Yang et al. Multivariate statistic methods for predicting electricity consumption of Beijing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140521