CN103810101B - A kind of Software Defects Predict Methods and software defect forecasting system - Google Patents

A kind of Software Defects Predict Methods and software defect forecasting system Download PDF

Info

Publication number
CN103810101B
CN103810101B CN201410056779.9A CN201410056779A CN103810101B CN 103810101 B CN103810101 B CN 103810101B CN 201410056779 A CN201410056779 A CN 201410056779A CN 103810101 B CN103810101 B CN 103810101B
Authority
CN
China
Prior art keywords
parameter
svm classifier
software
training
sample point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410056779.9A
Other languages
Chinese (zh)
Other versions
CN103810101A (en
Inventor
胡昌振
单纯
陈博洋
马锐
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201410056779.9A priority Critical patent/CN103810101B/en
Publication of CN103810101A publication Critical patent/CN103810101A/en
Application granted granted Critical
Publication of CN103810101B publication Critical patent/CN103810101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of Software Defects Predict Methods and software defect forecasting system, to solve the problems, such as that existing software defect precision of prediction is not high.It include: dimension-reduction treatment unit, SVM training unit and failure prediction unit;Wherein Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space is obtained, the second training dataset being made of each low-dimensional vector is obtained;Step 2: being trained according to second training dataset to support vector machines classifier, the optimal separating hyper plane function of SVM classifier is obtained, and then obtain trained SVM classifier;Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction.

Description

A kind of Software Defects Predict Methods and software defect forecasting system
Technical field
The present invention relates to software security field, in particular to a kind of Software Defects Predict Methods and software defect prediction system System.
Background technique
Software defect Predicting Technique is born in the 1970s, main function is embodied in the guidance to Quality Assurance And high value reference is provided for balancing software cost.Software defect prediction is broadly divided into dynamic prediction and static prediction, at present In terms of main research concentrates on static prediction, the invention belongs to the forecast of distribution technologies in static prediction.Support vector machines The new engineering of one kind that (Support Vector Machine, abbreviation SVM) grows up on the basis of Statistical Learning Theory Learning method has in solution small sample, the identification of non-linear and high dimensional pattern there are many unique advantage, and existing software defect is pre- It surveys and is mainly predicted to establish prediction model to software defect using being support vector machines this tools.It is lacked with software Falling into the relevant patent of prediction mainly has: the failure prediction method and system (publication number CN200910080742) based on demand change And the software defect priority prediction method (publication number CN201210057888) based on improved support vector machines.
The thinking of the prior art includes two parts, the dimensionality reduction to data set and the optimizing to support vector machines parameter, needle To both of these problems, the prior art proposes different solutions, and achieves certain achievement, but the selected drop of the prior art Dimension method has certain limitation, and the result after dimensionality reduction cannot be guaranteed the integrality of initial data, nor intrinsic dimension Preferably embody, and software defect Predicting Technique itself is the operation to data set, the guarantee of data integrity is to guarantee prediction knot The accuracy of fruit has critically important meaning.
Summary of the invention
The present invention provides a kind of Software Defects Predict Methods and software defect forecasting system, to solve existing software The not high problem of failure prediction precision.
A kind of Software Defects Predict Methods, the following steps are included:
Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the first instruction is obtained Practice each sample point in data set and be mapped to the low-dimensional vector in lower dimensional space, obtains the be made of each low-dimensional vector second training Data set;
Step 2: being trained according to second training dataset to support vector machines classifier, SVM points are obtained The optimal separating hyper plane function of class device, and then obtain trained SVM classifier;
Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction.
The second training dataset being made of each low-dimensional vector is wherein obtained in step 1 using following methods:
1.1 set the first training dataset as { X1,X2,...,XN},Xi∈RD, wherein XiIt is the vector for belonging to D dimension space;
1.2, which calculate the first training data, concentrates each sample point XiK Neighbor Points;
1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point;
Formula 1
Wherein, N is sample point quantity, wijRepresent i-th of sample point XiThe coefficient indicated using j-th of Neighbor Points;First Training data concentrates all sample point XiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample according to formula 2 The corresponding low-dimensional vector of this point;
Formula 2
Wherein, I is unit matrix, M=(I-W)T(I-W)。
Trained SVM classifier is wherein obtained described in step 2 using following methods:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change Amount, φ (x) is the kernel function that SVM classifier uses.
Above-mentioned kernel function is Radial basis kernel function, form are as follows:
Formula 4
Wherein, σ is the width parameter of Radial basis kernel function.
In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, intersected using trellis search method and ten foldings Verification method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that To the value of parameter C and σ, to determine the optimal separating hyper plane function of SVM classifier.
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel function of SVM classifier It includes: to carry out value to parameter C and σ using trellis search method that parameter σ, which carries out optimizing,;It obtains all in the value interval of C All groups of all values composition are merged into capable search in value and σ value interval.
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel function of SVM classifier It includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ that parameter σ, which carries out optimizing, using ten Folding cross method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value;Wherein, using ten Folding cross method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset is done Training set obtains 1 classification accuracy under certain selected group parameter, is so repeated 10 times;It obtains under this group of parameter 10 classification accuracies, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, than The relatively average of the classification accuracy of every group of selected parameter, by average highest that group of parameter C, σ as optimal parameter Value.
Software defect prediction is wherein carried out according to optimal separating hyper plane function and uses following methods:
Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge;If described defeated The data entered fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the data are corresponding Software module does not include defect and is marked in the output result of SVM classifier;If the data of the input fall into described When in the defective space that optimal separating hyper plane function determines, it is determined that the corresponding software module of the data includes defect And it is marked in the output result of SVM classifier.
A kind of software defect forecasting system, comprising: dimension-reduction treatment unit, SVM training unit and failure prediction unit;
Dimension-reduction treatment unit, for carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, The low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space is obtained, obtains being made of each low-dimensional vector The second training dataset;
SVM training unit is obtained for being trained according to the second training dataset to support vector machines classifier The optimal separating hyper plane function of SVM classifier, and then obtain trained SVM classifier;
Failure prediction unit carries out failure prediction for treating forecasting software according to trained SVM classifier.
Beneficial effects of the present invention:
Software Defects Predict Methods provided by the invention and software defect forecasting system, firstly, using being locally linear embedding into Algorithm carries out dimension-reduction treatment to training dataset, and the geometry of sample point is constant in data set after guarantee dimensionality reduction, so that dimensionality reduction Data afterwards can more completely reflect the various features of raw data set, secondly, finding the ginseng of SVM according to grid-search algorithms Number C and the parameter σ of kernel function carry out optimizing, make that highest group of svm classifier accuracy rate with putting the palms together before one to roll over cross validation method and find C, the value of σ is determined as optimized parameter, and the optimal separating hyper plane function of SVM is determined according to the optimized parameter, utilizes most optimal sorting Class hyperplane function carries out software defect prediction and achievees the purpose that improve software defect predictablity rate.
Detailed description of the invention
Fig. 1 is a kind of block diagram of Software Defects Predict Methods provided by one embodiment of the present invention;
Fig. 2 is a kind of flow chart for Software Defects Predict Methods that another embodiment of the invention provides;
Fig. 3 is a kind of block diagram for software defect forecasting system that another embodiment of the invention provides.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Technical concept of the invention is the limitation for existing dimension reduction method, i.e. result after dimensionality reduction does not ensure that The integrality of data, nor the preferably embodiment of intrinsic dimension.The embodiment of the present invention uses and is locally linear embedding into (locally Linear embedding, abbreviation LLE) algorithm carry out software defect data set dimensionality reduction, the thought of the algorithm is from sample The space structure of data sets out, and can guarantee that the geometry of data sample after dimensionality reduction is constant, enable the data after dimensionality reduction more Fully reflect that the various features of raw data set, software defect Predicting Technique itself are the operations to data set, more adds The feature of whole embodiment initial data is extremely important to the accuracy for improving prediction result.
One embodiment of the invention provides a kind of Software Defects Predict Methods.Fig. 1 is that one embodiment of the invention provides A kind of Software Defects Predict Methods block diagram, referring to Fig. 1, this method comprises:
Step S100: dimension-reduction treatment is carried out to the first training dataset according to Local Liner Prediction LLE, obtains first The low-dimensional vector that training data concentrates each sample point to be mapped in lower dimensional space obtains the be made of each low-dimensional vector second instruction Practice data set;
Step S110: support vector machines classifier is trained according to the second training dataset, obtains svm classifier The optimal separating hyper plane function of device, and then obtain trained SVM classifier;
Step S120: forecasting software is treated according to optimal separating hyper plane function and carries out failure prediction.
In the present embodiment, dimension-reduction treatment is carried out to the first training dataset according to Local Liner Prediction LLE, obtained The low-dimensional vector that first training data concentrates each sample point to be mapped in lower dimensional space obtains be made of each low-dimensional vector Two training datasets include:
If the first training dataset is { X1,X2,...,XN},Xi∈RDWherein, XiIt is the vector for belonging to D dimension space;
It calculates the first training data and concentrates each sample point XiK Neighbor Points;
Partial reconstruction weight matrix W is calculated according to formula 1 using K Neighbor Points of each sample point;
Formula 1
Wherein, N is sample point quantity, wijRepresent i-th of sample point XiThe coefficient indicated using j-th of Neighbor Points;First Training data concentrates all sample point XiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
Pair of each sample point is calculated according to obtained partial reconstruction weight matrix W and its Neighbor Points and according to formula 2 The low-dimensional vector answered;
Formula 2
Wherein, I is unit matrix, M=(I-W)T(I-W)。
In the present embodiment, support vector machines classifier is trained according to the second training dataset, obtains SVM The optimal separating hyper plane function of classifier includes:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change Amount, φ (x) is the kernel function that SVM classifier uses.
In the present embodiment, kernel function is Radial basis kernel function, form are as follows:
Formula 4
Wherein, σ is the width parameter of Radial basis kernel function.
Fig. 2 is a kind of flow chart of the method for software defect prediction that another embodiment of the invention provides;Referring to fig. 2, Specifically, the embodiment of the present invention can be specifically divided into three parts, and first part carries out dimension-reduction treatment to training dataset: this A part includes step S200 and S210: second part includes step S220;Part III then includes step S230.
Step S200: the first training dataset used when software defect prediction is obtained;
Step S210: dimensionality reduction is carried out to the first training dataset using LLE algorithm;The data set used in the present embodiment for Software defect predicts widely used NASA MDP software defect data set in area research, under which can pass through from network It carries and obtains.The data set includes 13 Sub Data Sets, and each Sub Data Set has recorded each mould in the actual software project of NASA The metric attribute and marker bit of block, wherein marker bit represents whether the module has defect.It is right after obtaining the first training dataset Data set carries out dimension-reduction treatment.Specifically, dimensionality reduction step can be divided into:
1) the first training dataset is set as { X1,X2,...,XN},Xi∈RD, wherein R represents space, and D represents dimension.
2) the distance between each sample point and other sample points, calculation formula d are determinedij=| | Xi-Xj| |, it calculates After the distance between each sample point and other sample points, selectes and be wherein used as Neighbor Points apart from shortest K;
3) by sample point XiNeighbor Points calculate partial reconstruction weight matrix W, keep the reconstruction error of sample point minimum, i.e., Solve optimization problem:
Formula 1
Wherein, N is sample point quantity, wijRepresent the coefficient that i-th of sample point uses j-th of Neighbor Points to indicate, wijIt is also One weight represents contribution of j-th of Neighbor Points to i-th of sample point.It is specific that dimensionality reduction is carried out to data set using LLE algorithm For: it is that the k nearest neighbor point of each sample point sample point concentrated to data indicates the sample point.In this way, each sample This point has k nearest neighbor point to indicate that K coefficient of the sample point, single Neighbor Points indicate the sample when with Neighbor Points to indicate When point, coefficient is a specific numerical value, and K coefficient of each sample point constitutes a coefficient vector;Own in data set The coefficient vector of sample point just constitutes a weight matrix W.
4) partial reconstruction weight matrix W obtained in the previous step is then fixed, according to target function solves each sample point Xi Corresponding low-dimensional vector Yi, objective function are as follows:
Formula 2
Wherein, I is a unit matrix, M=(I-W)T(I-W), the 2nd to the d+1 feature vector of final M is exactly to export As a result.Here, d represents the dimension after carrying out dimensionality reduction to sample point, and final output is the result is that d low-dimensional vector.
By above-mentioned 4 steps, obtains the first training data and each sample point is concentrated to be mapped to the low-dimensional in lower dimensional space Then vector is trained SVM classifier with the second training dataset that these low-dimensional vectors form.
Second part is trained SVM classifier using the data set after dimensionality reduction:
Step S220: the data set after dimensionality reduction is input in SVM classifier, is intersected in conjunction with trellis search method and ten foldings Verification method is trained to parameter optimization, and to SVM classifier.
Wherein, support vector machines classifier is trained according to the second training dataset, obtains SVM classifier Optimal separating hyper plane function specifically includes following process:
SVM classifier is trained using the second training dataset after dimensionality reduction, trained process is to solve for SVM's Optimal separating hyper plane.
The problem of being specifically trained to SVM can be exchanged into one the problem of seeking convex quadratic programming:
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change Amount, φ (x) are kernel function selected to use.Penalty factor has determined multiple view outlier bring loss, it is clear that when all The slack variable of outlier and a timing, fixed C is bigger, also bigger to the loss of objective function, would imply that you are non-at this time Often it is unwilling to abandon these outliers, most extreme situation is that C is set to infinity by you, as long as slightly a point peels off in this way, The value of objective function is immediately turned to infinity, and problem is allowed to become no solution at once, this has just been degenerated to hard interval problem.Slack variable ξiValue actually indicated corresponding point peel off on earth it is how far, be worth it is bigger, put it is remoter.Acting through for kernel function will The data of lower dimensional space are mapped to higher dimensional space, so that linearly inseparable be made to be converted to linear separability.
Since Radial basis kernel function has wider convergence range, made in the present embodiment using Radial basis kernel function For the kernel function of SVM classifier.The form of kernel function are as follows:
Formula 4
Lagrange multiplier is introduced, aforementioned quadratic programming problem is solved using standard Lagrange duality principle abbreviation, obtains To a symbol discriminant function:
Formula 5
Determination for the parameter σ in SVM in penalty coefficient C and Radial basis kernel function, in the present embodiment, using grid Searching method matches parameter C and kernel functional parameter σ the progress optimizing puted the palms together before one and roll over cross validation method to SVM classifier, and finding makes That highest value to parameter C and σ of svm classifier accuracy rate, to determine the optimal separating hyper plane function of SVM classifier.
Specifically, in the present embodiment, the value of optimal parameter C and σ are determined using trellis search method;Allow this two A parameter is in previously given range grid division and traverses all grids progress values, wherein the value interval of C is set as [2-10,27], σ value interval is set as [2-10,23], the step-length of two parameters is all 0.1, obtain value all in the value interval of C with All groups of all values composition are merged into capable search in σ value interval.
In the present embodiment, the classification accuracy under this group of parameter value is obtained to every group of selected parameter C, σ, used Ten folding cross methods are verified, and taking makes highest that group of parameter C, σ of classification accuracy as optimal parameter value;Wherein, The realization process verified using ten folding cross methods are as follows: the second data set is divided into 10 subsets, 1 subset is tested Collection, remaining 9 subset do training set, obtain 1 classification accuracy under certain selected group parameter, are so repeated 10 times;It obtains 10 classification accuracies under this group of parameter, using the average of this 10 classification accuracies as each group of parameter superiority and inferiority of evaluation Index, then, the average of the classification accuracy of every group of relatively selected parameter, by average highest that group of parameter C, σ As optimal parameter value.
After the value for finding optimal parameter C, σ, the optimal separating hyper plane function of SVM classifier is determined, and then obtain Trained SVM classifier.
Part III: failure prediction is carried out to software under testing using trained SVM classifier.
Step S230: software defect prediction is carried out using trained SVM classifier;
Specifically, in the present embodiment, the data set for treating forecasting software first carries out dimension-reduction treatment using LLE algorithm; If the data of input fall into when not having in defective space of optimal separating hyper plane function determination, it is determined that the data are corresponding Software module does not include defect and is marked in the output result of SVM classifier;If the data of input fall into optimal classification When in the defective space that hyperplane function determines, it is determined that the corresponding software module of data is comprising defect and in svm classifier It is marked in the output result of device.
In the present embodiment, it when being shown in the output result of SVM classifier, is used if software module has defect Alphabetical Y be marked for.It is marked if software module does not have defect with letter N.
Software Defects Predict Methods provided in an embodiment of the present invention are using Local Liner Prediction to training data as a result, Collection carries out dimension-reduction treatment, and the geometry of sample point is constant in data set after guarantee dimensionality reduction, enables the data after dimensionality reduction completeer Reflect the various features of raw data set entirely.
Another embodiment of the invention additionally provides a kind of system of software defect prediction, and Fig. 3 is another reality of the invention A kind of block diagram of software defect forecasting system of example offer is provided.Referring to Fig. 3, the system 300 include: dimension-reduction treatment unit 310, SVM training unit 320 and failure prediction unit 330;
Dimension-reduction treatment unit 310, for being carried out at dimensionality reduction according to Local Liner Prediction LLE to the first training dataset Reason, obtains the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space, obtains by each low-dimensional vector Second training dataset of composition;
SVM training unit 320 is obtained for being trained according to the second training dataset to support vector machines classifier To the optimal separating hyper plane function of SVM classifier, and then obtain trained SVM classifier;
Failure prediction unit 330 carries out failure prediction for treating forecasting software according to trained SVM classifier.
In one embodiment of the invention, the first training dataset is dropped according to Local Liner Prediction LLE Dimension processing, obtains the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space, obtains by each low-dimensional Vector composition the second training dataset include:
If the first training dataset is { X1,X2,...,XN},Xi∈RDWherein, XiIt is the vector for belonging to D dimension space;
It calculates the first training data and concentrates each sample point XiK Neighbor Points;
Partial reconstruction weight matrix W is calculated according to formula 1 using K Neighbor Points of each sample point;
Formula 1
Wherein, N is sample point quantity, wijRepresent i-th of sample point XiThe coefficient indicated using j-th of Neighbor Points, first Training data concentrates all sample point XiThe partial reconstruction weight square of all sample points is constituted using the coefficient that Neighbor Points indicate Battle array W;
Pair of each sample point is calculated according to obtained partial reconstruction weight matrix W and its Neighbor Points and according to formula 2 The low-dimensional vector answered;
Formula 2
Wherein, I is unit matrix, M=(I-W)T(I-W)。
It is to be carried out according to the second training dataset to support vector machines classifier in embodiment at of the invention one Training, the optimal separating hyper plane function for obtaining SVM classifier include:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change Amount, φ (x) is the kernel function that SVM classifier uses.
In one embodiment of the invention, kernel function is Radial basis kernel function, form are as follows:
Formula 4
Wherein, σ is the width parameter of Radial basis kernel function.
In one embodiment of the invention, SVM training unit is also used to intersect using trellis search method with folding of putting the palms together before one Verification method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that To the value of parameter C and σ, to determine the optimal separating hyper plane function of SVM classifier.
In one embodiment of the invention, cross validation method is rolled over to svm classifier with putting the palms together before one using trellis search method The parameter C and kernel functional parameter σ of device carry out optimizing
Value is carried out to the parameter C and σ using trellis search method;Wherein, the value interval of C is set as [2-10,27], σ Value interval is set as [2-10,23], the step-length of two parameter is all 0.1, obtains value and σ value area all in the value interval by C All groups of interior all value compositions are merged into capable search.
In one embodiment of the invention, cross validation method is rolled over to svm classifier with putting the palms together before one using trellis search method The parameter C and kernel functional parameter σ of device carry out optimizing further include:
The classification accuracy under this group of parameter value is obtained to every group of selected parameter C, σ, using ten folding cross methods into Row verifying, taking makes highest that group of parameter of classification accuracy as optimal parameter value, wherein described to use ten folding intersection sides Method carries out that verifying refers to second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training Collection obtains 1 classification accuracy under certain selected group parameter, is so repeated 10 times;Obtain 10 points under this group of parameter Class accuracy rate, it is then, relatively more selected using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation Every group of parameter classification accuracy average, by average highest that group of parameter C, σ as optimal parameter value.
In one embodiment of the invention, carrying out software defect prediction according to trained SVM classifier includes:
The data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Data set after dimensionality reduction is input in trained SVM classifier and is judged;If the data of input are fallen into Not when not having in defective space of optimal separating hyper plane function determination, it is determined that the corresponding software module of the data does not include scarce It falls into and is marked in the output result of SVM classifier;If the data of input fall into what optimal separating hyper plane function determined When in defective space, it is determined that the corresponding software module of data include defect and in the output result of SVM classifier into Line flag.
It is emphasized that this software defect forecasting system provided in an embodiment of the present invention carries out software defect prediction Process may be summarized to be the process of prediction model of the building based on LLE algorithm and SVM classifier.The prediction model building process It include mainly two modules, first is dimension-reduction treatment, and second is failure prediction.Wherein, SVM classifier is used in dimension-reduction treatment Training set need to carry out dimension-reduction treatment, meanwhile, in practical applications, the test data set of software under testing is similarly used Then LLE dimension-reduction treatment carries out specific pre- according to the data set after dimensionality reduction and the SVM optimal separating hyper plane function acquired It surveys.Data set after can guaranteeing dimensionality reduction in this way can more comprehensively embody the data characteristics of initial data, to improve soft The accuracy rate of part failure prediction.
Software defect forecasting system provided in an embodiment of the present invention is opposite with the Software Defects Predict Methods of foregoing description It answers, specific use process is not repeating herein referring to the related content in preceding method embodiment.
In conclusion this Software Defects Predict Methods provided in an embodiment of the present invention and software defect forecasting system, are adopted Dimension-reduction treatment is carried out to training dataset with Local Liner Prediction, the data after dimensionality reduction is enabled more completely to reflect original The various features of beginning data set, and according to the optimal separating hyper plane function of SVM, it is carried out using optimal separating hyper plane function soft Part failure prediction, to achieve the purpose that improve software defect predictablity rate.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (6)

1. a kind of Software Defects Predict Methods, which comprises the following steps:
Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the first training number is obtained According to concentrating each sample point to be mapped to the low-dimensional vector in lower dimensional space, the second training data being made of each low-dimensional vector is obtained Collection, wherein first training dataset is NASA MDP software defect data set;
Wherein, the preparation method of the second training dataset is as follows:
1.1 set the first training dataset as { x1,x2,...,xN},xi∈RD, wherein xiIt is the vector for belonging to D dimension space;
1.2, which calculate the first training data, concentrates each sample point xiK Neighbor Points;
1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point;
Wherein, N is sample point quantity, wijRepresent i-th of sample point xiThe coefficient indicated using j-th of Neighbor Points;First training All sample point x in data setiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample point according to formula 2 Corresponding low-dimensional vector;
Wherein, I is unit matrix, M=(I-W)T(I-W);
Step 2: being trained according to second training dataset to support vector machines classifier, SVM classifier is obtained Optimal separating hyper plane function, and then obtain trained SVM classifier;
Wherein, the preparation method of trained SVM classifier is as follows:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is slack variable, φ (x) it is kernel function that SVM classifier uses;
Wherein, the kernel function is Radial basis kernel function, form are as follows:
Wherein, σ is the width parameter of Radial basis kernel function;
Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction;
In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, using trellis search method and ten folding cross validations Method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that ginseng The value of number C and σ, to determine the optimal separating hyper plane function of SVM classifier;
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel functional parameter σ of SVM classifier Carrying out optimizing includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ, is intersected using ten foldings Method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value;Wherein, intersected using ten foldings Method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training set, 1 classification accuracy under certain selected group parameter is obtained, is so repeated 10 times;Obtain 10 classification under this group of parameter Accuracy rate, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, relatively more selected The average of the classification accuracy of every group of parameter, by average highest that group of parameter C, σ as optimal parameter value.
2. a kind of Software Defects Predict Methods as described in claim 1, which is characterized in that above-mentioned uses trellis search method Carrying out optimizing with parameter C and kernel functional parameter σ of the ten folding cross validation methods to SVM classifier includes: using grid search Method carries out value to parameter C and σ;Obtain the institute of value composition all in value and σ value interval all in the value interval of C There is group to be merged into capable search.
3. a kind of Software Defects Predict Methods as claimed in claim 1 or 2, which is characterized in that wherein super according to optimal classification Planar function carries out software defect prediction and uses following methods:
Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge;If the input Data fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the corresponding software of the data Module does not include defect and is marked in the output result of SVM classifier;If the data of the input fall into described optimal When in the defective space that Optimal Separating Hyperplane function determines, it is determined that the corresponding software module of the data include defect and It is marked in the output result of SVM classifier.
4. a kind of software defect forecasting system characterized by comprising dimension-reduction treatment unit, SVM training unit and failure prediction Unit;
Dimension-reduction treatment unit is obtained for carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE The low-dimensional vector that first training data concentrates each sample point to be mapped in lower dimensional space obtains be made of each low-dimensional vector Two training datasets;Wherein first training dataset is NASAMDP software defect data set;
Wherein, second training dataset being made of each low-dimensional vector that obtains is using following methods:
1.1 set the first training dataset as { x1,x2,...,xN},xi∈RD, wherein xiIt is the vector for belonging to D dimension space;
1.2, which calculate the first training data, concentrates each sample point xiK Neighbor Points;
1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point;
Wherein, N is sample point quantity, wijRepresent i-th of sample point xiThe coefficient indicated using j-th of Neighbor Points;First training All sample point x in data setiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample point according to formula 2 Corresponding low-dimensional vector;
Wherein, I is unit matrix, M=(I-W)T(I-W);
SVM training unit obtains SVM points for being trained according to the second training dataset to support vector machines classifier The optimal separating hyper plane function of class device, and then obtain trained SVM classifier;
Wherein, the trained SVM classifier that obtains uses following methods:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is slack variable, φ (x) it is kernel function that SVM classifier uses;
The kernel function is Radial basis kernel function, form are as follows:
Wherein, σ is the width parameter of Radial basis kernel function;
Failure prediction unit carries out failure prediction for treating forecasting software according to trained SVM classifier;
In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, using trellis search method and ten folding cross validations Method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that ginseng The value of number C and σ, to determine the optimal separating hyper plane function of SVM classifier;
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel functional parameter σ of SVM classifier Carrying out optimizing includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ, is intersected using ten foldings Method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value;Wherein, intersected using ten foldings Method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training set, 1 classification accuracy under certain selected group parameter is obtained, is so repeated 10 times;Obtain 10 classification under this group of parameter Accuracy rate, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, relatively more selected The average of the classification accuracy of every group of parameter, by average highest that group of parameter C, σ as optimal parameter value.
5. a kind of software defect forecasting system as claimed in claim 4, which is characterized in that above-mentioned uses trellis search method Carrying out optimizing with parameter C and kernel functional parameter σ of the ten folding cross validation methods to SVM classifier includes: using grid search Method carries out value to parameter C and σ;Obtain the institute of value composition all in value and σ value interval all in the value interval of C There is group to be merged into capable search.
6. a kind of software defect forecasting system as described in claim 4 or 5, which is characterized in that wherein super according to optimal classification Planar function carries out software defect prediction and uses following methods:
Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge;If the input Data fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the corresponding software of the data Module does not include defect and is marked in the output result of SVM classifier;If the data of the input fall into described optimal When in the defective space that Optimal Separating Hyperplane function determines, it is determined that the corresponding software module of the data include defect and It is marked in the output result of SVM classifier.
CN201410056779.9A 2014-02-19 2014-02-19 A kind of Software Defects Predict Methods and software defect forecasting system Active CN103810101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410056779.9A CN103810101B (en) 2014-02-19 2014-02-19 A kind of Software Defects Predict Methods and software defect forecasting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410056779.9A CN103810101B (en) 2014-02-19 2014-02-19 A kind of Software Defects Predict Methods and software defect forecasting system

Publications (2)

Publication Number Publication Date
CN103810101A CN103810101A (en) 2014-05-21
CN103810101B true CN103810101B (en) 2019-02-19

Family

ID=50706897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410056779.9A Active CN103810101B (en) 2014-02-19 2014-02-19 A kind of Software Defects Predict Methods and software defect forecasting system

Country Status (1)

Country Link
CN (1) CN103810101B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899135B (en) * 2015-05-14 2017-10-20 工业和信息化部电子第五研究所 Software Defects Predict Methods and system
CN105205002B (en) * 2015-10-28 2017-09-29 北京理工大学 A kind of software safety defect based on test job amount finds the modeling method of model
CN105808435A (en) * 2016-03-08 2016-07-27 北京理工大学 Construction method of software defect evaluation model on the basis of complex network
CN106650828B (en) * 2017-01-03 2020-03-24 电子科技大学 Intelligent terminal security level classification method based on support vector machine
CN106919505B (en) * 2017-02-20 2019-07-05 中国电子产品可靠性与环境试验研究所 Software Defects Predict Methods and device
CN107168868B (en) * 2017-04-01 2021-01-19 西安交通大学 Software change defect prediction method based on sampling and ensemble learning
CN107832209A (en) * 2017-10-26 2018-03-23 北京邮电大学 A kind of Android applied behavior analysis methods based on hybrid detection result
CN107957946B (en) * 2017-12-01 2020-10-20 北京理工大学 Software defect prediction method based on neighborhood embedding protection algorithm support vector machine
CN108304316B (en) * 2017-12-25 2021-04-06 浙江工业大学 Software defect prediction method based on collaborative migration
CN108595495B (en) 2018-03-15 2020-06-23 阿里巴巴集团控股有限公司 Method and device for predicting abnormal sample
CN108763096A (en) * 2018-06-06 2018-11-06 北京理工大学 Software Defects Predict Methods based on depth belief network algorithm support vector machines
CN109165160A (en) * 2018-08-28 2019-01-08 北京理工大学 Software defect prediction model design method based on core principle component analysis algorithm
CN110147321B (en) * 2019-04-19 2020-11-24 北京航空航天大学 Software network-based method for identifying defect high-risk module
CN111143222A (en) * 2019-12-30 2020-05-12 军事科学院系统工程研究院系统总体研究所 Software evaluation method based on defect prediction
CN112651424A (en) * 2020-12-01 2021-04-13 国网山东省电力公司青岛供电公司 GIS insulation defect identification method and system based on LLE dimension reduction and chaos algorithm optimization
CN113204481B (en) * 2021-04-21 2022-03-04 武汉大学 Class imbalance software defect prediction method based on data resampling
CN113807016A (en) * 2021-09-22 2021-12-17 华东理工大学 Data-driven engineering material ultra-high cycle fatigue life prediction method
CN114816963B (en) * 2022-06-28 2022-09-20 南昌航空大学 Embedded software quality evaluation method, system, computer and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于局部线性嵌入和Haar小波的人脸识别方法;李伟生,张勤;《计算机工程与应用》;20111231;第47卷(第4期);全文
基于数据降维和支持向量机的入侵检测方法研究;肖海明;《中国优秀硕士学位论文全文数据库·信息科技辑》;20110515;第2011年卷(第5期);正文第15-38页
支持向量分类机的核函数研究;李红英;《中国优秀硕士学位论文全文数据库·信息科技辑》;20091215;第2009年卷(第12期);正文第16-23页

Also Published As

Publication number Publication date
CN103810101A (en) 2014-05-21

Similar Documents

Publication Publication Date Title
CN103810101B (en) A kind of Software Defects Predict Methods and software defect forecasting system
CN103745273B (en) Semiconductor fabrication process multi-performance prediction method
Alinezhad et al. Sensitivity analysis of TOPSIS technique: the results of change in the weight of one attribute on the final ranking of alternatives
CN116108758B (en) Landslide susceptibility evaluation method
CN106355192A (en) Support vector machine method based on chaos and grey wolf optimization
CN107122327A (en) The method and training system of a kind of utilization training data training pattern
CN108051660A (en) A kind of transformer fault combined diagnosis method for establishing model and diagnostic method
CN107797931A (en) A kind of method for evaluating software quality and system based on second evaluation
CN103559303A (en) Evaluation and selection method for data mining algorithm
CN103957116B (en) A kind of decision-making technique and system of cloud fault data
CN106485348A (en) A kind of Forecasting Methodology of transaction data and device
CN105335619A (en) Collaborative optimization method applicable to parameter back analysis of high calculation cost numerical calculation model
CN109829627A (en) A kind of safe confidence appraisal procedure of Electrical Power System Dynamic based on integrated study scheme
CN106708659A (en) Filling method for adaptive nearest neighbor missing data
CN106156857B (en) The method and apparatus of the data initialization of variation reasoning
Sabzi et al. Numerical comparison of multi-criteria decision-Making techniques: A simulation of flood management multi-criteria systems
Ullah et al. Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids
CN110837952A (en) Game theory-based power grid new technology equipment selection method and system
CN109961160A (en) A kind of power grid future operation trend predictor method and system based on trend parameter
Kim et al. A simulated annealing algorithm for the creation of synthetic population in activity-based travel demand model
Wang et al. Temperature forecast based on SVM optimized by PSO algorithm
CN108830407A (en) Sensor distribution optimization method under the conditions of multi-state in monitoring structural health conditions
CN114139482A (en) EDA circuit failure analysis method based on depth measurement learning
Liu et al. Personal Credit Evaluation Under the Big Data and Internet Background Based on Group Character
CN104572900A (en) Trait characteristic selection method for crop breeding evaluation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant