CN103810101B - A kind of Software Defects Predict Methods and software defect forecasting system - Google Patents
A kind of Software Defects Predict Methods and software defect forecasting system Download PDFInfo
- Publication number
- CN103810101B CN103810101B CN201410056779.9A CN201410056779A CN103810101B CN 103810101 B CN103810101 B CN 103810101B CN 201410056779 A CN201410056779 A CN 201410056779A CN 103810101 B CN103810101 B CN 103810101B
- Authority
- CN
- China
- Prior art keywords
- parameter
- svm classifier
- software
- training
- sample point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention provides a kind of Software Defects Predict Methods and software defect forecasting system, to solve the problems, such as that existing software defect precision of prediction is not high.It include: dimension-reduction treatment unit, SVM training unit and failure prediction unit;Wherein Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space is obtained, the second training dataset being made of each low-dimensional vector is obtained;Step 2: being trained according to second training dataset to support vector machines classifier, the optimal separating hyper plane function of SVM classifier is obtained, and then obtain trained SVM classifier;Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction.
Description
Technical field
The present invention relates to software security field, in particular to a kind of Software Defects Predict Methods and software defect prediction system
System.
Background technique
Software defect Predicting Technique is born in the 1970s, main function is embodied in the guidance to Quality Assurance
And high value reference is provided for balancing software cost.Software defect prediction is broadly divided into dynamic prediction and static prediction, at present
In terms of main research concentrates on static prediction, the invention belongs to the forecast of distribution technologies in static prediction.Support vector machines
The new engineering of one kind that (Support Vector Machine, abbreviation SVM) grows up on the basis of Statistical Learning Theory
Learning method has in solution small sample, the identification of non-linear and high dimensional pattern there are many unique advantage, and existing software defect is pre-
It surveys and is mainly predicted to establish prediction model to software defect using being support vector machines this tools.It is lacked with software
Falling into the relevant patent of prediction mainly has: the failure prediction method and system (publication number CN200910080742) based on demand change
And the software defect priority prediction method (publication number CN201210057888) based on improved support vector machines.
The thinking of the prior art includes two parts, the dimensionality reduction to data set and the optimizing to support vector machines parameter, needle
To both of these problems, the prior art proposes different solutions, and achieves certain achievement, but the selected drop of the prior art
Dimension method has certain limitation, and the result after dimensionality reduction cannot be guaranteed the integrality of initial data, nor intrinsic dimension
Preferably embody, and software defect Predicting Technique itself is the operation to data set, the guarantee of data integrity is to guarantee prediction knot
The accuracy of fruit has critically important meaning.
Summary of the invention
The present invention provides a kind of Software Defects Predict Methods and software defect forecasting system, to solve existing software
The not high problem of failure prediction precision.
A kind of Software Defects Predict Methods, the following steps are included:
Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the first instruction is obtained
Practice each sample point in data set and be mapped to the low-dimensional vector in lower dimensional space, obtains the be made of each low-dimensional vector second training
Data set;
Step 2: being trained according to second training dataset to support vector machines classifier, SVM points are obtained
The optimal separating hyper plane function of class device, and then obtain trained SVM classifier;
Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction.
The second training dataset being made of each low-dimensional vector is wherein obtained in step 1 using following methods:
1.1 set the first training dataset as { X1,X2,...,XN},Xi∈RD, wherein XiIt is the vector for belonging to D dimension space;
1.2, which calculate the first training data, concentrates each sample point XiK Neighbor Points;
1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point;
Formula 1
Wherein, N is sample point quantity, wijRepresent i-th of sample point XiThe coefficient indicated using j-th of Neighbor Points;First
Training data concentrates all sample point XiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample according to formula 2
The corresponding low-dimensional vector of this point;
Formula 2
Wherein, I is unit matrix, M=(I-W)T(I-W)。
Trained SVM classifier is wherein obtained described in step 2 using following methods:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change
Amount, φ (x) is the kernel function that SVM classifier uses.
Above-mentioned kernel function is Radial basis kernel function, form are as follows:
Formula 4
Wherein, σ is the width parameter of Radial basis kernel function.
In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, intersected using trellis search method and ten foldings
Verification method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that
To the value of parameter C and σ, to determine the optimal separating hyper plane function of SVM classifier.
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel function of SVM classifier
It includes: to carry out value to parameter C and σ using trellis search method that parameter σ, which carries out optimizing,;It obtains all in the value interval of C
All groups of all values composition are merged into capable search in value and σ value interval.
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel function of SVM classifier
It includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ that parameter σ, which carries out optimizing, using ten
Folding cross method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value;Wherein, using ten
Folding cross method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset is done
Training set obtains 1 classification accuracy under certain selected group parameter, is so repeated 10 times;It obtains under this group of parameter
10 classification accuracies, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, than
The relatively average of the classification accuracy of every group of selected parameter, by average highest that group of parameter C, σ as optimal parameter
Value.
Software defect prediction is wherein carried out according to optimal separating hyper plane function and uses following methods:
Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge;If described defeated
The data entered fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the data are corresponding
Software module does not include defect and is marked in the output result of SVM classifier;If the data of the input fall into described
When in the defective space that optimal separating hyper plane function determines, it is determined that the corresponding software module of the data includes defect
And it is marked in the output result of SVM classifier.
A kind of software defect forecasting system, comprising: dimension-reduction treatment unit, SVM training unit and failure prediction unit;
Dimension-reduction treatment unit, for carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE,
The low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space is obtained, obtains being made of each low-dimensional vector
The second training dataset;
SVM training unit is obtained for being trained according to the second training dataset to support vector machines classifier
The optimal separating hyper plane function of SVM classifier, and then obtain trained SVM classifier;
Failure prediction unit carries out failure prediction for treating forecasting software according to trained SVM classifier.
Beneficial effects of the present invention:
Software Defects Predict Methods provided by the invention and software defect forecasting system, firstly, using being locally linear embedding into
Algorithm carries out dimension-reduction treatment to training dataset, and the geometry of sample point is constant in data set after guarantee dimensionality reduction, so that dimensionality reduction
Data afterwards can more completely reflect the various features of raw data set, secondly, finding the ginseng of SVM according to grid-search algorithms
Number C and the parameter σ of kernel function carry out optimizing, make that highest group of svm classifier accuracy rate with putting the palms together before one to roll over cross validation method and find
C, the value of σ is determined as optimized parameter, and the optimal separating hyper plane function of SVM is determined according to the optimized parameter, utilizes most optimal sorting
Class hyperplane function carries out software defect prediction and achievees the purpose that improve software defect predictablity rate.
Detailed description of the invention
Fig. 1 is a kind of block diagram of Software Defects Predict Methods provided by one embodiment of the present invention;
Fig. 2 is a kind of flow chart for Software Defects Predict Methods that another embodiment of the invention provides;
Fig. 3 is a kind of block diagram for software defect forecasting system that another embodiment of the invention provides.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Technical concept of the invention is the limitation for existing dimension reduction method, i.e. result after dimensionality reduction does not ensure that
The integrality of data, nor the preferably embodiment of intrinsic dimension.The embodiment of the present invention uses and is locally linear embedding into (locally
Linear embedding, abbreviation LLE) algorithm carry out software defect data set dimensionality reduction, the thought of the algorithm is from sample
The space structure of data sets out, and can guarantee that the geometry of data sample after dimensionality reduction is constant, enable the data after dimensionality reduction more
Fully reflect that the various features of raw data set, software defect Predicting Technique itself are the operations to data set, more adds
The feature of whole embodiment initial data is extremely important to the accuracy for improving prediction result.
One embodiment of the invention provides a kind of Software Defects Predict Methods.Fig. 1 is that one embodiment of the invention provides
A kind of Software Defects Predict Methods block diagram, referring to Fig. 1, this method comprises:
Step S100: dimension-reduction treatment is carried out to the first training dataset according to Local Liner Prediction LLE, obtains first
The low-dimensional vector that training data concentrates each sample point to be mapped in lower dimensional space obtains the be made of each low-dimensional vector second instruction
Practice data set;
Step S110: support vector machines classifier is trained according to the second training dataset, obtains svm classifier
The optimal separating hyper plane function of device, and then obtain trained SVM classifier;
Step S120: forecasting software is treated according to optimal separating hyper plane function and carries out failure prediction.
In the present embodiment, dimension-reduction treatment is carried out to the first training dataset according to Local Liner Prediction LLE, obtained
The low-dimensional vector that first training data concentrates each sample point to be mapped in lower dimensional space obtains be made of each low-dimensional vector
Two training datasets include:
If the first training dataset is { X1,X2,...,XN},Xi∈RDWherein, XiIt is the vector for belonging to D dimension space;
It calculates the first training data and concentrates each sample point XiK Neighbor Points;
Partial reconstruction weight matrix W is calculated according to formula 1 using K Neighbor Points of each sample point;
Formula 1
Wherein, N is sample point quantity, wijRepresent i-th of sample point XiThe coefficient indicated using j-th of Neighbor Points;First
Training data concentrates all sample point XiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
Pair of each sample point is calculated according to obtained partial reconstruction weight matrix W and its Neighbor Points and according to formula 2
The low-dimensional vector answered;
Formula 2
Wherein, I is unit matrix, M=(I-W)T(I-W)。
In the present embodiment, support vector machines classifier is trained according to the second training dataset, obtains SVM
The optimal separating hyper plane function of classifier includes:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change
Amount, φ (x) is the kernel function that SVM classifier uses.
In the present embodiment, kernel function is Radial basis kernel function, form are as follows:
Formula 4
Wherein, σ is the width parameter of Radial basis kernel function.
Fig. 2 is a kind of flow chart of the method for software defect prediction that another embodiment of the invention provides;Referring to fig. 2,
Specifically, the embodiment of the present invention can be specifically divided into three parts, and first part carries out dimension-reduction treatment to training dataset: this
A part includes step S200 and S210: second part includes step S220;Part III then includes step S230.
Step S200: the first training dataset used when software defect prediction is obtained;
Step S210: dimensionality reduction is carried out to the first training dataset using LLE algorithm;The data set used in the present embodiment for
Software defect predicts widely used NASA MDP software defect data set in area research, under which can pass through from network
It carries and obtains.The data set includes 13 Sub Data Sets, and each Sub Data Set has recorded each mould in the actual software project of NASA
The metric attribute and marker bit of block, wherein marker bit represents whether the module has defect.It is right after obtaining the first training dataset
Data set carries out dimension-reduction treatment.Specifically, dimensionality reduction step can be divided into:
1) the first training dataset is set as { X1,X2,...,XN},Xi∈RD, wherein R represents space, and D represents dimension.
2) the distance between each sample point and other sample points, calculation formula d are determinedij=| | Xi-Xj| |, it calculates
After the distance between each sample point and other sample points, selectes and be wherein used as Neighbor Points apart from shortest K;
3) by sample point XiNeighbor Points calculate partial reconstruction weight matrix W, keep the reconstruction error of sample point minimum, i.e.,
Solve optimization problem:
Formula 1
Wherein, N is sample point quantity, wijRepresent the coefficient that i-th of sample point uses j-th of Neighbor Points to indicate, wijIt is also
One weight represents contribution of j-th of Neighbor Points to i-th of sample point.It is specific that dimensionality reduction is carried out to data set using LLE algorithm
For: it is that the k nearest neighbor point of each sample point sample point concentrated to data indicates the sample point.In this way, each sample
This point has k nearest neighbor point to indicate that K coefficient of the sample point, single Neighbor Points indicate the sample when with Neighbor Points to indicate
When point, coefficient is a specific numerical value, and K coefficient of each sample point constitutes a coefficient vector;Own in data set
The coefficient vector of sample point just constitutes a weight matrix W.
4) partial reconstruction weight matrix W obtained in the previous step is then fixed, according to target function solves each sample point Xi
Corresponding low-dimensional vector Yi, objective function are as follows:
Formula 2
Wherein, I is a unit matrix, M=(I-W)T(I-W), the 2nd to the d+1 feature vector of final M is exactly to export
As a result.Here, d represents the dimension after carrying out dimensionality reduction to sample point, and final output is the result is that d low-dimensional vector.
By above-mentioned 4 steps, obtains the first training data and each sample point is concentrated to be mapped to the low-dimensional in lower dimensional space
Then vector is trained SVM classifier with the second training dataset that these low-dimensional vectors form.
Second part is trained SVM classifier using the data set after dimensionality reduction:
Step S220: the data set after dimensionality reduction is input in SVM classifier, is intersected in conjunction with trellis search method and ten foldings
Verification method is trained to parameter optimization, and to SVM classifier.
Wherein, support vector machines classifier is trained according to the second training dataset, obtains SVM classifier
Optimal separating hyper plane function specifically includes following process:
SVM classifier is trained using the second training dataset after dimensionality reduction, trained process is to solve for SVM's
Optimal separating hyper plane.
The problem of being specifically trained to SVM can be exchanged into one the problem of seeking convex quadratic programming:
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change
Amount, φ (x) are kernel function selected to use.Penalty factor has determined multiple view outlier bring loss, it is clear that when all
The slack variable of outlier and a timing, fixed C is bigger, also bigger to the loss of objective function, would imply that you are non-at this time
Often it is unwilling to abandon these outliers, most extreme situation is that C is set to infinity by you, as long as slightly a point peels off in this way,
The value of objective function is immediately turned to infinity, and problem is allowed to become no solution at once, this has just been degenerated to hard interval problem.Slack variable
ξiValue actually indicated corresponding point peel off on earth it is how far, be worth it is bigger, put it is remoter.Acting through for kernel function will
The data of lower dimensional space are mapped to higher dimensional space, so that linearly inseparable be made to be converted to linear separability.
Since Radial basis kernel function has wider convergence range, made in the present embodiment using Radial basis kernel function
For the kernel function of SVM classifier.The form of kernel function are as follows:
Formula 4
Lagrange multiplier is introduced, aforementioned quadratic programming problem is solved using standard Lagrange duality principle abbreviation, obtains
To a symbol discriminant function:
Formula 5
Determination for the parameter σ in SVM in penalty coefficient C and Radial basis kernel function, in the present embodiment, using grid
Searching method matches parameter C and kernel functional parameter σ the progress optimizing puted the palms together before one and roll over cross validation method to SVM classifier, and finding makes
That highest value to parameter C and σ of svm classifier accuracy rate, to determine the optimal separating hyper plane function of SVM classifier.
Specifically, in the present embodiment, the value of optimal parameter C and σ are determined using trellis search method;Allow this two
A parameter is in previously given range grid division and traverses all grids progress values, wherein the value interval of C is set as [2-10,27], σ value interval is set as [2-10,23], the step-length of two parameters is all 0.1, obtain value all in the value interval of C with
All groups of all values composition are merged into capable search in σ value interval.
In the present embodiment, the classification accuracy under this group of parameter value is obtained to every group of selected parameter C, σ, used
Ten folding cross methods are verified, and taking makes highest that group of parameter C, σ of classification accuracy as optimal parameter value;Wherein,
The realization process verified using ten folding cross methods are as follows: the second data set is divided into 10 subsets, 1 subset is tested
Collection, remaining 9 subset do training set, obtain 1 classification accuracy under certain selected group parameter, are so repeated 10 times;It obtains
10 classification accuracies under this group of parameter, using the average of this 10 classification accuracies as each group of parameter superiority and inferiority of evaluation
Index, then, the average of the classification accuracy of every group of relatively selected parameter, by average highest that group of parameter C, σ
As optimal parameter value.
After the value for finding optimal parameter C, σ, the optimal separating hyper plane function of SVM classifier is determined, and then obtain
Trained SVM classifier.
Part III: failure prediction is carried out to software under testing using trained SVM classifier.
Step S230: software defect prediction is carried out using trained SVM classifier;
Specifically, in the present embodiment, the data set for treating forecasting software first carries out dimension-reduction treatment using LLE algorithm;
If the data of input fall into when not having in defective space of optimal separating hyper plane function determination, it is determined that the data are corresponding
Software module does not include defect and is marked in the output result of SVM classifier;If the data of input fall into optimal classification
When in the defective space that hyperplane function determines, it is determined that the corresponding software module of data is comprising defect and in svm classifier
It is marked in the output result of device.
In the present embodiment, it when being shown in the output result of SVM classifier, is used if software module has defect
Alphabetical Y be marked for.It is marked if software module does not have defect with letter N.
Software Defects Predict Methods provided in an embodiment of the present invention are using Local Liner Prediction to training data as a result,
Collection carries out dimension-reduction treatment, and the geometry of sample point is constant in data set after guarantee dimensionality reduction, enables the data after dimensionality reduction completeer
Reflect the various features of raw data set entirely.
Another embodiment of the invention additionally provides a kind of system of software defect prediction, and Fig. 3 is another reality of the invention
A kind of block diagram of software defect forecasting system of example offer is provided.Referring to Fig. 3, the system 300 include: dimension-reduction treatment unit 310,
SVM training unit 320 and failure prediction unit 330;
Dimension-reduction treatment unit 310, for being carried out at dimensionality reduction according to Local Liner Prediction LLE to the first training dataset
Reason, obtains the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space, obtains by each low-dimensional vector
Second training dataset of composition;
SVM training unit 320 is obtained for being trained according to the second training dataset to support vector machines classifier
To the optimal separating hyper plane function of SVM classifier, and then obtain trained SVM classifier;
Failure prediction unit 330 carries out failure prediction for treating forecasting software according to trained SVM classifier.
In one embodiment of the invention, the first training dataset is dropped according to Local Liner Prediction LLE
Dimension processing, obtains the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space, obtains by each low-dimensional
Vector composition the second training dataset include:
If the first training dataset is { X1,X2,...,XN},Xi∈RDWherein, XiIt is the vector for belonging to D dimension space;
It calculates the first training data and concentrates each sample point XiK Neighbor Points;
Partial reconstruction weight matrix W is calculated according to formula 1 using K Neighbor Points of each sample point;
Formula 1
Wherein, N is sample point quantity, wijRepresent i-th of sample point XiThe coefficient indicated using j-th of Neighbor Points, first
Training data concentrates all sample point XiThe partial reconstruction weight square of all sample points is constituted using the coefficient that Neighbor Points indicate
Battle array W;
Pair of each sample point is calculated according to obtained partial reconstruction weight matrix W and its Neighbor Points and according to formula 2
The low-dimensional vector answered;
Formula 2
Wherein, I is unit matrix, M=(I-W)T(I-W)。
It is to be carried out according to the second training dataset to support vector machines classifier in embodiment at of the invention one
Training, the optimal separating hyper plane function for obtaining SVM classifier include:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is loose change
Amount, φ (x) is the kernel function that SVM classifier uses.
In one embodiment of the invention, kernel function is Radial basis kernel function, form are as follows:
Formula 4
Wherein, σ is the width parameter of Radial basis kernel function.
In one embodiment of the invention, SVM training unit is also used to intersect using trellis search method with folding of putting the palms together before one
Verification method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that
To the value of parameter C and σ, to determine the optimal separating hyper plane function of SVM classifier.
In one embodiment of the invention, cross validation method is rolled over to svm classifier with putting the palms together before one using trellis search method
The parameter C and kernel functional parameter σ of device carry out optimizing
Value is carried out to the parameter C and σ using trellis search method;Wherein, the value interval of C is set as [2-10,27], σ
Value interval is set as [2-10,23], the step-length of two parameter is all 0.1, obtains value and σ value area all in the value interval by C
All groups of interior all value compositions are merged into capable search.
In one embodiment of the invention, cross validation method is rolled over to svm classifier with putting the palms together before one using trellis search method
The parameter C and kernel functional parameter σ of device carry out optimizing further include:
The classification accuracy under this group of parameter value is obtained to every group of selected parameter C, σ, using ten folding cross methods into
Row verifying, taking makes highest that group of parameter of classification accuracy as optimal parameter value, wherein described to use ten folding intersection sides
Method carries out that verifying refers to second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training
Collection obtains 1 classification accuracy under certain selected group parameter, is so repeated 10 times;Obtain 10 points under this group of parameter
Class accuracy rate, it is then, relatively more selected using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation
Every group of parameter classification accuracy average, by average highest that group of parameter C, σ as optimal parameter value.
In one embodiment of the invention, carrying out software defect prediction according to trained SVM classifier includes:
The data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Data set after dimensionality reduction is input in trained SVM classifier and is judged;If the data of input are fallen into
Not when not having in defective space of optimal separating hyper plane function determination, it is determined that the corresponding software module of the data does not include scarce
It falls into and is marked in the output result of SVM classifier;If the data of input fall into what optimal separating hyper plane function determined
When in defective space, it is determined that the corresponding software module of data include defect and in the output result of SVM classifier into
Line flag.
It is emphasized that this software defect forecasting system provided in an embodiment of the present invention carries out software defect prediction
Process may be summarized to be the process of prediction model of the building based on LLE algorithm and SVM classifier.The prediction model building process
It include mainly two modules, first is dimension-reduction treatment, and second is failure prediction.Wherein, SVM classifier is used in dimension-reduction treatment
Training set need to carry out dimension-reduction treatment, meanwhile, in practical applications, the test data set of software under testing is similarly used
Then LLE dimension-reduction treatment carries out specific pre- according to the data set after dimensionality reduction and the SVM optimal separating hyper plane function acquired
It surveys.Data set after can guaranteeing dimensionality reduction in this way can more comprehensively embody the data characteristics of initial data, to improve soft
The accuracy rate of part failure prediction.
Software defect forecasting system provided in an embodiment of the present invention is opposite with the Software Defects Predict Methods of foregoing description
It answers, specific use process is not repeating herein referring to the related content in preceding method embodiment.
In conclusion this Software Defects Predict Methods provided in an embodiment of the present invention and software defect forecasting system, are adopted
Dimension-reduction treatment is carried out to training dataset with Local Liner Prediction, the data after dimensionality reduction is enabled more completely to reflect original
The various features of beginning data set, and according to the optimal separating hyper plane function of SVM, it is carried out using optimal separating hyper plane function soft
Part failure prediction, to achieve the purpose that improve software defect predictablity rate.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (6)
1. a kind of Software Defects Predict Methods, which comprises the following steps:
Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the first training number is obtained
According to concentrating each sample point to be mapped to the low-dimensional vector in lower dimensional space, the second training data being made of each low-dimensional vector is obtained
Collection, wherein first training dataset is NASA MDP software defect data set;
Wherein, the preparation method of the second training dataset is as follows:
1.1 set the first training dataset as { x1,x2,...,xN},xi∈RD, wherein xiIt is the vector for belonging to D dimension space;
1.2, which calculate the first training data, concentrates each sample point xiK Neighbor Points;
1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point;
Wherein, N is sample point quantity, wijRepresent i-th of sample point xiThe coefficient indicated using j-th of Neighbor Points;First training
All sample point x in data setiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample point according to formula 2
Corresponding low-dimensional vector;
Wherein, I is unit matrix, M=(I-W)T(I-W);
Step 2: being trained according to second training dataset to support vector machines classifier, SVM classifier is obtained
Optimal separating hyper plane function, and then obtain trained SVM classifier;
Wherein, the preparation method of trained SVM classifier is as follows:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is slack variable, φ
(x) it is kernel function that SVM classifier uses;
Wherein, the kernel function is Radial basis kernel function, form are as follows:
Wherein, σ is the width parameter of Radial basis kernel function;
Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction;
In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, using trellis search method and ten folding cross validations
Method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that ginseng
The value of number C and σ, to determine the optimal separating hyper plane function of SVM classifier;
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel functional parameter σ of SVM classifier
Carrying out optimizing includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ, is intersected using ten foldings
Method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value;Wherein, intersected using ten foldings
Method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training set,
1 classification accuracy under certain selected group parameter is obtained, is so repeated 10 times;Obtain 10 classification under this group of parameter
Accuracy rate, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, relatively more selected
The average of the classification accuracy of every group of parameter, by average highest that group of parameter C, σ as optimal parameter value.
2. a kind of Software Defects Predict Methods as described in claim 1, which is characterized in that above-mentioned uses trellis search method
Carrying out optimizing with parameter C and kernel functional parameter σ of the ten folding cross validation methods to SVM classifier includes: using grid search
Method carries out value to parameter C and σ;Obtain the institute of value composition all in value and σ value interval all in the value interval of C
There is group to be merged into capable search.
3. a kind of Software Defects Predict Methods as claimed in claim 1 or 2, which is characterized in that wherein super according to optimal classification
Planar function carries out software defect prediction and uses following methods:
Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge;If the input
Data fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the corresponding software of the data
Module does not include defect and is marked in the output result of SVM classifier;If the data of the input fall into described optimal
When in the defective space that Optimal Separating Hyperplane function determines, it is determined that the corresponding software module of the data include defect and
It is marked in the output result of SVM classifier.
4. a kind of software defect forecasting system characterized by comprising dimension-reduction treatment unit, SVM training unit and failure prediction
Unit;
Dimension-reduction treatment unit is obtained for carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE
The low-dimensional vector that first training data concentrates each sample point to be mapped in lower dimensional space obtains be made of each low-dimensional vector
Two training datasets;Wherein first training dataset is NASAMDP software defect data set;
Wherein, second training dataset being made of each low-dimensional vector that obtains is using following methods:
1.1 set the first training dataset as { x1,x2,...,xN},xi∈RD, wherein xiIt is the vector for belonging to D dimension space;
1.2, which calculate the first training data, concentrates each sample point xiK Neighbor Points;
1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point;
Wherein, N is sample point quantity, wijRepresent i-th of sample point xiThe coefficient indicated using j-th of Neighbor Points;First training
All sample point x in data setiPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates;
The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample point according to formula 2
Corresponding low-dimensional vector;
Wherein, I is unit matrix, M=(I-W)T(I-W);
SVM training unit obtains SVM points for being trained according to the second training dataset to support vector machines classifier
The optimal separating hyper plane function of class device, and then obtain trained SVM classifier;
Wherein, the trained SVM classifier that obtains uses following methods:
The optimal separating hyper plane function of SVM classifier is solved according to formula 3
Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξiIt is slack variable, φ
(x) it is kernel function that SVM classifier uses;
The kernel function is Radial basis kernel function, form are as follows:
Wherein, σ is the width parameter of Radial basis kernel function;
Failure prediction unit carries out failure prediction for treating forecasting software according to trained SVM classifier;
In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, using trellis search method and ten folding cross validations
Method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that ginseng
The value of number C and σ, to determine the optimal separating hyper plane function of SVM classifier;
It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel functional parameter σ of SVM classifier
Carrying out optimizing includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ, is intersected using ten foldings
Method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value;Wherein, intersected using ten foldings
Method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training set,
1 classification accuracy under certain selected group parameter is obtained, is so repeated 10 times;Obtain 10 classification under this group of parameter
Accuracy rate, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, relatively more selected
The average of the classification accuracy of every group of parameter, by average highest that group of parameter C, σ as optimal parameter value.
5. a kind of software defect forecasting system as claimed in claim 4, which is characterized in that above-mentioned uses trellis search method
Carrying out optimizing with parameter C and kernel functional parameter σ of the ten folding cross validation methods to SVM classifier includes: using grid search
Method carries out value to parameter C and σ;Obtain the institute of value composition all in value and σ value interval all in the value interval of C
There is group to be merged into capable search.
6. a kind of software defect forecasting system as described in claim 4 or 5, which is characterized in that wherein super according to optimal classification
Planar function carries out software defect prediction and uses following methods:
Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm;
Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge;If the input
Data fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the corresponding software of the data
Module does not include defect and is marked in the output result of SVM classifier;If the data of the input fall into described optimal
When in the defective space that Optimal Separating Hyperplane function determines, it is determined that the corresponding software module of the data include defect and
It is marked in the output result of SVM classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410056779.9A CN103810101B (en) | 2014-02-19 | 2014-02-19 | A kind of Software Defects Predict Methods and software defect forecasting system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410056779.9A CN103810101B (en) | 2014-02-19 | 2014-02-19 | A kind of Software Defects Predict Methods and software defect forecasting system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103810101A CN103810101A (en) | 2014-05-21 |
CN103810101B true CN103810101B (en) | 2019-02-19 |
Family
ID=50706897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410056779.9A Active CN103810101B (en) | 2014-02-19 | 2014-02-19 | A kind of Software Defects Predict Methods and software defect forecasting system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103810101B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899135B (en) * | 2015-05-14 | 2017-10-20 | 工业和信息化部电子第五研究所 | Software Defects Predict Methods and system |
CN105205002B (en) * | 2015-10-28 | 2017-09-29 | 北京理工大学 | A kind of software safety defect based on test job amount finds the modeling method of model |
CN105808435A (en) * | 2016-03-08 | 2016-07-27 | 北京理工大学 | Construction method of software defect evaluation model on the basis of complex network |
CN106650828B (en) * | 2017-01-03 | 2020-03-24 | 电子科技大学 | Intelligent terminal security level classification method based on support vector machine |
CN106919505B (en) * | 2017-02-20 | 2019-07-05 | 中国电子产品可靠性与环境试验研究所 | Software Defects Predict Methods and device |
CN107168868B (en) * | 2017-04-01 | 2021-01-19 | 西安交通大学 | Software change defect prediction method based on sampling and ensemble learning |
CN107832209A (en) * | 2017-10-26 | 2018-03-23 | 北京邮电大学 | A kind of Android applied behavior analysis methods based on hybrid detection result |
CN107957946B (en) * | 2017-12-01 | 2020-10-20 | 北京理工大学 | Software defect prediction method based on neighborhood embedding protection algorithm support vector machine |
CN108304316B (en) * | 2017-12-25 | 2021-04-06 | 浙江工业大学 | Software defect prediction method based on collaborative migration |
CN108595495B (en) | 2018-03-15 | 2020-06-23 | 阿里巴巴集团控股有限公司 | Method and device for predicting abnormal sample |
CN108763096A (en) * | 2018-06-06 | 2018-11-06 | 北京理工大学 | Software Defects Predict Methods based on depth belief network algorithm support vector machines |
CN109165160A (en) * | 2018-08-28 | 2019-01-08 | 北京理工大学 | Software defect prediction model design method based on core principle component analysis algorithm |
CN110147321B (en) * | 2019-04-19 | 2020-11-24 | 北京航空航天大学 | Software network-based method for identifying defect high-risk module |
CN111143222A (en) * | 2019-12-30 | 2020-05-12 | 军事科学院系统工程研究院系统总体研究所 | Software evaluation method based on defect prediction |
CN112651424A (en) * | 2020-12-01 | 2021-04-13 | 国网山东省电力公司青岛供电公司 | GIS insulation defect identification method and system based on LLE dimension reduction and chaos algorithm optimization |
CN113204481B (en) * | 2021-04-21 | 2022-03-04 | 武汉大学 | Class imbalance software defect prediction method based on data resampling |
CN113807016A (en) * | 2021-09-22 | 2021-12-17 | 华东理工大学 | Data-driven engineering material ultra-high cycle fatigue life prediction method |
CN114816963B (en) * | 2022-06-28 | 2022-09-20 | 南昌航空大学 | Embedded software quality evaluation method, system, computer and readable storage medium |
-
2014
- 2014-02-19 CN CN201410056779.9A patent/CN103810101B/en active Active
Non-Patent Citations (3)
Title |
---|
基于局部线性嵌入和Haar小波的人脸识别方法;李伟生,张勤;《计算机工程与应用》;20111231;第47卷(第4期);全文 |
基于数据降维和支持向量机的入侵检测方法研究;肖海明;《中国优秀硕士学位论文全文数据库·信息科技辑》;20110515;第2011年卷(第5期);正文第15-38页 |
支持向量分类机的核函数研究;李红英;《中国优秀硕士学位论文全文数据库·信息科技辑》;20091215;第2009年卷(第12期);正文第16-23页 |
Also Published As
Publication number | Publication date |
---|---|
CN103810101A (en) | 2014-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103810101B (en) | A kind of Software Defects Predict Methods and software defect forecasting system | |
CN103745273B (en) | Semiconductor fabrication process multi-performance prediction method | |
Alinezhad et al. | Sensitivity analysis of TOPSIS technique: the results of change in the weight of one attribute on the final ranking of alternatives | |
CN116108758B (en) | Landslide susceptibility evaluation method | |
CN106355192A (en) | Support vector machine method based on chaos and grey wolf optimization | |
CN107122327A (en) | The method and training system of a kind of utilization training data training pattern | |
CN108051660A (en) | A kind of transformer fault combined diagnosis method for establishing model and diagnostic method | |
CN107797931A (en) | A kind of method for evaluating software quality and system based on second evaluation | |
CN103559303A (en) | Evaluation and selection method for data mining algorithm | |
CN103957116B (en) | A kind of decision-making technique and system of cloud fault data | |
CN106485348A (en) | A kind of Forecasting Methodology of transaction data and device | |
CN105335619A (en) | Collaborative optimization method applicable to parameter back analysis of high calculation cost numerical calculation model | |
CN109829627A (en) | A kind of safe confidence appraisal procedure of Electrical Power System Dynamic based on integrated study scheme | |
CN106708659A (en) | Filling method for adaptive nearest neighbor missing data | |
CN106156857B (en) | The method and apparatus of the data initialization of variation reasoning | |
Sabzi et al. | Numerical comparison of multi-criteria decision-Making techniques: A simulation of flood management multi-criteria systems | |
Ullah et al. | Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids | |
CN110837952A (en) | Game theory-based power grid new technology equipment selection method and system | |
CN109961160A (en) | A kind of power grid future operation trend predictor method and system based on trend parameter | |
Kim et al. | A simulated annealing algorithm for the creation of synthetic population in activity-based travel demand model | |
Wang et al. | Temperature forecast based on SVM optimized by PSO algorithm | |
CN108830407A (en) | Sensor distribution optimization method under the conditions of multi-state in monitoring structural health conditions | |
CN114139482A (en) | EDA circuit failure analysis method based on depth measurement learning | |
Liu et al. | Personal Credit Evaluation Under the Big Data and Internet Background Based on Group Character | |
CN104572900A (en) | Trait characteristic selection method for crop breeding evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |