CN103810101B

CN103810101B - A kind of Software Defects Predict Methods and software defect forecasting system

Info

Publication number: CN103810101B
Application number: CN201410056779.9A
Authority: CN
Inventors: 胡昌振; 单纯; 陈博洋; 马锐; 王勇
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2014-02-19
Filing date: 2014-02-19
Publication date: 2019-02-19
Anticipated expiration: 2034-02-19
Also published as: CN103810101A

Abstract

The present invention provides a kind of Software Defects Predict Methods and software defect forecasting system, to solve the problems, such as that existing software defect precision of prediction is not high.It include: dimension-reduction treatment unit, SVM training unit and failure prediction unit；Wherein Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space is obtained, the second training dataset being made of each low-dimensional vector is obtained；Step 2: being trained according to second training dataset to support vector machines classifier, the optimal separating hyper plane function of SVM classifier is obtained, and then obtain trained SVM classifier；Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction.

Description

A kind of Software Defects Predict Methods and software defect forecasting system

Technical field

The present invention relates to software security field, in particular to a kind of Software Defects Predict Methods and software defect prediction system System.

Background technique

Software defect Predicting Technique is born in the 1970s, main function is embodied in the guidance to Quality Assurance And high value reference is provided for balancing software cost.Software defect prediction is broadly divided into dynamic prediction and static prediction, at present In terms of main research concentrates on static prediction, the invention belongs to the forecast of distribution technologies in static prediction.Support vector machines The new engineering of one kind that (Support Vector Machine, abbreviation SVM) grows up on the basis of Statistical Learning Theory Learning method has in solution small sample, the identification of non-linear and high dimensional pattern there are many unique advantage, and existing software defect is pre- It surveys and is mainly predicted to establish prediction model to software defect using being support vector machines this tools.It is lacked with software Falling into the relevant patent of prediction mainly has: the failure prediction method and system (publication number CN200910080742) based on demand change And the software defect priority prediction method (publication number CN201210057888) based on improved support vector machines.

The thinking of the prior art includes two parts, the dimensionality reduction to data set and the optimizing to support vector machines parameter, needle To both of these problems, the prior art proposes different solutions, and achieves certain achievement, but the selected drop of the prior art Dimension method has certain limitation, and the result after dimensionality reduction cannot be guaranteed the integrality of initial data, nor intrinsic dimension Preferably embody, and software defect Predicting Technique itself is the operation to data set, the guarantee of data integrity is to guarantee prediction knot The accuracy of fruit has critically important meaning.

Summary of the invention

The present invention provides a kind of Software Defects Predict Methods and software defect forecasting system, to solve existing software The not high problem of failure prediction precision.

A kind of Software Defects Predict Methods, the following steps are included:

Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the first instruction is obtained Practice each sample point in data set and be mapped to the low-dimensional vector in lower dimensional space, obtains the be made of each low-dimensional vector second training Data set；

Step 2: being trained according to second training dataset to support vector machines classifier, SVM points are obtained The optimal separating hyper plane function of class device, and then obtain trained SVM classifier；

Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction.

The second training dataset being made of each low-dimensional vector is wherein obtained in step 1 using following methods:

1.1 set the first training dataset as { X₁,X₂,...,X_N},X_i∈R^D, wherein X_iIt is the vector for belonging to D dimension space；

1.2, which calculate the first training data, concentrates each sample point X_iK Neighbor Points；

1.3 calculate partial reconstruction weight matrix W according to formula 1 using K Neighbor Points of each sample point；

Formula 1

Wherein, N is sample point quantity, w_ijRepresent i-th of sample point X_iThe coefficient indicated using j-th of Neighbor Points；First Training data concentrates all sample point X_iPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates；

The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample according to formula 2 The corresponding low-dimensional vector of this point；

Formula 2

Wherein, I is unit matrix, M=(I-W)^T(I-W)。

Trained SVM classifier is wherein obtained described in step 2 using following methods:

The optimal separating hyper plane function of SVM classifier is solved according to formula 3

Formula 3

Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξ_iIt is loose change Amount, φ (x) is the kernel function that SVM classifier uses.

Above-mentioned kernel function is Radial basis kernel function, form are as follows:

Formula 4

Wherein, σ is the width parameter of Radial basis kernel function.

In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, intersected using trellis search method and ten foldings Verification method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that To the value of parameter C and σ, to determine the optimal separating hyper plane function of SVM classifier.

It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel function of SVM classifier It includes: to carry out value to parameter C and σ using trellis search method that parameter σ, which carries out optimizing,；It obtains all in the value interval of C All groups of all values composition are merged into capable search in value and σ value interval.

It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel function of SVM classifier It includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ that parameter σ, which carries out optimizing, using ten Folding cross method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value；Wherein, using ten Folding cross method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset is done Training set obtains 1 classification accuracy under certain selected group parameter, is so repeated 10 times；It obtains under this group of parameter 10 classification accuracies, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, than The relatively average of the classification accuracy of every group of selected parameter, by average highest that group of parameter C, σ as optimal parameter Value.

Software defect prediction is wherein carried out according to optimal separating hyper plane function and uses following methods:

Firstly, the data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm；

Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge；If described defeated The data entered fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the data are corresponding Software module does not include defect and is marked in the output result of SVM classifier；If the data of the input fall into described When in the defective space that optimal separating hyper plane function determines, it is determined that the corresponding software module of the data includes defect And it is marked in the output result of SVM classifier.

A kind of software defect forecasting system, comprising: dimension-reduction treatment unit, SVM training unit and failure prediction unit；

Dimension-reduction treatment unit, for carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, The low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space is obtained, obtains being made of each low-dimensional vector The second training dataset；

SVM training unit is obtained for being trained according to the second training dataset to support vector machines classifier The optimal separating hyper plane function of SVM classifier, and then obtain trained SVM classifier；

Failure prediction unit carries out failure prediction for treating forecasting software according to trained SVM classifier.

Beneficial effects of the present invention:

Software Defects Predict Methods provided by the invention and software defect forecasting system, firstly, using being locally linear embedding into Algorithm carries out dimension-reduction treatment to training dataset, and the geometry of sample point is constant in data set after guarantee dimensionality reduction, so that dimensionality reduction Data afterwards can more completely reflect the various features of raw data set, secondly, finding the ginseng of SVM according to grid-search algorithms Number C and the parameter σ of kernel function carry out optimizing, make that highest group of svm classifier accuracy rate with putting the palms together before one to roll over cross validation method and find C, the value of σ is determined as optimized parameter, and the optimal separating hyper plane function of SVM is determined according to the optimized parameter, utilizes most optimal sorting Class hyperplane function carries out software defect prediction and achievees the purpose that improve software defect predictablity rate.

Detailed description of the invention

Fig. 1 is a kind of block diagram of Software Defects Predict Methods provided by one embodiment of the present invention；

Fig. 2 is a kind of flow chart for Software Defects Predict Methods that another embodiment of the invention provides；

Fig. 3 is a kind of block diagram for software defect forecasting system that another embodiment of the invention provides.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

Technical concept of the invention is the limitation for existing dimension reduction method, i.e. result after dimensionality reduction does not ensure that The integrality of data, nor the preferably embodiment of intrinsic dimension.The embodiment of the present invention uses and is locally linear embedding into (locally Linear embedding, abbreviation LLE) algorithm carry out software defect data set dimensionality reduction, the thought of the algorithm is from sample The space structure of data sets out, and can guarantee that the geometry of data sample after dimensionality reduction is constant, enable the data after dimensionality reduction more Fully reflect that the various features of raw data set, software defect Predicting Technique itself are the operations to data set, more adds The feature of whole embodiment initial data is extremely important to the accuracy for improving prediction result.

One embodiment of the invention provides a kind of Software Defects Predict Methods.Fig. 1 is that one embodiment of the invention provides A kind of Software Defects Predict Methods block diagram, referring to Fig. 1, this method comprises:

Step S100: dimension-reduction treatment is carried out to the first training dataset according to Local Liner Prediction LLE, obtains first The low-dimensional vector that training data concentrates each sample point to be mapped in lower dimensional space obtains the be made of each low-dimensional vector second instruction Practice data set；

Step S110: support vector machines classifier is trained according to the second training dataset, obtains svm classifier The optimal separating hyper plane function of device, and then obtain trained SVM classifier；

Step S120: forecasting software is treated according to optimal separating hyper plane function and carries out failure prediction.

In the present embodiment, dimension-reduction treatment is carried out to the first training dataset according to Local Liner Prediction LLE, obtained The low-dimensional vector that first training data concentrates each sample point to be mapped in lower dimensional space obtains be made of each low-dimensional vector Two training datasets include:

If the first training dataset is { X₁,X₂,...,X_N},X_i∈R^DWherein, X_iIt is the vector for belonging to D dimension space；

It calculates the first training data and concentrates each sample point X_iK Neighbor Points；

Partial reconstruction weight matrix W is calculated according to formula 1 using K Neighbor Points of each sample point；

Formula 1

Pair of each sample point is calculated according to obtained partial reconstruction weight matrix W and its Neighbor Points and according to formula 2 The low-dimensional vector answered；

Formula 2

Wherein, I is unit matrix, M=(I-W)^T(I-W)。

In the present embodiment, support vector machines classifier is trained according to the second training dataset, obtains SVM The optimal separating hyper plane function of classifier includes:

Formula 3

In the present embodiment, kernel function is Radial basis kernel function, form are as follows:

Formula 4

Wherein, σ is the width parameter of Radial basis kernel function.

Fig. 2 is a kind of flow chart of the method for software defect prediction that another embodiment of the invention provides；Referring to fig. 2, Specifically, the embodiment of the present invention can be specifically divided into three parts, and first part carries out dimension-reduction treatment to training dataset: this A part includes step S200 and S210: second part includes step S220；Part III then includes step S230.

Step S200: the first training dataset used when software defect prediction is obtained；

Step S210: dimensionality reduction is carried out to the first training dataset using LLE algorithm；The data set used in the present embodiment for Software defect predicts widely used NASA MDP software defect data set in area research, under which can pass through from network It carries and obtains.The data set includes 13 Sub Data Sets, and each Sub Data Set has recorded each mould in the actual software project of NASA The metric attribute and marker bit of block, wherein marker bit represents whether the module has defect.It is right after obtaining the first training dataset Data set carries out dimension-reduction treatment.Specifically, dimensionality reduction step can be divided into:

1) the first training dataset is set as { X₁,X₂,...,X_N},X_i∈R^D, wherein R represents space, and D represents dimension.

2) the distance between each sample point and other sample points, calculation formula d are determined_ij=| | X_i-X_j| |, it calculates After the distance between each sample point and other sample points, selectes and be wherein used as Neighbor Points apart from shortest K；

3) by sample point X_iNeighbor Points calculate partial reconstruction weight matrix W, keep the reconstruction error of sample point minimum, i.e., Solve optimization problem:

Formula 1

Wherein, N is sample point quantity, w_ijRepresent the coefficient that i-th of sample point uses j-th of Neighbor Points to indicate, w_ijIt is also One weight represents contribution of j-th of Neighbor Points to i-th of sample point.It is specific that dimensionality reduction is carried out to data set using LLE algorithm For: it is that the k nearest neighbor point of each sample point sample point concentrated to data indicates the sample point.In this way, each sample This point has k nearest neighbor point to indicate that K coefficient of the sample point, single Neighbor Points indicate the sample when with Neighbor Points to indicate When point, coefficient is a specific numerical value, and K coefficient of each sample point constitutes a coefficient vector；Own in data set The coefficient vector of sample point just constitutes a weight matrix W.

4) partial reconstruction weight matrix W obtained in the previous step is then fixed, according to target function solves each sample point X_i Corresponding low-dimensional vector Y_i, objective function are as follows:

Formula 2

Wherein, I is a unit matrix, M=(I-W)^T(I-W), the 2nd to the d+1 feature vector of final M is exactly to export As a result.Here, d represents the dimension after carrying out dimensionality reduction to sample point, and final output is the result is that d low-dimensional vector.

By above-mentioned 4 steps, obtains the first training data and each sample point is concentrated to be mapped to the low-dimensional in lower dimensional space Then vector is trained SVM classifier with the second training dataset that these low-dimensional vectors form.

Second part is trained SVM classifier using the data set after dimensionality reduction:

Step S220: the data set after dimensionality reduction is input in SVM classifier, is intersected in conjunction with trellis search method and ten foldings Verification method is trained to parameter optimization, and to SVM classifier.

Wherein, support vector machines classifier is trained according to the second training dataset, obtains SVM classifier Optimal separating hyper plane function specifically includes following process:

SVM classifier is trained using the second training dataset after dimensionality reduction, trained process is to solve for SVM's Optimal separating hyper plane.

The problem of being specifically trained to SVM can be exchanged into one the problem of seeking convex quadratic programming:

Formula 3

Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξ_iIt is loose change Amount, φ (x) are kernel function selected to use.Penalty factor has determined multiple view outlier bring loss, it is clear that when all The slack variable of outlier and a timing, fixed C is bigger, also bigger to the loss of objective function, would imply that you are non-at this time Often it is unwilling to abandon these outliers, most extreme situation is that C is set to infinity by you, as long as slightly a point peels off in this way, The value of objective function is immediately turned to infinity, and problem is allowed to become no solution at once, this has just been degenerated to hard interval problem.Slack variable ξ_iValue actually indicated corresponding point peel off on earth it is how far, be worth it is bigger, put it is remoter.Acting through for kernel function will The data of lower dimensional space are mapped to higher dimensional space, so that linearly inseparable be made to be converted to linear separability.

Since Radial basis kernel function has wider convergence range, made in the present embodiment using Radial basis kernel function For the kernel function of SVM classifier.The form of kernel function are as follows:

Formula 4

Lagrange multiplier is introduced, aforementioned quadratic programming problem is solved using standard Lagrange duality principle abbreviation, obtains To a symbol discriminant function:

Formula 5

Determination for the parameter σ in SVM in penalty coefficient C and Radial basis kernel function, in the present embodiment, using grid Searching method matches parameter C and kernel functional parameter σ the progress optimizing puted the palms together before one and roll over cross validation method to SVM classifier, and finding makes That highest value to parameter C and σ of svm classifier accuracy rate, to determine the optimal separating hyper plane function of SVM classifier.

Specifically, in the present embodiment, the value of optimal parameter C and σ are determined using trellis search method；Allow this two A parameter is in previously given range grid division and traverses all grids progress values, wherein the value interval of C is set as [2^-10,2⁷], σ value interval is set as [2^-10,2³], the step-length of two parameters is all 0.1, obtain value all in the value interval of C with All groups of all values composition are merged into capable search in σ value interval.

In the present embodiment, the classification accuracy under this group of parameter value is obtained to every group of selected parameter C, σ, used Ten folding cross methods are verified, and taking makes highest that group of parameter C, σ of classification accuracy as optimal parameter value；Wherein, The realization process verified using ten folding cross methods are as follows: the second data set is divided into 10 subsets, 1 subset is tested Collection, remaining 9 subset do training set, obtain 1 classification accuracy under certain selected group parameter, are so repeated 10 times；It obtains 10 classification accuracies under this group of parameter, using the average of this 10 classification accuracies as each group of parameter superiority and inferiority of evaluation Index, then, the average of the classification accuracy of every group of relatively selected parameter, by average highest that group of parameter C, σ As optimal parameter value.

After the value for finding optimal parameter C, σ, the optimal separating hyper plane function of SVM classifier is determined, and then obtain Trained SVM classifier.

Part III: failure prediction is carried out to software under testing using trained SVM classifier.

Step S230: software defect prediction is carried out using trained SVM classifier；

Specifically, in the present embodiment, the data set for treating forecasting software first carries out dimension-reduction treatment using LLE algorithm； If the data of input fall into when not having in defective space of optimal separating hyper plane function determination, it is determined that the data are corresponding Software module does not include defect and is marked in the output result of SVM classifier；If the data of input fall into optimal classification When in the defective space that hyperplane function determines, it is determined that the corresponding software module of data is comprising defect and in svm classifier It is marked in the output result of device.

In the present embodiment, it when being shown in the output result of SVM classifier, is used if software module has defect Alphabetical Y be marked for.It is marked if software module does not have defect with letter N.

Software Defects Predict Methods provided in an embodiment of the present invention are using Local Liner Prediction to training data as a result, Collection carries out dimension-reduction treatment, and the geometry of sample point is constant in data set after guarantee dimensionality reduction, enables the data after dimensionality reduction completeer Reflect the various features of raw data set entirely.

Another embodiment of the invention additionally provides a kind of system of software defect prediction, and Fig. 3 is another reality of the invention A kind of block diagram of software defect forecasting system of example offer is provided.Referring to Fig. 3, the system 300 include: dimension-reduction treatment unit 310, SVM training unit 320 and failure prediction unit 330；

Dimension-reduction treatment unit 310, for being carried out at dimensionality reduction according to Local Liner Prediction LLE to the first training dataset Reason, obtains the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space, obtains by each low-dimensional vector Second training dataset of composition；

SVM training unit 320 is obtained for being trained according to the second training dataset to support vector machines classifier To the optimal separating hyper plane function of SVM classifier, and then obtain trained SVM classifier；

Failure prediction unit 330 carries out failure prediction for treating forecasting software according to trained SVM classifier.

In one embodiment of the invention, the first training dataset is dropped according to Local Liner Prediction LLE Dimension processing, obtains the low-dimensional vector that the first training data concentrates each sample point to be mapped in lower dimensional space, obtains by each low-dimensional Vector composition the second training dataset include:

Formula 1

Wherein, N is sample point quantity, w_ijRepresent i-th of sample point X_iThe coefficient indicated using j-th of Neighbor Points, first Training data concentrates all sample point X_iThe partial reconstruction weight square of all sample points is constituted using the coefficient that Neighbor Points indicate Battle array W；

Formula 2

Wherein, I is unit matrix, M=(I-W)^T(I-W)。

It is to be carried out according to the second training dataset to support vector machines classifier in embodiment at of the invention one Training, the optimal separating hyper plane function for obtaining SVM classifier include:

Formula 3

In one embodiment of the invention, kernel function is Radial basis kernel function, form are as follows:

Formula 4

Wherein, σ is the width parameter of Radial basis kernel function.

In one embodiment of the invention, SVM training unit is also used to intersect using trellis search method with folding of putting the palms together before one Verification method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that To the value of parameter C and σ, to determine the optimal separating hyper plane function of SVM classifier.

In one embodiment of the invention, cross validation method is rolled over to svm classifier with putting the palms together before one using trellis search method The parameter C and kernel functional parameter σ of device carry out optimizing

Value is carried out to the parameter C and σ using trellis search method；Wherein, the value interval of C is set as [2^-10,2⁷], σ Value interval is set as [2^-10,2³], the step-length of two parameter is all 0.1, obtains value and σ value area all in the value interval by C All groups of interior all value compositions are merged into capable search.

In one embodiment of the invention, cross validation method is rolled over to svm classifier with putting the palms together before one using trellis search method The parameter C and kernel functional parameter σ of device carry out optimizing further include:

The classification accuracy under this group of parameter value is obtained to every group of selected parameter C, σ, using ten folding cross methods into Row verifying, taking makes highest that group of parameter of classification accuracy as optimal parameter value, wherein described to use ten folding intersection sides Method carries out that verifying refers to second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training Collection obtains 1 classification accuracy under certain selected group parameter, is so repeated 10 times；Obtain 10 points under this group of parameter Class accuracy rate, it is then, relatively more selected using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation Every group of parameter classification accuracy average, by average highest that group of parameter C, σ as optimal parameter value.

In one embodiment of the invention, carrying out software defect prediction according to trained SVM classifier includes:

The data set for treating forecasting software carries out dimension-reduction treatment using LLE algorithm；

Data set after dimensionality reduction is input in trained SVM classifier and is judged；If the data of input are fallen into Not when not having in defective space of optimal separating hyper plane function determination, it is determined that the corresponding software module of the data does not include scarce It falls into and is marked in the output result of SVM classifier；If the data of input fall into what optimal separating hyper plane function determined When in defective space, it is determined that the corresponding software module of data include defect and in the output result of SVM classifier into Line flag.

It is emphasized that this software defect forecasting system provided in an embodiment of the present invention carries out software defect prediction Process may be summarized to be the process of prediction model of the building based on LLE algorithm and SVM classifier.The prediction model building process It include mainly two modules, first is dimension-reduction treatment, and second is failure prediction.Wherein, SVM classifier is used in dimension-reduction treatment Training set need to carry out dimension-reduction treatment, meanwhile, in practical applications, the test data set of software under testing is similarly used Then LLE dimension-reduction treatment carries out specific pre- according to the data set after dimensionality reduction and the SVM optimal separating hyper plane function acquired It surveys.Data set after can guaranteeing dimensionality reduction in this way can more comprehensively embody the data characteristics of initial data, to improve soft The accuracy rate of part failure prediction.

Software defect forecasting system provided in an embodiment of the present invention is opposite with the Software Defects Predict Methods of foregoing description It answers, specific use process is not repeating herein referring to the related content in preceding method embodiment.

In conclusion this Software Defects Predict Methods provided in an embodiment of the present invention and software defect forecasting system, are adopted Dimension-reduction treatment is carried out to training dataset with Local Liner Prediction, the data after dimensionality reduction is enabled more completely to reflect original The various features of beginning data set, and according to the optimal separating hyper plane function of SVM, it is carried out using optimal separating hyper plane function soft Part failure prediction, to achieve the purpose that improve software defect predictablity rate.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of Software Defects Predict Methods, which comprises the following steps:

Step 1: carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE, the first training number is obtained According to concentrating each sample point to be mapped to the low-dimensional vector in lower dimensional space, the second training data being made of each low-dimensional vector is obtained Collection, wherein first training dataset is NASA MDP software defect data set；

Wherein, the preparation method of the second training dataset is as follows:

Wherein, N is sample point quantity, w_ijRepresent i-th of sample point x_iThe coefficient indicated using j-th of Neighbor Points；First training All sample point x in data set_iPartial reconstruction weight matrix W is constituted using the coefficient that its Neighbor Points indicates；

The Neighbor Points of partial reconstruction weight matrix W and sample point that 1.4 bases obtain simultaneously calculate each sample point according to formula 2 Corresponding low-dimensional vector；

Wherein, I is unit matrix, M=(I-W)^T(I-W)；

Step 2: being trained according to second training dataset to support vector machines classifier, SVM classifier is obtained Optimal separating hyper plane function, and then obtain trained SVM classifier；

Wherein, the preparation method of trained SVM classifier is as follows:

Wherein, ω is the d dimensional vector for being orthogonal to Optimal Separating Hyperplane, and b is bias term, and C is penalty coefficient, ξ_iIt is slack variable, φ (x) it is kernel function that SVM classifier uses；

Wherein, the kernel function is Radial basis kernel function, form are as follows:

Wherein, σ is the width parameter of Radial basis kernel function；

Step 3: treating forecasting software according to the trained SVM classifier carries out failure prediction；

In the above-mentioned optimal separating hyper plane function for obtaining SVM classifier, using trellis search method and ten folding cross validations Method carries out optimizing to the parameter C and kernel functional parameter σ of SVM classifier, find make svm classifier accuracy rate it is highest that ginseng The value of number C and σ, to determine the optimal separating hyper plane function of SVM classifier；

It is above-mentioned using trellis search method and ten folding cross validation methods to the parameter C and kernel functional parameter σ of SVM classifier Carrying out optimizing includes: to obtain the classification accuracy under this group of parameter value to every group of selected parameter C, σ, is intersected using ten foldings Method is verified, and taking makes highest that group of parameter of classification accuracy as optimal parameter value；Wherein, intersected using ten foldings Method carries out that verifying refers to the second data set being divided into 10 subsets, and 1 subset does test set, remaining 9 subset does training set, 1 classification accuracy under certain selected group parameter is obtained, is so repeated 10 times；Obtain 10 classification under this group of parameter Accuracy rate, using the average of this 10 classification accuracies as the index of each group of parameter superiority and inferiority of evaluation, then, relatively more selected The average of the classification accuracy of every group of parameter, by average highest that group of parameter C, σ as optimal parameter value.

2. a kind of Software Defects Predict Methods as described in claim 1, which is characterized in that above-mentioned uses trellis search method Carrying out optimizing with parameter C and kernel functional parameter σ of the ten folding cross validation methods to SVM classifier includes: using grid search Method carries out value to parameter C and σ；Obtain the institute of value composition all in value and σ value interval all in the value interval of C There is group to be merged into capable search.

3. a kind of Software Defects Predict Methods as claimed in claim 1 or 2, which is characterized in that wherein super according to optimal classification Planar function carries out software defect prediction and uses following methods:

Secondly, the data set after dimensionality reduction to be input in the trained SVM classifier and judge；If the input Data fall into when not having in defective space of the optimal separating hyper plane function determination, it is determined that the corresponding software of the data Module does not include defect and is marked in the output result of SVM classifier；If the data of the input fall into described optimal When in the defective space that Optimal Separating Hyperplane function determines, it is determined that the corresponding software module of the data include defect and It is marked in the output result of SVM classifier.

4. a kind of software defect forecasting system characterized by comprising dimension-reduction treatment unit, SVM training unit and failure prediction Unit；

Dimension-reduction treatment unit is obtained for carrying out dimension-reduction treatment to the first training dataset according to Local Liner Prediction LLE The low-dimensional vector that first training data concentrates each sample point to be mapped in lower dimensional space obtains be made of each low-dimensional vector Two training datasets；Wherein first training dataset is NASAMDP software defect data set；

Wherein, second training dataset being made of each low-dimensional vector that obtains is using following methods:

Wherein, I is unit matrix, M=(I-W)^T(I-W)；

SVM training unit obtains SVM points for being trained according to the second training dataset to support vector machines classifier The optimal separating hyper plane function of class device, and then obtain trained SVM classifier；

Wherein, the trained SVM classifier that obtains uses following methods:

The kernel function is Radial basis kernel function, form are as follows:

Wherein, σ is the width parameter of Radial basis kernel function；

Failure prediction unit carries out failure prediction for treating forecasting software according to trained SVM classifier；

5. a kind of software defect forecasting system as claimed in claim 4, which is characterized in that above-mentioned uses trellis search method Carrying out optimizing with parameter C and kernel functional parameter σ of the ten folding cross validation methods to SVM classifier includes: using grid search Method carries out value to parameter C and σ；Obtain the institute of value composition all in value and σ value interval all in the value interval of C There is group to be merged into capable search.

6. a kind of software defect forecasting system as described in claim 4 or 5, which is characterized in that wherein super according to optimal classification Planar function carries out software defect prediction and uses following methods: