CN109976998A - A kind of Software Defects Predict Methods, device and electronic equipment - Google Patents
A kind of Software Defects Predict Methods, device and electronic equipment Download PDFInfo
- Publication number
- CN109976998A CN109976998A CN201711462461.0A CN201711462461A CN109976998A CN 109976998 A CN109976998 A CN 109976998A CN 201711462461 A CN201711462461 A CN 201711462461A CN 109976998 A CN109976998 A CN 109976998A
- Authority
- CN
- China
- Prior art keywords
- software
- training
- prediction
- software defect
- defect
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of Software Defects Predict Methods, device and electronic equipment, the methods, comprising: obtains the feature vector of software to be predicted;And it is based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, determine the software defect prediction result of the software under testing, wherein the prediction model is obtained based on gradient boosting algorithm and the training of random forest learning machine.Using method provided by the invention, by obtaining the training pattern for carrying out software defect prediction using gradient boosting algorithm and the training of random forest learning machine, so that the software defect prediction result accuracy for the prediction model output that training obtains is higher, while larger impact will not be brought to computation complexity.
Description
Technical field
The present invention relates to soft project applied technical field more particularly to a kind of Software Defects Predict Methods, device and set
It is standby.
Background technique
Software defect Predicting Technique originate from the 1970s, this technology from its origin till now, always be
The very active content of field of software engineering, it plays very important work in terms of analysis software quality, balancing software cost
With.Software defect Predicting Technique, the contents such as development approach, complexity and personnel ability according to software, by being lacked to known
It is trapped into capable analysis, to predict defect potential in off-the-shelf item.
Existing Software Defects Predict Methods mostly measure the indices of software defect using single algorithm,
There is no comprehensive measurement and prediction is carried out according to every attribute of software, current existing algorithm mainly includes that SVM is supported
Vector machine, neural network, Bayes, Logistic recurrence etc., the computation complexity that these methods have is too high, some accuracy
It is not good enough, it is unable to get preferable prediction effect.
Therefore, how in the case where computation complexity is not obviously improved, the accuracy for improving prediction result is urgently
One of the technical issues of solution.
Summary of the invention
The embodiment of the present invention provides a kind of Software Defects Predict Methods, device and electronic equipment, to solve the prior art
The Software Defects Predict Methods computation complexity of middle use is higher and the lower problem of prediction result accuracy.
In a first aspect, the embodiment of the present invention provides a kind of Software Defects Predict Methods, comprising:
Obtain the feature vector of software to be predicted;And
Based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, really
The software defect prediction result of the fixed software under testing, wherein the prediction model is based on gradient boosting algorithm and random gloomy
The training of woods learning machine obtains.
Second aspect, the embodiment of the present invention provide a kind of software defect prediction meanss, comprising:
Acquiring unit, for obtaining the feature vector of software to be predicted;
Determination unit is used to be based on described eigenvector, and trains what is obtained to be used to carry out software defect prediction in advance
Prediction model, determine the software defect prediction result of the software under testing, wherein the prediction model be based on gradient promoted
What algorithm and the training of random forest learning machine obtained.
The third aspect, the embodiment of the present invention provide a kind of nonvolatile computer storage media, and being stored with computer can hold
Row instruction, the computer executable instructions are for executing Software Defects Predict Methods provided by the present application.
Fourth aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes, so that at least one described processor is able to carry out Software Defects Predict Methods provided by the present application.
The invention has the advantages that:
Software Defects Predict Methods, device and electronic equipment provided in an embodiment of the present invention, obtain the spy of software to be predicted
Levy vector;And it is based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, really
The software defect prediction result of the fixed software under testing, wherein the prediction model is based on gradient boosting algorithm and random gloomy
The training of woods learning machine obtains.Using method provided by the invention, by utilizing gradient boosting algorithm and random forest learning machine
Training obtains the training pattern for carrying out software defect prediction, so that the software defect for the prediction model output that training obtains is pre-
It is higher to survey result accuracy, while larger impact will not be brought to computation complexity.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation
Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 a is one of the flow diagram of Software Defects Predict Methods that the embodiment of the present invention one provides;
Fig. 1 b is the two of the flow diagram for the Software Defects Predict Methods that the embodiment of the present invention one provides;
Fig. 2 a utilizes gradient boosting algorithm and random forest learning machine to for carrying out for what the embodiment of the present invention one provided
The flow diagram that the prediction model of software defect prediction is trained;
Fig. 2 b is the flow diagram that repetitive exercise is trained to any decision tree that the embodiment of the present invention one provides;
Fig. 2 c is any training in the training pattern obtained for the training of each batch that the embodiment of the present invention one provides
Model determines the flow diagram of the prediction accuracy of the training pattern;
Fig. 3 is the structural schematic diagram of software defect prediction meanss provided by Embodiment 2 of the present invention;
Fig. 4 is the hardware configuration signal of the electronic equipment for the implementation Software Defects Predict Methods that the embodiment of the present invention four provides
Figure.
Specific embodiment
The embodiment of the present invention provides a kind of Software Defects Predict Methods, device and electronic equipment, to solve the prior art
The Software Defects Predict Methods computation complexity of middle use is higher and the lower problem of prediction result accuracy.
Below in conjunction with Figure of description, preferred embodiment of the present invention will be described, it should be understood that described herein
Preferred embodiment only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention, and in the absence of conflict, this hair
The feature in embodiment and embodiment in bright can be combined with each other.
Embodiment one
As shown in Figure 1a, the flow diagram of the Software Defects Predict Methods provided for the embodiment of the present invention one, can wrap
Include following steps:
S11, the feature vector for obtaining software to be predicted.
When it is implemented, usually being predicted the defect generated when currently writing software, therefore in software development process
It needs to obtain the feature vector of software to be predicted based on the historical data before the same day in development process.For example, prediction second day
Software defect prediction result when software development, feature when needing based on software development in first day are predicted.It such as can root
According to software development in first day as a result, determining the tester for participating in software development, developer, the use-case quantity used and needing
Seek quantity etc..
Preferably, obtain software to be predicted feature may include tester, developer, test time,
Use-case quantity and quantity required etc. constitute the feature vector of software to be predicted based on these features.It should be noted that utilizing
When one element of tester or developer's constitutive characteristic vector, need tester or developer by discrete variable
It is transformed into continuous variable.For developer, can according to the time that the developer works, come the work of our company when
Between, participate in working time and the factors such as educational background of our company's project, determine the developer couple during software defect prediction
The successive value answered.For example, score value conversion can be carried out to educational background first, shown in reference table 1, such as doctor's educational background, the score value of conversion
It is 8 points etc..
Table 1
Educational background | Score value |
Doctor | 8 |
Master | 6 |
Undergraduate course | 5 |
Training | 4 |
Specifically, determine developer software defect prediction conversion obtain successive value during, can unify will
The time parameter of the developer is converted into the numerical value as unit of year, time for such as working, in the work of our company
Between, participate in our company's project working time be converted into three numerical value as unit of year, be such as in the working time of our company
A year and a half, then numerical value is 1.5;It can certainly be with moon unit.So these three time parameters have been converted into numerical value, with
And the above-mentioned score value for obtaining educational background conversion, then in conjunction with being in advance the weight of this four parametric distributions, comprehensive operation is obtained
The successive value of the developer.
Similarly, the determination process of the successive value of tester refers to the determination process of the successive value of developer, herein not
It repeats again.
S12, it is based on described eigenvector, and the prediction mould for being used to carry out software defect prediction that training in advance obtains
Type determines the software defect prediction result of the software under testing.
Wherein, the prediction model is obtained based on gradient boosting algorithm and the training of random forest learning machine.
The embodiment of the present invention utilizes gradient boosting algorithm and the training of random forest learning machine for carrying out software defect prediction
Prediction model, can get a promotion in the speed of training prediction model first with the algorithm of random forest, can also be
The important feature for training prediction model is provided after training;Secondly, using gradient promoted algorithm can training with
When the decision tree of machine forest, obtained decision-tree model is corrected, the residual error of iteration is reduced step by step, finally subtracts in residual error
Optimal decision-tree model is obtained on small gradient direction.
After the feature vector for obtaining software to be predicted, described eigenvector is input to the prediction mould that training obtains in advance
In type, the result then exported is the software defect prediction result of the software to be predicted.
Preferably, the software defect prediction result can be defects count and defect said module.
When it is implemented, being participated in also to predict software defect prediction result when software development in second day based on first day
The developer and tester of software development and when software development in first day used use-case quantity and quantity required structure
At the feature vector of second day software defect prediction result of prediction, feature vector is then input to the prediction that training obtains in advance
In model, software defect prediction result when the output result of the prediction model is software development in second day, such as second day
The defects count that will appear when software development and each defect said module.Prediction result based on defects count, developer
The defect actually occurred can be compared with the defects count of prediction after second day end-of-development with tester, due to pre-
The result precision for surveying model output is higher, therefore be likely within second day occur exporting with prediction model in development process lacks
Sunken quantity is close, if the defects count actually obtained differs larger with the defects count of prediction, shows to develop for second day
Software some defects be not found, need developer or tester to reexamine or test.It is obtained based on prediction
Defect said module, when developer and tester have found error when verifying software, what meeting it is important to note that predict obtained
Defect said module so that the speed of discovery and verification defect greatly improves, while alleviating developer to a certain extent
With the workload of tester.
The defects of present invention said module can be understood as the identification information of the module comprising defect.
Preferably, in order to preferably mitigate the work load of staff and tester, the present invention is proposed shown in Fig. 1 b
Process, comprising the following steps:
S11a, the actual result for obtaining the software to be predicted.
Specifically, in practical applications, staff and tester can when verifying to the program currently developed
With record the same day develop software actual result, as a result in include actual defects quantity and actual defects said module.Herein
On the basis of, a personal-machine interactive interface can be set in the present invention, and user can import the practical knot of software based on the interactive interface
Fruit.
S12a, the software to be predicted, the defects count and reality that will include in obtained software defect prediction result be based on
The actual defects quantity for including in the prediction result of border is compared.
S13a, if it is determined that the defects count be greater than the actual defects quantity, then from the software defect predict tie
The all defect said module for including in fruit determines the mould inconsistent with the actual defects said module that includes in actual result
Block is simultaneously stored into list, shows user in the form of a list.
When it is implemented, step S12a can be executed after obtaining the actual result imported based on step S11a, i.e., it will be real
Border result is compared with software defect prediction result, however, it is determined that is gone out the failure prediction quantity and is greater than actual defects quantity.By
In the accuracy that failure prediction method provided by the invention obtains be relatively high, therefore it can be concluded that the software some defects not
Be found, thus can by software defect prediction result all defect said module and actual result in include defect institute
Belong to module to be compared, obtains defect said module inconsistent in software defect result and in the actual result, and will be true
The identification information for these defect said modules made is written in list, then shows staff and/or tester,
Staff and/or tester, which are based on the list, can quickly navigate to these modules, be then quickly found out in the module and deposit
Security risk, thus greatly reduce the workload of staff and tester, and improve search defect speed.
Specifically, it is illustrated for including 20 strip defects in software defect prediction result, if the reality in actual result
Border defects count is 10, then can find out from actual defects prediction result inconsistent with defect said module in actual result
Module, if the module number found be 10, it is potential that these modules are quickly positioned based on the defect said module found
Hidden danger, have certain good effect to the success rate of the exploitation of later period software.
It does not predict to tie in software defect it should be noted that if determining that the defects of actual result said module has plenty of
In fruit, then show that these modules are that prediction model is not previously predicted, then can by determining to influence the factors of these modules,
Then these factors are added in the sample for being used to train prediction model, or adjust the value of original parameter in sample, thus
Further increase the accuracy of the prediction model trained.
When it is implemented, using gradient boosting algorithm and random forest learning machine to for carrying out software defect prediction
When prediction model is trained, it can implement according to method shown in Fig. 2 a, comprising the following steps:
S21, software defect forecast sample collection is obtained, and is drawn according to the software defect forecast sample that preset ratio will acquire
It is divided into training set and verifying set.
When it is implemented, can be obtained according to the developed software of company when obtaining software defect forecast sample, such as
Based on the developed any software of company, the survey of the available developer, tester, the software for participating in the software development
The quantity required of the use-case quantity and the software that are used when trying the duration, developing the software, using these parameters as input
Variable, and also need or the defects count and defect said module of the generation when software development, and defects count is made
For end value, a sample of software defect prediction is constituted based on these parameters.Further according to the developed all softwares of company or
The software of person's preset quantity obtains software defect forecast sample collection.
Due in training prediction model, needing prediction model to be trained using training set, then utilize verifying
Gather the prediction model for obtaining training to verify, therefore after obtaining software defect forecast sample collection, needing will be above-mentioned soft
Part failure prediction sample set is divided, such as can take 75% sample composing training set to train prediction model, then
Verifying set is constituted using remaining 25% sample to verify the prediction model that training obtains.
Preferably, in order to improve the accuracy of prediction model, and the computation complexity of training process is reduced, it can also be according to
Following methods obtain software defect forecast sample, specifically include:
Based on the initial data that software known to software defect result records in the process of development, according to initial data with it is soft
The degree of correlation of part failure prediction screens the initial data, obtains software defect forecast sample.
When it is implemented, since the prediction model accuracy that sample size is likely to be obtained more greatly is higher, but computation complexity
Can be relatively high, therefore be accuracy that is especially high, and can guarantee prediction model prediction result to guarantee computation complexity not,
It, can be according to the degree of correlation of initial data and the software defect prediction of record, to original number when obtaining software defect forecast sample
According to being screened, wherein the degree of correlation of software defect prediction can be accounted for from three dimensions: development project is big
Small dimension, personnel ability's dimension and product submit dimension.What the development project size can be used with this software of exploitation
Use-case quantity is measured to measure, or with the quantity required of the software;The personnel ability can use developer or survey
Person works' ability and personnel state (academic state) and work experience are tried to measure;When the product submission can use completion
Between measure, the deadline is test time.
When being screened to initial data, it can be determined that first judge whether each developed in software includes three
Dimension, it is possible thereby to which the software for not including these three dimensions is rejected;From the software comprising these three dimensions into one
Step is screened, for example, it can be set to software requirement quantity is not less than the first quantity required and use-case quantity not less than the second number
Value, personnel's educational background is in undergraduate course or more and more than company work time year and a day and the deadline is within half a year etc.
Condition does further screening to software, it is hereby achieved that meeting the software of above-mentioned condition, and will record the original of these softwares
Data relevant to these three dimensions are as software defect forecast sample collection in data.It is possible thereby to guarantee sample data have compared with
The high property of can refer to can be improved the prediction got using the software defect forecast sample training after screening to a certain extent
The accuracy of model, further, so that more there is reference value using the result that the prediction model is predicted.
S22, it is based on the training set and gradient boosting algorithm, model training is carried out to random forest learning machine.
When it is implemented, when executing step S22, since random forest learning machine includes more decision trees, it is therefore desirable to
Each decision tree is respectively trained using the sample in training set.Based on this purpose, the embodiment of the present invention is to each decision tree
It is several batches in the training set by the software defect forecast sample random division for including, wherein each when being trained
The quantity for the software defect forecast sample for including in batch is identical;Then the software defect forecast sample for including using each batch
Decision-tree model is trained, is embodied, is performed both by following procedure for each batch for including in the training set:
Using the software defect forecast sample and gradient boosting algorithm for including in the batch, to any decision in random forest learning machine
The loop iteration training that tree carries out preset times obtains the training pattern that batch training obtains.
It specifically, may include process shown in Fig. 2 b, including following when being iterated trained to any decision tree
Step:
S221, for i-th training, determine that loss function that (i-1)-th time is determined is obtained in (i-1)-th training
The gradient value of training pattern.
When it is implemented, the model of i-th training, needs to be promoted using gradient in order to obtain when carrying out i-th training
Algorithm determines the gradient value for the training pattern that the loss function that (i-1)-th time is determined is obtained in (i-1)-th training, the ladder
Angle value is divided into positive and negative;Gradient value is greater than 0, demonstrates the need for correcting (i-1)-th obtained training pattern to negative direction;If institute
Gradient value is stated less than 0, then demonstrates the need for correcting (i-1)-th obtained training pattern to positive direction.Ideally, gradient
Value should be close to 0 numerical value, but the gradient value being calculated under normal circumstances is obviously not zero, and therefore, the present invention needs
By utilizing gradient boosting algorithm, the gradient value for the model that final training is obtained is as far as possible close to 0, so that finally obtained
Model is more accurate.
S222, the leaf node region that the decision tree includes is determined according to the gradient value.
Specifically, it can use the method for linear search to determine leaf node region that the decision tree includes.
S223, preset condition is met according to the loss function value that the leaf node region determines that (i-1)-th time is determined
When each leaf node gain.
Preferably, determining that the loss function value that (i-1)-th time is determined meets default item according to the leaf node region
The gain of each leaf node when part, specifically includes:
Corresponding each leaf when the loss function minimalization determined for (i-1)-th time is determined according to the leaf node region
The gain of child node.
S224, determine what i-th training obtained using the gain of the leaf node region and each leaf node
Training pattern.
S225, judge whether current iteration number reaches preset times, then follow the steps S226 if not;If it is it holds
Row step S227.
S226, current iteration number i is added 1, obtains new i, i.e. i=i+1;Then step S231 is executed.
S227, process terminate.
Preferably, the sample characteristics of each sample in the training set include at least one of the following: defects count, lack
Fall into said module, tester's information, developer's information, test time, use-case quantity and quantity required.
It should be noted that the quantity required that the embodiment of the present invention one is related to can be understood as the demand to software in the market
Quantity.
S23, using the verifying set to be completed trained random forest learning machine carry out model verify to obtain it is described
For carrying out the prediction model of software defect prediction.
When it is implemented, being specifically included when executing step S23:
Any training pattern in training pattern obtained for the training of each batch, is performed both by operation shown in Fig. 2 c:
S231, using all software defect forecast samples in verifying set, which is verified, respectively
To the corresponding verification result of each software defect forecast sample.
When it is implemented, being trained to obtain each determine using each decision tree of the gradient boosting algorithm to random forest
When the final training pattern of plan tree, the training mould that utilizes the software defect forecast sample in verifying set final to each decision tree
Type is verified, and is verified result.For example, be based on a certain sample A, to the finally obtained training pattern of any decision tree into
When row verifying, obtained verification result is the corresponding defects count predicted value of sample A.
S232, according to the actual result and verification result of each software defect forecast sample, it is true using mean square error function
The prediction accuracy of the fixed training pattern.
Preferably, the verification result can be the defects count predicted value of software defect forecast sample;The practical knot
Fruit can be the actual defects quantity of software defect sample.
When it is implemented, after the verification result for determining each software defect forecast sample, due to each software defect
The actual defects quantity of forecast sample be it is known that therefore can use the actual defects quantity of each software defect forecast sample with
Obtained defects count predicted value is verified, reason mean square error function determines the prediction accuracy of the training pattern.
Similarly, the corresponding prediction accuracy of each training pattern can be determined according to the method for step S231 and S232.
After determining prediction accuracy that each training pattern obtains, determine eventually for carrying out software defect prediction
When training pattern, can the obtained prediction accuracy of more each training pattern, determine the maximum training pattern of prediction accuracy
To be described for carrying out the prediction model of software defect prediction.Therefore deduce that the relatively high software of prediction result accuracy lacks
Fall into prediction model.
Software Defects Predict Methods provided in an embodiment of the present invention obtain the feature vector of software to be predicted;And based on institute
Feature vector is stated, and the prediction model for being used to carry out software defect prediction that training in advance obtains, determines the software under testing
Software defect prediction result, wherein the prediction model is trained based on gradient boosting algorithm and random forest learning machine
It arrives.Using method provided by the invention, by using gradient boosting algorithm and the training of random forest learning machine obtain for into
Row software defect prediction training pattern so that training obtain prediction model output software defect prediction result accuracy compared with
Height, while larger impact will not be brought to computation complexity.
Embodiment two
Based on the same inventive concept, a kind of software defect prediction meanss are additionally provided in the embodiment of the present invention, due to above-mentioned
The principle that device solves the problems, such as is similar to Software Defects Predict Methods, therefore the implementation of above-mentioned apparatus may refer to the reality of method
It applies, overlaps will not be repeated.
As shown in figure 3, being the structural schematic diagram of software defect prediction meanss provided by Embodiment 2 of the present invention, including obtain
Unit 31 and determination unit 32, in which:
Acquiring unit 31, for obtaining the feature vector of software to be predicted;
Determination unit 32, for being based on described eigenvector, and training in advance obtain for carry out software defect pre-
The prediction model of survey determines the software defect prediction result of the software under testing, wherein the prediction model is mentioned based on gradient
What liter algorithm and the training of random forest learning machine obtained.
Preferably, the determination unit 32, described pre- for carrying out software defect specifically for obtaining by the following method
The prediction model of survey: software defect forecast sample collection, and the software defect forecast sample that will acquire according to preset ratio are obtained
It is divided into training set and verifying set;And based on the training set and gradient boosting algorithm, Random Forest model is carried out
Learning machine training, and verify to obtain institute to trained random forest learning machine progress model is completed using the verifying set
State the prediction model for carrying out software defect prediction.
Further, the determination unit 32, specifically for the pre- test sample of software defect that will include in the training set
This random division is several batches, wherein the quantity for the software defect forecast sample for including in each batch is identical;For described
The each batch for including in training set is performed both by following procedure: utilizing the software defect forecast sample and ladder for including in the batch
Boosting algorithm is spent, the loop iteration training for carrying out preset times to any decision tree in random forest learning machine obtains the batch
The training pattern that training obtains.
Specifically, the determination unit is specifically used for determining (i-1)-th loss function determined for i-th training
In the gradient value for the training pattern that (i-1)-th training obtains;And the leaf that the decision tree includes is determined according to the gradient value
Child node region;And preset condition is met according to the loss function value that the leaf node region determines that (i-1)-th time is determined
When each leaf node gain;And i-th is determined using the gain in the leaf node region and each leaf node
The training pattern that training obtains;Wherein, i is the integer between 1 and preset times.
Preferably, the determination unit, specifically for determining (i-1)-th damage determined according to the leaf node region
Lose the gain of corresponding each leaf node when function minimalization.
Further, the determination unit 32, specifically for training appointing in obtained training pattern for each batch
One training pattern, is performed both by following operation: using all software defect forecast samples in verifying set, to the training pattern into
Row verifying, respectively obtains the corresponding verification result of each software defect forecast sample;According to each software defect forecast sample
Actual result and verification result determine the prediction accuracy of the training pattern using mean square error function;And more each instruction
Practice the prediction accuracy that model obtains, determines that the maximum training pattern of prediction accuracy is described for carrying out software defect prediction
Prediction model.
Preferably, the sample characteristics of each sample in the training set include at least one of the following: defects count, lack
Fall into said module, tester's information, developer's information, test time, use-case quantity and quantity required.
Preferably, the determination unit 32, is specifically used for based on software known to software defect result in the process of development
The initial data of record is screened the initial data, is obtained according to the degree of correlation of initial data and software defect prediction
Software defect forecast sample.
Preferably, the software defect prediction result includes defects count and defect said module;And described device, also
Include:
Processing unit, for obtaining the actual result of the software to be predicted;Based on the software to be predicted, by what is obtained
The defects count for including in software defect prediction result and the actual defects quantity for including in actual prediction result are compared;If
Determine that the defects count is greater than the actual defects quantity, then include from the software defect prediction result is all scarce
Said module is fallen into, determine the module inconsistent with the actual defects said module that includes in actual result and is stored to list
In, user is showed in the form of a list.
For convenience of description, above each section is divided by function describes respectively for each module (or unit).Certainly, exist
Implement to realize the function of each module (or unit) in same or multiple softwares or hardware when the present invention.
Embodiment three
The embodiment of the present application three provides a kind of nonvolatile computer storage media, the computer storage medium storage
There are computer executable instructions, which can be performed the prediction of the software defect in above-mentioned any means embodiment
Method.
Example IV
Fig. 4 is the hardware configuration signal of the electronic equipment for the implementation Software Defects Predict Methods that the embodiment of the present invention four provides
Figure, as shown in figure 4, the electronic equipment includes:
One or more processors 410 and memory 420, in Fig. 4 by taking a processor 410 as an example.
The electronic equipment for executing Software Defects Predict Methods can also include: input unit 430 and output device 440.
Processor 410, memory 420, input unit 430 and output device 440 can pass through bus or other modes
It connects, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey
Sequence, non-volatile computer executable program and module, as the Software Defects Predict Methods in the embodiment of the present application are corresponding
Program instruction/module/unit (for example, attached acquiring unit shown in Fig. 3 31 and determination unit 32).Processor 410 passes through operation
Non-volatile software program, instruction and the module/unit being stored in memory 420, thereby executing server or intelligence
The various function application and data processing of terminal, i.e. realization above method embodiment Software Defects Predict Methods.
Memory 420 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;Storage data area, which can be stored, uses institute according to software defect prediction meanss
The data etc. of creation.In addition, memory 420 may include high-speed random access memory, it can also include non-volatile memories
Device, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments
In, optional memory 420 includes the memory remotely located relative to processor 410, these remote memories can pass through net
Network is connected to software defect prediction meanss.The example of above-mentioned network include but is not limited to internet, intranet, local area network,
Mobile radio communication and combinations thereof.
Input unit 430 can receive the number or character information of input, and generate the use with software defect prediction meanss
Family setting and the related key signals input of function control.Output device 440 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 420, when by one or more of processors
When 410 execution, the Software Defects Predict Methods in above-mentioned any means embodiment are executed.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
Embodiment five
The embodiment of the present application five provides a kind of computer program product, wherein the computer program product includes depositing
The computer program in non-transient computer readable storage medium is stored up, the computer program includes program instruction, wherein when
When described program instruction is computer-executed, so that the computer is executed any one of the application above method embodiment software and lack
Fall into prediction technique.
Software defect prediction meanss provided by embodiments herein can be realized by a computer program.Art technology
Personnel are it should be appreciated that above-mentioned module division mode is only one of numerous module division modes, if being divided into it
His module or non-division module all should be in the protection scopes of the application as long as software defect prediction meanss have above-mentioned function
Within.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (20)
1. a kind of Software Defects Predict Methods characterized by comprising
Obtain the feature vector of software to be predicted;And
Based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, determine institute
State the software defect prediction result of software under testing, wherein the prediction model is based on gradient boosting algorithm and random forest
The training of habit machine obtains.
2. the method as described in claim 1, which is characterized in that obtain by the following method described pre- for carrying out software defect
The prediction model of survey:
Software defect forecast sample collection is obtained, and the software defect forecast sample that will acquire according to preset ratio is divided into training
Set and verifying set;And
Based on the training set and gradient boosting algorithm, model training is carried out to random forest learning machine, and described in utilization
Verifying set to be completed trained random forest learning machine carry out model verify to obtain it is described for carrying out software defect prediction
Prediction model.
3. method according to claim 2, which is characterized in that based on the training set and gradient boosting algorithm, to random
Forest learning machine carries out model training, specifically includes:
It is several batches in the training set by the software defect forecast sample random division for including, wherein packet in each batch
The quantity of the software defect forecast sample contained is identical;
Following procedure is performed both by for each batch for including in the training set: utilizing the software defect for including in the batch
Forecast sample and gradient boosting algorithm, the loop iteration for carrying out preset times to any decision tree in random forest learning machine are instructed
Get the training pattern that batch training obtains.
4. method as claimed in claim 3, which is characterized in that utilize the software defect forecast sample and ladder for including in the batch
Boosting algorithm is spent, the loop iteration training for carrying out preset times to any decision tree in random forest learning machine obtains the batch
The training pattern that training obtains, specifically includes:
For i-th training, the training pattern that the loss function that (i-1)-th time is determined is obtained in (i-1)-th training is determined
Gradient value;And
The leaf node region that the decision tree includes is determined according to the gradient value;And
Each leaf when determining that loss function value that (i-1)-th time is determined meets preset condition according to the leaf node region
The gain of node;And
The training pattern that i-th training obtains is determined using the gain of the leaf node region and each leaf node;
Wherein, i is the integer between 1 and preset times.
5. method as claimed in claim 4 is characterized in that, is determined according to the leaf node region and to be determined for (i-1)-th time
Loss function value meets the gain of each leaf node when preset condition, specifically includes:
Corresponding each leaf section when the loss function minimalization determined for (i-1)-th time is determined according to the leaf node region
The gain of point.
6. method as claimed in claim 5, which is characterized in that using verifying set to trained random forest is completed
Learning machine carries out model and verifies to obtain the prediction model for being used to carry out software defect prediction, specifically includes:
Any training pattern in training pattern obtained for the training of each batch, is performed both by following operation:
Using all software defect forecast samples in verifying set, which is verified, is respectively obtained each soft
The corresponding verification result of part failure prediction sample;
According to the actual result and verification result of each software defect forecast sample, the training mould is determined using mean square error function
The prediction accuracy of type;And
The prediction accuracy that more each training pattern obtains, determine the maximum training pattern of prediction accuracy be it is described for into
The prediction model of row software defect prediction.
7. method as claimed in claim 3, which is characterized in that the sample characteristics of each sample in the training set include
At least one of below: defects count, defect said module, tester's information, developer's information, test time, use
Number of cases amount and quantity required.
8. method according to claim 2, which is characterized in that obtain software defect forecast sample, specifically include:
Based on the initial data that software known to software defect result records in the process of development, lacked according to initial data and software
The degree of correlation for falling into prediction, screens the initial data, obtains software defect forecast sample.
9. the method as described in claim 1~8 any claim, which is characterized in that the software defect prediction result packet
Include defects count and defect said module;And the method, further includes:
Obtain the actual result of the software to be predicted;
Based on the software to be predicted, by the defects count for including in obtained software defect prediction result and actual prediction result
In include actual defects quantity be compared;
If it is determined that the defects count is greater than the actual defects quantity, then include from the software defect prediction result
All defect said module, determines the module inconsistent with the actual defects said module that includes in actual result and storage is arrived
In list, user is showed in the form of a list.
10. a kind of software defect prediction meanss characterized by comprising
Acquiring unit, for obtaining the feature vector of software to be predicted;
Determination unit is used to be based on described eigenvector, and trains what is obtained to be used to carry out the pre- of software defect prediction in advance
Model is surveyed, determines the software defect prediction result of the software under testing, wherein the prediction model is based on gradient boosting algorithm
It is obtained with the training of random forest learning machine.
11. device as claimed in claim 10, which is characterized in that
The determination unit, described for carrying out the prediction model of software defect prediction specifically for obtaining by the following method:
Software defect forecast sample collection is obtained, and the software defect forecast sample that will acquire according to preset ratio is divided into trained set
Gather with verifying;And based on the training set and gradient boosting algorithm, model training is carried out to random forest learning machine, and
Using the verifying set to be completed trained random forest learning machine carry out model verify to obtain it is described for carrying out software
The prediction model of failure prediction.
12. device as claimed in claim 11, which is characterized in that
The determination unit is several in the training set specifically for the software defect forecast sample random division that will include
Batch, wherein the quantity for the software defect forecast sample for including in each batch is identical;For it is described training set in include
Each batch is performed both by following procedure: using the software defect forecast sample and gradient boosting algorithm for including in the batch, to
The loop iteration training that any decision tree in machine forest learning machine carries out preset times obtains the training that batch training obtains
Model.
13. device as claimed in claim 12, which is characterized in that
The determination unit is specifically used for determining loss function that (i-1)-th time is determined in the i-th for i-th training
The gradient value for the training pattern that training obtains;And the leaf node region that the decision tree includes is determined according to the gradient value;And
Each leaf node when determining that loss function value that (i-1)-th time is determined meets preset condition according to the leaf node region
Gain;And the training that i-th training obtains is determined using the gain of the leaf node region and each leaf node
Model;Wherein, i is the integer between 1 and preset times.
14. device as claimed in claim 13, which is characterized in that
The determination unit takes pole specifically for the loss function for determining that (i-1)-th time is determined according to the leaf node region
The gain of corresponding each leaf node when small value.
15. device as claimed in claim 14, which is characterized in that
The determination unit is held specifically for training any training pattern in obtained training pattern for each batch
The following operation of row: using all software defect forecast samples in verifying set, which is verified, is respectively obtained
The corresponding verification result of each software defect forecast sample;According to the actual result of each software defect forecast sample and verifying knot
Fruit determines the prediction accuracy of the training pattern using mean square error function;And the prediction that more each training pattern obtains
Accuracy determines that the maximum training pattern of prediction accuracy is described for carrying out the prediction model of software defect prediction.
16. device as claimed in claim 14, which is characterized in that the sample characteristics packet of each sample in the training set
Include at least one of following: defects count, defect said module, tester's information, developer's information, test time,
Use-case quantity and quantity required.
17. device as claimed in claim 11, which is characterized in that
The determination unit, specifically for the original number recorded in the process of development based on software known to software defect result
According to, according to initial data and software defect prediction the degree of correlation, the initial data is screened, obtain software defect prediction
Sample.
18. the device as described in claim 10~17 any claim, which is characterized in that the software defect prediction result
Including defects count and defect said module;And described device, further includes:
Processing unit, for obtaining the actual result of the software to be predicted;Based on the software to be predicted, the software that will be obtained
The defects count for including in failure prediction result and the actual defects quantity for including in actual prediction result are compared;If it is determined that
The defects count is greater than the actual defects quantity, then all defect institute for including from the software defect prediction result out
Belong to module, determine the module inconsistent with the actual defects said module that includes in actual result and stores into list, with
The form of list shows user.
19. a kind of nonvolatile computer storage media, is stored with computer executable instructions, which is characterized in that the calculating
Machine executable instruction is used to execute the method as described in claim 1 to 9 any claim.
20. a kind of electronic equipment characterized by comprising
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
It manages device to execute, so that at least one described processor is able to carry out the method as described in claim 1 to 9 any claim.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711462461.0A CN109976998B (en) | 2017-12-28 | 2017-12-28 | Software defect prediction method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711462461.0A CN109976998B (en) | 2017-12-28 | 2017-12-28 | Software defect prediction method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109976998A true CN109976998A (en) | 2019-07-05 |
CN109976998B CN109976998B (en) | 2022-06-07 |
Family
ID=67074950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711462461.0A Active CN109976998B (en) | 2017-12-28 | 2017-12-28 | Software defect prediction method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109976998B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110732138A (en) * | 2019-10-17 | 2020-01-31 | 腾讯科技(深圳)有限公司 | Virtual object control method and device, readable storage medium and computer equipment |
CN111090585A (en) * | 2019-12-09 | 2020-05-01 | 中国科学院软件研究所 | Crowd-sourcing task closing time automatic prediction method based on crowd-sourcing process |
CN111143222A (en) * | 2019-12-30 | 2020-05-12 | 军事科学院系统工程研究院系统总体研究所 | Software evaluation method based on defect prediction |
CN112416783A (en) * | 2020-11-25 | 2021-02-26 | 武汉联影医疗科技有限公司 | Method, device, equipment and storage medium for determining software quality influence factors |
CN112711530A (en) * | 2020-12-28 | 2021-04-27 | 航天信息股份有限公司 | Code risk prediction method and system based on machine learning |
CN112882934A (en) * | 2021-02-24 | 2021-06-01 | 中国工商银行股份有限公司 | Test analysis method and system based on defect growth |
WO2021159585A1 (en) * | 2020-02-10 | 2021-08-19 | 北京工业大学 | Dioxin emission concentration prediction method |
CN117475240A (en) * | 2023-12-26 | 2024-01-30 | 创思(广州)电子科技有限公司 | Vegetable checking method and system based on image recognition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103257921A (en) * | 2013-04-16 | 2013-08-21 | 西安电子科技大学 | Improved random forest algorithm based system and method for software fault prediction |
US20130268469A1 (en) * | 2012-04-06 | 2013-10-10 | Applied Materials, Inc. | Increasing signal to noise ratio for creation of generalized and robust prediction models |
CN105608004A (en) * | 2015-12-17 | 2016-05-25 | 云南大学 | CS-ANN-based software failure prediction method |
-
2017
- 2017-12-28 CN CN201711462461.0A patent/CN109976998B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130268469A1 (en) * | 2012-04-06 | 2013-10-10 | Applied Materials, Inc. | Increasing signal to noise ratio for creation of generalized and robust prediction models |
CN103257921A (en) * | 2013-04-16 | 2013-08-21 | 西安电子科技大学 | Improved random forest algorithm based system and method for software fault prediction |
CN105608004A (en) * | 2015-12-17 | 2016-05-25 | 云南大学 | CS-ANN-based software failure prediction method |
Non-Patent Citations (2)
Title |
---|
ISSAM H.LARADJI 等: "Software defect prediction using ensemble learning on selected features", 《INFORMATION AND SOFTWARE TECHNOLOGY》 * |
李巧: "模型融合算法的研究及应用", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110732138A (en) * | 2019-10-17 | 2020-01-31 | 腾讯科技(深圳)有限公司 | Virtual object control method and device, readable storage medium and computer equipment |
CN110732138B (en) * | 2019-10-17 | 2023-09-22 | 腾讯科技(深圳)有限公司 | Virtual object control method, device, readable storage medium and computer equipment |
CN111090585A (en) * | 2019-12-09 | 2020-05-01 | 中国科学院软件研究所 | Crowd-sourcing task closing time automatic prediction method based on crowd-sourcing process |
CN111090585B (en) * | 2019-12-09 | 2021-06-01 | 中国科学院软件研究所 | Crowd-sourcing task closing time automatic prediction method based on crowd-sourcing process |
CN111143222A (en) * | 2019-12-30 | 2020-05-12 | 军事科学院系统工程研究院系统总体研究所 | Software evaluation method based on defect prediction |
WO2021159585A1 (en) * | 2020-02-10 | 2021-08-19 | 北京工业大学 | Dioxin emission concentration prediction method |
CN112416783A (en) * | 2020-11-25 | 2021-02-26 | 武汉联影医疗科技有限公司 | Method, device, equipment and storage medium for determining software quality influence factors |
CN112416783B (en) * | 2020-11-25 | 2022-05-20 | 武汉联影医疗科技有限公司 | Method, device, equipment and storage medium for determining software quality influence factors |
CN112711530A (en) * | 2020-12-28 | 2021-04-27 | 航天信息股份有限公司 | Code risk prediction method and system based on machine learning |
CN112882934A (en) * | 2021-02-24 | 2021-06-01 | 中国工商银行股份有限公司 | Test analysis method and system based on defect growth |
CN112882934B (en) * | 2021-02-24 | 2024-02-13 | 中国工商银行股份有限公司 | Test analysis method and system based on defect growth |
CN117475240A (en) * | 2023-12-26 | 2024-01-30 | 创思(广州)电子科技有限公司 | Vegetable checking method and system based on image recognition |
Also Published As
Publication number | Publication date |
---|---|
CN109976998B (en) | 2022-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109976998A (en) | A kind of Software Defects Predict Methods, device and electronic equipment | |
CN108021931A (en) | A kind of data sample label processing method and device | |
CN105912500A (en) | Machine learning model generation method and machine learning model generation device | |
EP3731159A1 (en) | Adaptive multiyear economic planning method for energy systems, microgrid and distributed energy resources | |
CN110472798A (en) | Prediction technique, device and the computer readable storage medium of time series data | |
CN108206027A (en) | A kind of audio quality evaluation method and system | |
CN109783704A (en) | Man-machine mixed answer method, system, device | |
CN106462923A (en) | Predicting social, economic, and learning outcomes | |
CN109697636A (en) | A kind of trade company's recommended method, trade company's recommendation apparatus, electronic equipment and medium | |
CN113361690A (en) | Water quality prediction model training method, water quality prediction device, water quality prediction equipment and medium | |
CN109460503A (en) | Answer input method, device, storage medium and electronic equipment | |
CN110517558A (en) | A kind of piano playing fingering evaluation method and system, storage medium and terminal | |
CN108932646A (en) | User tag verification method, device and electronic equipment based on operator | |
CN110164474A (en) | Voice wakes up automated testing method and system | |
CN113554213A (en) | Natural gas demand prediction method, system, storage medium and equipment | |
CN113408957A (en) | Classroom teaching evaluation method based on combined empowerment method | |
CN109473121A (en) | Speech synthesis quality detecting method and device | |
CN107392217A (en) | Computer implemented information processing method and device | |
CN117493830A (en) | Evaluation of training data quality, and generation method, device and equipment of evaluation model | |
CN110134754A (en) | Operation time prediction technique, device, server and the medium of region point of interest | |
CN111695967A (en) | Method, device, equipment and storage medium for determining quotation | |
CN114968821A (en) | Test data generation method and device based on reinforcement learning | |
CN109670568A (en) | Neural net prediction method and device | |
KR101979427B1 (en) | Apparatus for Education and Assessment of Debt Management Competency and Method Thereof | |
Kundapur et al. | Simulating knowledge worker adoption rate of KMS: An organizational perspective |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |