CN109976998A - A kind of Software Defects Predict Methods, device and electronic equipment - Google Patents

A kind of Software Defects Predict Methods, device and electronic equipment Download PDF

Info

Publication number
CN109976998A
CN109976998A CN201711462461.0A CN201711462461A CN109976998A CN 109976998 A CN109976998 A CN 109976998A CN 201711462461 A CN201711462461 A CN 201711462461A CN 109976998 A CN109976998 A CN 109976998A
Authority
CN
China
Prior art keywords
software
training
prediction
software defect
defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711462461.0A
Other languages
Chinese (zh)
Other versions
CN109976998B (en
Inventor
吴旭
曹晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201711462461.0A priority Critical patent/CN109976998B/en
Publication of CN109976998A publication Critical patent/CN109976998A/en
Application granted granted Critical
Publication of CN109976998B publication Critical patent/CN109976998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of Software Defects Predict Methods, device and electronic equipment, the methods, comprising: obtains the feature vector of software to be predicted;And it is based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, determine the software defect prediction result of the software under testing, wherein the prediction model is obtained based on gradient boosting algorithm and the training of random forest learning machine.Using method provided by the invention, by obtaining the training pattern for carrying out software defect prediction using gradient boosting algorithm and the training of random forest learning machine, so that the software defect prediction result accuracy for the prediction model output that training obtains is higher, while larger impact will not be brought to computation complexity.

Description

A kind of Software Defects Predict Methods, device and electronic equipment
Technical field
The present invention relates to soft project applied technical field more particularly to a kind of Software Defects Predict Methods, device and set It is standby.
Background technique
Software defect Predicting Technique originate from the 1970s, this technology from its origin till now, always be The very active content of field of software engineering, it plays very important work in terms of analysis software quality, balancing software cost With.Software defect Predicting Technique, the contents such as development approach, complexity and personnel ability according to software, by being lacked to known It is trapped into capable analysis, to predict defect potential in off-the-shelf item.
Existing Software Defects Predict Methods mostly measure the indices of software defect using single algorithm, There is no comprehensive measurement and prediction is carried out according to every attribute of software, current existing algorithm mainly includes that SVM is supported Vector machine, neural network, Bayes, Logistic recurrence etc., the computation complexity that these methods have is too high, some accuracy It is not good enough, it is unable to get preferable prediction effect.
Therefore, how in the case where computation complexity is not obviously improved, the accuracy for improving prediction result is urgently One of the technical issues of solution.
Summary of the invention
The embodiment of the present invention provides a kind of Software Defects Predict Methods, device and electronic equipment, to solve the prior art The Software Defects Predict Methods computation complexity of middle use is higher and the lower problem of prediction result accuracy.
In a first aspect, the embodiment of the present invention provides a kind of Software Defects Predict Methods, comprising:
Obtain the feature vector of software to be predicted;And
Based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, really The software defect prediction result of the fixed software under testing, wherein the prediction model is based on gradient boosting algorithm and random gloomy The training of woods learning machine obtains.
Second aspect, the embodiment of the present invention provide a kind of software defect prediction meanss, comprising:
Acquiring unit, for obtaining the feature vector of software to be predicted;
Determination unit is used to be based on described eigenvector, and trains what is obtained to be used to carry out software defect prediction in advance Prediction model, determine the software defect prediction result of the software under testing, wherein the prediction model be based on gradient promoted What algorithm and the training of random forest learning machine obtained.
The third aspect, the embodiment of the present invention provide a kind of nonvolatile computer storage media, and being stored with computer can hold Row instruction, the computer executable instructions are for executing Software Defects Predict Methods provided by the present application.
Fourth aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out Software Defects Predict Methods provided by the present application.
The invention has the advantages that:
Software Defects Predict Methods, device and electronic equipment provided in an embodiment of the present invention, obtain the spy of software to be predicted Levy vector;And it is based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, really The software defect prediction result of the fixed software under testing, wherein the prediction model is based on gradient boosting algorithm and random gloomy The training of woods learning machine obtains.Using method provided by the invention, by utilizing gradient boosting algorithm and random forest learning machine Training obtains the training pattern for carrying out software defect prediction, so that the software defect for the prediction model output that training obtains is pre- It is higher to survey result accuracy, while larger impact will not be brought to computation complexity.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 a is one of the flow diagram of Software Defects Predict Methods that the embodiment of the present invention one provides;
Fig. 1 b is the two of the flow diagram for the Software Defects Predict Methods that the embodiment of the present invention one provides;
Fig. 2 a utilizes gradient boosting algorithm and random forest learning machine to for carrying out for what the embodiment of the present invention one provided The flow diagram that the prediction model of software defect prediction is trained;
Fig. 2 b is the flow diagram that repetitive exercise is trained to any decision tree that the embodiment of the present invention one provides;
Fig. 2 c is any training in the training pattern obtained for the training of each batch that the embodiment of the present invention one provides Model determines the flow diagram of the prediction accuracy of the training pattern;
Fig. 3 is the structural schematic diagram of software defect prediction meanss provided by Embodiment 2 of the present invention;
Fig. 4 is the hardware configuration signal of the electronic equipment for the implementation Software Defects Predict Methods that the embodiment of the present invention four provides Figure.
Specific embodiment
The embodiment of the present invention provides a kind of Software Defects Predict Methods, device and electronic equipment, to solve the prior art The Software Defects Predict Methods computation complexity of middle use is higher and the lower problem of prediction result accuracy.
Below in conjunction with Figure of description, preferred embodiment of the present invention will be described, it should be understood that described herein Preferred embodiment only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention, and in the absence of conflict, this hair The feature in embodiment and embodiment in bright can be combined with each other.
Embodiment one
As shown in Figure 1a, the flow diagram of the Software Defects Predict Methods provided for the embodiment of the present invention one, can wrap Include following steps:
S11, the feature vector for obtaining software to be predicted.
When it is implemented, usually being predicted the defect generated when currently writing software, therefore in software development process It needs to obtain the feature vector of software to be predicted based on the historical data before the same day in development process.For example, prediction second day Software defect prediction result when software development, feature when needing based on software development in first day are predicted.It such as can root According to software development in first day as a result, determining the tester for participating in software development, developer, the use-case quantity used and needing Seek quantity etc..
Preferably, obtain software to be predicted feature may include tester, developer, test time, Use-case quantity and quantity required etc. constitute the feature vector of software to be predicted based on these features.It should be noted that utilizing When one element of tester or developer's constitutive characteristic vector, need tester or developer by discrete variable It is transformed into continuous variable.For developer, can according to the time that the developer works, come the work of our company when Between, participate in working time and the factors such as educational background of our company's project, determine the developer couple during software defect prediction The successive value answered.For example, score value conversion can be carried out to educational background first, shown in reference table 1, such as doctor's educational background, the score value of conversion It is 8 points etc..
Table 1
Educational background Score value
Doctor 8
Master 6
Undergraduate course 5
Training 4
Specifically, determine developer software defect prediction conversion obtain successive value during, can unify will The time parameter of the developer is converted into the numerical value as unit of year, time for such as working, in the work of our company Between, participate in our company's project working time be converted into three numerical value as unit of year, be such as in the working time of our company A year and a half, then numerical value is 1.5;It can certainly be with moon unit.So these three time parameters have been converted into numerical value, with And the above-mentioned score value for obtaining educational background conversion, then in conjunction with being in advance the weight of this four parametric distributions, comprehensive operation is obtained The successive value of the developer.
Similarly, the determination process of the successive value of tester refers to the determination process of the successive value of developer, herein not It repeats again.
S12, it is based on described eigenvector, and the prediction mould for being used to carry out software defect prediction that training in advance obtains Type determines the software defect prediction result of the software under testing.
Wherein, the prediction model is obtained based on gradient boosting algorithm and the training of random forest learning machine.
The embodiment of the present invention utilizes gradient boosting algorithm and the training of random forest learning machine for carrying out software defect prediction Prediction model, can get a promotion in the speed of training prediction model first with the algorithm of random forest, can also be The important feature for training prediction model is provided after training;Secondly, using gradient promoted algorithm can training with When the decision tree of machine forest, obtained decision-tree model is corrected, the residual error of iteration is reduced step by step, finally subtracts in residual error Optimal decision-tree model is obtained on small gradient direction.
After the feature vector for obtaining software to be predicted, described eigenvector is input to the prediction mould that training obtains in advance In type, the result then exported is the software defect prediction result of the software to be predicted.
Preferably, the software defect prediction result can be defects count and defect said module.
When it is implemented, being participated in also to predict software defect prediction result when software development in second day based on first day The developer and tester of software development and when software development in first day used use-case quantity and quantity required structure At the feature vector of second day software defect prediction result of prediction, feature vector is then input to the prediction that training obtains in advance In model, software defect prediction result when the output result of the prediction model is software development in second day, such as second day The defects count that will appear when software development and each defect said module.Prediction result based on defects count, developer The defect actually occurred can be compared with the defects count of prediction after second day end-of-development with tester, due to pre- The result precision for surveying model output is higher, therefore be likely within second day occur exporting with prediction model in development process lacks Sunken quantity is close, if the defects count actually obtained differs larger with the defects count of prediction, shows to develop for second day Software some defects be not found, need developer or tester to reexamine or test.It is obtained based on prediction Defect said module, when developer and tester have found error when verifying software, what meeting it is important to note that predict obtained Defect said module so that the speed of discovery and verification defect greatly improves, while alleviating developer to a certain extent With the workload of tester.
The defects of present invention said module can be understood as the identification information of the module comprising defect.
Preferably, in order to preferably mitigate the work load of staff and tester, the present invention is proposed shown in Fig. 1 b Process, comprising the following steps:
S11a, the actual result for obtaining the software to be predicted.
Specifically, in practical applications, staff and tester can when verifying to the program currently developed With record the same day develop software actual result, as a result in include actual defects quantity and actual defects said module.Herein On the basis of, a personal-machine interactive interface can be set in the present invention, and user can import the practical knot of software based on the interactive interface Fruit.
S12a, the software to be predicted, the defects count and reality that will include in obtained software defect prediction result be based on The actual defects quantity for including in the prediction result of border is compared.
S13a, if it is determined that the defects count be greater than the actual defects quantity, then from the software defect predict tie The all defect said module for including in fruit determines the mould inconsistent with the actual defects said module that includes in actual result Block is simultaneously stored into list, shows user in the form of a list.
When it is implemented, step S12a can be executed after obtaining the actual result imported based on step S11a, i.e., it will be real Border result is compared with software defect prediction result, however, it is determined that is gone out the failure prediction quantity and is greater than actual defects quantity.By In the accuracy that failure prediction method provided by the invention obtains be relatively high, therefore it can be concluded that the software some defects not Be found, thus can by software defect prediction result all defect said module and actual result in include defect institute Belong to module to be compared, obtains defect said module inconsistent in software defect result and in the actual result, and will be true The identification information for these defect said modules made is written in list, then shows staff and/or tester, Staff and/or tester, which are based on the list, can quickly navigate to these modules, be then quickly found out in the module and deposit Security risk, thus greatly reduce the workload of staff and tester, and improve search defect speed.
Specifically, it is illustrated for including 20 strip defects in software defect prediction result, if the reality in actual result Border defects count is 10, then can find out from actual defects prediction result inconsistent with defect said module in actual result Module, if the module number found be 10, it is potential that these modules are quickly positioned based on the defect said module found Hidden danger, have certain good effect to the success rate of the exploitation of later period software.
It does not predict to tie in software defect it should be noted that if determining that the defects of actual result said module has plenty of In fruit, then show that these modules are that prediction model is not previously predicted, then can by determining to influence the factors of these modules, Then these factors are added in the sample for being used to train prediction model, or adjust the value of original parameter in sample, thus Further increase the accuracy of the prediction model trained.
When it is implemented, using gradient boosting algorithm and random forest learning machine to for carrying out software defect prediction When prediction model is trained, it can implement according to method shown in Fig. 2 a, comprising the following steps:
S21, software defect forecast sample collection is obtained, and is drawn according to the software defect forecast sample that preset ratio will acquire It is divided into training set and verifying set.
When it is implemented, can be obtained according to the developed software of company when obtaining software defect forecast sample, such as Based on the developed any software of company, the survey of the available developer, tester, the software for participating in the software development The quantity required of the use-case quantity and the software that are used when trying the duration, developing the software, using these parameters as input Variable, and also need or the defects count and defect said module of the generation when software development, and defects count is made For end value, a sample of software defect prediction is constituted based on these parameters.Further according to the developed all softwares of company or The software of person's preset quantity obtains software defect forecast sample collection.
Due in training prediction model, needing prediction model to be trained using training set, then utilize verifying Gather the prediction model for obtaining training to verify, therefore after obtaining software defect forecast sample collection, needing will be above-mentioned soft Part failure prediction sample set is divided, such as can take 75% sample composing training set to train prediction model, then Verifying set is constituted using remaining 25% sample to verify the prediction model that training obtains.
Preferably, in order to improve the accuracy of prediction model, and the computation complexity of training process is reduced, it can also be according to Following methods obtain software defect forecast sample, specifically include:
Based on the initial data that software known to software defect result records in the process of development, according to initial data with it is soft The degree of correlation of part failure prediction screens the initial data, obtains software defect forecast sample.
When it is implemented, since the prediction model accuracy that sample size is likely to be obtained more greatly is higher, but computation complexity Can be relatively high, therefore be accuracy that is especially high, and can guarantee prediction model prediction result to guarantee computation complexity not, It, can be according to the degree of correlation of initial data and the software defect prediction of record, to original number when obtaining software defect forecast sample According to being screened, wherein the degree of correlation of software defect prediction can be accounted for from three dimensions: development project is big Small dimension, personnel ability's dimension and product submit dimension.What the development project size can be used with this software of exploitation Use-case quantity is measured to measure, or with the quantity required of the software;The personnel ability can use developer or survey Person works' ability and personnel state (academic state) and work experience are tried to measure;When the product submission can use completion Between measure, the deadline is test time.
When being screened to initial data, it can be determined that first judge whether each developed in software includes three Dimension, it is possible thereby to which the software for not including these three dimensions is rejected;From the software comprising these three dimensions into one Step is screened, for example, it can be set to software requirement quantity is not less than the first quantity required and use-case quantity not less than the second number Value, personnel's educational background is in undergraduate course or more and more than company work time year and a day and the deadline is within half a year etc. Condition does further screening to software, it is hereby achieved that meeting the software of above-mentioned condition, and will record the original of these softwares Data relevant to these three dimensions are as software defect forecast sample collection in data.It is possible thereby to guarantee sample data have compared with The high property of can refer to can be improved the prediction got using the software defect forecast sample training after screening to a certain extent The accuracy of model, further, so that more there is reference value using the result that the prediction model is predicted.
S22, it is based on the training set and gradient boosting algorithm, model training is carried out to random forest learning machine.
When it is implemented, when executing step S22, since random forest learning machine includes more decision trees, it is therefore desirable to Each decision tree is respectively trained using the sample in training set.Based on this purpose, the embodiment of the present invention is to each decision tree It is several batches in the training set by the software defect forecast sample random division for including, wherein each when being trained The quantity for the software defect forecast sample for including in batch is identical;Then the software defect forecast sample for including using each batch Decision-tree model is trained, is embodied, is performed both by following procedure for each batch for including in the training set: Using the software defect forecast sample and gradient boosting algorithm for including in the batch, to any decision in random forest learning machine The loop iteration training that tree carries out preset times obtains the training pattern that batch training obtains.
It specifically, may include process shown in Fig. 2 b, including following when being iterated trained to any decision tree Step:
S221, for i-th training, determine that loss function that (i-1)-th time is determined is obtained in (i-1)-th training The gradient value of training pattern.
When it is implemented, the model of i-th training, needs to be promoted using gradient in order to obtain when carrying out i-th training Algorithm determines the gradient value for the training pattern that the loss function that (i-1)-th time is determined is obtained in (i-1)-th training, the ladder Angle value is divided into positive and negative;Gradient value is greater than 0, demonstrates the need for correcting (i-1)-th obtained training pattern to negative direction;If institute Gradient value is stated less than 0, then demonstrates the need for correcting (i-1)-th obtained training pattern to positive direction.Ideally, gradient Value should be close to 0 numerical value, but the gradient value being calculated under normal circumstances is obviously not zero, and therefore, the present invention needs By utilizing gradient boosting algorithm, the gradient value for the model that final training is obtained is as far as possible close to 0, so that finally obtained Model is more accurate.
S222, the leaf node region that the decision tree includes is determined according to the gradient value.
Specifically, it can use the method for linear search to determine leaf node region that the decision tree includes.
S223, preset condition is met according to the loss function value that the leaf node region determines that (i-1)-th time is determined When each leaf node gain.
Preferably, determining that the loss function value that (i-1)-th time is determined meets default item according to the leaf node region The gain of each leaf node when part, specifically includes:
Corresponding each leaf when the loss function minimalization determined for (i-1)-th time is determined according to the leaf node region The gain of child node.
S224, determine what i-th training obtained using the gain of the leaf node region and each leaf node Training pattern.
S225, judge whether current iteration number reaches preset times, then follow the steps S226 if not;If it is it holds Row step S227.
S226, current iteration number i is added 1, obtains new i, i.e. i=i+1;Then step S231 is executed.
S227, process terminate.
Preferably, the sample characteristics of each sample in the training set include at least one of the following: defects count, lack Fall into said module, tester's information, developer's information, test time, use-case quantity and quantity required.
It should be noted that the quantity required that the embodiment of the present invention one is related to can be understood as the demand to software in the market Quantity.
S23, using the verifying set to be completed trained random forest learning machine carry out model verify to obtain it is described For carrying out the prediction model of software defect prediction.
When it is implemented, being specifically included when executing step S23:
Any training pattern in training pattern obtained for the training of each batch, is performed both by operation shown in Fig. 2 c:
S231, using all software defect forecast samples in verifying set, which is verified, respectively To the corresponding verification result of each software defect forecast sample.
When it is implemented, being trained to obtain each determine using each decision tree of the gradient boosting algorithm to random forest When the final training pattern of plan tree, the training mould that utilizes the software defect forecast sample in verifying set final to each decision tree Type is verified, and is verified result.For example, be based on a certain sample A, to the finally obtained training pattern of any decision tree into When row verifying, obtained verification result is the corresponding defects count predicted value of sample A.
S232, according to the actual result and verification result of each software defect forecast sample, it is true using mean square error function The prediction accuracy of the fixed training pattern.
Preferably, the verification result can be the defects count predicted value of software defect forecast sample;The practical knot Fruit can be the actual defects quantity of software defect sample.
When it is implemented, after the verification result for determining each software defect forecast sample, due to each software defect The actual defects quantity of forecast sample be it is known that therefore can use the actual defects quantity of each software defect forecast sample with Obtained defects count predicted value is verified, reason mean square error function determines the prediction accuracy of the training pattern.
Similarly, the corresponding prediction accuracy of each training pattern can be determined according to the method for step S231 and S232.
After determining prediction accuracy that each training pattern obtains, determine eventually for carrying out software defect prediction When training pattern, can the obtained prediction accuracy of more each training pattern, determine the maximum training pattern of prediction accuracy To be described for carrying out the prediction model of software defect prediction.Therefore deduce that the relatively high software of prediction result accuracy lacks Fall into prediction model.
Software Defects Predict Methods provided in an embodiment of the present invention obtain the feature vector of software to be predicted;And based on institute Feature vector is stated, and the prediction model for being used to carry out software defect prediction that training in advance obtains, determines the software under testing Software defect prediction result, wherein the prediction model is trained based on gradient boosting algorithm and random forest learning machine It arrives.Using method provided by the invention, by using gradient boosting algorithm and the training of random forest learning machine obtain for into Row software defect prediction training pattern so that training obtain prediction model output software defect prediction result accuracy compared with Height, while larger impact will not be brought to computation complexity.
Embodiment two
Based on the same inventive concept, a kind of software defect prediction meanss are additionally provided in the embodiment of the present invention, due to above-mentioned The principle that device solves the problems, such as is similar to Software Defects Predict Methods, therefore the implementation of above-mentioned apparatus may refer to the reality of method It applies, overlaps will not be repeated.
As shown in figure 3, being the structural schematic diagram of software defect prediction meanss provided by Embodiment 2 of the present invention, including obtain Unit 31 and determination unit 32, in which:
Acquiring unit 31, for obtaining the feature vector of software to be predicted;
Determination unit 32, for being based on described eigenvector, and training in advance obtain for carry out software defect pre- The prediction model of survey determines the software defect prediction result of the software under testing, wherein the prediction model is mentioned based on gradient What liter algorithm and the training of random forest learning machine obtained.
Preferably, the determination unit 32, described pre- for carrying out software defect specifically for obtaining by the following method The prediction model of survey: software defect forecast sample collection, and the software defect forecast sample that will acquire according to preset ratio are obtained It is divided into training set and verifying set;And based on the training set and gradient boosting algorithm, Random Forest model is carried out Learning machine training, and verify to obtain institute to trained random forest learning machine progress model is completed using the verifying set State the prediction model for carrying out software defect prediction.
Further, the determination unit 32, specifically for the pre- test sample of software defect that will include in the training set This random division is several batches, wherein the quantity for the software defect forecast sample for including in each batch is identical;For described The each batch for including in training set is performed both by following procedure: utilizing the software defect forecast sample and ladder for including in the batch Boosting algorithm is spent, the loop iteration training for carrying out preset times to any decision tree in random forest learning machine obtains the batch The training pattern that training obtains.
Specifically, the determination unit is specifically used for determining (i-1)-th loss function determined for i-th training In the gradient value for the training pattern that (i-1)-th training obtains;And the leaf that the decision tree includes is determined according to the gradient value Child node region;And preset condition is met according to the loss function value that the leaf node region determines that (i-1)-th time is determined When each leaf node gain;And i-th is determined using the gain in the leaf node region and each leaf node The training pattern that training obtains;Wherein, i is the integer between 1 and preset times.
Preferably, the determination unit, specifically for determining (i-1)-th damage determined according to the leaf node region Lose the gain of corresponding each leaf node when function minimalization.
Further, the determination unit 32, specifically for training appointing in obtained training pattern for each batch One training pattern, is performed both by following operation: using all software defect forecast samples in verifying set, to the training pattern into Row verifying, respectively obtains the corresponding verification result of each software defect forecast sample;According to each software defect forecast sample Actual result and verification result determine the prediction accuracy of the training pattern using mean square error function;And more each instruction Practice the prediction accuracy that model obtains, determines that the maximum training pattern of prediction accuracy is described for carrying out software defect prediction Prediction model.
Preferably, the sample characteristics of each sample in the training set include at least one of the following: defects count, lack Fall into said module, tester's information, developer's information, test time, use-case quantity and quantity required.
Preferably, the determination unit 32, is specifically used for based on software known to software defect result in the process of development The initial data of record is screened the initial data, is obtained according to the degree of correlation of initial data and software defect prediction Software defect forecast sample.
Preferably, the software defect prediction result includes defects count and defect said module;And described device, also Include:
Processing unit, for obtaining the actual result of the software to be predicted;Based on the software to be predicted, by what is obtained The defects count for including in software defect prediction result and the actual defects quantity for including in actual prediction result are compared;If Determine that the defects count is greater than the actual defects quantity, then include from the software defect prediction result is all scarce Said module is fallen into, determine the module inconsistent with the actual defects said module that includes in actual result and is stored to list In, user is showed in the form of a list.
For convenience of description, above each section is divided by function describes respectively for each module (or unit).Certainly, exist Implement to realize the function of each module (or unit) in same or multiple softwares or hardware when the present invention.
Embodiment three
The embodiment of the present application three provides a kind of nonvolatile computer storage media, the computer storage medium storage There are computer executable instructions, which can be performed the prediction of the software defect in above-mentioned any means embodiment Method.
Example IV
Fig. 4 is the hardware configuration signal of the electronic equipment for the implementation Software Defects Predict Methods that the embodiment of the present invention four provides Figure, as shown in figure 4, the electronic equipment includes:
One or more processors 410 and memory 420, in Fig. 4 by taking a processor 410 as an example.
The electronic equipment for executing Software Defects Predict Methods can also include: input unit 430 and output device 440.
Processor 410, memory 420, input unit 430 and output device 440 can pass through bus or other modes It connects, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, as the Software Defects Predict Methods in the embodiment of the present application are corresponding Program instruction/module/unit (for example, attached acquiring unit shown in Fig. 3 31 and determination unit 32).Processor 410 passes through operation Non-volatile software program, instruction and the module/unit being stored in memory 420, thereby executing server or intelligence The various function application and data processing of terminal, i.e. realization above method embodiment Software Defects Predict Methods.
Memory 420 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;Storage data area, which can be stored, uses institute according to software defect prediction meanss The data etc. of creation.In addition, memory 420 may include high-speed random access memory, it can also include non-volatile memories Device, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments In, optional memory 420 includes the memory remotely located relative to processor 410, these remote memories can pass through net Network is connected to software defect prediction meanss.The example of above-mentioned network include but is not limited to internet, intranet, local area network, Mobile radio communication and combinations thereof.
Input unit 430 can receive the number or character information of input, and generate the use with software defect prediction meanss Family setting and the related key signals input of function control.Output device 440 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 420, when by one or more of processors When 410 execution, the Software Defects Predict Methods in above-mentioned any means embodiment are executed.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
Embodiment five
The embodiment of the present application five provides a kind of computer program product, wherein the computer program product includes depositing The computer program in non-transient computer readable storage medium is stored up, the computer program includes program instruction, wherein when When described program instruction is computer-executed, so that the computer is executed any one of the application above method embodiment software and lack Fall into prediction technique.
Software defect prediction meanss provided by embodiments herein can be realized by a computer program.Art technology Personnel are it should be appreciated that above-mentioned module division mode is only one of numerous module division modes, if being divided into it His module or non-division module all should be in the protection scopes of the application as long as software defect prediction meanss have above-mentioned function Within.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (20)

1. a kind of Software Defects Predict Methods characterized by comprising
Obtain the feature vector of software to be predicted;And
Based on described eigenvector, and the prediction model for being used to carry out software defect prediction that training in advance obtains, determine institute State the software defect prediction result of software under testing, wherein the prediction model is based on gradient boosting algorithm and random forest The training of habit machine obtains.
2. the method as described in claim 1, which is characterized in that obtain by the following method described pre- for carrying out software defect The prediction model of survey:
Software defect forecast sample collection is obtained, and the software defect forecast sample that will acquire according to preset ratio is divided into training Set and verifying set;And
Based on the training set and gradient boosting algorithm, model training is carried out to random forest learning machine, and described in utilization Verifying set to be completed trained random forest learning machine carry out model verify to obtain it is described for carrying out software defect prediction Prediction model.
3. method according to claim 2, which is characterized in that based on the training set and gradient boosting algorithm, to random Forest learning machine carries out model training, specifically includes:
It is several batches in the training set by the software defect forecast sample random division for including, wherein packet in each batch The quantity of the software defect forecast sample contained is identical;
Following procedure is performed both by for each batch for including in the training set: utilizing the software defect for including in the batch Forecast sample and gradient boosting algorithm, the loop iteration for carrying out preset times to any decision tree in random forest learning machine are instructed Get the training pattern that batch training obtains.
4. method as claimed in claim 3, which is characterized in that utilize the software defect forecast sample and ladder for including in the batch Boosting algorithm is spent, the loop iteration training for carrying out preset times to any decision tree in random forest learning machine obtains the batch The training pattern that training obtains, specifically includes:
For i-th training, the training pattern that the loss function that (i-1)-th time is determined is obtained in (i-1)-th training is determined Gradient value;And
The leaf node region that the decision tree includes is determined according to the gradient value;And
Each leaf when determining that loss function value that (i-1)-th time is determined meets preset condition according to the leaf node region The gain of node;And
The training pattern that i-th training obtains is determined using the gain of the leaf node region and each leaf node;
Wherein, i is the integer between 1 and preset times.
5. method as claimed in claim 4 is characterized in that, is determined according to the leaf node region and to be determined for (i-1)-th time Loss function value meets the gain of each leaf node when preset condition, specifically includes:
Corresponding each leaf section when the loss function minimalization determined for (i-1)-th time is determined according to the leaf node region The gain of point.
6. method as claimed in claim 5, which is characterized in that using verifying set to trained random forest is completed Learning machine carries out model and verifies to obtain the prediction model for being used to carry out software defect prediction, specifically includes:
Any training pattern in training pattern obtained for the training of each batch, is performed both by following operation:
Using all software defect forecast samples in verifying set, which is verified, is respectively obtained each soft The corresponding verification result of part failure prediction sample;
According to the actual result and verification result of each software defect forecast sample, the training mould is determined using mean square error function The prediction accuracy of type;And
The prediction accuracy that more each training pattern obtains, determine the maximum training pattern of prediction accuracy be it is described for into The prediction model of row software defect prediction.
7. method as claimed in claim 3, which is characterized in that the sample characteristics of each sample in the training set include At least one of below: defects count, defect said module, tester's information, developer's information, test time, use Number of cases amount and quantity required.
8. method according to claim 2, which is characterized in that obtain software defect forecast sample, specifically include:
Based on the initial data that software known to software defect result records in the process of development, lacked according to initial data and software The degree of correlation for falling into prediction, screens the initial data, obtains software defect forecast sample.
9. the method as described in claim 1~8 any claim, which is characterized in that the software defect prediction result packet Include defects count and defect said module;And the method, further includes:
Obtain the actual result of the software to be predicted;
Based on the software to be predicted, by the defects count for including in obtained software defect prediction result and actual prediction result In include actual defects quantity be compared;
If it is determined that the defects count is greater than the actual defects quantity, then include from the software defect prediction result All defect said module, determines the module inconsistent with the actual defects said module that includes in actual result and storage is arrived In list, user is showed in the form of a list.
10. a kind of software defect prediction meanss characterized by comprising
Acquiring unit, for obtaining the feature vector of software to be predicted;
Determination unit is used to be based on described eigenvector, and trains what is obtained to be used to carry out the pre- of software defect prediction in advance Model is surveyed, determines the software defect prediction result of the software under testing, wherein the prediction model is based on gradient boosting algorithm It is obtained with the training of random forest learning machine.
11. device as claimed in claim 10, which is characterized in that
The determination unit, described for carrying out the prediction model of software defect prediction specifically for obtaining by the following method: Software defect forecast sample collection is obtained, and the software defect forecast sample that will acquire according to preset ratio is divided into trained set Gather with verifying;And based on the training set and gradient boosting algorithm, model training is carried out to random forest learning machine, and Using the verifying set to be completed trained random forest learning machine carry out model verify to obtain it is described for carrying out software The prediction model of failure prediction.
12. device as claimed in claim 11, which is characterized in that
The determination unit is several in the training set specifically for the software defect forecast sample random division that will include Batch, wherein the quantity for the software defect forecast sample for including in each batch is identical;For it is described training set in include Each batch is performed both by following procedure: using the software defect forecast sample and gradient boosting algorithm for including in the batch, to The loop iteration training that any decision tree in machine forest learning machine carries out preset times obtains the training that batch training obtains Model.
13. device as claimed in claim 12, which is characterized in that
The determination unit is specifically used for determining loss function that (i-1)-th time is determined in the i-th for i-th training The gradient value for the training pattern that training obtains;And the leaf node region that the decision tree includes is determined according to the gradient value;And Each leaf node when determining that loss function value that (i-1)-th time is determined meets preset condition according to the leaf node region Gain;And the training that i-th training obtains is determined using the gain of the leaf node region and each leaf node Model;Wherein, i is the integer between 1 and preset times.
14. device as claimed in claim 13, which is characterized in that
The determination unit takes pole specifically for the loss function for determining that (i-1)-th time is determined according to the leaf node region The gain of corresponding each leaf node when small value.
15. device as claimed in claim 14, which is characterized in that
The determination unit is held specifically for training any training pattern in obtained training pattern for each batch The following operation of row: using all software defect forecast samples in verifying set, which is verified, is respectively obtained The corresponding verification result of each software defect forecast sample;According to the actual result of each software defect forecast sample and verifying knot Fruit determines the prediction accuracy of the training pattern using mean square error function;And the prediction that more each training pattern obtains Accuracy determines that the maximum training pattern of prediction accuracy is described for carrying out the prediction model of software defect prediction.
16. device as claimed in claim 14, which is characterized in that the sample characteristics packet of each sample in the training set Include at least one of following: defects count, defect said module, tester's information, developer's information, test time, Use-case quantity and quantity required.
17. device as claimed in claim 11, which is characterized in that
The determination unit, specifically for the original number recorded in the process of development based on software known to software defect result According to, according to initial data and software defect prediction the degree of correlation, the initial data is screened, obtain software defect prediction Sample.
18. the device as described in claim 10~17 any claim, which is characterized in that the software defect prediction result Including defects count and defect said module;And described device, further includes:
Processing unit, for obtaining the actual result of the software to be predicted;Based on the software to be predicted, the software that will be obtained The defects count for including in failure prediction result and the actual defects quantity for including in actual prediction result are compared;If it is determined that The defects count is greater than the actual defects quantity, then all defect institute for including from the software defect prediction result out Belong to module, determine the module inconsistent with the actual defects said module that includes in actual result and stores into list, with The form of list shows user.
19. a kind of nonvolatile computer storage media, is stored with computer executable instructions, which is characterized in that the calculating Machine executable instruction is used to execute the method as described in claim 1 to 9 any claim.
20. a kind of electronic equipment characterized by comprising
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out the method as described in claim 1 to 9 any claim.
CN201711462461.0A 2017-12-28 2017-12-28 Software defect prediction method and device and electronic equipment Active CN109976998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711462461.0A CN109976998B (en) 2017-12-28 2017-12-28 Software defect prediction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711462461.0A CN109976998B (en) 2017-12-28 2017-12-28 Software defect prediction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109976998A true CN109976998A (en) 2019-07-05
CN109976998B CN109976998B (en) 2022-06-07

Family

ID=67074950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711462461.0A Active CN109976998B (en) 2017-12-28 2017-12-28 Software defect prediction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109976998B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110732138A (en) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 Virtual object control method and device, readable storage medium and computer equipment
CN111090585A (en) * 2019-12-09 2020-05-01 中国科学院软件研究所 Crowd-sourcing task closing time automatic prediction method based on crowd-sourcing process
CN111143222A (en) * 2019-12-30 2020-05-12 军事科学院系统工程研究院系统总体研究所 Software evaluation method based on defect prediction
CN112416783A (en) * 2020-11-25 2021-02-26 武汉联影医疗科技有限公司 Method, device, equipment and storage medium for determining software quality influence factors
CN112711530A (en) * 2020-12-28 2021-04-27 航天信息股份有限公司 Code risk prediction method and system based on machine learning
CN112882934A (en) * 2021-02-24 2021-06-01 中国工商银行股份有限公司 Test analysis method and system based on defect growth
WO2021159585A1 (en) * 2020-02-10 2021-08-19 北京工业大学 Dioxin emission concentration prediction method
CN117475240A (en) * 2023-12-26 2024-01-30 创思(广州)电子科技有限公司 Vegetable checking method and system based on image recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103257921A (en) * 2013-04-16 2013-08-21 西安电子科技大学 Improved random forest algorithm based system and method for software fault prediction
US20130268469A1 (en) * 2012-04-06 2013-10-10 Applied Materials, Inc. Increasing signal to noise ratio for creation of generalized and robust prediction models
CN105608004A (en) * 2015-12-17 2016-05-25 云南大学 CS-ANN-based software failure prediction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130268469A1 (en) * 2012-04-06 2013-10-10 Applied Materials, Inc. Increasing signal to noise ratio for creation of generalized and robust prediction models
CN103257921A (en) * 2013-04-16 2013-08-21 西安电子科技大学 Improved random forest algorithm based system and method for software fault prediction
CN105608004A (en) * 2015-12-17 2016-05-25 云南大学 CS-ANN-based software failure prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ISSAM H.LARADJI 等: "Software defect prediction using ensemble learning on selected features", 《INFORMATION AND SOFTWARE TECHNOLOGY》 *
李巧: "模型融合算法的研究及应用", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110732138A (en) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 Virtual object control method and device, readable storage medium and computer equipment
CN110732138B (en) * 2019-10-17 2023-09-22 腾讯科技(深圳)有限公司 Virtual object control method, device, readable storage medium and computer equipment
CN111090585A (en) * 2019-12-09 2020-05-01 中国科学院软件研究所 Crowd-sourcing task closing time automatic prediction method based on crowd-sourcing process
CN111090585B (en) * 2019-12-09 2021-06-01 中国科学院软件研究所 Crowd-sourcing task closing time automatic prediction method based on crowd-sourcing process
CN111143222A (en) * 2019-12-30 2020-05-12 军事科学院系统工程研究院系统总体研究所 Software evaluation method based on defect prediction
WO2021159585A1 (en) * 2020-02-10 2021-08-19 北京工业大学 Dioxin emission concentration prediction method
CN112416783A (en) * 2020-11-25 2021-02-26 武汉联影医疗科技有限公司 Method, device, equipment and storage medium for determining software quality influence factors
CN112416783B (en) * 2020-11-25 2022-05-20 武汉联影医疗科技有限公司 Method, device, equipment and storage medium for determining software quality influence factors
CN112711530A (en) * 2020-12-28 2021-04-27 航天信息股份有限公司 Code risk prediction method and system based on machine learning
CN112882934A (en) * 2021-02-24 2021-06-01 中国工商银行股份有限公司 Test analysis method and system based on defect growth
CN112882934B (en) * 2021-02-24 2024-02-13 中国工商银行股份有限公司 Test analysis method and system based on defect growth
CN117475240A (en) * 2023-12-26 2024-01-30 创思(广州)电子科技有限公司 Vegetable checking method and system based on image recognition

Also Published As

Publication number Publication date
CN109976998B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN109976998A (en) A kind of Software Defects Predict Methods, device and electronic equipment
CN108021931A (en) A kind of data sample label processing method and device
CN105912500A (en) Machine learning model generation method and machine learning model generation device
EP3731159A1 (en) Adaptive multiyear economic planning method for energy systems, microgrid and distributed energy resources
CN110472798A (en) Prediction technique, device and the computer readable storage medium of time series data
CN108206027A (en) A kind of audio quality evaluation method and system
CN109783704A (en) Man-machine mixed answer method, system, device
CN106462923A (en) Predicting social, economic, and learning outcomes
CN109697636A (en) A kind of trade company's recommended method, trade company's recommendation apparatus, electronic equipment and medium
CN113361690A (en) Water quality prediction model training method, water quality prediction device, water quality prediction equipment and medium
CN109460503A (en) Answer input method, device, storage medium and electronic equipment
CN110517558A (en) A kind of piano playing fingering evaluation method and system, storage medium and terminal
CN108932646A (en) User tag verification method, device and electronic equipment based on operator
CN110164474A (en) Voice wakes up automated testing method and system
CN113554213A (en) Natural gas demand prediction method, system, storage medium and equipment
CN113408957A (en) Classroom teaching evaluation method based on combined empowerment method
CN109473121A (en) Speech synthesis quality detecting method and device
CN107392217A (en) Computer implemented information processing method and device
CN117493830A (en) Evaluation of training data quality, and generation method, device and equipment of evaluation model
CN110134754A (en) Operation time prediction technique, device, server and the medium of region point of interest
CN111695967A (en) Method, device, equipment and storage medium for determining quotation
CN114968821A (en) Test data generation method and device based on reinforcement learning
CN109670568A (en) Neural net prediction method and device
KR101979427B1 (en) Apparatus for Education and Assessment of Debt Management Competency and Method Thereof
Kundapur et al. Simulating knowledge worker adoption rate of KMS: An organizational perspective

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant