CN109976998B

CN109976998B - Software defect prediction method and device and electronic equipment

Info

Publication number: CN109976998B
Application number: CN201711462461.0A
Authority: CN
Inventors: 吴旭; 曹晶晶
Original assignee: Aisino Corp
Current assignee: Aisino Corp
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2022-06-07
Anticipated expiration: 2037-12-28
Also published as: CN109976998A

Abstract

The invention discloses a software defect prediction method, a device and electronic equipment, wherein the method comprises the following steps: acquiring a feature vector of software to be predicted; and determining a software defect prediction result of the software to be tested based on the feature vector and a prediction model which is obtained by pre-training and used for predicting software defects, wherein the prediction model is obtained by training based on a gradient lifting algorithm and a random forest learning machine. By adopting the method provided by the invention, the training model for software defect prediction is obtained by utilizing the gradient lifting algorithm and the random forest learning machine training, so that the accuracy of the software defect prediction result output by the prediction model obtained by training is higher, and meanwhile, the calculation complexity is not greatly influenced.

Description

Software defect prediction method and device and electronic equipment

Technical Field

The invention relates to the technical field of software engineering application, in particular to a software defect prediction method, a device and equipment.

Background

The software defect prediction technology originated in the 70's of the 20 th century, and from its origin to the present, the technology has been a very active content in the field of software engineering, and plays a very important role in analyzing software quality and balancing software cost. The software defect prediction technology predicts the potential defects in the existing project by analyzing the known defects according to the development method, complexity, personnel capacity and other contents of the software.

The existing software defect prediction methods mostly adopt a single algorithm to measure various indexes of software defects, and do not carry out all-around measurement and prediction according to various attributes of software, the existing algorithms mainly comprise SVM (support vector machine), neural network, Bayesian, Logistic regression and the like, and the methods have the defects of too high calculation complexity, poor accuracy and incapability of obtaining good prediction effect.

Therefore, how to improve the accuracy of the prediction result without significantly increasing the computational complexity is one of the technical problems to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a software defect prediction method, a software defect prediction device and electronic equipment, which are used for solving the problems of higher calculation complexity and lower accuracy of a prediction result of a software defect prediction method adopted in the prior art.

In a first aspect, an embodiment of the present invention provides a software defect prediction method, including:

acquiring a feature vector of software to be predicted; and are

And determining a software defect prediction result of the software to be tested based on the feature vector and a pre-trained prediction model for predicting software defects, wherein the prediction model is obtained based on a gradient lifting algorithm and random forest learning machine training.

In a second aspect, an embodiment of the present invention provides a software defect prediction apparatus, including:

the device comprises an acquisition unit, a prediction unit and a prediction unit, wherein the acquisition unit is used for acquiring a feature vector of software to be predicted;

and the determining unit is used for determining a software defect prediction result of the software to be tested based on the feature vector and a pre-trained prediction model for predicting software defects, wherein the prediction model is obtained based on a gradient lifting algorithm and random forest learning machine training.

In a third aspect, an embodiment of the present invention provides a non-volatile computer storage medium storing computer-executable instructions for performing the software defect prediction method provided in the present application.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the software defect prediction methods provided herein.

The invention has the beneficial effects that:

according to the software defect prediction method, the device and the electronic equipment provided by the embodiment of the invention, the feature vector of the software to be predicted is obtained; and determining a software defect prediction result of the software to be tested based on the feature vector and a prediction model which is obtained by pre-training and used for predicting software defects, wherein the prediction model is obtained by training based on a gradient lifting algorithm and a random forest learning machine. By adopting the method provided by the invention, the training model for software defect prediction is obtained by utilizing the gradient lifting algorithm and the random forest learning machine training, so that the accuracy of the software defect prediction result output by the prediction model obtained by training is higher, and meanwhile, the calculation complexity is not greatly influenced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1a is a flowchart illustrating a software defect prediction method according to an embodiment of the present invention;

FIG. 1b is a second flowchart illustrating a software defect prediction method according to a first embodiment of the present invention;

fig. 2a is a schematic flowchart of a process of training a prediction model for predicting software defects by using a gradient lifting algorithm and a random forest learning machine according to an embodiment of the present invention;

fig. 2b is a schematic flowchart of training iterative training for any decision tree according to an embodiment of the present invention;

fig. 2c is a schematic flowchart of determining the prediction accuracy of the training model for any one of the training models obtained by training in each batch according to the first embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a software defect prediction apparatus according to a second embodiment of the present invention;

fig. 4 is a schematic diagram of a hardware structure of an electronic device implementing a software defect prediction method according to a fourth embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a software defect prediction method, a software defect prediction device and electronic equipment, which are used for solving the problems of higher calculation complexity and lower accuracy of prediction results of a software defect prediction method adopted in the prior art.

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present invention, and are not intended to limit the present invention, and that the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

Example one

As shown in fig. 1a, a schematic flow chart of a software defect prediction method according to an embodiment of the present invention may include the following steps:

and S11, acquiring the feature vector of the software to be predicted.

In specific implementation, in the software development process, the defects generated when software is written currently are usually predicted, so that feature vectors of software to be predicted need to be acquired based on historical data before the current day in the development process. For example, predicting the result of software defect prediction in the next day of software development requires prediction based on the characteristics of the first day of software development. For example, testers, developers, the number of used cases, the number of required cases and the like participating in software development can be determined according to the result of the software development on the first day.

Preferably, the obtained features of the software to be predicted may include testers, developers, test duration, number of use cases, number of requirements, and the like, and a feature vector of the software to be predicted is formed based on the features. It should be noted that, when a tester or a developer is used to construct an element of a feature vector, the tester or the developer needs to convert a discrete variable into a continuous variable. For a developer, the continuous value corresponding to the developer in the software defect prediction process can be determined according to the time for the developer to participate in work, the work time of the company, the work time and the academic calendar of the project participating in the company and other factors. For example, the academic calendar may be first subjected to score conversion, and as shown in table 1, for example, the doctor academic calendar has a converted score of 8 or the like.

TABLE 1

Study calendar	Score value
		Doctor (Rooibos)	8
Master's soldier	6
		This section	5
Special section	4

Specifically, in the process of determining that the developer obtains a continuous value through software defect prediction conversion, the time parameter of the developer can be uniformly converted into a numerical value in units of years, for example, the time for participating in work, the work time of the company and the work time of the project participating in the company are converted into three numerical values in units of years, and if the work time of the company is one year and half, the numerical value is 1.5; of course, it may be in monthly units. Therefore, the three time parameters are converted into numerical values and the scores obtained by converting the academic records are combined with the weights distributed to the four parameters in advance, and the continuous values of the developer are obtained through comprehensive operation.

Similarly, the determination process of the continuous value of the tester refers to the determination process of the continuous value of the developer, and is not described herein again.

And S12, determining the software defect prediction result of the software to be tested based on the feature vector and a pre-trained prediction model for predicting the software defects.

The prediction model is obtained based on a gradient lifting algorithm and random forest learning machine training.

The embodiment of the invention trains a prediction model for predicting software defects by using a gradient lifting algorithm and a random forest learning machine, firstly, the speed of training the prediction model can be improved by using the algorithm of the random forest, and important characteristics for training the prediction model can be given after the training is finished; secondly, when a decision tree of the random forest is trained, the obtained decision tree model can be corrected by utilizing a gradient lifting algorithm, iterative residual errors are reduced step by step, and finally the optimal decision tree model is obtained in the gradient direction of the reduced residual errors.

After the feature vector of the software to be predicted is obtained, the feature vector is input into a prediction model obtained by pre-training, and then the output result is the software defect prediction result of the software to be predicted.

Preferably, the software defect prediction result may be the number of defects and the module to which the defects belong.

In specific implementation, a feature vector for predicting the software defect prediction result on the second day is formed by predicting the software defect prediction result in the software development on the second day, based on developers and testers participating in the software development on the first day and the quantity of use cases and requirements used in the software development on the first day, and then the feature vector is input into a prediction model obtained by pre-training, wherein the output result of the prediction model is the software defect prediction result in the software development on the second day, such as the quantity of defects which can occur in the software development on the second day and modules to which the defects belong. Based on the prediction result of the defect number, developers and testers can compare the actually-occurring defects with the predicted defect number after the development of the next day is finished, the accuracy of the result output by the prediction model is high, so that the defect number close to that output by the prediction model is likely to occur in the development process of the next day, and if the difference between the actually-obtained defect number and the predicted defect number is large, the software developed in the next day is indicated to have some defects which are not found, and the developers or testers need to check or test for one more time. Based on the module to which the defect obtained through prediction belongs, when developers and testers find errors during software verification, the developers and the testers pay attention to the module to which the defect obtained through prediction belongs, so that the speed of finding and verifying the defect is greatly improved, and meanwhile, the workload of the developers and the testers is reduced to a certain extent.

The module to which the defect belongs in the present invention can be understood as the identification information of the module containing the defect.

Preferably, in order to better reduce the workload of the worker and the tester, the invention proposes the process shown in fig. 1b, which comprises the following steps:

and S11a, acquiring the actual result of the software to be predicted.

Specifically, in practical application, when verifying a currently developed program, a worker or a tester may record an actual result of software developed on the same day, where the result includes the number of actual defects and a module to which the actual defects belong. On the basis, the invention can set a man-machine interactive interface, and the user can import the actual result of the software based on the interactive interface.

And S12a, comparing the defect quantity contained in the obtained software defect prediction result with the actual defect quantity contained in the actual prediction result based on the software to be predicted.

And S13a, if the defect number is determined to be larger than the actual defect number, determining modules which are inconsistent with the modules to which the actual defects are included in the actual result from all the modules to which the defects are included in the software defect prediction result, storing the modules into a list, and displaying the modules to a user in the form of the list.

In specific implementation, after obtaining the imported actual result based on step S11a, step S12a may be executed, that is, comparing the actual result with the software defect prediction result, and if it is determined that the predicted number of defects is greater than the actual number of defects. The accuracy of the defect prediction method provided by the invention is higher, so that some defects of the software can be obtained and are not found, all modules to which the defects belong in the software defect prediction result can be compared with the modules to which the defects belong contained in the actual result, the modules to which the defects which are inconsistent with the actual result in the software defect result are obtained, the identification information of the determined modules to which the defects belong is written into the list and then displayed to the working personnel and/or the testing personnel, the working personnel and/or the testing personnel can quickly position the modules based on the list, and then the potential safety hazards existing in the modules are quickly found, so that the workload of the working personnel and the testing personnel is greatly reduced, and the defect searching speed is improved.

Specifically, the example that the software defect prediction result includes 20 defects is taken as an example for explanation, if the actual number of defects in the actual result is 10, modules inconsistent with the modules to which the defects in the actual result belong can be found out from the actual defect prediction result, and if the number of found modules is 10, potential hidden dangers of the modules are quickly located based on the found modules to which the defects belong, so that a certain positive effect is achieved on the success rate of later-stage software development.

It should be noted that if it is determined that some modules to which the defects in the actual result belong are not in the software defect prediction result, which indicates that the modules are not predicted by the prediction model, factors affecting the modules may be determined, and then the factors are added to the samples for training the prediction model, or the values of the original parameters in the samples are adjusted, so as to further improve the accuracy of the trained prediction model.

In specific implementation, when a prediction model for predicting software defects is trained by using a gradient lifting algorithm and a random forest learning machine, the method can be implemented according to the method shown in fig. 2a, and includes the following steps:

and S21, acquiring a software defect prediction sample set, and dividing the acquired software defect prediction sample into a training set and a verification set according to a preset proportion.

In specific implementation, when obtaining the software defect prediction sample, the software defect prediction sample can be obtained according to software developed by a company, for example, based on any software developed by the company, developers and testers participating in the software development, the test duration of the software, the number of use cases used in developing the software, and the required number of the software can be obtained, the several parameters are used as input variables, and the number of defects and modules to which the defects are required or generated in the software development are also obtained, and the number of the defects is used as a result value, so that one sample of the software defect prediction is formed based on the parameters. And obtaining a software defect prediction sample set according to all software developed by a company or a preset amount of software.

When the prediction model is trained, the prediction model needs to be trained by using a training set, and then the trained prediction model needs to be verified by using a verification set, so after a software defect prediction sample set is obtained, the software defect prediction sample set needs to be divided, for example, 75% of samples can be taken to form the training set to train the prediction model, and then the rest 25% of samples can be used to form the verification set to verify the trained prediction model.

Preferably, in order to improve the accuracy of the prediction model and reduce the computational complexity of the training process, the software defect prediction sample may be obtained according to the following method, which specifically includes:

based on original data recorded in the development process of software with known software defect results, screening the original data according to the correlation degree of the original data and software defect prediction to obtain a software defect prediction sample.

In specific implementation, as the larger the sample capacity is, the higher the accuracy of the prediction model is, but the higher the computation complexity is, in order to ensure that the computation complexity is not particularly high and the accuracy of the prediction result of the prediction model is also ensured, when obtaining a software defect prediction sample, the original data may be screened according to the correlation between the recorded original data and the software defect prediction, wherein the correlation of the software defect prediction may be considered from three dimensions: a software development project size dimension, a personnel ability dimension, and a product submission dimension. The size of the software development project can be measured by the number of use cases used for developing the software, or the number of requirements of the software, and the like; the staff capacity can be measured in terms of the developer or tester working capacity and staff status (academic status) and working experience; the product submission may be measured by the completion time, which is the test duration.

When the original data is screened, whether each developed software contains three dimensions can be judged firstly, and therefore the software which does not contain the three dimensions can be removed; the software is further screened, for example, conditions that the quantity of software requirements is not less than a first quantity of requirements and the quantity of use cases is not less than a second numerical value, personnel studies are in the same department and above, the working time of a company is more than one year, the completion time is less than half a year, and the like can be set, so that the software meeting the conditions can be obtained, and data related to the three dimensions in the raw data recording the software is used as a software defect prediction sample set. Therefore, the sample data can be ensured to have higher referential performance, the accuracy of the prediction model obtained by training the screened software defect prediction sample set can be improved to a certain extent, and furthermore, the result obtained by prediction by using the prediction model has higher reference value.

And S22, performing model training on the random forest learning machine based on the training set and the gradient lifting algorithm.

In specific implementation, when step S22 is executed, since the random forest learning machine includes a plurality of decision trees, it is necessary to train each decision tree separately by using samples in the training set. Based on the purpose, when each decision tree is trained, the software defect prediction samples contained in the training set are randomly divided into a plurality of batches, wherein the number of the software defect prediction samples contained in each batch is the same; then, training the decision tree model by using the software defect prediction samples contained in each batch, and specifically implementing the following processes for each batch contained in the training set: and performing preset times of cyclic iterative training on any decision tree in the random forest learning machine by using the software defect prediction sample and the gradient promotion algorithm contained in the batch to obtain a training model obtained by training the batch.

Specifically, when iteratively training any decision tree, the process shown in fig. 2b may be included, which includes the following steps:

s221, aiming at the ith training, determining a gradient value of a training model obtained by training the loss function determined in the ith-1 training time.

In specific implementation, when the training of the ith time is carried out, in order to obtain a model of the training of the ith time, a gradient lifting algorithm is required to be used for determining the gradient value of the loss function determined by the (i-1) th time in the training model obtained by the (i-1) th time, wherein the gradient value is divided into a positive part and a negative part; the gradient value is larger than 0, which indicates that the training model obtained in the (i-1) th time needs to be corrected in the negative direction; and if the gradient value is less than 0, indicating that the training model obtained in the (i-1) th time needs to be corrected in the positive direction. Ideally, the gradient value should be a numerical value close to 0, but the gradient value obtained through calculation is obviously not zero in general, so that the gradient value of the model obtained through final training needs to be close to 0 as much as possible by using a gradient lifting algorithm, so that the model obtained finally is more accurate.

S222, determining a leaf node area contained in the decision tree according to the gradient value.

Specifically, a linear search method may be utilized to determine the leaf node area included in the decision tree.

And S223, determining the gain of each leaf node when the loss function value determined in the (i-1) th time meets the preset condition according to the leaf node area.

Preferably, determining, according to the leaf node area, the gain of each leaf node when the loss function value determined at the i-1 st time meets a preset condition includes:

and determining the gain of each corresponding leaf node when the loss function determined at the (i-1) th time is minimum according to the leaf node area.

S224, determining a training model obtained by the ith training by using the leaf node area and the gain of each leaf node.

S225, judging whether the current iteration times reach the preset times, and if not, executing the step S226; if so, step S227 is performed.

S226, adding 1 to the current iteration number i to obtain a new i, that is, i is i + 1; then, step S231 is performed.

And S227, ending the process.

Preferably, the sample characteristics of each sample in the training set include at least one of: the number of defects, the module to which the defect belongs, the information of testers, the information of developers, the test duration, the number of use cases and the number of requirements.

It should be noted that the demand quantity related to the first embodiment of the present invention may be understood as a demand quantity of software on the market.

And S23, carrying out model verification on the trained random forest learning machine by using the verification set to obtain the prediction model for predicting the software defects.

In specific implementation, when step S23 is executed, the method specifically includes:

for any training model in the training models obtained by training in each batch, the operation shown in fig. 2c is performed:

s231, verifying the training model by using all software defect prediction samples in the verification set to respectively obtain verification results corresponding to all the software defect prediction samples.

In specific implementation, when a gradient lifting algorithm is used for training each decision tree of the random forest to obtain a final training model of each decision tree, a software defect prediction sample in a verification set is used for verifying the final training model of each decision tree to obtain a verification result. For example, when a training model finally obtained by any decision tree is verified based on a certain sample a, the obtained verification result is the predicted value of the number of defects corresponding to the sample a.

And S232, determining the prediction accuracy of the training model by using a mean square error function according to the actual result and the verification result of each software defect prediction sample.

Preferably, the verification result may be a defect number prediction value of the software defect prediction sample; the actual result may be an actual defect number of the software defect sample.

In specific implementation, after the verification result of each software defect prediction sample is determined, the actual defect number of each software defect prediction sample is known, so that the prediction accuracy of the training model can be determined by using the actual defect number of each software defect prediction sample and the defect number prediction value obtained by verification and the mean square error function.

Similarly, the prediction accuracy corresponding to each training model can be determined according to the methods of steps S231 and S232.

After the prediction accuracy obtained by each training model is determined, when the training model finally used for software defect prediction is determined, the prediction accuracy obtained by each training model can be compared, and the training model with the highest prediction accuracy is determined to be the prediction model used for software defect prediction. Therefore, the software defect prediction model with high accuracy of the prediction result can be obtained.

The software defect prediction method provided by the embodiment of the invention obtains the feature vector of the software to be predicted; and determining a software defect prediction result of the software to be tested based on the feature vector and a pre-trained prediction model for predicting software defects, wherein the prediction model is obtained based on a gradient lifting algorithm and random forest learning machine training. By adopting the method provided by the invention, the training model for software defect prediction is obtained by utilizing the gradient lifting algorithm and the random forest learning machine training, so that the accuracy of the software defect prediction result output by the prediction model obtained by training is higher, and meanwhile, the calculation complexity is not greatly influenced.

Example two

Based on the same inventive concept, the embodiment of the invention also provides a software defect prediction device, and as the principle of solving the problems of the device is similar to the software defect prediction method, the implementation of the device can refer to the implementation of the method, and repeated parts are not described again.

As shown in fig. 3, a schematic structural diagram of a software defect prediction apparatus according to a second embodiment of the present invention includes an obtaining unit 31 and a determining unit 32, where:

an obtaining unit 31, configured to obtain a feature vector of software to be predicted;

and the determining unit 32 is configured to determine a software defect prediction result of the software to be tested based on the feature vector and a pre-trained prediction model for predicting software defects, where the prediction model is trained based on a gradient lifting algorithm and a random forest learning machine.

Preferably, the determining unit 32 is specifically configured to obtain the prediction model for software defect prediction according to the following method: acquiring a software defect prediction sample set, and dividing the acquired software defect prediction samples into a training set and a verification set according to a preset proportion; and based on the training set and the gradient lifting algorithm, performing learning machine training on a random forest model, and performing model verification on the trained random forest learning machine by using the verification set to obtain the prediction model for predicting the software defects.

Further, the determining unit 32 is specifically configured to randomly divide the software defect prediction samples included in the training set into a plurality of batches, where the number of the software defect prediction samples included in each batch is the same; for each batch contained in the training set, the following process is performed: and performing preset times of cyclic iterative training on any decision tree in the random forest learning machine by using the software defect prediction sample and the gradient promotion algorithm contained in the batch to obtain a training model obtained by training the batch.

Specifically, the determining unit is specifically configured to determine, for the ith training, a gradient value of a training model obtained by training the loss function determined in the (i-1) th training time; determining leaf node areas contained in the decision tree according to the gradient values; determining the gain of each leaf node when the loss function value determined at the (i-1) th time meets the preset condition according to the leaf node area; determining a training model obtained by the ith training by using the leaf node area and the gain of each leaf node; wherein i is an integer between 1 and a preset number of times.

Preferably, the determining unit is specifically configured to determine, according to the leaf node area, a gain of each corresponding leaf node when the loss function determined at the i-1 st time takes a minimum value.

Further, the determining unit 32 is specifically configured to execute the following operations for any one of the training models obtained by training in each batch: verifying the training model by using all software defect prediction samples in the verification set to respectively obtain verification results corresponding to all the software defect prediction samples; determining the prediction accuracy of the training model by utilizing a mean square error function according to the actual result and the verification result of each software defect prediction sample; and comparing the prediction accuracy obtained by each training model, and determining the training model with the highest prediction accuracy as the prediction model for predicting the software defects.

Preferably, the sample characteristics of each sample in the training set comprise at least one of: the number of defects, the module to which the defect belongs, the information of testers, the information of developers, the test duration, the number of use cases and the number of requirements.

Preferably, the determining unit 32 is specifically configured to, based on original data recorded in a development process of software with a known software defect result, screen the original data according to a correlation between the original data and a software defect prediction, so as to obtain a software defect prediction sample.

Preferably, the software defect prediction result comprises the number of defects and the module to which the defects belong; and the apparatus, further comprising:

the processing unit is used for acquiring an actual result of the software to be predicted; based on the software to be predicted, comparing the defect number contained in the obtained software defect prediction result with the actual defect number contained in the actual prediction result; and if the defect number is larger than the actual defect number, determining modules which are inconsistent with the modules to which the actual defects contained in the actual result belong from all modules to which the defects contained in the software defect prediction result belong, storing the modules into a list, and displaying the modules to a user in the form of the list.

For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same or in multiple pieces of software or hardware in practicing the invention.

EXAMPLE III

The third embodiment of the present application provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the software defect prediction method in any of the above method embodiments.

Example four

Fig. 4 is a schematic diagram of a hardware structure of an electronic device implementing a software defect prediction method according to a fourth embodiment of the present invention, and as shown in fig. 4, the electronic device includes:

one or more processors 410 and a memory 420, with one processor 410 being an example in fig. 4.

The electronic device performing the software defect prediction method may further include: an input device 430 and an output device 440.

The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, such as the bus connection in fig. 4.

The memory 420, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules/units (e.g., the obtaining unit 31 and the determining unit 32 shown in fig. 3) corresponding to the software defect prediction method in the embodiment of the present application. The processor 410 executes various functional applications and data processing of the server or the smart terminal by executing the nonvolatile software program, instructions and modules/units stored in the memory 420, that is, implements the software defect prediction method of the above-described method embodiment.

The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the software defect prediction apparatus, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to a software defect prediction apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the software bug prediction apparatus. The output device 440 may include a display device such as a display screen.

The one or more modules are stored in the memory 420 and, when executed by the one or more processors 410, perform the software defect prediction method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.

(5) And other electronic devices with data interaction functions.

EXAMPLE five

A fifth embodiment of the present application provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, and the computer program includes program instructions, where the program instructions, when executed by a computer, cause the computer to execute any software defect prediction method in the above-mentioned method embodiments of the present application.

The software defect prediction device provided by the embodiment of the application can be realized by a computer program. It should be understood by those skilled in the art that the above-mentioned module division is only one of many module division, and if the module is divided into other modules or not, it is within the scope of the present application as long as the software defect prediction apparatus has the above-mentioned functions.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for predicting software defects, comprising:

acquiring a feature vector of software to be predicted; and are

Determining a software defect prediction result of the software to be predicted based on the feature vector and a pre-trained prediction model for predicting software defects, wherein the prediction model is obtained based on a gradient lifting algorithm and random forest learning machine training;

the prediction model for predicting the software defects is obtained according to the following method:

acquiring a software defect prediction sample set, and dividing the acquired software defect prediction samples into a training set and a verification set according to a preset proportion; performing model training on a random forest learning machine based on the training set and a gradient lifting algorithm, and performing model verification on the trained random forest learning machine by using the verification set to obtain the prediction model for predicting the software defects;

based on the training set and the gradient lifting algorithm, model training is carried out on the random forest learning machine, and the method specifically comprises the following steps:

randomly dividing the software defect prediction samples contained in the training set into a plurality of batches, wherein the number of the software defect prediction samples contained in each batch is the same; for each batch contained in the training set, the following process is performed: performing preset times of cyclic iterative training on any decision tree in the random forest learning machine by using a software defect prediction sample and a gradient promotion algorithm contained in the batch to obtain a training model obtained by training the batch;

performing preset times of cyclic iterative training on any decision tree in the random forest learning machine by using a software defect prediction sample and a gradient promotion algorithm contained in the batch to obtain a training model obtained by training the batch, wherein the training model specifically comprises the following steps:

aiming at the ith training, determining the gradient value of a training model obtained by training the ith-1 training loss function in the ith-1 training; determining a leaf node area contained in the decision tree according to the gradient value; determining the gain of each leaf node when the loss function value determined in the (i-1) th time meets the preset condition according to the leaf node area; determining a training model obtained by the ith training by using the leaf node area and the gain of each leaf node; wherein i is an integer between 1 and a preset number of times.

2. The method according to claim 1, wherein determining, according to the leaf node region, the gain of each leaf node when the loss function value determined in the (i-1) th time meets a preset condition includes:

and determining the gain of each corresponding leaf node when the loss function determined in the (i-1) th time is minimum according to the leaf node area.

3. The method as claimed in claim 2, wherein performing model validation on the trained random forest learning machine by using the validation set to obtain the prediction model for performing software defect prediction specifically comprises:

aiming at any training model in the training models obtained by training in each batch, the following operations are carried out:

verifying the training model by using all software defect prediction samples in the verification set to respectively obtain verification results corresponding to all the software defect prediction samples;

determining the prediction accuracy of the training model by using a mean square error function according to the actual result and the verification result of each software defect prediction sample; and

and comparing the prediction accuracy obtained by each training model, and determining the training model with the highest prediction accuracy as the prediction model for predicting the software defects.

4. The method of claim 1, wherein the sample characteristics of each sample in the training set comprise at least one of: the number of defects, the module to which the defect belongs, the information of testers, the information of developers, the test duration, the number of use cases and the number of requirements.

5. The method of claim 1, wherein obtaining software defect prediction samples comprises:

6. The method according to any one of claims 1 to 5, wherein the software defect prediction result comprises a defect number and a module to which the defect belongs; and the method, further comprising:

acquiring an actual result of the software to be predicted;

based on the software to be predicted, comparing the defect number contained in the obtained software defect prediction result with the actual defect number contained in the actual prediction result;

and if the defect number is larger than the actual defect number, determining modules which are inconsistent with the modules to which the actual defects contained in the actual result belong from all modules to which the defects contained in the software defect prediction result belong, storing the modules into a list, and displaying the modules to a user in the form of the list.

7. A software defect prediction apparatus, comprising:

the determining unit is used for determining a software defect prediction result of the software to be predicted based on the feature vector and a pre-trained prediction model for predicting software defects, wherein the prediction model is obtained based on a gradient lifting algorithm and random forest learning machine training;

the determining unit is specifically configured to obtain the prediction model for predicting the software defect according to the following method: acquiring a software defect prediction sample set, and dividing the acquired software defect prediction sample into a training set and a verification set according to a preset proportion; performing model training on a random forest learning machine based on the training set and a gradient lifting algorithm, and performing model verification on the trained random forest learning machine by using the verification set to obtain the prediction model for predicting the software defects;

the determining unit is specifically configured to randomly divide the software defect prediction samples included in the training set into a plurality of batches, where the number of the software defect prediction samples included in each batch is the same; for each batch contained in the training set, the following process is performed: performing preset times of cyclic iterative training on any decision tree in the random forest learning machine by using a software defect prediction sample and a gradient promotion algorithm contained in the batch to obtain a training model obtained by training the batch;

the determining unit is specifically configured to determine, for the ith training, a gradient value of a training model obtained by the ith-1 st training of the determined loss function; determining leaf node areas contained in the decision tree according to the gradient values; determining the gain of each leaf node when the loss function value determined in the (i-1) th time meets the preset condition according to the leaf node area; determining a training model obtained by the ith training by using the leaf node area and the gain of each leaf node; wherein i is an integer between 1 and a preset number of times.

8. The apparatus of claim 7,

the determining unit is specifically configured to determine, according to the leaf node area, gains of corresponding leaf nodes when the loss function determined in the (i-1) th time takes a minimum value.

9. The apparatus of claim 8,

the determining unit is specifically configured to, for any one of the training models obtained by training in each batch, perform the following operations: verifying the training model by using all software defect prediction samples in the verification set to respectively obtain verification results corresponding to all the software defect prediction samples; determining the prediction accuracy of the training model by using a mean square error function according to the actual result and the verification result of each software defect prediction sample; and comparing the prediction accuracy obtained by each training model, and determining the training model with the highest prediction accuracy as the prediction model for predicting the software defects.

10. The apparatus of claim 7, wherein the sample characteristics for each sample in the training set comprise at least one of: the number of defects, the module to which the defect belongs, the information of testers, the information of developers, the test duration, the number of use cases and the number of requirements.

11. The apparatus of claim 7,

the determining unit is specifically configured to screen the original data based on the original data recorded in the development process of the software with known software defect results according to the correlation between the original data and the software defect prediction, so as to obtain a software defect prediction sample.

12. The apparatus according to any one of claims 7 to 11, wherein the software defect prediction result comprises a defect number and a module to which the defect belongs; and the apparatus, further comprising:

13. A non-transitory computer storage medium storing computer-executable instructions for performing the method of any of claims 1 to 6.

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.