CN114428719A

CN114428719A - K-B-based software defect prediction method and device, electronic equipment and medium

Info

Publication number: CN114428719A
Application number: CN202011077301.6A
Authority: CN
Inventors: 王婷婷
Original assignee: China Petroleum and Chemical Corp; Sinopec Geophysical Research Institute
Current assignee: China Petroleum and Chemical Corp; Sinopec Geophysical Research Institute
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2022-05-03

Abstract

The application discloses a software defect prediction method and device based on K-B, electronic equipment and a medium. The method can comprise the following steps: collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set; reducing dimensions of metric elements in a training data set to obtain a feature vector; carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction; adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model; and according to the optimal model, performing dimension reduction on the measurement elements in the test data set, performing Bayesian classification regression calculation, and predicting the defects of the test data set. The invention solves the dimension problem of the measurement element, solves the accuracy of software defect prediction, provides a new feasible method for the software defect prediction, can predict the defect number of a subsequent software system, provides a reference index for making a software test plan, and better plans manpower and time.

Description

K-B-based software defect prediction method and device, electronic equipment and medium

Technical Field

The invention relates to the field of software testing and data mining, in particular to a K-B-based software defect prediction method, a K-B-based software defect prediction device, electronic equipment and a K-B-based software defect prediction medium.

Background

Since 1970, software defect prediction technology began to develop; as software systems become larger and larger in scale and logic becomes more and more complex, software defects tend to increase and affect software quality, and because software defect prediction helps testers to know the state and quality of software and make delivery standards, software defect prediction also becomes important.

At present, software defect prediction is divided into static prediction methods and dynamic prediction methods. With the increase of software iteration updating times and similar software, the prediction of the number, type and distribution of the defects becomes a feasible method based on software historical development data and the discovered defect number. Research indicates that 3 factors influence defect prediction, selection of measurement elements, a construction method of a defect prediction model and a data set. That is, according to the measurement metadata (code line number, class number, method number, etc.) related to the defect, a proper prediction model is selected, and a proper data set is selected, so that the forwarding limit of the defect prediction can be effectively improved. The study is carried out based on the above static defect prediction method.

How to find data related to defects from a large amount of development historical data, namely, a metric element selection problem becomes a primary problem, and the method relates to the field of data mining. Currently, methods such as PCA, LDA, LLE, and ICA are mainly used. KPCA (Kernel principal Component Analysis), a Kernel principal Component Analysis method, is a derivative version of PCA, traditional PCA cannot realize nonlinear projection, PCA dimensionality reduction attempts to find a low-dimensional linear subspace where data is limited, but the data may be nonlinear. Kernel PCA organically fuses two methods of Kernel lifting and PCA dimension reduction, original data are mapped to a high-dimensional space through a Kernel function (Kernel), and then dimension reduction is carried out by utilizing a PCA algorithm, so that the operation efficiency is improved.

As for the static software defect prediction technology, methods such as classification, regression and Bayes, CNN, DNN and the like based on a neural network are available, and the method relates to the problem of prediction model selection. The time for training the model based on the complex neural network is longer, so that the requirement on the performance of the machine is higher.

Therefore, it is necessary to develop a software defect prediction method, apparatus, electronic device and medium based on KPCA-Bayes.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention provides a K-B-based software defect prediction method, a K-B-based software defect prediction device, electronic equipment and a K-B-based software defect prediction medium, which can solve the dimension problem of a measurement element, solve the accuracy of software defect prediction, provide a new feasible method for software defect prediction, predict the number of defects of a subsequent software system, provide reference indexes for software test plan formulation and better plan manpower and time.

In a first aspect, an embodiment of the present disclosure provides a software defect prediction method based on K-B, including:

collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set;

reducing dimensions of metric elements in the training data set to obtain a feature vector;

carrying out Bayesian classification regression calculation training according to the training data set subjected to dimension reduction and the feature vector;

adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model;

and according to the optimal model, carrying out dimension reduction on the measurement elements in the test data set, carrying out Bayesian classification regression calculation, and predicting the defects of the test data set.

Preferably, performing dimension reduction on the metric elements in the training data set, and obtaining the feature vector includes:

calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;

performing aggregation processing on the matrix to obtain a symmetric kernel matrix;

and converting the symmetric kernel matrix into a central matrix to obtain a characteristic vector.

Preferably, converting the symmetric kernel matrix into a central matrix, and obtaining the feature vector includes:

and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.

Preferably, the optimal model comprises an optimal dimension reduction parameter and an optimal bayesian parameter.

Preferably, according to the optimal model, performing dimension reduction on the metric elements in the test data set, and performing bayesian classification regression calculation, and predicting the defects of the test data set includes:

and performing dimension reduction on the metric elements in the test data set according to the optimal dimension reduction parameters, performing Bayesian classification regression calculation according to the optimal Bayesian parameters, and predicting the defects of the test data set.

Preferably, the method further comprises the following steps:

and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.

Preferably, the data ratio of the training data set to the test data set is 7: 3.

As a specific implementation of the embodiments of the present disclosure,

in a second aspect, an embodiment of the present disclosure further provides a software defect prediction apparatus based on K-B, including:

the data set dividing module is used for collecting software historical defect data and dividing the software historical defect data into a training data set and a test data set;

the dimension reduction module is used for reducing dimensions of the measurement elements in the training data set to obtain a feature vector;

the training module is used for carrying out Bayesian classification regression calculation training according to the training data set subjected to dimension reduction and the feature vector;

the optimal model establishing module is used for adjusting the dimension reduction parameters and the Bayesian parameters to obtain an optimal model;

and the prediction module is used for reducing the dimension of the measurement element in the test data set according to the optimal model, performing Bayesian classification regression calculation and predicting the defects of the test data set.

Preferably, the method further comprises the following steps:

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

a memory storing executable instructions;

a processor executing the executable instructions in the memory to implement the K-B based software bug prediction method.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the K-B based software defect prediction method.

The method and apparatus of the present invention have other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the invention.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts.

FIG. 1 shows a flow chart of the steps of a K-B based software defect prediction method according to one embodiment of the present invention.

FIG. 2 shows a schematic diagram of a comparison of a KPCA-Bayes based software defect prediction model and an SVM software defect prediction model according to one embodiment of the present invention.

FIG. 3 shows a block diagram of a K-B based software defect prediction apparatus according to an embodiment of the present invention.

Description of reference numerals:

201. a data set partitioning module; 202. a dimension reduction module; 203. a training module; 204. an optimal model building module; 205. and a prediction module.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below. While the following describes preferred embodiments of the present invention, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein.

The invention provides a software defect prediction method based on K-B, which comprises the following steps:

reducing dimensions of metric elements in a training data set to obtain a feature vector;

carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction;

and according to the optimal model, performing dimension reduction on the measurement elements in the test data set, performing Bayesian classification regression calculation, and predicting the defects of the test data set.

In one example, dimensionality reduction is performed on the metric elements within the training data set, and obtaining the feature vector comprises:

and converting the symmetric kernel matrix into a central matrix to obtain the characteristic vector.

In one example, converting the symmetric kernel matrix into a center matrix, and obtaining the feature vector comprises:

In one example, the optimal model includes optimal dimension reduction parameters and optimal Bayesian parameters.

In one example, performing dimension reduction on the metric elements in the test data set according to the optimal model, and performing bayesian classification regression calculation, predicting the defects of the test data set comprises:

and performing dimension reduction on the measurement elements in the test data set according to the optimal dimension reduction parameters, and performing Bayesian classification regression calculation according to the optimal Bayesian parameters to predict the defects of the test data set.

In one example, further comprising:

In one example, the data ratio of the training data set to the test data set is 7: 3.

Specifically, historical data is collected for the relevant software to be predicted. Including wmc (method weight in class), dit (depth of inheritance tree), noc (direct sub-number of a class), cbo (coupling between objects), Ic (number of inheritance coupled classes of a class), loc (number of rows of binary code of a class), etc., which are referred to as metrics, one of which is the number of bugs, used as label.

Randomly selecting and dividing the collected software defect related data into a training data set and a testing data set, wherein the data ratio of the training data set to the testing data set is 7:3, the training data set is provided with a label y, and the testing data set does not contain the label.

K feature values are obtained using a KPCA (kernel principal component analysis) technique of data mining. First, the euclidean distance value between any two samples is calculated to obtain a matrix k. In order to make the kernel matrix k more aggregated, the matrix k needs to be aggregated, and the matrix k is obtained by calculating a symmetric kernel matrix. And converting the symmetric kernel matrix into a central kernel matrix, and arranging the central kernel matrix according to the matrix in a descending order to obtain eigenvectors corresponding to the first k eigenvalues. Because the measurement elements influence the bug number of the bug from different dimensions, several features which have the greatest influence on the bug number are selected as main components by the KPCA technology, namely, the nonlinear dimension reduction is realized.

And inputting the training set data x _ train and the bug number vector subjected to the feature processing into a classification regression algorithm with Bayes as a prediction model, and calculating the conditional probability of different independent features. Adjusting related parameters of KPCA and Bayes, including K value and kernel function type of KPCA, prior probability priors of Bayes, etc., continuously transforming various combinations of K value, function type, probability value, etc. by parameter adjustment, and determining the optimal model according to the final result.

And inputting the dimension-reduced test data x _ test into the optimal model, comparing the conditional probability, taking out the maximum value, and finishing the classification of the test samples, namely the predicted bug number y _ pred.

By y _ pred and y _ test, accuracy _ score (accuracy score, which is the correct data for the model classification divided by the total number of samples), precision, call _ score, f _ score (calculated from accuracy and recall), and accuracy of the analysis algorithm are calculated.

The invention also provides a software defect prediction device based on K-B, which comprises:

the training module is used for carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction;

In one example, further comprising:

The present invention also provides an electronic device, comprising: a memory storing executable instructions; and the processor executes executable instructions in the memory to realize the K-B-based software defect prediction method.

The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the K-B based software defect prediction method described above.

To facilitate understanding of the scheme of the embodiments of the present invention and the effects thereof, four specific application examples are given below. It will be understood by those skilled in the art that this example is merely for the purpose of facilitating an understanding of the present invention and that any specific details thereof are not intended to limit the invention in any way.

Example 1

As shown in fig. 1, the software defect prediction method based on K-B includes: step 101, collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set; 102, reducing dimensions of measurement elements in a training data set to obtain a feature vector; 103, carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction; step 104, adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model; and 105, according to the optimal model, reducing dimensions of the measurement elements in the test data set, performing Bayesian classification regression calculation, and predicting defects of the test data set.

Historical data is collected for the relevant software to be predicted. Including wmc (method weight in class), dit (depth of inheritance tree), noc (direct sub-class of a class), cbo (coupling between objects), Ic (number of inheritance coupled classes of a class), loc (number of rows of binary codes of a class), etc., the above data is called a metric, and the invention collects 21 metrics, wherein one metric is the number of bugs, and is used as a label.

And inputting the training set data x _ train and the bug number vector subjected to the feature processing into a classification regression algorithm with Bayes as a prediction model, and calculating the conditional probability of different independent features. And adjusting the related parameters of KPCA and Bayes, including the k value and the kernel function type of KPCA, the prior probability priors of Bayes and the like, to obtain an optimal model.

FIG. 2 shows a schematic comparison of a KPCA-Bayes based software defect prediction model to an SVM software defect prediction model with the vertical axis representing a score, according to one embodiment of the invention.

FIG. 2 shows that the model score of the present invention is 0.874, the svm score is 0.787; f-score scores were 0.534 and 0.293, precision scores were 0.573 and 0.261, respectively, and recall scores were 0.530 and 0.293, respectively. Therefore, the software defect prediction model based on KPCA-Bayes has better accuracy, and can better avoid the occurrence of false negative and false positive prediction.

Example 2

As shown in fig. 3, the K-B based software defect prediction apparatus includes:

the data set dividing module 201 is used for collecting software historical defect data and dividing the software historical defect data into a training data set and a test data set;

a dimension reduction module 202, which performs dimension reduction on the metric elements in the training data set to obtain feature vectors;

the training module 203 performs Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction;

an optimal model establishing module 204 for adjusting the dimension reduction parameters and the Bayesian parameters to obtain an optimal model;

and the prediction module 205 performs dimension reduction on the measurement elements in the test data set according to the optimal model, performs Bayesian classification regression calculation, and predicts the defects of the test data set.

As an alternative, performing dimension reduction on the metric elements in the training data set, and obtaining the feature vector includes:

As an alternative, converting the symmetric kernel matrix into the central matrix, and obtaining the feature vector includes:

As an alternative, the optimal model includes an optimal dimension reduction parameter and an optimal bayesian parameter.

As an alternative, according to the optimal model, performing dimension reduction on the metric elements in the test data set, and performing bayesian classification regression calculation, and predicting the defects of the test data set includes:

As an alternative, the method further comprises the following steps:

Alternatively, the data ratio of the training data set to the test data set is 7: 3.

Example 3

The present disclosure provides an electronic device including: a memory storing executable instructions; and the processor runs the executable instructions in the memory to realize the software defect prediction method based on the K-B.

An electronic device according to an embodiment of the present disclosure includes a memory and a processor.

The memory is to store non-transitory computer readable instructions. In particular, the memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions. In one embodiment of the disclosure, the processor is configured to execute the computer readable instructions stored in the memory.

Those skilled in the art should understand that, in order to solve the technical problem of how to obtain a good user experience, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures should also be included in the protection scope of the present disclosure.

For the detailed description of the present embodiment, reference may be made to the corresponding descriptions in the foregoing embodiments, which are not repeated herein.

Example 4

The disclosed embodiments provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the K-B based software defect prediction method.

A computer-readable storage medium according to an embodiment of the present disclosure has non-transitory computer-readable instructions stored thereon. The non-transitory computer readable instructions, when executed by a processor, perform all or a portion of the steps of the methods of the embodiments of the disclosure previously described.

The computer-readable storage media include, but are not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).

It will be appreciated by persons skilled in the art that the above description of embodiments of the invention is intended only to illustrate the benefits of embodiments of the invention and is not intended to limit embodiments of the invention to any examples given.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims

1. A software defect prediction method based on K-B is characterized by comprising the following steps:

2. The K-B based software bug prediction method of claim 1, wherein dimensionality reduction is performed on the metric elements within the training data set, and obtaining a feature vector comprises:

3. The K-B based software bug prediction method of claim 2, wherein transforming the symmetric kernel matrix into a center matrix, obtaining feature vectors comprises:

4. The K-B based software bug prediction method of claim 1, wherein the optimal model comprises optimal dimension reduction parameters and optimal bayesian parameters.

5. The K-B based software bug prediction method of claim 4, wherein, according to the optimal model, dimension reduction is performed on metric elements in the test data set, and Bayesian classification regression calculation is performed, and predicting bugs in the test data set comprises:

6. The K-B based software bug prediction method of claim 1, further comprising:

7. The K-B based software bug prediction method of claim 1, wherein the data ratio of the training data set to the test data set is 7: 3.

8. A software defect prediction device based on K-B is characterized by comprising:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing executable instructions;

a processor executing the executable instructions in the memory to implement the K-B based software bug prediction method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the K-B based software defect prediction method of any one of claims 1-7.