CN114428719A - K-B-based software defect prediction method and device, electronic equipment and medium - Google Patents

K-B-based software defect prediction method and device, electronic equipment and medium Download PDF

Info

Publication number
CN114428719A
CN114428719A CN202011077301.6A CN202011077301A CN114428719A CN 114428719 A CN114428719 A CN 114428719A CN 202011077301 A CN202011077301 A CN 202011077301A CN 114428719 A CN114428719 A CN 114428719A
Authority
CN
China
Prior art keywords
data set
dimension reduction
test data
software
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011077301.6A
Other languages
Chinese (zh)
Inventor
王婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Original Assignee
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Petroleum and Chemical Corp, Sinopec Geophysical Research Institute filed Critical China Petroleum and Chemical Corp
Priority to CN202011077301.6A priority Critical patent/CN114428719A/en
Publication of CN114428719A publication Critical patent/CN114428719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Abstract

The application discloses a software defect prediction method and device based on K-B, electronic equipment and a medium. The method can comprise the following steps: collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set; reducing dimensions of metric elements in a training data set to obtain a feature vector; carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction; adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model; and according to the optimal model, performing dimension reduction on the measurement elements in the test data set, performing Bayesian classification regression calculation, and predicting the defects of the test data set. The invention solves the dimension problem of the measurement element, solves the accuracy of software defect prediction, provides a new feasible method for the software defect prediction, can predict the defect number of a subsequent software system, provides a reference index for making a software test plan, and better plans manpower and time.

Description

K-B-based software defect prediction method and device, electronic equipment and medium
Technical Field
The invention relates to the field of software testing and data mining, in particular to a K-B-based software defect prediction method, a K-B-based software defect prediction device, electronic equipment and a K-B-based software defect prediction medium.
Background
Since 1970, software defect prediction technology began to develop; as software systems become larger and larger in scale and logic becomes more and more complex, software defects tend to increase and affect software quality, and because software defect prediction helps testers to know the state and quality of software and make delivery standards, software defect prediction also becomes important.
At present, software defect prediction is divided into static prediction methods and dynamic prediction methods. With the increase of software iteration updating times and similar software, the prediction of the number, type and distribution of the defects becomes a feasible method based on software historical development data and the discovered defect number. Research indicates that 3 factors influence defect prediction, selection of measurement elements, a construction method of a defect prediction model and a data set. That is, according to the measurement metadata (code line number, class number, method number, etc.) related to the defect, a proper prediction model is selected, and a proper data set is selected, so that the forwarding limit of the defect prediction can be effectively improved. The study is carried out based on the above static defect prediction method.
How to find data related to defects from a large amount of development historical data, namely, a metric element selection problem becomes a primary problem, and the method relates to the field of data mining. Currently, methods such as PCA, LDA, LLE, and ICA are mainly used. KPCA (Kernel principal Component Analysis), a Kernel principal Component Analysis method, is a derivative version of PCA, traditional PCA cannot realize nonlinear projection, PCA dimensionality reduction attempts to find a low-dimensional linear subspace where data is limited, but the data may be nonlinear. Kernel PCA organically fuses two methods of Kernel lifting and PCA dimension reduction, original data are mapped to a high-dimensional space through a Kernel function (Kernel), and then dimension reduction is carried out by utilizing a PCA algorithm, so that the operation efficiency is improved.
As for the static software defect prediction technology, methods such as classification, regression and Bayes, CNN, DNN and the like based on a neural network are available, and the method relates to the problem of prediction model selection. The time for training the model based on the complex neural network is longer, so that the requirement on the performance of the machine is higher.
Therefore, it is necessary to develop a software defect prediction method, apparatus, electronic device and medium based on KPCA-Bayes.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention provides a K-B-based software defect prediction method, a K-B-based software defect prediction device, electronic equipment and a K-B-based software defect prediction medium, which can solve the dimension problem of a measurement element, solve the accuracy of software defect prediction, provide a new feasible method for software defect prediction, predict the number of defects of a subsequent software system, provide reference indexes for software test plan formulation and better plan manpower and time.
In a first aspect, an embodiment of the present disclosure provides a software defect prediction method based on K-B, including:
collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set;
reducing dimensions of metric elements in the training data set to obtain a feature vector;
carrying out Bayesian classification regression calculation training according to the training data set subjected to dimension reduction and the feature vector;
adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model;
and according to the optimal model, carrying out dimension reduction on the measurement elements in the test data set, carrying out Bayesian classification regression calculation, and predicting the defects of the test data set.
Preferably, performing dimension reduction on the metric elements in the training data set, and obtaining the feature vector includes:
calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;
performing aggregation processing on the matrix to obtain a symmetric kernel matrix;
and converting the symmetric kernel matrix into a central matrix to obtain a characteristic vector.
Preferably, converting the symmetric kernel matrix into a central matrix, and obtaining the feature vector includes:
and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.
Preferably, the optimal model comprises an optimal dimension reduction parameter and an optimal bayesian parameter.
Preferably, according to the optimal model, performing dimension reduction on the metric elements in the test data set, and performing bayesian classification regression calculation, and predicting the defects of the test data set includes:
and performing dimension reduction on the metric elements in the test data set according to the optimal dimension reduction parameters, performing Bayesian classification regression calculation according to the optimal Bayesian parameters, and predicting the defects of the test data set.
Preferably, the method further comprises the following steps:
and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.
Preferably, the data ratio of the training data set to the test data set is 7: 3.
As a specific implementation of the embodiments of the present disclosure,
in a second aspect, an embodiment of the present disclosure further provides a software defect prediction apparatus based on K-B, including:
the data set dividing module is used for collecting software historical defect data and dividing the software historical defect data into a training data set and a test data set;
the dimension reduction module is used for reducing dimensions of the measurement elements in the training data set to obtain a feature vector;
the training module is used for carrying out Bayesian classification regression calculation training according to the training data set subjected to dimension reduction and the feature vector;
the optimal model establishing module is used for adjusting the dimension reduction parameters and the Bayesian parameters to obtain an optimal model;
and the prediction module is used for reducing the dimension of the measurement element in the test data set according to the optimal model, performing Bayesian classification regression calculation and predicting the defects of the test data set.
Preferably, performing dimension reduction on the metric elements in the training data set, and obtaining the feature vector includes:
calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;
performing aggregation processing on the matrix to obtain a symmetric kernel matrix;
and converting the symmetric kernel matrix into a central matrix to obtain a characteristic vector.
Preferably, converting the symmetric kernel matrix into a central matrix, and obtaining the feature vector includes:
and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.
Preferably, the optimal model comprises an optimal dimension reduction parameter and an optimal bayesian parameter.
Preferably, according to the optimal model, performing dimension reduction on the metric elements in the test data set, and performing bayesian classification regression calculation, and predicting the defects of the test data set includes:
and performing dimension reduction on the metric elements in the test data set according to the optimal dimension reduction parameters, performing Bayesian classification regression calculation according to the optimal Bayesian parameters, and predicting the defects of the test data set.
Preferably, the method further comprises the following steps:
and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.
Preferably, the data ratio of the training data set to the test data set is 7: 3.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
a memory storing executable instructions;
a processor executing the executable instructions in the memory to implement the K-B based software bug prediction method.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the K-B based software defect prediction method.
The method and apparatus of the present invention have other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the invention.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts.
FIG. 1 shows a flow chart of the steps of a K-B based software defect prediction method according to one embodiment of the present invention.
FIG. 2 shows a schematic diagram of a comparison of a KPCA-Bayes based software defect prediction model and an SVM software defect prediction model according to one embodiment of the present invention.
FIG. 3 shows a block diagram of a K-B based software defect prediction apparatus according to an embodiment of the present invention.
Description of reference numerals:
201. a data set partitioning module; 202. a dimension reduction module; 203. a training module; 204. an optimal model building module; 205. and a prediction module.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below. While the following describes preferred embodiments of the present invention, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein.
The invention provides a software defect prediction method based on K-B, which comprises the following steps:
collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set;
reducing dimensions of metric elements in a training data set to obtain a feature vector;
carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction;
adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model;
and according to the optimal model, performing dimension reduction on the measurement elements in the test data set, performing Bayesian classification regression calculation, and predicting the defects of the test data set.
In one example, dimensionality reduction is performed on the metric elements within the training data set, and obtaining the feature vector comprises:
calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;
performing aggregation processing on the matrix to obtain a symmetric kernel matrix;
and converting the symmetric kernel matrix into a central matrix to obtain the characteristic vector.
In one example, converting the symmetric kernel matrix into a center matrix, and obtaining the feature vector comprises:
and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.
In one example, the optimal model includes optimal dimension reduction parameters and optimal Bayesian parameters.
In one example, performing dimension reduction on the metric elements in the test data set according to the optimal model, and performing bayesian classification regression calculation, predicting the defects of the test data set comprises:
and performing dimension reduction on the measurement elements in the test data set according to the optimal dimension reduction parameters, and performing Bayesian classification regression calculation according to the optimal Bayesian parameters to predict the defects of the test data set.
In one example, further comprising:
and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.
In one example, the data ratio of the training data set to the test data set is 7: 3.
Specifically, historical data is collected for the relevant software to be predicted. Including wmc (method weight in class), dit (depth of inheritance tree), noc (direct sub-number of a class), cbo (coupling between objects), Ic (number of inheritance coupled classes of a class), loc (number of rows of binary code of a class), etc., which are referred to as metrics, one of which is the number of bugs, used as label.
Randomly selecting and dividing the collected software defect related data into a training data set and a testing data set, wherein the data ratio of the training data set to the testing data set is 7:3, the training data set is provided with a label y, and the testing data set does not contain the label.
K feature values are obtained using a KPCA (kernel principal component analysis) technique of data mining. First, the euclidean distance value between any two samples is calculated to obtain a matrix k. In order to make the kernel matrix k more aggregated, the matrix k needs to be aggregated, and the matrix k is obtained by calculating a symmetric kernel matrix. And converting the symmetric kernel matrix into a central kernel matrix, and arranging the central kernel matrix according to the matrix in a descending order to obtain eigenvectors corresponding to the first k eigenvalues. Because the measurement elements influence the bug number of the bug from different dimensions, several features which have the greatest influence on the bug number are selected as main components by the KPCA technology, namely, the nonlinear dimension reduction is realized.
And inputting the training set data x _ train and the bug number vector subjected to the feature processing into a classification regression algorithm with Bayes as a prediction model, and calculating the conditional probability of different independent features. Adjusting related parameters of KPCA and Bayes, including K value and kernel function type of KPCA, prior probability priors of Bayes, etc., continuously transforming various combinations of K value, function type, probability value, etc. by parameter adjustment, and determining the optimal model according to the final result.
And inputting the dimension-reduced test data x _ test into the optimal model, comparing the conditional probability, taking out the maximum value, and finishing the classification of the test samples, namely the predicted bug number y _ pred.
By y _ pred and y _ test, accuracy _ score (accuracy score, which is the correct data for the model classification divided by the total number of samples), precision, call _ score, f _ score (calculated from accuracy and recall), and accuracy of the analysis algorithm are calculated.
The invention also provides a software defect prediction device based on K-B, which comprises:
the data set dividing module is used for collecting software historical defect data and dividing the software historical defect data into a training data set and a test data set;
the dimension reduction module is used for reducing dimensions of the measurement elements in the training data set to obtain a feature vector;
the training module is used for carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction;
the optimal model establishing module is used for adjusting the dimension reduction parameters and the Bayesian parameters to obtain an optimal model;
and the prediction module is used for reducing the dimension of the measurement element in the test data set according to the optimal model, performing Bayesian classification regression calculation and predicting the defects of the test data set.
In one example, dimensionality reduction is performed on the metric elements within the training data set, and obtaining the feature vector comprises:
calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;
performing aggregation processing on the matrix to obtain a symmetric kernel matrix;
and converting the symmetric kernel matrix into a central matrix to obtain the characteristic vector.
In one example, converting the symmetric kernel matrix into a center matrix, and obtaining the feature vector comprises:
and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.
In one example, the optimal model includes optimal dimension reduction parameters and optimal Bayesian parameters.
In one example, performing dimension reduction on the metric elements in the test data set according to the optimal model, and performing bayesian classification regression calculation, predicting the defects of the test data set comprises:
and performing dimension reduction on the measurement elements in the test data set according to the optimal dimension reduction parameters, and performing Bayesian classification regression calculation according to the optimal Bayesian parameters to predict the defects of the test data set.
In one example, further comprising:
and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.
In one example, the data ratio of the training data set to the test data set is 7: 3.
Specifically, historical data is collected for the relevant software to be predicted. Including wmc (method weight in class), dit (depth of inheritance tree), noc (direct sub-number of a class), cbo (coupling between objects), Ic (number of inheritance coupled classes of a class), loc (number of rows of binary code of a class), etc., which are referred to as metrics, one of which is the number of bugs, used as label.
Randomly selecting and dividing the collected software defect related data into a training data set and a testing data set, wherein the data ratio of the training data set to the testing data set is 7:3, the training data set is provided with a label y, and the testing data set does not contain the label.
K feature values are obtained using a KPCA (kernel principal component analysis) technique of data mining. First, the euclidean distance value between any two samples is calculated to obtain a matrix k. In order to make the kernel matrix k more aggregated, the matrix k needs to be aggregated, and the matrix k is obtained by calculating a symmetric kernel matrix. And converting the symmetric kernel matrix into a central kernel matrix, and arranging the central kernel matrix according to the matrix in a descending order to obtain eigenvectors corresponding to the first k eigenvalues. Because the measurement elements influence the bug number of the bug from different dimensions, several features which have the greatest influence on the bug number are selected as main components by the KPCA technology, namely, the nonlinear dimension reduction is realized.
And inputting the training set data x _ train and the bug number vector subjected to the feature processing into a classification regression algorithm with Bayes as a prediction model, and calculating the conditional probability of different independent features. Adjusting related parameters of KPCA and Bayes, including K value and kernel function type of KPCA, prior probability priors of Bayes, etc., continuously transforming various combinations of K value, function type, probability value, etc. by parameter adjustment, and determining the optimal model according to the final result.
And inputting the dimension-reduced test data x _ test into the optimal model, comparing the conditional probability, taking out the maximum value, and finishing the classification of the test samples, namely the predicted bug number y _ pred.
By y _ pred and y _ test, accuracy _ score (accuracy score, which is the correct data for the model classification divided by the total number of samples), precision, call _ score, f _ score (calculated from accuracy and recall), and accuracy of the analysis algorithm are calculated.
The present invention also provides an electronic device, comprising: a memory storing executable instructions; and the processor executes executable instructions in the memory to realize the K-B-based software defect prediction method.
The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the K-B based software defect prediction method described above.
To facilitate understanding of the scheme of the embodiments of the present invention and the effects thereof, four specific application examples are given below. It will be understood by those skilled in the art that this example is merely for the purpose of facilitating an understanding of the present invention and that any specific details thereof are not intended to limit the invention in any way.
Example 1
FIG. 1 shows a flow chart of the steps of a K-B based software defect prediction method according to one embodiment of the present invention.
As shown in fig. 1, the software defect prediction method based on K-B includes: step 101, collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set; 102, reducing dimensions of measurement elements in a training data set to obtain a feature vector; 103, carrying out Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction; step 104, adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model; and 105, according to the optimal model, reducing dimensions of the measurement elements in the test data set, performing Bayesian classification regression calculation, and predicting defects of the test data set.
Historical data is collected for the relevant software to be predicted. Including wmc (method weight in class), dit (depth of inheritance tree), noc (direct sub-class of a class), cbo (coupling between objects), Ic (number of inheritance coupled classes of a class), loc (number of rows of binary codes of a class), etc., the above data is called a metric, and the invention collects 21 metrics, wherein one metric is the number of bugs, and is used as a label.
Randomly selecting and dividing the collected software defect related data into a training data set and a testing data set, wherein the data ratio of the training data set to the testing data set is 7:3, the training data set is provided with a label y, and the testing data set does not contain the label.
K feature values are obtained using a KPCA (kernel principal component analysis) technique of data mining. First, the euclidean distance value between any two samples is calculated to obtain a matrix k. In order to make the kernel matrix k more aggregated, the matrix k needs to be aggregated, and the matrix k is obtained by calculating a symmetric kernel matrix. And converting the symmetric kernel matrix into a central kernel matrix, and arranging the central kernel matrix according to the matrix in a descending order to obtain eigenvectors corresponding to the first k eigenvalues. Because the measurement elements influence the bug number of the bug from different dimensions, several features which have the greatest influence on the bug number are selected as main components by the KPCA technology, namely, the nonlinear dimension reduction is realized.
And inputting the training set data x _ train and the bug number vector subjected to the feature processing into a classification regression algorithm with Bayes as a prediction model, and calculating the conditional probability of different independent features. And adjusting the related parameters of KPCA and Bayes, including the k value and the kernel function type of KPCA, the prior probability priors of Bayes and the like, to obtain an optimal model.
And inputting the dimension-reduced test data x _ test into the optimal model, comparing the conditional probability, taking out the maximum value, and finishing the classification of the test samples, namely the predicted bug number y _ pred.
By y _ pred and y _ test, accuracy _ score (accuracy score, which is the correct data for the model classification divided by the total number of samples), precision, call _ score, f _ score (calculated from accuracy and recall), and accuracy of the analysis algorithm are calculated.
FIG. 2 shows a schematic comparison of a KPCA-Bayes based software defect prediction model to an SVM software defect prediction model with the vertical axis representing a score, according to one embodiment of the invention.
FIG. 2 shows that the model score of the present invention is 0.874, the svm score is 0.787; f-score scores were 0.534 and 0.293, precision scores were 0.573 and 0.261, respectively, and recall scores were 0.530 and 0.293, respectively. Therefore, the software defect prediction model based on KPCA-Bayes has better accuracy, and can better avoid the occurrence of false negative and false positive prediction.
Example 2
FIG. 3 shows a block diagram of a K-B based software defect prediction apparatus according to an embodiment of the present invention.
As shown in fig. 3, the K-B based software defect prediction apparatus includes:
the data set dividing module 201 is used for collecting software historical defect data and dividing the software historical defect data into a training data set and a test data set;
a dimension reduction module 202, which performs dimension reduction on the metric elements in the training data set to obtain feature vectors;
the training module 203 performs Bayesian classification regression calculation training according to the training data set and the feature vector after dimension reduction;
an optimal model establishing module 204 for adjusting the dimension reduction parameters and the Bayesian parameters to obtain an optimal model;
and the prediction module 205 performs dimension reduction on the measurement elements in the test data set according to the optimal model, performs Bayesian classification regression calculation, and predicts the defects of the test data set.
As an alternative, performing dimension reduction on the metric elements in the training data set, and obtaining the feature vector includes:
calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;
performing aggregation processing on the matrix to obtain a symmetric kernel matrix;
and converting the symmetric kernel matrix into a central matrix to obtain the characteristic vector.
As an alternative, converting the symmetric kernel matrix into the central matrix, and obtaining the feature vector includes:
and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.
As an alternative, the optimal model includes an optimal dimension reduction parameter and an optimal bayesian parameter.
As an alternative, according to the optimal model, performing dimension reduction on the metric elements in the test data set, and performing bayesian classification regression calculation, and predicting the defects of the test data set includes:
and performing dimension reduction on the measurement elements in the test data set according to the optimal dimension reduction parameters, and performing Bayesian classification regression calculation according to the optimal Bayesian parameters to predict the defects of the test data set.
As an alternative, the method further comprises the following steps:
and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.
Alternatively, the data ratio of the training data set to the test data set is 7: 3.
Example 3
The present disclosure provides an electronic device including: a memory storing executable instructions; and the processor runs the executable instructions in the memory to realize the software defect prediction method based on the K-B.
An electronic device according to an embodiment of the present disclosure includes a memory and a processor.
The memory is to store non-transitory computer readable instructions. In particular, the memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions. In one embodiment of the disclosure, the processor is configured to execute the computer readable instructions stored in the memory.
Those skilled in the art should understand that, in order to solve the technical problem of how to obtain a good user experience, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures should also be included in the protection scope of the present disclosure.
For the detailed description of the present embodiment, reference may be made to the corresponding descriptions in the foregoing embodiments, which are not repeated herein.
Example 4
The disclosed embodiments provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the K-B based software defect prediction method.
A computer-readable storage medium according to an embodiment of the present disclosure has non-transitory computer-readable instructions stored thereon. The non-transitory computer readable instructions, when executed by a processor, perform all or a portion of the steps of the methods of the embodiments of the disclosure previously described.
The computer-readable storage media include, but are not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).
It will be appreciated by persons skilled in the art that the above description of embodiments of the invention is intended only to illustrate the benefits of embodiments of the invention and is not intended to limit embodiments of the invention to any examples given.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims (10)

1. A software defect prediction method based on K-B is characterized by comprising the following steps:
collecting software historical defect data, and dividing the software historical defect data into a training data set and a test data set;
reducing dimensions of metric elements in the training data set to obtain a feature vector;
carrying out Bayesian classification regression calculation training according to the training data set subjected to dimension reduction and the feature vector;
adjusting the dimension reduction parameters and Bayesian parameters to obtain an optimal model;
and according to the optimal model, carrying out dimension reduction on the measurement elements in the test data set, carrying out Bayesian classification regression calculation, and predicting the defects of the test data set.
2. The K-B based software bug prediction method of claim 1, wherein dimensionality reduction is performed on the metric elements within the training data set, and obtaining a feature vector comprises:
calculating the Euclidean distance value between any two samples in the training data set to obtain a matrix;
performing aggregation processing on the matrix to obtain a symmetric kernel matrix;
and converting the symmetric kernel matrix into a central matrix to obtain a characteristic vector.
3. The K-B based software bug prediction method of claim 2, wherein transforming the symmetric kernel matrix into a center matrix, obtaining feature vectors comprises:
and arranging the central matrixes in a descending order to obtain eigenvectors corresponding to the first k eigenvalues.
4. The K-B based software bug prediction method of claim 1, wherein the optimal model comprises optimal dimension reduction parameters and optimal bayesian parameters.
5. The K-B based software bug prediction method of claim 4, wherein, according to the optimal model, dimension reduction is performed on metric elements in the test data set, and Bayesian classification regression calculation is performed, and predicting bugs in the test data set comprises:
and performing dimension reduction on the metric elements in the test data set according to the optimal dimension reduction parameters, performing Bayesian classification regression calculation according to the optimal Bayesian parameters, and predicting the defects of the test data set.
6. The K-B based software bug prediction method of claim 1, further comprising:
and comparing the predicted defects and the actual defects of the test data set, and evaluating the optimal model.
7. The K-B based software bug prediction method of claim 1, wherein the data ratio of the training data set to the test data set is 7: 3.
8. A software defect prediction device based on K-B is characterized by comprising:
the data set dividing module is used for collecting software historical defect data and dividing the software historical defect data into a training data set and a test data set;
the dimension reduction module is used for reducing dimensions of the measurement elements in the training data set to obtain a feature vector;
the training module is used for carrying out Bayesian classification regression calculation training according to the training data set subjected to dimension reduction and the feature vector;
the optimal model establishing module is used for adjusting the dimension reduction parameters and the Bayesian parameters to obtain an optimal model;
and the prediction module is used for reducing the dimension of the measurement element in the test data set according to the optimal model, performing Bayesian classification regression calculation and predicting the defects of the test data set.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing executable instructions;
a processor executing the executable instructions in the memory to implement the K-B based software bug prediction method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the K-B based software defect prediction method of any one of claims 1-7.
CN202011077301.6A 2020-10-10 2020-10-10 K-B-based software defect prediction method and device, electronic equipment and medium Pending CN114428719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011077301.6A CN114428719A (en) 2020-10-10 2020-10-10 K-B-based software defect prediction method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011077301.6A CN114428719A (en) 2020-10-10 2020-10-10 K-B-based software defect prediction method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114428719A true CN114428719A (en) 2022-05-03

Family

ID=81309707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011077301.6A Pending CN114428719A (en) 2020-10-10 2020-10-10 K-B-based software defect prediction method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114428719A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115599698A (en) * 2022-11-30 2023-01-13 北京航空航天大学(Cn) Software defect prediction method and system based on class association rule

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115599698A (en) * 2022-11-30 2023-01-13 北京航空航天大学(Cn) Software defect prediction method and system based on class association rule

Similar Documents

Publication Publication Date Title
US10649882B2 (en) Automated log analysis and problem solving using intelligent operation and deep learning
CN111274134A (en) Vulnerability identification and prediction method and system based on graph neural network, computer equipment and storage medium
CN109948735B (en) Multi-label classification method, system, device and storage medium
US11269760B2 (en) Systems and methods for automated testing using artificial intelligence techniques
CN113128671B (en) Service demand dynamic prediction method and system based on multi-mode machine learning
CN111309577B (en) Spark-oriented batch application execution time prediction model construction method
US20230385597A1 (en) Multi-granularity perception integrated learning method, device, computer equipment and medium
CN110795736B (en) Malicious android software detection method based on SVM decision tree
CN114781532A (en) Evaluation method and device of machine learning model, computer equipment and medium
CN111292377A (en) Target detection method, target detection device, computer equipment and storage medium
CN111325344A (en) Method and apparatus for evaluating model interpretation tools
US20220164669A1 (en) Automatic machine learning policy network for parametric binary neural networks
CN114428719A (en) K-B-based software defect prediction method and device, electronic equipment and medium
US11593673B2 (en) Systems and methods for identifying influential training data points
CN114428720A (en) Software defect prediction method and device based on P-K, electronic equipment and medium
CN111026661B (en) Comprehensive testing method and system for software usability
KR20210158740A (en) Apparatus and method for clustering validation based on machine learning performance
CN113743594A (en) Network flow prediction model establishing method and device, electronic equipment and storage medium
JP7067634B2 (en) Robust learning device, robust learning method and robust learning program
Zhang et al. Hardware-aware one-shot neural architecture search in coordinate ascent framework
CN113760407A (en) Information processing method, device, equipment and storage medium
US7720771B1 (en) Method of dividing past computing instances into predictable and unpredictable sets and method of predicting computing value
CN112732549A (en) Test program classification method based on cluster analysis
TWI755774B (en) Loss function optimization system, method and the computer-readable record medium
US11687799B1 (en) Integrated machine learning and rules platform for improved accuracy and root cause analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination