CN113837266B - Software defect prediction method based on feature extraction and Stacking ensemble learning - Google Patents

Software defect prediction method based on feature extraction and Stacking ensemble learning Download PDF

Info

Publication number
CN113837266B
CN113837266B CN202111106611.0A CN202111106611A CN113837266B CN 113837266 B CN113837266 B CN 113837266B CN 202111106611 A CN202111106611 A CN 202111106611A CN 113837266 B CN113837266 B CN 113837266B
Authority
CN
China
Prior art keywords
data set
defect data
defect
model
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111106611.0A
Other languages
Chinese (zh)
Other versions
CN113837266A (en
Inventor
崔梦天
吴克奇
李卫榜
王琳
姜玥
罗洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Minzu University
Original Assignee
Southwest Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Minzu University filed Critical Southwest Minzu University
Priority to CN202111106611.0A priority Critical patent/CN113837266B/en
Publication of CN113837266A publication Critical patent/CN113837266A/en
Application granted granted Critical
Publication of CN113837266B publication Critical patent/CN113837266B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a software defect prediction method based on feature extraction and Stacking ensemble learning, which comprises the following steps: (1) performing feature extraction on the original data set by using kernel principal component analysis to obtain a defect data set DS' after dimension reduction; (2) the collaborative filtering algorithm provided by the invention is utilized to recommend an applicable sampling method for new software defect data, and the recommended sampling algorithm is utilized to carry out unbalanced processing on the defect data set DS 'to obtain a defect data set DS' after unbalanced processing; (3) clustering the defect data set DS 'by using a K-Means algorithm, and removing abnormal values deviating from the main stream category to obtain a defect data set DS'; (4) constructing a software defect prediction model based on Stacking ensemble learning, selecting proper classifiers for a base learner of a first layer and a meta-learner of a second layer, and constructing a software defect prediction model with good performance; (5) and comparing the integrated model with the base model and the main flow integrated model on the processed defect data set DS', so as to verify the performance of the integrated prediction model provided by the invention. Research results show that the KSSDP integrated prediction model provided by the invention has better performance than a base model and a mainstream integrated model.

Description

Software defect prediction method based on feature extraction and Stacking ensemble learning
Technical Field
The invention relates to the field of software defects, in particular to a software defect prediction method based on feature extraction and Stacking ensemble learning.
Background
As one of the main trends of the future development of the software industry, how to ensure the quality of the open source software is always a concern and a crucial issue in the industry. Due to the openness of the open source software and the community-based sharing performance, many bugs are often contained in source codes, so that the cost of defect processing is greatly increased, and the application and popularization of the open source software are hindered. Therefore, the method has important practical significance for identifying and controlling the defect introduction factors in the early stage of software development, making effective defect prevention measures, reducing the defect introduction rate and ensuring the software quality. The current mainstream defect prediction technology is to find out modules with defects by using some classical classification algorithms and improved algorithms in machine learning, and the following limitations mainly exist: (1) aiming at the problems that most defect data sets have high-dimensional data, redundant features and the like, the existing model reduces the dimensions by using a feature selection method, so that more original data features are lost, and adverse effects are caused on subsequent defect prediction, such as the problems that the accuracy is reduced, the F-Measure value is not high and the like. (2) At present, an applicable sampling method is selected for a software defect data set, manual selection is mostly carried out according to the experience of experts and the average performance of the sampling method, so that the efficiency of the selection of the sampling method is low, and the selection of the sampling method is too dependent on the experience of the experts. (3) At present, software defects are predicted by mostly adopting a single prediction model. Because the characteristics of the defect data are complex and changeable, a single prediction model has certain limitations, and when the characteristics of the defect data are complex, the prediction effect is possibly poor.
Disclosure of Invention
Technical problem to be solved
In order to overcome the defects of the existing defect prediction method, the invention provides a software defect prediction method based on feature extraction and Stacking ensemble learning, so that the problems in the prior art are solved.
Technical scheme
A software defect prediction method based on feature extraction and Stacking ensemble learning is characterized by comprising the following steps:
step 1: extracting features of the original data set, extracting features of the original defect data set DS through Kernel Principal Component Analysis (KPCA) to reduce the feature dimension of the data set, and reducing the dimension of the original defect data set DS to 10 dimensions to obtain a reduced-dimension defect data set DS';
step 2: the invention provides a collaborative filtering sampling recommendation method facing to software defect data, which comprises the steps of firstly sorting sampling methods, selecting a classification algorithm by a user according to the characteristics of the defect data, sampling historical defect data by using a mainstream sampling method according to a measurement index accure, sorting the mainstream sampling method on the historical defect data by using the selected classification algorithm to obtain the performance sorting of the mainstream sampling method, then carrying out data similarity mining, calculating a Jaccard (Jaccard) similarity coefficient between new defect data and the historical defect data when the new defect data and the historical defect data belong to the same item, taking the Jaccard similarity coefficient as a similarity score between the new defect data and the historical defect data, carrying out characteristic extraction on the new defect data and the historical defect data when the new defect data and the historical defect data belong to different items, normalization is carried out, then the Euclidean distance between new defect data and historical defect data is calculated, the reciprocal of the Euclidean distance is used as a similarity score between the new defect data and the historical defect data, finally, recommendation based on users is carried out, information of the ranking of a sampling method and data similarity is combined, the sampling method suitable for the new software defect data is recommended by utilizing a collaborative filtering algorithm, and unbalanced processing is carried out on a defect data set DS 'by utilizing the recommended sampling algorithm to obtain a defect data set DS' after unbalanced processing;
and step 3: detecting and eliminating abnormal values in the defect data set DS ', clustering the defect data set DS ' by using a K-Means algorithm, and eliminating abnormal values deviating from the main stream category to obtain a defect data set DS ';
and 4, step 4: constructing a software defect prediction model based on Stacking ensemble learning, selecting proper classifiers for a base learner of a first layer and a meta-learner of a second layer, and constructing a software defect prediction model (KSSDP) with good performance;
and 5: and performing performance verification on the KSSDP integrated prediction model, and comparing the integrated model with the base model and the main flow integrated model on the processed defect data set DS', so as to verify the performance of the KSSDP integrated prediction model.
Advantageous effects
The invention provides a software defect prediction method (KSSDP) based on feature extraction and Stacking ensemble learning, which adopts kernel principal component analysis to extract features of a defect data set so as to reduce the correlation among data features, and uses a collaborative filtering sampling recommendation method facing software defect data to solve the class imbalance problem of the defect data set, the method firstly calculates the prediction accuracy of a training set after the processing of a mainstream sampling method under a classification algorithm selected by a user, orders the sampling method by taking the prediction accuracy as a measurement standard, then calculates the similarity between a new defect data set and a historical defect data set by using an Jacard similarity coefficient, or calculates the reciprocal of Euclidean distance between the new defect data set and the historical defect data set as the similarity, and finally obtains a recommendation score through the ranking score and the similarity value, recommends an applicable sampling method for the user according to the recommendation score, clustering the defect data set by using a K-Means algorithm according to the number of positive and negative samples of the balanced data set so as to find and remove abnormal values of the data set, constructing a software defect prediction model by using Stacking ensemble learning, and performing simulation experiments on a plurality of NASA defect data sets, wherein the experiment results show that the model has better performance than a base model and a mainstream integration model; therefore, when the sampling method is recommended for the new data set, manual intervention is not needed, the automatic selection of the applicable sampling method for the new defect data set is realized, and meanwhile, the software defect prediction method based on feature extraction and Stacking ensemble learning provided by the invention has good performance on the false alarm rate and the F-Measure index and is better in generalization than a base model and a main flow ensemble model.
Drawings
FIG. 1 is a flow diagram of a KSSDP integrated prediction model
FIG. 2 is a flowchart of a collaborative filtering sampling recommendation method for software defect data
FIG. 3 is a diagram of a recommended network architecture containing 3 sets of historical data and 4 sampling methods
FIG. 4-FIG. 5 are graphs comparing the false alarm rate (Pf) and F-Measure of the basis model
FIG. 6-FIG. 7 are graphs comparing the false alarm rate (Pf) and F-Measure of the optimal mainstream integration model
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
the invention provides a software defect prediction method (KSSDP) based on feature extraction and Stacking ensemble learning, wherein a flow chart of a KSSDP ensemble prediction model is shown in figure 1, and the technical scheme adopted for solving the technical problem comprises the following contents:
1. feature extraction on raw data set
And mapping the original data points in the low-dimensional feature space to the high-dimensional feature space by using a nonlinear mapping kernel function, further extracting representative features, and characterizing a complex defect data structure. The core principle is as follows:
let x be mapped into u by a corresponding function ρ, which is defined as follows:
u=ρ(x) (1)
the kernel function maps the data to a corresponding N-dimensional feature space, and the data in the mapping feature space meets the following specific conditions:
Figure BDA0003272661290000021
2. collaborative filtering sampling recommendation method for software defect data
The flow chart of the collaborative filtering sampling recommendation method for the software defect data is shown in the attached figure 2. The method adopts a ten-fold cross validation method to train a historical data set, and sets OD (origin-destination) of the historical data set to { OD (origin-destination) }1,OD2,…,ODmEach data set OD iniDividing into ten parts, taking one part as a test set test and taking the rest nine parts as a training set train in sequence. Applying mainstream sampling method set T ═ T1,T2,…,TnAny sampling method T injAnd carrying out unbalanced processing on the training set train to obtain a balanced training set BTrain. The user selects a proper classification algorithm CA ═ { CA ] from the classification algorithm library1,CA2,…,CApAnd learning on the balanced training set BTrain by utilizing a classification algorithm CA to obtain a predictor P. Evaluating the test set test by using a predictor P to obtain a corresponding performance metric value accurve, and calculating the sampling method ranking score RankScore [ i ] of the test set test on different historical data sets aiming at different sampling methods][j]The inventionThe score RankScore [ i ] was ranked by the sampling method using the following formula][j]The calculation of (2):
RankScore[i][j]=RankScore[i][j]+accuracy (3)
through ten iterations, the accumulated sum of the performance metric value accuracy under the condition of taking different parts as the test set is finally obtained, and the OD of the data set isiUsing a sampling method TjThe cumulative sum of the performance metric values, accuracy, is stored in RankScore [ i][j]. Further aiming at the accumulation and the average value of the performance metric value accurve, the invention uses the following formula to calculate the average value of the accumulation and the average value of the performance metric value accurve:
RankScore[i][j]=RankScore[i][j]/10 (4)
and finally, taking the average RankScore [ i ] [ j ] of the sum of the performance metric values as the basis for sorting by a sampling method.
The invention calculates new defect data set ND and history defect data set OD ═ OD1,OD2,…,ODmAnd when the new defect data and the historical defect data belong to the same item, calculating the intersection number and the union number of the features of the data sets, and taking the quotient of the intersection number and the union number as a similarity score SimiSore of the new defect data set and the historical defect data set. For each historical defect data set ODiThe invention performs the calculation of the similarity score SimiScore using the following formula:
Figure BDA0003272661290000031
when the new defect data and the historical defect data belong to different items, the feature extraction is carried out on the new defect data set ND and the historical defect data set OD by utilizing kernel principal component analysis, the dimension of the ND and the dimension of the OD are reduced to 10, and the invention uses the following formula to carry out feature x on the ND and the ODk(k ═ 1,2, …, 10) normalized calculations:
Figure BDA0003272661290000032
recording the characteristics of the new defect data set ND after normalization as ykHistorical defect data set ODiNormalized feature is zkThe invention uses the following formula to perform the new defect data set ND and the historical defect data set ODiAnd (3) calculating the Euclidean distance between the two elements:
Figure BDA0003272661290000033
in order to ensure that the value range of the similarity is between 0 and 1, for each historical defect data set ODiThe invention performs the calculation of the similarity score, SimiScore, using the following formula:
Figure BDA0003272661290000034
the invention correspondingly multiplies the ranking score and the similarity score of the sampling method, takes the product as a recommendation score RecScore, and adopts a TOP-N sequencing method to recommend the applicable sampling method to the new data set. For a sampling method set T ═ T1,T2,…,TnAny sampling method T injBased on the historical defect data set OD ═ OD1,OD2,…,ODmThe present invention recommends a score Recscore [ j ] using the following formula for sampling method for m historical defect data sets in]The calculation of (2):
Figure BDA0003272661290000035
for different sampling methods, after the recommendation score RecScore is calculated, sorting is carried out according to the size of the recommendation score RecScore value to obtain the Top-N sorting of the sampling methods, and further the sampling method suitable for automatically recommending new software defect data is realized. The invention provides a schematic diagram of a recommendation network structure consisting of three historical defect data sets and four sampling methods, and particularly refers to fig. 3, wherein the information of the ranking and the data similarity of the sampling methods is combined to construct a three-layer recommendation network, the connection weight between the first layer and the second layer is the similarity score between the data sets, and the connection weight between the second layer and the third layer is the ranking score.
3. Detecting outliers of a defect data set
Based on the principle of clustering criterion function minimization, data are divided into different classes through iteration, the generated classes are as compact and independent as possible, and abnormal values deviating from the main stream classes are removed. The core principle is as follows:
for i ═ 1,2, …, m, sample x is calculatediAnd each centroid vector muj(j ═ 1,2, …, k) distance dij=||xij||2According to the smallest dijX is to beiClass λ corresponding to the divisioniAt this time, update
Figure BDA0003272661290000041
4. Software defect prediction model based on Stacking ensemble learning is constructed
In the Stacking ensemble learning model, the base learner of the first layer needs to satisfy the following characteristics: the method has the advantages of strong enough performance, small correlation and gap as much as possible, and performance that cannot be too large.
According to the characteristics, the KNN model, the random forest model and the Gaussian naive Bayes model are selected as the first-layer base learner. The KNN model is widely applied, and has the characteristics of mature theory, high efficiency of training mode and the like; the random forest model is formed by integrating decision trees as basic models under a Bagging integration framework, and has a good effect in practical application; the Gaussian naive Bayes model can be trained only by a small amount of samples, is good at processing separable binary data, and has the characteristics of high training speed and the like. Since overfitting may occur in the Stacking ensemble learning model, in order to reduce the overfitting, the meta learner at the second layer in the Stacking model should use a simpler model for learning, so the logistic regression model is selected as the meta learner at the second layer.
5. Performance verification of KSSDP integrated prediction model
And comparing general indexes such as false alarm rate and F-Measure to analyze the performance of the KSSDP integrated model, the base model and the mainstream integrated model. As can be seen from fig. 4, the false alarm rate of KSSDP on the data set JM1 is higher, and the random forest model, the gaussian naive bayes model, and the logistic regression model are all lower than the KSSDP model, wherein the gaussian naive bayes model is even 18.8% lower than the false alarm rate of the KSSDP model. On the data set PC4, the false alarm rate of the Gaussian naive Bayes model is lower than that of the KSSDP model, and the difference is 7.1%. However, the KSSDP model performs well on the remaining 6 data sets, and reaches or approaches the lowest false alarm rate, and the control of the false alarm rate of the KSSDP model is still more ideal as a whole.
As can be seen from FIG. 5, the KSSDP model proposed by the present invention has good performance on the 8 data sets, and F-Measure is a comprehensive index which can objectively reflect the quality of a model. The KSSDP model obtains the highest value on 8 data sets, and can show that the performance of the KSSDP model is superior to that of a single base classifier, including a KNN model, a random forest model, a Gaussian naive Bayesian model and a logistic regression model, thereby further showing that the KSSDP model provided by the invention is feasible and effective.
The invention selects the optimal main stream integration model for comparison, and if the KSSDP integration prediction model has better performance than the optimal main stream integration model, the KSSDP integration prediction model has better performance than all main stream integration models naturally. On the two indexes of F-Measure and Pf, the optimal mainstream integration model is the ExtraTrees model. As can be seen from FIGS. 6 and 7, the method of the present invention maintains a high F-Measure value and a low false alarm rate on the 8 data sets. The ExtraTrees model has a higher F-Measure value than the KSSDP model on the data set PC1, but the KSSDP model has a higher F-Measure value than the ExtraTrees model on the remaining 7 data sets. In the aspect of the false alarm rate index, although the extratres model is lower than the KSSDP model in the data sets JM1 and PC1, the fluctuation of the extratres model is relatively large, the average false alarm rate of the extratres model is 11.36%, the average false alarm rate of the KSSDP model is 9.7%, and the extratres model is not stable enough. In conclusion, the method provided by the invention has excellent performance, because the overall performance on 8 data sets is better than that of the base model and the mainstream integration model.

Claims (5)

1. A software defect prediction method based on feature extraction and Stacking ensemble learning is characterized by comprising the following steps:
step 1: extracting the characteristics of the original defect data set DS through kernel principal component analysis KPCA, and reducing the dimension of the original defect data set DS to 10 dimensions to obtain a reduced-dimension defect data set DS';
step 2: performing unbalanced processing on a defect data set DS ', firstly performing sampling method sequencing, performing unbalanced processing on a historical defect data set by using a plurality of sampling methods, selecting a classification algorithm by a user according to the characteristics of the defect data set DS', sequencing the plurality of sampling methods on the historical defect data set by using the classification algorithm, obtaining a sampling method ranking score RankScore by taking a measurement index ACCURACY as a basis, then performing data similarity mining, calculating an Jacard similarity coefficient between the defect data set DS 'and the historical defect data set when the defect data set DS' and the historical defect data set belong to the same item, taking the Jacard similarity coefficient as a similarity score SimScore between the defect data set DS 'and the historical defect data set, performing characteristic extraction on the historical defect data set when the defect data set DS' and the historical defect data set belong to different items, normalizing the defect data set DS 'and the historical defect data set, calculating a Euclidean distance between the defect data set DS' and the historical defect data set, taking the reciprocal of the Euclidean distance as a similarity score SimiScore between the defect data set DS 'and the historical defect data set, finally recommending based on a user, combining information of a sampling method ranking score RankScore and the similarity score SimiScore, recommending a sampling method for the defect data set DS' by utilizing a collaborative filtering algorithm, and performing unbalanced processing on the defect data set DS 'by utilizing the recommended sampling method to obtain a defect data set DS' after unbalanced processing;
and step 3: detecting and removing abnormal values in the defect data set DS ', clustering the defect data set DS ' by using a K-Means algorithm, and removing abnormal values deviating from most categories to obtain a defect data set DS ';
and 4, step 4: constructing a software defect prediction model KSSDP based on Stacking ensemble learning, selecting a KNN model, a random forest model and a Gaussian naive Bayesian model as a first-layer base learner, and selecting a logistic regression model as a second-layer meta learner, thereby constructing a software defect prediction model KSSDP;
and 5: and verifying the performance of the software defect prediction model KSSDP, and comparing the software defect prediction model KSSDP with a base model and a plurality of integration models on the processed defect data set DS', so as to verify the performance of the software defect prediction model KSSDP.
2. The software defect prediction method of claim 1, wherein a non-linear mapping kernel function is used to map the original data points in the low-dimensional feature space to the high-dimensional feature space, and further to extract representative features and characterize complex defect data structures.
3. The software defect prediction method of claim 1, wherein step 2 specifically comprises: training a historical defect data set by adopting a ten-fold cross validation method, dividing the historical defect data set into ten parts, taking one part of the ten parts as a test set test, taking the remaining nine parts as a training set train, and applying a sampling method set T ═ T { (T {)1,T2,…,TnSampling method in (1) } TjCarrying out unbalance processing on the training set train to obtain an unbalance processed training set BTrain, selecting an adaptive classification algorithm CA in a classification algorithm library by a user, learning on the training set BTrain by using the classification algorithm CA to obtain a predictor P, and using the predictor P to carry out unbalance processing on a test set test, testing to obtain a corresponding measurement index accurve, and finally taking the average value of the sum of the measurement indexes as a sampling method ranking score RankScore;
when the defect data set DS ' and the historical defect data set belong to the same item, calculating the similarity between the defect data set DS ' and the historical defect data set, calculating the intersection number and the union number between the data set characteristics, taking the quotient of the intersection number and the union number as the similarity score SimiScore of the defect data set DS ' and the historical defect data set, when the defect data set DS ' and the historical defect data set belong to different items, utilizing kernel principal component analysis to extract the characteristics of the historical defect data set, reducing the dimension to 10 dimensions, normalizing the defect data set DS ' and the historical defect data set, calculating the square sum of the characteristic difference values of each dimension of the defect data set DS ' and the historical defect data set to obtain the arithmetic square root, and taking the reciprocal of the obtained arithmetic as the similarity score SimiScore of the defect data set DS ' and the historical defect data set;
and correspondingly multiplying the ranking score RankScore of the sampling method by the similarity score SimiScore, taking the product as a recommendation score RecScore, and recommending the sampling method to the defect data set DS' by adopting a TOP-N sorting method according to the value of the recommendation score RecScore.
4. The software defect prediction method of claim 1, characterized in that based on the principle of clustering criterion function minimization, the abnormal values deviating from most categories are removed by iteratively dividing the data into different categories.
5. The software defect prediction method of claim 1, wherein the performance of the software defect prediction model KSSDP is analyzed with respect to the base model and the plurality of integration models by comparing the false alarm rate with the F-Measure index.
CN202111106611.0A 2021-09-22 2021-09-22 Software defect prediction method based on feature extraction and Stacking ensemble learning Active CN113837266B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111106611.0A CN113837266B (en) 2021-09-22 2021-09-22 Software defect prediction method based on feature extraction and Stacking ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111106611.0A CN113837266B (en) 2021-09-22 2021-09-22 Software defect prediction method based on feature extraction and Stacking ensemble learning

Publications (2)

Publication Number Publication Date
CN113837266A CN113837266A (en) 2021-12-24
CN113837266B true CN113837266B (en) 2022-05-20

Family

ID=78960344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111106611.0A Active CN113837266B (en) 2021-09-22 2021-09-22 Software defect prediction method based on feature extraction and Stacking ensemble learning

Country Status (1)

Country Link
CN (1) CN113837266B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114706780A (en) * 2022-04-13 2022-07-05 北京理工大学 Software defect prediction method based on Stacking ensemble learning
CN118052813A (en) * 2024-04-12 2024-05-17 深圳特朗达照明股份有限公司 Intelligent detection device and method for LED lamp

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659207A (en) * 2019-09-02 2020-01-07 北京航空航天大学 Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN110674865A (en) * 2019-09-20 2020-01-10 燕山大学 Rule learning classifier integration method oriented to software defect class distribution unbalance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11010385B2 (en) * 2019-10-10 2021-05-18 Sap Se Data security through query refinement

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659207A (en) * 2019-09-02 2020-01-07 北京航空航天大学 Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN110674865A (en) * 2019-09-20 2020-01-10 燕山大学 Rule learning classifier integration method oriented to software defect class distribution unbalance

Also Published As

Publication number Publication date
CN113837266A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
Lin et al. Parameter tuning, feature selection and weight assignment of features for case-based reasoning by artificial immune system
CN108985380B (en) Point switch fault identification method based on cluster integration
CN108921604B (en) Advertisement click rate prediction method based on cost-sensitive classifier integration
CN111222332A (en) Commodity recommendation method combining attention network and user emotion
CN113837266B (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
CN110866782B (en) Customer classification method and system and electronic equipment
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN111583031A (en) Application scoring card model building method based on ensemble learning
CN108681742B (en) Analysis method for analyzing sensitivity of driver driving behavior to vehicle energy consumption
CN111582538A (en) Community value prediction method and system based on graph neural network
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
CN112085525A (en) User network purchasing behavior prediction research method based on hybrid model
CN111338950A (en) Software defect feature selection method based on spectral clustering
Tembusai et al. K-nearest neighbor with k-fold cross validation and analytic hierarchy process on data classification
CN116257759A (en) Structured data intelligent classification grading system of deep neural network model
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN109783633A (en) Data analysis service procedural model recommended method
CN114942974A (en) E-commerce platform commodity user evaluation emotional tendency classification method
CN114519508A (en) Credit risk assessment method based on time sequence deep learning and legal document information
CN111708865A (en) Technology forecasting and patent early warning analysis method based on improved XGboost algorithm
CN106775694A (en) A kind of hierarchy classification method of software merit rating code product
Krishnamoorthy et al. Comparative study of machine learning algorithms for product recommendation based on user experience
CN114358813B (en) Improved advertisement putting method and system based on field matrix factorization machine
Shanthini et al. Advanced Data Mining Enabled Robust Sentiment Analysis on E-Commerce Product Reviews and Recommendation Model
CN110609961A (en) Collaborative filtering recommendation method based on word embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant