CN114064459A - Software defect prediction method based on generation countermeasure network and ensemble learning - Google Patents

Software defect prediction method based on generation countermeasure network and ensemble learning Download PDF

Info

Publication number
CN114064459A
CN114064459A CN202111243350.7A CN202111243350A CN114064459A CN 114064459 A CN114064459 A CN 114064459A CN 202111243350 A CN202111243350 A CN 202111243350A CN 114064459 A CN114064459 A CN 114064459A
Authority
CN
China
Prior art keywords
data
training
software defect
software
defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111243350.7A
Other languages
Chinese (zh)
Inventor
孟海宁
郑毅
冯锴
朱磊
杨哲
张嘉薇
黑新宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202111243350.7A priority Critical patent/CN114064459A/en
Publication of CN114064459A publication Critical patent/CN114064459A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a software defect prediction method based on generation of a countermeasure network and ensemble learning, which comprises the following steps: step 1, preprocessing a software defect data set, dividing the preprocessed software defect data set into a training set and a testing set, and calculating a resampling rate; step 2, constructing and generating a confrontation network model; step 3, inputting the training set into the generated confrontation network training to obtain a trained generated confrontation network; step 4, generating new few-sample defect data by using the generated confrontation network after training according to the resampling rate to obtain a resampled training set; and 5, constructing a software defect strong classifier by using an AdaBoost method, and inputting the test set to the trained software defect strong classifier to obtain a software defect prediction result. The invention solves the problem of unbalanced software defect data, and improves the accuracy, the recall rate and the F-measure performance of the software defect prediction method.

Description

Software defect prediction method based on generation countermeasure network and ensemble learning
Technical Field
The invention belongs to the technical field of software defect prediction, and particularly relates to a software defect prediction method based on generation countermeasure network and ensemble learning.
Background
With the popularization of informatization, various software in use or in the development process may have software defects, the software defects can be found and positioned in time, and the method plays an important role in normal operation of a software system and perfection of software functions. Software Defect Prediction (SDP) is intended to find the modules from a Software project that are most likely to contain defects, where a module may be a function, a loop body, or a class, etc. Generally, the software defect prediction method is divided into the following three steps. Firstly, marking the defects and non-defects of the software modules according to the bug reports of each module in the historical software. Secondly, in order to train various software defect prediction models, different software defect measurement index information is used as the characteristics of the defect data. And finally, after certain data preprocessing, dividing a training set and a testing set, and performing training and testing work of the software defect prediction model.
At present, many software defect prediction methods are proposed, such as a support vector machine, a decision tree, a random forest, an AdaBoost method, and the like. However, these methods only briefly preprocess the original defect data, and make different modifications at the algorithm level, and do not consider the problems existing at the data level. Considering that the proportion of software defects in the whole software is small, the software defect data set belongs to typical class imbalance data, and for example, in the PCs 1, PC2 and PC3 of the NASA MDP software defect data set, a serious class imbalance phenomenon exists. As the sample data is resampled, the originally unbalanced data becomes balanced. Therefore, a resampling method is introduced into the software defect prediction method, and typical resampling methods such as a Synthetic least-Sampling Technique (SMOTE), an Adaptive Synthetic Sampling method (ADASYN), a Random Over Sampling (ROS), a Random Under Sampling (RUS), and the like. The SMOTE and ADASYNN method adopts a mode of artificially synthesizing few sample data to reduce the class imbalance problem of data, the rule of generating data is manually specified, and a proper rule is difficult to find in a real software defect data set. The ROS and RUS method directly reduces the data volume of the software defect prediction model training set, and the actual application effect is poor.
Disclosure of Invention
The invention aims to provide a software defect prediction method based on generation countermeasure network and ensemble learning, which can adaptively resample data through the generation countermeasure network when dealing with different software defect data sets, relieve the problem of data imbalance, and meanwhile, adopt the AdaBoost method of ensemble learning to classify and train software defects, thereby improving the accuracy, recall ratio and F-measure performance of the software defect prediction method.
The technical scheme adopted by the invention is as follows:
the software defect prediction method based on generation of the countermeasure network and ensemble learning comprises the following steps:
step 1, preprocessing a software defect data set, dividing the preprocessed data set into a training set and a testing set, calculating the ratio of defect data and non-defect data in the training set data, and recording the ratio as a resampling rate;
step 2, constructing and generating a confrontation network model, wherein the model comprises a generator and a discriminator;
step 3, training the training set data by adopting the generated countermeasure network until the generated countermeasure network converges or reaches the set iteration number, and obtaining the generated countermeasure network after training;
step 4, generating new minority class defect data according to the resampling rate calculated in the step 1 by using the generated confrontation network after training to obtain a training set after resampling;
step 5, training a software defect strong classifier by using an ensemble learning AdaBoost method, inputting the training set resampled in the step 4 into the software defect classifier, and finishing training; and (5) checking the effect of the classifier on the test set to obtain the software defect prediction evaluation index.
The invention is also characterized in that:
the loss functions of the generator and discriminator are shown in equations (1) and (2):
DLoss=BCELoss(xtrue,ytrue)+BCELoss(xfake,yfake) (1)
GLoss=BCELoss(xfake,ytrue) (2)
wherein DLoss is the loss function of the discriminator and GLoss is the loss function of the generator; the generator and the discriminator both adopt an Adam optimizer, and three super parameters including learning rate (learning rate), Betas and iteration times are used; x is the number offake、xtrue、ytrue、yfakeThe BCELoss is a binary cross entropy loss function, and the formula is as follows:
Figure BDA0003319978780000031
wherein xi,yiRespectively, the ith software defect sample and a corresponding label thereof, and n is the total number of samples.
The generator in the step 2 comprises an initial random noise input layer, a final generated data output layer and a block type structure consisting of a linear layer, a batch normalization layer and a LeakyReLU activation function, wherein the batch normalization layer is not arranged in the first block type structure behind the noise input layer, the batch normalization layer is also not arranged in the previous block type structure of the data output layer, the Sigmoid activation function is adopted in the data output layer, the noise input dimension is noise _ dim, the noise input dimension passes through the 3 block type structures and finally passes through the full connection layer of the output layer, and the output dimension is out _ dim;
the discriminator in the step 2 comprises an initial data input layer, a final discrimination result output layer and a block type structure consisting of a linear layer and a LeakyReLU activation function; the dimension of input data is out _ dim, and the dimension of the input data is the same as that of the data output by the generator; the identification result output layer adopts a linear layer and a Sigmoid activation function, and the final output result of the identifier is a numerical value from 0 to 1.
The step 3 comprises the following steps:
step 3.1, carrying out normalization operation on the training set, and constraining the numerical range of the training set to be between 0 and 1;
step 3.2, inputting random noise with a numerical range of 0 to 1 into a generator G to generate forged data, and marking the data label as 0; recording the normalized training set label in the step 3.1 as 1, inputting the training set label and the forged data into a discriminator D together, and distinguishing real data from the forged data;
and 3.3, repeating the step 3.2 until the confrontation network is generated to be converged or the set iteration number is reached, and obtaining a generator G and a discriminator D which are trained.
The specific operation of step 4 is as follows: step 4.1, calculating to obtain few-sample software defect data needing to be generated by using the generator G in the generated countermeasure network trained in the step 3 according to the resampling ratio k in the step 1, inputting random noise from 0 to 1 into the generator G obtained in the step 3, and obtaining generated few-class defect data through calculation of the generator;
and 4.2, performing reverse normalization operation opposite to the normalization operation in the step 3.2 on the minority class defect class data generated in the step 4.1, and merging the minority class defect class data with the original software defect-less sample class training data to obtain the re-sampled training set data.
The specific operation of step 5 shown is as follows:
step 5.1, dividing the re-sampled training set in the step 4 into ten-fold cross validation, dividing the whole training data set into 10 parts, taking each part as a validation set, inspecting the classifier during training, repeating the process for 10 times, finally carrying out weighted average to obtain the training performance index of the classifier, and finishing the training process to obtain a trained software defect strong classifier C;
and 5.2, using the software defect strong classifier C in the step 5.1 to carry out the inspection of the classification performance of the model on the test set to obtain the classification result of the strong classifier C on the test set, and inspecting the classification performance of the strong classifier C on the test set.
In step 5.1, the process of training the software defect strong classifier C to obtain the software defect strong classifier is implemented by adopting an AdaBoost learning method, and the process is shown in formulas (6), (7) and (8):
Figure BDA0003319978780000051
Figure BDA0003319978780000052
Figure BDA0003319978780000053
wherein alpha ismAs the weight of the mth decision tree weak classifier, Cm(x) The m decision tree weak classifier, C (x) the strong classifier obtained by ensemble learning, sign function for weighting the m weak classifier results, emError rate for classification of the mth decision tree weak classifier.
The invention has the beneficial effects that:
1) the invention provides a software defect prediction method based on generation of a countermeasure network and ensemble learning. The software defect data is resampled by using the generated countermeasure network, the defects of a manual synthesis type resampling method are overcome, the class unbalance problem of the software defect data is relieved, and the accuracy, the recall rate and the F-measure performance of the classifier can be improved when various software defect data sets are dealt with.
2) The method solves the problem of software defect prediction from two levels, uses the generation countermeasure network to resample data at the data level, uses the ensemble learning method to train the classifier at the algorithm level, combines two different angles to solve the problem of software defects, and improves the classification performance.
Drawings
FIG. 1 is a general flow chart of a software bug prediction method based on generation of countermeasure networks and ensemble learning according to the present invention;
FIG. 2 is a diagram of a countermeasure network structure generated in the software defect prediction method based on generation of the countermeasure network and ensemble learning according to the present invention;
FIG. 3 is a schematic diagram of the structure of a generator and a discriminator in a software bug prediction method based on generation countermeasure network and ensemble learning;
FIG. 4 is a graph showing the variation of the accuracy of the model on the test set before and after resampling the generated countermeasure network in the software defect prediction method based on generation of the countermeasure network and ensemble learning.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a software defect prediction method based on generation of a countermeasure network and ensemble learning, which comprises the following steps of:
step 1, preprocessing a software defect data set, dividing the preprocessed data set into a training set and a testing set, calculating the ratio of defect data and non-defect data in the training set data, and recording the ratio as a resampling rate;
step 2, constructing and generating a confrontation network model, wherein the model comprises a generator and a discriminator;
step 3, training the training set data by adopting the generated countermeasure network until the generated countermeasure network converges or reaches the set iteration number, and obtaining the generated countermeasure network after training;
step 4, generating new minority class defect data according to the resampling rate calculated in the step 1 by using the generated confrontation network after training to obtain a training set after resampling;
step 5, training a software defect strong classifier by using an ensemble learning AdaBoost method, inputting the training set resampled in the step 4 into the software defect classifier, and finishing training; and (5) checking the effect of the classifier on the test set to obtain the software defect prediction evaluation index.
In the step 1:
the defect data set used by the method is derived from a NASA software defect data set and comprises 12 sub data sets, and the used software defect measurement criteria comprise a McCabe measurement method, a HalStead scientific measurement method, a code line number measurement method and a CK measurement method.
Wherein the preprocessing operation is the removal of duplicate data, duplicate attributes, and exception data. And after the preprocessed data are randomly sampled, the training set and the test set are divided.
Wherein the step 2: the structure of the Generator (Generator) and Discriminator (Discriminator) is:
the generator comprises an initial random noise input layer, a final generated data output layer and a block type structure consisting of a linear layer, a batch normalization layer and a LeakyReLU activation function, wherein the batch normalization layer is not arranged in a first block type structure behind the noise input layer and a previous block type structure of the data output layer, the Sigmoid activation function is adopted in the data output layer, and the numerical range of the generated data is mapped to an interval from 0 to 1. The noise input dimension is noise _ dim, and the noise passes through the 3-block structure and finally passes through the full connection layer of the output layer, and the output dimension is out _ dim.
The discriminator comprises an initial data input layer, a final discrimination result output layer and a block type structure consisting of a linear layer and a LeakyReLU activation function. The data input layer receives real few-sample defect data and fake data generated by the generator, the dimension of the input data is out _ dim, and the dimension of the input data is the same as that of data output by the generator. The identification result output layer adopts a linear layer and a Sigmoid activation function, and the final output result of the identifier is an interval from 0 to 1. The input data firstly passes through a data input layer, then passes through a block type structure, and finally passes through a discriminator result output layer to obtain discriminator output results from 0 to 1.
The loss functions of the generator and discriminator are shown in equations (1) and (2):
DLoss=BCELoss(xtrue,ytrue)+BCELoss(xfake,yfake) (1)
GLoss=BCELoss(xfake,ytrue) (2)
where DLoss is the loss function of the discriminator and GLoss is the loss function of the generator. The generator and the discriminator both adopt an Adam optimizer and use three super parameters of learning rate (learning rate), Betas and iteration number. x is the number offake、xtrue、ytrue、yfakeThe BCELoss is a binary cross entropy loss function, and the formula is as follows:
Figure BDA0003319978780000091
wherein xi,yiRespectively, the ith software defect sample and a corresponding label thereof, and n is the total number of samples.
The specific steps of the step 3 are as follows:
step 3.1, carrying out normalization operation on the training set, and constraining the numerical range of the training set to be between 0 and 1, wherein the formula used in normalization is shown in formula (4), wherein xijIs the j characteristic value, x, of the ith software defect samplej_minIs the minimum value of the jth characteristic value, xj_maxThe value is the maximum value of the jth characteristic value, x is the normalized characteristic value and is a floating point number between 0 and 1;
Figure BDA0003319978780000092
step 3.2, inputting random noise with a numerical range of 0 to 1 into a generator G to generate fake data, and marking the data label as 0; marking the normalized training set label in the step 3.1 as 1, inputting the training set label and the forged data into a discriminator D together, and distinguishing real data from the forged data;
and 3.3, repeating the step 3.2 until the confrontation network is generated to be converged or the set iteration number is reached, and obtaining a generator G and a discriminator D which are trained.
The specific operation of step 4 is as follows: step 4.1, using the generator G in the generated countermeasure network trained in step 3, calculating to obtain the minority class defect class data to be generated according to the resampling ratio k calculated in step 1, inputting random noise from 0 to 1 into the generator G obtained in step 3, and obtaining the generated minority class defect class data through calculation of the generator:
N=(k-1)*T (5)
n is the data quantity of defect classes to be synthesized, k is the resampling rate, and T is the data quantity of a few classes of defect classes in the training set;
and 4.2, performing reverse normalization operation opposite to the normalization operation in the step 3.2 on the minority class defect class data generated in the step 4.1, and merging the minority class defect class data with the original software defect-less sample class training data to obtain a re-sampled training set.
The specific operation of step 5 is as follows:
step 5.1, the software defect classifier uses an AdaBoost integrated learning method, M identical decision tree weak classifiers are used as weak learners, and a final software defect strong classifier is obtained through weighted average; the training process specifically comprises the following steps:
and 5.1, dividing the training set resampled in the step 4 into ten-fold cross validation, dividing the whole training data set into 10 parts, taking each part as a validation set, checking the classifier during training, repeating the process for 10 times, and taking the weighted average as the final performance index of the classifier. And obtaining the trained software defect classifier C after the training process is finished. The process of obtaining the final classifier by training with an AdaBoost learning method is shown in formulas (6), (7) and (8):
Figure BDA0003319978780000101
Figure BDA0003319978780000102
Figure BDA0003319978780000103
wherein alpha ismAs the weight of the mth decision tree weak classifier, Cm(x) The M decision tree weak classifier, C (x) the strong classifier obtained by ensemble learning, sign function for weighting the M weak classifier results, emError rate for the mth decision tree weak classifier classification;
and 5.2, using the strong classifier C trained in the step 5.1, and using the accuracy rate, the recall rate and the F-measure performance indexes to carry out performance inspection on the trained software defect classifier on a test set.
According to the method provided by the invention, the generation of software defect data is carried out by adopting the generation of the countermeasure network at the data level, after the problem of data class unbalance is relieved, an ensemble learning method is used at the algorithm level, wherein a ten-fold cross validation mode is adopted during training, the randomness and the reasonability of the training process are ensured, and finally, the software defect classification accuracy is improved by combining the generation of the countermeasure network and the algorithm.
Example 1
Step 1 is executed, the software defect data set used in the present embodiment is derived from a NASA software defect data set, and includes 12 sub-data sets, and the software defect measurement criteria used include McCabe measurement, HalStead scientific measurement, code line number measurement, and CK measurement. Wherein each subdata set contains a different number of features, as shown in the feature column of table 1. After the data preprocessing operation is performed on the original NASA software defect data set, the duplicated data, the duplicated attributes and the abnormal data are cleared, and the NASA software defect data set after the preprocessing is shown in table 1.
The data set of table 1 was randomly sampled and the training set and test set were partitioned according to an 8:2 ratio. And respectively counting the number of the defect data and the number of the non-defect data in the training set data, and then calculating the ratio of the defect data to the non-defect data to obtain a resampling rate k for calculating the number of the resampling data.
TABLE 1 NASA software Defect dataset
Figure BDA0003319978780000121
Step 2 is executed, the generation countermeasure network constructed in the present embodiment is divided into two parts, i.e., a generator and a discriminator, wherein the specific construction of the generator and the discriminator is shown in fig. 3, the noise input dimension noise _ dim of the generator is set to 50, the output dimension of the generator and the input dimension out _ dim of the discriminator are set to be the feature numbers of the 12 sub-datasets used in the present embodiment, and the factor dataset features are different. In the Adam optimizer used by the generator and discriminator, the learning rate superparameter was set to 0.05, the Betas superparameter was set to (0.5,0.999), and the iteration number superparameter was set to 5000.
And (4) executing the steps 3-4.
Step 5 is executed, in this embodiment, the number M of weak classifiers set by ensemble learning is 200, 200 decision tree weak classifiers are trained on training set data, and finally a software defect strong classifier is obtained by weighting through updating of classifier weights, and the method performance is evaluated on a test set by using the strong classifier.
For the evaluation of the software defect prediction model, since the final prediction result is a two-classification result, the two-classification confusion matrix is calculated in this embodiment to obtain four evaluation indexes of accuracy, recall and F-measure, where the confusion matrix is shown in table 2.
TABLE 2 two-class confusion matrix
Figure BDA0003319978780000131
According to the True type of the sample and the result of model prediction, the method can be divided into four cases of True Positive (TP), False Negative (FN), False Positive (FP) and True Negative (TN), and the other four evaluation indexes are calculated in formulas (9), (10), (11) and (12).
The calculation formula of the accuracy (accuracycacy) is shown in formula (9).
Figure BDA0003319978780000132
The calculation formula of the precision ratio (precision) is shown in formula (10).
Figure BDA0003319978780000133
The calculation formula of the recall ratio (recall) is shown in formula (11).
Figure BDA0003319978780000134
The formula for F-measure is shown in equation (12).
Figure BDA0003319978780000135
For the classification results obtained in this example, the performance of the model on the test set was examined according to the performance evaluation indexes described in equations (9), (10), (11), and (12). Wherein, before and after resampling, the change line graphs of the software defect prediction model on the accuracy correct rate index are shown in fig. 4, taking a CM1 data set as an example. The classification method evaluation of accuracy, precision, recall and F-measure on the test set data (taking the average of 10 cross-validations) is shown in Table 3.
TABLE 3 test set Classification method evaluation
Evaluation index Accuracy rate Rate of accuracy Recall rate F-measure
CM1 78% 87% 86% 0.87
JM1 73% 83% 82% 0.83
KC1 66% 79% 73% 0.76
KC3 81% 85% 92% 0.89
MC1 85% 87% 88% 0.88
MC2 77% 76% 88% 0.81
MW1 82% 90% 89% 0.89
PC1 88% 94% 92% 0.93
PC2 84% 97% 87% 0.87
PC3 85% 91% 91% 0.91
PC4 86% 92% 91% 0.92
PC5 71% 80% 80% 0.80

Claims (7)

1. The software defect prediction method based on generation of the countermeasure network and ensemble learning is characterized by comprising the following steps of:
step 1, preprocessing a software defect data set, dividing the preprocessed data set into a training set and a testing set, calculating the ratio of defect data and non-defect data in the training set data, and recording the ratio as a resampling rate;
step 2, constructing and generating a confrontation network model, wherein the model comprises a generator and a discriminator;
step 3, training the training set data by adopting the generated countermeasure network until the generated countermeasure network converges or reaches the set iteration number, and obtaining the generated countermeasure network after training;
step 4, generating new minority class defect data according to the resampling rate calculated in the step 1 by using the generated confrontation network after training to obtain a training set after resampling;
step 5, training a software defect strong classifier by using an ensemble learning AdaBoost method, inputting the training set resampled in the step 4 into the software defect classifier, and finishing training; and (5) checking the effect of the classifier on the test set to obtain the software defect prediction evaluation index.
2. The software bug prediction method based on generation countermeasure network and ensemble learning of claim 1, wherein the loss functions of the generator and discriminator are as shown in formula (1) and formula (2):
DLoss=BCELoss(xtrue,ytrue)+BCELoss(xfake,yfake) (1)
GLoss=BCELoss(xfake,ytrue) (2)
wherein DLoss is the loss function of the discriminator and GLoss is the loss function of the generator; the generator and the discriminator both adopt an Adam optimizer, and three super parameters including learning rate (learning rate), Betas and iteration times are used; x is the number offake、xtrue、ytrue、yfakeThe BCELoss is a binary cross entropy loss function, and the formula is as follows:
Figure FDA0003319978770000021
wherein xi,yiRespectively, the ith software defect sample and a corresponding label thereof, and n is the total number of samples.
3. The software defect prediction method based on generation countermeasure network and ensemble learning of claim 2, wherein the generator in step 2 comprises an initial random noise input layer, a final generated data output layer and a block type structure composed of a linear layer, a batch normalization layer and a LeakyReLU activation function, wherein the batch normalization layer is not arranged in the first block type structure after the noise input layer, the batch normalization layer is also not arranged in the previous block type structure of the data output layer, a Sigmoid activation function is adopted in the data output layer, the noise input dimension is noise _ dim, the noise input dimension passes through 3 block type structures, the final output dimension is out _ dim;
the discriminator in the step 2 comprises an initial data input layer, a final discrimination result output layer and a block type structure consisting of a linear layer and a LeakyReLU activation function; the dimension of input data is out _ dim, and the dimension of the input data is the same as that of the data output by the generator; the identification result output layer adopts a linear layer and a Sigmoid activation function, and the final output result of the identifier is a numerical value from 0 to 1.
4. The software bug prediction method based on generation of countermeasure networks and ensemble learning of claim 1, wherein the step 3 comprises the steps of:
step 3.1, carrying out normalization operation on the training set, and constraining the numerical range of the training set to be between 0 and 1;
step 3.2, inputting random noise with a numerical range of 0 to 1 into a generator G to generate forged data, and marking the data label as 0; recording the normalized training set label in the step 3.1 as 1, inputting the training set label and the forged data into a discriminator D together, and distinguishing real data from the forged data;
and 3.3, repeating the step 3.2 until the confrontation network is generated to be converged or the set iteration number is reached, and obtaining a generator G and a discriminator D which are trained.
5. The software defect prediction method based on generation of countermeasure networks and ensemble learning of claim 1, wherein the specific operation of the step 4 is as follows: step 4.1, calculating to obtain few-sample software defect data needing to be generated by using the generator G in the generated countermeasure network trained in the step 3 according to the resampling ratio k in the step 1, inputting random noise from 0 to 1 into the generator G obtained in the step 3, and obtaining generated few-class defect data through calculation of the generator;
and 4.2, performing reverse normalization operation opposite to the normalization operation in the step 3.2 on the minority class defect class data generated in the step 4.1, and merging the minority class defect class data with the original software defect-less sample class training data to obtain the re-sampled training set data.
6. The software bug prediction method based on generation of countermeasure networks and ensemble learning of claim 1, wherein the specific operation of the step 5 is as follows:
step 5.1, dividing the re-sampled training set in the step 4 into ten-fold cross validation, dividing the whole training data set into 10 parts, taking each part as a validation set, inspecting the classifier during training, repeating the process for 10 times, finally carrying out weighted average to obtain the training performance index of the classifier, and finishing the training process to obtain a trained software defect strong classifier C;
and 5.2, using the software defect strong classifier C in the step 5.1 to carry out the inspection of the classification performance of the model on the test set to obtain the classification result of the strong classifier C on the test set, and inspecting the classification performance of the strong classifier C on the test set.
7. The software defect prediction method based on generation of the countermeasure network and ensemble learning of claim 6, wherein the training of the software defect strong classifier C in step 5.1 adopts AdaBoost learning method, and the process of training to obtain the software defect strong classifier is shown in formulas (6), (7), (8):
Figure FDA0003319978770000041
Figure FDA0003319978770000042
Figure FDA0003319978770000043
wherein alpha ismAs the weight of the mth decision tree weak classifier, Cm(x) The m decision tree weak classifier, C (x) the strong classifier obtained by ensemble learning, sign function for weighting the m weak classifier results, emError rate for classification of the mth decision tree weak classifier.
CN202111243350.7A 2021-10-25 2021-10-25 Software defect prediction method based on generation countermeasure network and ensemble learning Pending CN114064459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111243350.7A CN114064459A (en) 2021-10-25 2021-10-25 Software defect prediction method based on generation countermeasure network and ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111243350.7A CN114064459A (en) 2021-10-25 2021-10-25 Software defect prediction method based on generation countermeasure network and ensemble learning

Publications (1)

Publication Number Publication Date
CN114064459A true CN114064459A (en) 2022-02-18

Family

ID=80235445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111243350.7A Pending CN114064459A (en) 2021-10-25 2021-10-25 Software defect prediction method based on generation countermeasure network and ensemble learning

Country Status (1)

Country Link
CN (1) CN114064459A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356641A (en) * 2022-03-04 2022-04-15 中南大学 Incremental software defect prediction method, system, equipment and storage medium
CN114706780A (en) * 2022-04-13 2022-07-05 北京理工大学 Software defect prediction method based on Stacking ensemble learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356641A (en) * 2022-03-04 2022-04-15 中南大学 Incremental software defect prediction method, system, equipment and storage medium
CN114706780A (en) * 2022-04-13 2022-07-05 北京理工大学 Software defect prediction method based on Stacking ensemble learning

Similar Documents

Publication Publication Date Title
CN111626336B (en) Subway fault data classification method based on unbalanced data set
CN108228716B (en) SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine
CN104866578B (en) A kind of imperfect Internet of Things data mixing fill method
CN109739844B (en) Data classification method based on attenuation weight
CN106228980A (en) Data processing method and device
CN114064459A (en) Software defect prediction method based on generation countermeasure network and ensemble learning
CN112364352B (en) Method and system for detecting and recommending interpretable software loopholes
WO2021115186A1 (en) Ann-based program test method and test system, and application
CN112232526B (en) Geological disaster vulnerability evaluation method and system based on integration strategy
CN112924177A (en) Rolling bearing fault diagnosis method for improved deep Q network
CN115510965A (en) Bearing imbalance fault diagnosis method based on generated data fusion
CN110349597A (en) A kind of speech detection method and device
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN111427775A (en) Method level defect positioning method based on Bert model
CN112215696A (en) Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
CN115587543A (en) Federal learning and LSTM-based tool residual life prediction method and system
CN107392217B (en) Computer-implemented information processing method and device
CN109242165A (en) A kind of model training and prediction technique and device based on model training
Zheng et al. A novel imbalanced ensemble learning in software defect predication
CN107305565A (en) Information processor, information processing method and message processing device
CN117472789B (en) Software defect prediction model construction method and device based on ensemble learning
CN109902731A (en) A kind of detection method and device of the performance fault based on support vector machines
Dovbysh et al. Estimation of Informativeness of Recognition Signs at Extreme Information Machine Learning of Knowledge Control System.
CN113283467A (en) Weak supervision picture classification method based on average loss and category-by-category selection
CN108629680A (en) A kind of Risk Identification Method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination