CN111090579A - Software defect prediction method based on Pearson correlation weighting association classification rule - Google Patents
Software defect prediction method based on Pearson correlation weighting association classification rule Download PDFInfo
- Publication number
- CN111090579A CN111090579A CN201911114620.7A CN201911114620A CN111090579A CN 111090579 A CN111090579 A CN 111090579A CN 201911114620 A CN201911114620 A CN 201911114620A CN 111090579 A CN111090579 A CN 111090579A
- Authority
- CN
- China
- Prior art keywords
- tendency
- defect
- defective
- rule
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3608—Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a software defect prediction method based on a Classification rule of weighted correlation of Pearson correlation, which comprises the steps of extracting a metadata set of software measurement to be detected according to a corresponding static code analysis tool; evaluating the correlation between each metric element and each category by a feature selection method based on the Pearson correlation, sequencing the correlation, and taking the first 30-50% with larger sequencing value as the selected metric element; substituting the selected measurement elements and the corresponding categories into a software defect prediction model based on the Classification rule of the Pearson correlation weighting correlation, predicting and outputting a prediction result; the method utilizes a valuable, high-performance and understandable rule model to reveal the relevance of the defect tendency and the characteristics, improves the high performance and the understandability of a software defect prediction model, and improves the accuracy of a prediction result.
Description
Technical Field
The invention relates to the technical field of software defect prediction, in particular to a software defect prediction method based on a Clason correlation weighted association classification rule.
Background
As the size and complexity of software has increased,ensuring software quality is increasingly important. Software defect prediction is a method for improving software quality, and is an effective means for lightening software code examination and improving test resource allocation. The commonly used software defect prediction method mainly includes classification, regression, clustering and association rule (association rule). The association rule is an algorithm for mining the association relation hidden in the data, and the expression form of the production formula is in accordance with the thinking logic of people, and the expression form of the production formula comprises ifXthenC orWherein And X ∩ C is phi, the front piece X is a set of features (or terms or metrics), the back piece C can be a set of features or categories (e.g., normal and reverse), I is I1,I1,···,Im-1C is a set of terms that contains m features, and whether an association rule is useful is typically measured in terms of support (support) and confidence (confidence), with the greater the support and confidence, the more useful the rule. The support reveals the probability of X occurring simultaneously with C, while the confidence reveals the probability of C occurring at the time X occurs. In the software development life cycle, the interest has been drawn that the association rule-based software defect prediction is beneficial to improve the prediction performance and understand the association between the defect conditions (such as defect tendency, defect type, workload) and the metric.
The software defect prediction based on the association rule mainly comprises data preprocessing, association rule model training and model evaluation. The traditional association rule algorithm Apriori and the mutation algorithm have proved to have higher accuracy (accuracy); due to the fact that software defect prediction data has the characteristics of high dimension and class imbalance, the high dimension means that a data set has a high characteristic dimension, class imbalance means that the number of samples (majority class) of a certain class is far larger than that (minority class) of other classes, association rules are prone to generate a large number of majority class (low risk/defect-free tendency) rules in a model training process, and the minority class (high risk/defect-free tendency) rules are prone to be ignored, so that the majority class has high accuracy, and the minority class has low prediction performance.
Most of the conventional association rule algorithms rely on support degree and confidence degree threshold values, if the support degree and confidence degree threshold values are set too high, a few classes of rules are difficult to mine, and the performance of predicting the few classes is low; if the support and confidence thresholds are set too low, too many rules may be generated, eventually leading to an overfitting phenomenon. Therefore, it is necessary to change the traditional association rule mining and analysis and solve the framework mode which only depends on the support degree and the confidence degree.
In addition, the conventional classical class association rule algorithms (CBA, CMAR, GARC, ECBA, etc.) consider all feature items to have equally important meanings, and do not consider different importance among features. For example, the quality model constructed in the actual data set finds that the code line number and the blank line number have different degrees of influence on the high-risk module respectively. Therefore, the significance of the measure element cannot be ignored, otherwise the discovered knowledge is influenced to have a larger value. Later, a number of weighted association rule mining was proposed, taking into account the importance of different individuals.
The characteristic of the rule-based software defect prediction model different from other irregular models is that the importance of a single attribute characteristic is considered, and the importance of an item set is also considered. The weighted-based associative classification rules turn the center of knowledge discovery towards important terms rather than indiscriminately conducting combinatorial explosion. Attribute features with higher impact are given higher weight and attribute features with lower impact are given lower weight. Thus, high-weighted attribute features will still have a higher priority in the rule set, and lower-weighted attribute features will have a lower priority and will be pruned during the pruning stage.
The domain experience weighting association rule algorithm gives the feature weight through previous experience of subjective cognition, the domain experience method has a good effect when the domain experience weighting association rule algorithm is oriented to a data set with few features, but when the domain experience weighting association rule algorithm is oriented to high-dimensional software defect data, the domain experience method cannot ensure that all the features can give accurate weights, the domain experience weighting association rule has certain subjectivity, the found rules tend to known rule patterns with small values, and therefore hidden knowledge is prevented from being mined. Therefore, the automated weighted association rule algorithms attract a lot of attention, however, the algorithms have the problems of being very sensitive to unbalanced data, only applying sparse data but not applying dense data, and the like.
Therefore, the problem of feature weight distribution of software defect unbalanced data is solved better, the accuracy of prediction is improved, and the problem to be solved by practitioners in the same industry is urgent.
Disclosure of Invention
In view of the above problems, the invention provides a software defect prediction method based on a weighting and association classification rule of the pearson correlation, and a software defect prediction model based on the weighting and association classification rule of the pearson correlation solves the problem that few rules with defects are found when software defect data are faced according to an improved association rule algorithm.
In a first aspect, an embodiment of the present invention provides a software defect prediction method based on a pearson correlation weighted association classification rule, including:
s1, extracting a to-be-detected software measurement metadata set according to a corresponding static code analysis tool;
s2, evaluating the correlation between each metric element and each category by a feature selection method based on the Pearson correlation, sequencing the correlation, and taking the first 30-50% with larger sequencing value as the selected metric element;
and S3, substituting the selected measurement elements and the corresponding categories into a software defect prediction model based on the Classification rule of weighting and associating the Pearson correlation, predicting and outputting a prediction result.
In one embodiment, the software defect prediction model based on the Classification rule weighted by the Pearson correlation is constructed by the following steps:
s31, preprocessing the acquired software defect data to obtain a software defect training set and a software defect testing set; the software defect training set is a defect tendency training set and a defect-free tendency training set;
s32, respectively setting a defect tendency type minimum weighting support degree and a defect-free tendency type minimum weighting support degree, and respectively constructing a defect tendency association rule set and a defect-free tendency association rule set for the defect tendency training set and the defect-free tendency training set by utilizing a weighted Apriori algorithm;
s33, sorting the defective tendency association rule set and the non-defective tendency association rule set;
s34, carrying out rule pruning optimization on the defective tendency association rule set and the non-defective tendency association rule set simultaneously by using a conflict rule pruning method and a redundant rule pruning method;
and S35, predicting the software module by using the optimized defect tendency association rule set and defect-free tendency association rule set.
In one embodiment, the constructing step further comprises:
and S36, verifying the software defect prediction model based on the Classification rule weighted by the Pearson correlation through the test set.
In one embodiment, the step S31 includes:
s311, carrying out first horizontal division on each data set D according to the category, and dividing the data sets into a defective tendency data set TS and a non-defective tendency data set FS, wherein the formulas D and TS ∩ FS are phi;
s312, performing first vertical division on a defective tendency data set TS and a non-defective tendency data set FS into a defective tendency training set TTS, a defective tendency test set Ttest, a non-defective tendency training set FTS and a non-defective tendency test set Ftest, wherein D is TTS ∪ Ttest, TTS ∩ Ttest is phi, D is FTS ∪ Ftest and FTS ∩ Ftest is phi, and the defective tendency training set TTS and the non-defective tendency training set FTS form a complete training set;
s313, evaluating the correlation between each feature and each category of the training set by adopting a feature selection method based on the Pearson correlation, sequencing the correlations, and taking the first 30-50% with larger sequencing value as the selected feature;
s314, discretizing the training set after feature selection by using a 5-order equal frequency method, and averagely dividing the training set into five intervals according to the magnitude sequence of numerical attributes of each feature;
and S315, performing second horizontal division on the discretized training set DS, dividing the discretized training set DS into a discretized defective tendency training set FDS and a discretized nondefective training set TDS, and meeting DS (FDS ∪ TDS) and FDS ∩ TDS (phi).
In one embodiment, the step S312 includes:
and 5-fold cross validation is adopted for evaluation, 4-fold data are selected as training sets respectively, 1-fold data are selected as test sets, data are randomized each time and repeated for 10 times, and the defective tendency data set TS and the non-defective tendency data set FS are vertically divided for the first time.
In one embodiment, the step S32 includes:
s321, calculating the correlation between the selected feature set and the class in the discretized training set by utilizing the Pearson correlation, wherein the feature is a measurement element; the calculation formula is as follows:
wherein n represents the number of features; k represents a feature (0,1,2, …. n); class represents class, defect prone class or defect free prone class; rank (k) represents the relevance of the kth feature;
s322, calculating the weight of the selected feature according to the correlation of the selected feature, wherein the calculation formula is as follows:
wherein n represents the number of features, k represents the feature (0,1,2, …. n), and weight (k) represents the weight of the feature k;
s323, calculating the weight of the item set according to the characteristic weight, wherein the calculation formula of the weight of the item set is as follows:
weight(itemset)=weight(X1)*weight(X2)···weight(Xk)
=weight(item1)*weight(item2)···weight(itemk) (3);
s324, item set support is obtained by calculating the probability of simultaneous occurrence of the feature item sets in the training set, item set weight is calculated by a formula (3), and weighting support is calculated by using the item set support and the item set weight, wherein the calculation formula is as follows:
wsupp(r)=weight(itemset)*support(itemset) (4);
and S325, respectively mining the weighted candidate item meeting the minimum support degree of the defect tendency class and the weighted candidate item meeting the minimum support degree of the defect-free tendency class. And the weighted candidate item sets not less than the minimum support degree are called frequent item sets, and the frequent item sets respectively form a class association rule that the former part is a characteristic set and the latter part is a class label (defect tendency class or defect-free tendency class). Since confidence in this document indicates the probability of a defective or non-defective class occurring in the event of a precondition occurrence. In each class of training set, the feature item set is changed, but the class label is constant, that is, the confidence is equal to 1, the confidence cannot evaluate the accuracy of each rule in this document, and the influence of the confidence on the class association rule can be ignored.
In one embodiment, the step S33 includes:
the defective tendency association rule set and the non-defective tendency association rule set are sorted in an order of priority based on the weighted support, the length of the predecessor, and the generation.
In one embodiment, the conflict rule pruning in step S34 is: when the front pieces of the two rules have the same characteristics and the back pieces belong to different categories, both the two rules are removed;
the redundancy rule pruning in the step S34 is: when the front parts of a plurality of rules have inclusion relations, the back parts belong to the same class, and a redundant rule with small weighting support degree is pruned by adopting a weighting support degree-based mode.
In one embodiment, the step S35 includes:
if the sum of the weighted support degrees of the software module meeting the high-risk defect rules is larger than the sum of the weighted support degrees of the software module meeting the low-risk defect-free rules, classifying the software module into a defect tendency class, otherwise classifying the software module into a defect-free tendency class; if none of the rules satisfies the software module, the software module is classified as a defect tendency class, such as the formula:
wherein c represents a defective or non-defective class, rcIndicating a defective class rule or a non-defective class rule satisfying the condition, and R indicates a defective class rule set or a non-defective class rule set.
In one embodiment, the step S36 includes:
substituting the test set into the software defect prediction model based on the Classification rule of the weighted correlation of the Pearson correlation to obtain an evaluation result;
performing classification evaluation on the evaluation result by using G-mean, Mcc and Balance;
the evaluation indexes of G-mean, Mcc and Balance are defined as follows:
wherein TP is the number of classes with defects classified as defective, FN is the number of classes with defects classified as non-defective, FP is the number of classes with defects classified as defective, TN is the number of classes with defects classified as non-defective;
and comparing the evaluation indexes of the evaluation results of the software defect prediction model based on the Clason correlation weighting association classification rule with the evaluation indexes of the software defect prediction model based on the Clason correlation weighting association classification rule and the prediction model of the classical association rule algorithm.
The invention provides a software defect prediction method based on a Classification rule of weighted correlation of Pearson correlation, which distributes different weights to measurement elements by using a weighting method based on the Pearson correlation, wherein the weights calculate the correlation according to the characteristics of a sample, thereby avoiding the subjectivity of manual setting; the software defect prediction model based on the Classification rule of the weighted correlation of the Pearson correlation improves the accuracy of the prediction result.
Furthermore, an automatic feature weighting method insensitive to unbalanced data is provided by constructing a software defect prediction framework facing to unbalanced data and based on an associated classification rule, and is combined with the associated classification rule generation, sequencing, pruning and prediction processes to form a valuable, high-performance and understandable rule model, so that the association between defect tendency and a measurement element is revealed, and the high performance and the intelligibility of the software defect prediction model are improved. And the four stages of generation, sorting, pruning and prediction of the association rule are optimized by using the weighting support degree, so that the accuracy of the prediction result is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of a software defect prediction method based on Classification rules of weighted correlation with Pearson correlation provided by an embodiment of the present invention;
FIG. 2 is a flow of constructing a software defect prediction model based on Classification rules of weighted correlation with Pearson correlation;
FIG. 3 is a flow chart of a pre-process provided by an embodiment of the present invention;
fig. 4 is a flowchart of step S32 according to an embodiment of the present invention;
fig. 5 is a flowchart of another construction of the software defect prediction model based on the pearson correlation weighting association classification rule according to the embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Referring to fig. 1, a software defect prediction method based on a pearson correlation weighting association classification rule provided by an embodiment of the present invention includes:
s1, extracting a to-be-detected software measurement metadata set according to a corresponding static code analysis tool;
s2, evaluating the correlation between each metric element and each category by a feature selection method based on the Pearson correlation, sequencing the correlation, and taking the first 30-50% with larger sequencing value as the selected metric element;
and S3, substituting the selected measurement elements and the corresponding categories into a software defect prediction model based on the Classification rule of weighting and associating the Pearson correlation, predicting and outputting a prediction result.
In this embodiment, the static code analysis tool in step S1 is a code analysis tool that scans the program code by using techniques such as syntax analysis, lexical analysis, and control flow analysis to verify whether the code satisfies the criteria such as normative, security, reliability, and maintainability without executing the program.
In steps S2-S3, feature selection is performed on the metric element data set based on the pearson correlation, the first 30-50% with larger ranking value is used as the selected metric element, and then the selected metric element and the corresponding category are substituted into a software defect prediction model based on the pearson correlation weighted association classification rule to perform prediction and output the prediction result.
The method uses a software defect prediction model based on the Pearson correlation weighting correlation classification rule to carry out prediction, solves the problem that valuable high-risk defect rules cannot be mined when the correlation classification rule is used for constructing the software defect prediction model, considers a multi-support generation mechanism and characteristic weight in the rule generation process, improves the quantity and quality of the valuable high-risk defect rules, has higher prediction performance and intelligibility, and improves the accuracy of the prediction result.
As shown in fig. 2, in one embodiment, the software defect prediction model based on the pearson correlation weighted association classification rule is constructed by the steps of:
s31, preprocessing the acquired software defect data to obtain a software defect training set and a software defect testing set; the software defect training set is a defect tendency training set and a defect-free tendency training set;
s32, respectively setting a defect tendency type minimum weighting support degree and a defect-free tendency type minimum weighting support degree, and respectively constructing a defect tendency association rule set and a defect-free tendency association rule set for the defect tendency training set and the defect-free tendency training set by utilizing a weighted Apriori algorithm; wherein, the defect tendency can be understood as a high risk defect tendency, and the defect-free tendency can be understood as a low risk defect tendency.
S33, sorting the defective tendency association rule set and the non-defective tendency association rule set;
s34, carrying out rule pruning optimization on the defective tendency association rule set and the non-defective tendency association rule set simultaneously by using a conflict rule pruning method and a redundant rule pruning method;
and S35, predicting the software module by using the optimized defect tendency association rule set and defect-free tendency association rule set.
In one embodiment, the constructing step further comprises:
and S36, verifying the software defect prediction model based on the Classification rule weighted by the Pearson correlation through a test set.
In the embodiment, according to the characteristics that the association classification rule has good prediction performance and intelligibility, aiming at the problem that the valuable high-risk defect class rule is few under the unbalanced data condition, preprocessing work such as feature selection, data division, discretization and the like is firstly carried out on software defect measurement metadata, and then an association rule-based software defect prediction model is constructed on a preprocessed software defect training set, wherein the software defect prediction model comprises a weighted association rule generation stage, a weighted association rule sorting stage, a weighted association rule pruning stage and a weighted association rule voting stage; and finally, verifying and evaluating the software defect prediction model based on the weighting and associating classification rule of the pearson correlation by using the divided test set. The method solves the problem that valuable high-risk defective rules cannot be mined when the associated classification rules are used for constructing the software defect prediction model, a multi-support generation mechanism and characteristic weights are considered in the rule generation process, the quantity and quality of the valuable high-risk defective rules are improved, and the method has high prediction performance and high understandability.
The method is used for carrying out expansion optimization on the basis of the associated classification rules, and solves the problem that valuable high-risk defective class rules are easy to ignore to a certain extent.
As shown in fig. 3, in an embodiment, the step S31 includes:
s311, carrying out first horizontal division on each data set D according to the category, and dividing the data sets into a defective tendency data set TS and a non-defective tendency data set FS, wherein the formulas D and TS ∩ FS are phi;
s312, performing first vertical division on a defective tendency data set TS and a non-defective tendency data set FS into a defective tendency training set TTS, a defective tendency test set Ttest, a non-defective tendency training set FTS and a non-defective tendency test set Ftest, wherein D is TTS ∪ Ttest, TTS ∩ Ttest is phi, D is FTS ∪ Ftest and FTS ∩ Ftest is phi;
s313, evaluating the correlation between each feature and each category of the training set by adopting a feature selection method based on the Pearson correlation, sequencing the correlations, and taking the first 30-50% with larger sequencing value as the selected feature;
s314, discretizing the training set after feature selection by using a 5-order equal frequency method, and averagely dividing the training set into five intervals according to the magnitude sequence of numerical attributes of each feature;
and S315, performing second horizontal division on the discretized training set DS, dividing the discretized training set DS into a discretized defective tendency training set FDS and a discretized nondefective training set TDS, and meeting DS (FDS ∪ TDS) and FDS ∩ TDS (phi).
Wherein, step S312 includes:
and 5-fold cross validation is adopted for evaluation, 4-fold data are selected as training sets respectively, 1-fold data are selected as test sets, data are randomized each time and repeated for 10 times, and the defective tendency data set TS and the non-defective tendency data set FS are vertically divided for the first time.
As shown in fig. 4, in one embodiment, step S32 includes:
s321, calculating the correlation between the selected feature set and the class in the discretized training set by utilizing the Pearson correlation, wherein the calculation formula is as follows:
wherein n represents the number of features; k represents a feature (0,1,2, …. n), class represents class, defect prone class or defect free prone class; rank (k) represents the relevance of the kth feature;
s322, calculating the weight of the selected feature according to the correlation of the selected feature, wherein the calculation formula is as follows:
wherein n represents the number of features, k represents the feature (0,1,2, …. n), and weight (k) represents the weight of the feature k;
s323, calculating the weight of the item set according to the characteristic weight, wherein the calculation formula of the weight of the item set is as follows:
weight(itemset)=weight(X1)*weight(X2)···weight(Xk)
=weight(item1)*weight(item2)···weight(itemk) (3);
s324, item set support is obtained by calculating the probability of simultaneous occurrence of the feature item sets in the training set, item set weight is calculated by a formula (3), and weighting support is calculated by the item set support and the item set weight, wherein the calculation formula is as follows:
wsupp(r)=weight(itemset)*support(itemset) (4);
and S325, respectively mining the weighted candidate item meeting the minimum support degree of the defect tendency class and the weighted candidate item meeting the minimum support degree of the defect-free tendency class. And the weighted candidate item sets not less than the minimum support degree are called frequent item sets, and the frequent item sets respectively form a class association rule that the former part is a characteristic set and the latter part is a class label (defect tendency class or defect-free tendency class). Since the confidence of the present embodiment indicates the probability of occurrence of a defective class or a non-defective class in the case where a antecedent condition occurs. In each class of training set, the feature item set is changed, but the class label is constant, that is, the confidence is equal to 1, in this embodiment, the confidence cannot evaluate the accuracy of each rule, and the influence of the confidence on the class association rule can be ignored.
In this embodiment, in steps S321-S322, the correlation between each feature and each class is calculated by using pearson correlation, and the stronger the correlation between the feature and each class is, the greater the weight given to the feature is, and the correlation between the features is as shown in formula (1):
where n represents the number of features, k represents the feature (0,1,2, …. n), class represents the class (defect or defect-free trend class), and rank (k) represents the correlation of the kth feature.
Since the correlation may be very close to 1 and also very close to 0, the correlation cannot be directly equivalent to the weight, and it can be known from the fact that the weighting support is equal to the product of the importance and the support, when the rule antecedent is more constrained, the weighting support is smaller, which causes a large amount of rule loss, and even if a lower minimum support threshold is set, only a small amount of rules are generated. To accommodate the relevance of different features, the weight of each feature is equal to the mean of the relevance of the feature to the class and the relevance of all features to the class, respectively, such as equation (2)
Wherein n represents the number of features, k represents the feature (0,1,2, …, n), and weight (k) represents the weight of the feature k;
in steps S33-S35, the present invention considers only rules with rule back-parts as categoriesThe influence of useless rules on a rule generating process is greatly reduced, the traditional association rules are usually measured by two indexes of support degree support and confidence, in the rule generating stage, each type of rule is generated according to the type, P (X ∩ C) is P (X), wherein X represents a feature set, C represents a class label (defective class or non-defective class), the confidence of each rule of each type of training set is calculated to be 1 according to a support degree formula (3) and a confidence degree formula (4),that is, the confidence coefficient is not affected in the software defect prediction framework, and at this time, only the weighting support degree remains in the index of the measurement rule; therefore, all weighted frequent item sets satisfying the set defect tendency type minimum weighted support degree and defect free tendency type minimum weighted support degree are mined based on the weighted association rule, and then all weighted frequent item sets are combined to generate the weighted association rule.
The term set weight (itemset) is given by equation (3), and it can be seen that when the length of the term set is 1, the weight of the term set is equal to the weight of the term. And the weighted support wsupp of the rule can be derived from equation (4). The method is different from the traditional associated classification rule generation mode and evaluation index, the constant confidence coefficient can not measure the rule any more, and the method adopts the majority class weighting support degree and the minority class weighting support degree as the importance of measuring each rule.
support(X=>C)=P(X∩C)
Where X represents a set of features and C represents a class label (defective or non-defective).
Where X represents a set of features and C represents a class label (defective or non-defective).
weight(itemset)=weight(X1)*weight(X2)···weight(Xk)
=weight(item1)*weight(item2)···weight(itemk) (3)
Wherein, XkDenotes the kth feature, weight (X)k) The weight of the kth feature, which is equivalent to the term item in the present invention, is expressed.
wsupp(r)=weight(itemset)*support(itemset) (4)
In one embodiment, the step S33 includes:
the defective tendency association rule set and the non-defective tendency association rule set are sorted in an order of priority based on the weighted support, the length of the predecessor, and the generation.
In the present embodiment, two rules R1 and R2 are assumed, and the rule R1 is said to have a higher priority than the rule R2 when:
if the weighted support of rule R1 is greater than the weighted support of rule R2, then R1 is better than R2;
when the weighted support of rule R1 and rule R2 are equal, R1 is better than R2 if the antecedent length cardinality of rule R1 is greater than the length of rule R2;
when the weighted support, length cardinality, etc. of rule R1 and rule R2 are equal, R1 outperforms R2 if rule R1 is generated earlier than rule R2.
In one embodiment, the conflict rule pruning in step S34 is: when the front parts of the two rules have the same characteristics and the back parts belong to different categories, the two rules are rejected. Such asAndkeeping any one rule has an effect on the other, so in order to avoid such a deviation, we intend to eliminate both rules.
The conflict rule pruning in step S34 is: when the front parts of a plurality of rules have inclusion relations, the back parts belong to the same class, and a redundant rule with small weighting support degree is pruned by adopting a weighting support degree-based mode. Such asAndrules that typically include more features in the antecedentRules that are considered special rules, whereas antecedents contain fewer featuresConsidering as a generalization rule, a special rule is opposite to a generalization rule. After considering the importance of the feature, the weighted term set and the weighted support degree are not only related to the occurrence frequency but also related to the importance. The weighted support of the particular rule at this timeGreater than generalized ruleThe weighting support degree of the method can not simply remove the special rules, and a new mode is needed for pruning. And pruning the redundancy rule with small weighting support degree by adopting a weighting support degree-based mode.
In one embodiment, in step S35:
when a new sample of software module instances is used for defect prediction using the proposed correlation-weighted classification rule based on pearson's correlation, it is possible that there are both defect (or high-risk defect) propensity rules and defect-free propensity (or low-risk defect) propensity rules that satisfy the module. Each weighted association classification rule only contains one index of weighted support degree, so the embodiment of the invention considers the sum of the support degrees of which the weights meet the conditions as the judgment standard.
For a module example sample, if the sum of the weighted support degrees of the high-risk defect rules is larger than the sum of the weighted support degrees of the low-risk defect-free rules, classifying the module as a defect tendency class, otherwise classifying the module as a defect-free tendency class; if none of the rules satisfies the module, then the module is classified as a defective tendency class, such as the formula:
wherein c represents a defective or non-defective class, rcIndicating a defective class rule or a non-defective class rule satisfying the condition,r denotes a defective class rule set or a non-defective class rule set.
In one embodiment, step S36 includes:
substituting the test set into a software defect prediction model based on a Pearson correlation weighting correlation classification rule to obtain an evaluation result;
carrying out classification evaluation on the evaluation result by utilizing G-mean, Mcc and Balance;
the evaluation indexes of G-mean, Mcc and Balance are defined as follows:
wherein TP is the number of classes with defects classified as defective, FN is the number of classes with defects classified as non-defective, FP is the number of classes with defects classified as defective, TN is the number of classes with defects classified as non-defective;
and comparing the evaluation indexes of the evaluation results of the software defect prediction model based on the Clason correlation weighting association classification rule with the evaluation indexes of the software defect prediction model based on the Clason correlation weighting association classification rule and the prediction model of the classical association rule algorithm.
Through the verification of the algorithm provided by the test set, all evaluation indexes of the software defect prediction model are improved, and the tendency capability of identifying a few types of defects is improved.
As shown in fig. 5, in a specific embodiment, the software defect prediction model based on the pearson correlation weighting association classification rule is constructed by the following steps:
the data of the embodiment is derived from the public PROMISE dataset which is composed of a plurality of code features and a class feature and can be downloaded from the tera-premium website (http:// openscience. us/repo/defect/mccabehalsted /). In the invention, 9 object-oriented project-oriented software defect data sets are collected, the number of modules of the data sets is 117 at the minimum and 965 at the maximum, the number of examples is more than 100, the number of features is 21, and the detailed description is given in table 1.
Table 1 description of the software under test
Name of item | Number of modules | Number of code features | Number of defective modules | Rate of defects |
Ant-1.3 | 187 | 21 | 20 | 0.107 |
Ant-1.4 | 178 | 21 | 40 | 0.225 |
Ant-1.5 | 293 | 21 | 32 | 0.109 |
Ant-1.6 | 351 | 21 | 92 | 0.262 |
Ant-1.7 | 745 | 21 | 166 | 0.223 |
Camel-1.0 | 339 | 21 | 13 | 0.038 |
Camel-12 | 608 | 21 | 216 | 0.355 |
Camel-1.4 | 872 | 21 | 145 | 0.166 |
Camel-1.6 | 965 | 21 | 188 | 0.195 |
In the present invention, 20 feature metrics and a class label are proposed, as shown in table 2.
Table 220 feature metrics and 1 class tag
Firstly, performing feature selection, data division and discretization on each data set; dividing each data set into a defective data subset and a non-defective data subset, randomly dividing the defective data subset and the non-defective data subset into a 4-fold training set and a 1-fold test set respectively, merging the defective data training set and the non-defective data training set into a training set, and merging the defective data test set and the non-defective data test set into a test set. And then, carrying out feature selection on the training set, selecting 30% -50% with a larger ranking value as a final feature subset, and carrying out 5-order equal-frequency discretization on the training set after feature selection. Then, a Pearson correlation method is adopted to automatically acquire the weight of the features, the weight changes along with the change of data, a defective association rule set and a non-defective association rule set are respectively generated by combining a weighted Apriori algorithm, the rule sets are sequenced, pruned and predicted, 10 times of training and testing are carried out, and the final result is averaged.
And finally, carrying out classification evaluation on the result of software defect prediction. And respectively selecting an associated classification rule CBA, a naive Bayes NB, a decision table DT, a random forest RF and a PART algorithm as a reference classifier, and comparing the reference classifier with the proposed algorithm CWCCAR. For the unbalanced software defect data, G-mean, Mcc, and Balance are used as evaluation indexes of each classifier. The three evaluation indices are defined as follows:
TP is the number of classifying positive samples (defective classes) as positive samples, FN is the number of classifying positive samples as negative samples (non-defective classes), FP is the number of classifying negative samples as positive samples, TN is the number of classifying negative samples as negative samples.
The three indexes are insensitive to data distribution, and are beneficial to comparison among different software defect prediction model algorithms.
The distribution of the classification effect obtained in the experiment is shown in the following tables:
TABLE 3 Balance
Dataset | CWCAR | CBA | PART | DT | RF | NB |
ant-1.3 | 0.793 | 0.648 | 0.439 | 0.345 | 0.546 | 0.772 |
ant-1.4 | 0.615 | 0.528 | 0.409 | 0.331 | 0.474 | 0.486 |
ant-1.5 | 0.726 | 0.523 | 0.390 | 0.321 | 0.494 | 0.705 |
ant-1.6 | 0.774 | 0.630 | 0.662 | 0.611 | 0.660 | 0.769 |
ant-1.7 | 0.752 | 0.580 | 0.645 | 0.635 | 0.639 | 0.722 |
camel-1.0 | 0.586 | 0.302 | 0.293 | 0.293 | 0.298 | 0.423 |
camel-1.2 | 0.537 | 0.431 | 0.465 | 0.351 | 0.511 | 0.508 |
camel-1.4 | 0.649 | 0.345 | 0.370 | 0.380 | 0.401 | 0.602 |
camel-1.6 | 0.589 | 0.341 | 0.349 | 0.324 | 0.391 | 0.511 |
Mean value | 0.669 | 0.481 | 0.447 | 0.399 | 0.490 | 0.611 |
Median number | 0.649 | 0.523 | 0.409 | 0.345 | 0.494 | 0.602 |
Ordering (Rank) | 1.00 | 4.22 | 4.39 | 5.61 | 3.56 | 2.22 |
p-value | - | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 |
TABLE 4 MCC
TABLE 5 Gmean
Dataset | CWCAR | CBA | PART | DT | RF | NB |
ant-1.3 | 0.809 | 0.691 | 0.327 | 0.121 | 0.539 | 0.789 |
ant-1.4 | 0.620 | 0.550 | 0.317 | 0.133 | 0.456 | 0.468 |
ant-1.5 | 0.735 | 0.556 | 0.259 | 0.096 | 0.508 | 0.719 |
ant-1.6 | 0.779 | 0.665 | 0.686 | 0.637 | 0.687 | 0.775 |
ant-1.7 | 0.754 | 0.618 | 0.675 | 0.669 | 0.668 | 0.729 |
camel-1.0 | 0.598 | 0.113 | 0.000 | 0.000 | 0.011 | 0.283 |
camel-1.2 | 0.530 | 0.424 | 0.461 | 0.252 | 0.518 | 0.512 |
camel-1.4 | 0.652 | 0.268 | 0.314 | 0.317 | 0.376 | 0.608 |
camel-1.6 | 0.586 | 0.259 | 0.254 | 0.170 | 0.355 | 0.522 |
Mean value | 0.674 | 0.460 | 0.366 | 0.266 | 0.458 | 0.601 |
Median number | 0.652 | 0.55 | 0.317 | 0.17 | 0.508 | 0.608 |
Ordering (Rank) | 1.00 | 4.11 | 4.61 | 5.50 | 3.56 | 2.22 |
p-value | - | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 |
Where W represents win, T represents level and L represents loss.
From the tables and analysis, the software defect prediction method based on the Classification rule of the weighting correlation of the Pearson correlation provided by the invention has good effects on balance, MCC and Gmean:
(1) compared with the classic association rule algorithm CBA, the balance mean value of the proposed CWCCAR algorithm on 9 data sets is relatively improved by 39.1%, the mean value on the MCC index is relatively improved by 22.7%, and the mean value on the Gmean index is relatively improved by 46.3%; the balance median was relatively increased by 24.1% on the 9 data sets, the median was relatively decreased by 8.6% on the mcc scale, and the median was relatively increased by 18.5% on the Gmean scale. This shows that compared with the classic association rule algorithm CBA, the proposed software defect prediction method CWCAR based on the pearson correlation weighted association classification rule has relatively higher mean and median on three indexes insensitive to class imbalance.
(2) Compared with the traditional rule/tree-based algorithm PART and the decision table DT, the balance mean value of the proposed CWCAR algorithm on 9 data sets is at least relatively improved by 49.7%, the mean value on the MCC index is at least relatively improved by 64.2%, and the mean value on the Gmean index is at least relatively improved by 84.1%; the balance median was at least a relative increase of 58.7% over the 9 data sets, the median was at least a relative increase of 97.5% on the mcc scale, and the median was at least a relative increase of 105.7% on the Gmean scale. This shows that compared with the conventional rule/tree-based algorithm PART and decision table DT, the proposed software defect prediction method CWCAR based on the pearson correlation weighted association classification rule has relatively higher mean and median on three indexes insensitive to class imbalance.
(3) Compared with an irregular algorithm naive Bayes NB and a random forest RF, the proposed CWCCAR algorithm has the advantages that the balance mean value of 9 data sets is at least relatively improved by 9.5%, the mean value of an MCC index is at least relatively improved by 5.7%, and the mean value of a Gmean index is at least relatively improved by 12.2%; the balance median was at least a relative 7.8% improvement over the 9 data sets, the median was at least a relative 6.8% improvement over the mcc index, and the median was at least a relative 7.2% improvement over the Gmean index. This shows that the proposed software defect prediction method CWCAR based on the pearson correlation weighted association classification rule has relatively higher mean and median on three indexes insensitive to class imbalance compared with the irregular algorithm naive bayes NB and random forest RF.
(4) The three index ranks rank are respectively compared, the smaller the rank is, the better the prediction performance is, on the balance index, the rank of the CWCCAR is 1, the rank of the naive Bayes NB is only 2.22 after the CWCCAR, and the rank of the decision table DT is 5.61, which is the most back ranked one; on the MCC index, CWCAR is ranked 1.89, na iotave bayes NB is ranked 2.44 next to CWCAR, decision table DT is ranked 4.89, the last ranked one; on the Gmean index, CWCAR is ranked 1, na iotave bayes NB is ranked 2.22 next to CWCAR, and decision table DT is ranked 5.50, the last ranked one. This shows that compared with five types of standard classifiers, the proposed software defect prediction method CWCCAR based on the Classon correlation weighting classification rule has the smallest ranking rank on three indexes insensitive to class imbalance and the best prediction performance.
(5) In order to better verify the significance difference of the algorithm proposed by the invention from other algorithms, the invention adopts nonparametric Wilcoxon Signed-Rank sum test, the significance level is 0.05, and the samples do not need to follow normal distribution. According to the comparison results of the three indexes, the p-value values of the balance index and the Gmean index are both less than 0.05, and the provided algorithm CWCCAR is completely superior to other baseline classifiers; on the MCC index, the proposed algorithm CWCAR is completely more significant than other classifiers (p-value is less than 0.05) except that the algorithm is not significantly different from naive bayes (p-value is 0.086> 0.05). This shows that compared with five kinds of reference classifiers, the proposed software defect prediction method CWCCAR based on the Classon correlation weighting and association classification rule is completely superior to other four reference classifiers except naive Bayes in three indexes insensitive to class imbalance.
In conclusion, the software defect prediction method based on the Classification rule of the weighted correlation of the Pearson correlation has better prediction performance.
The invention provides a software defect prediction method based on a Pearson correlation weighting correlation classification rule, which provides an automatic feature weighting method insensitive to unbalanced data by constructing a software defect prediction framework oriented to the unbalanced data and based on the correlation classification rule, and combines the automatic feature weighting method with the generation, sequencing, pruning and prediction processes of the correlation classification rule to form a valuable, high-performance and understandable rule model, thereby revealing the correlation between defect tendency and features, improving the high performance and understandability of the software defect prediction model and improving the accuracy of a prediction result.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. The software defect prediction method based on the Classification rule of the weighted correlation of the Pearson correlation is characterized by comprising the following steps:
s1, extracting a to-be-detected software measurement metadata set according to a corresponding static code analysis tool;
s2, evaluating the correlation between each metric element and each category by a feature selection method based on the Pearson correlation, sequencing the correlation, and taking the first 30-50% with larger sequencing value as the selected metric element;
and S3, substituting the selected measurement elements and the corresponding categories into a software defect prediction model based on the Classification rule of weighting and associating the Pearson correlation, predicting and outputting a prediction result.
2. The prediction method of claim 1, wherein the step of constructing the software defect prediction model based on the pearson correlation weighted relevance classification rule comprises:
s31, preprocessing the acquired software defect data to obtain a software defect training set and a software defect testing set; the software defect training set is a defect tendency training set and a defect-free tendency training set;
s32, respectively setting a defect tendency type minimum weighting support degree and a defect-free tendency type minimum weighting support degree, and respectively constructing a defect tendency association rule set and a defect-free tendency association rule set for the defect tendency training set and the defect-free tendency training set by utilizing a weighted Apriori algorithm;
s33, sorting the defective tendency association rule set and the non-defective tendency association rule set;
s34, carrying out rule pruning optimization on the defective tendency association rule set and the non-defective tendency association rule set simultaneously by using a conflict rule pruning method and a redundant rule pruning method;
and S35, predicting the software module by using the optimized defect tendency association rule set and defect-free tendency association rule set.
3. The prediction method of claim 2, wherein the constructing step further comprises:
and S36, verifying the software defect prediction model based on the Classification rule weighted by the Pearson correlation through the test set.
4. The prediction method according to claim 2, wherein the step S31 includes:
s311, carrying out first horizontal division on each data set D according to the category, and dividing the data sets into a defective tendency data set TS and a non-defective tendency data set FS, wherein the formulas D and TS ∩ FS are phi;
s312, performing first vertical division on a defective tendency data set TS and a non-defective tendency data set FS into a defective tendency training set TTS, a defective tendency test set Ttest, a non-defective tendency training set FTS and a non-defective tendency test set Ftest, wherein D is TTS ∪ Ttest, TTS ∩ Ttest is phi, D is FTS ∪ Ftest and FTS ∩ Ftest is phi, and the defective tendency training set TTS and the non-defective tendency training set FTS form a complete training set;
s313, evaluating the correlation between each feature and each category of the training set by adopting a feature selection method based on the Pearson correlation, sequencing the correlations, and taking the first 30-50% with larger sequencing value as the selected feature;
s314, discretizing the training set after feature selection by using a 5-order equal frequency method, and averagely dividing the training set into five intervals according to the magnitude sequence of numerical attributes of each feature;
and S315, performing second horizontal division on the discretized training set DS, dividing the discretized training set DS into a discretized defective tendency training set FDS and a discretized nondefective training set TDS, and meeting DS (FDS ∪ TDS) and FDS ∩ TDS (phi).
5. The prediction method according to claim 4, wherein the step S312 comprises:
and 5-fold cross validation is adopted for evaluation, 4-fold data are selected as training sets respectively, 1-fold data are selected as test sets, data are randomized each time and repeated for 10 times, and the defective tendency data set TS and the non-defective tendency data set FS are vertically divided for the first time.
6. The prediction method according to claim 4, wherein the step S32 includes:
s321, calculating the correlation between the selected feature set and the class in the discretized training set by utilizing the Pearson correlation, wherein the feature is a measurement element; the calculation formula is as follows:
wherein n represents the number of features; k represents a feature (0,1,2, …. n); class represents class, defect prone class or defect free prone class; rank (k) represents the relevance of the kth feature;
s322, calculating the weight of the selected feature according to the correlation of the selected feature, wherein the calculation formula is as follows:
wherein n represents the number of features, k represents the feature (0,1,2, …. n), and weight (k) represents the weight of the feature k;
s323, calculating the weight of the item set according to the characteristic weight, wherein the calculation formula of the weight of the item set is as follows:
weight(itemset)=weight(X1)*weight(X2)…weight(Xk)
=weight(item1)*weight(item2)…weight(itemk) (3);
s324, item set support is obtained by calculating the probability of simultaneous occurrence of the feature item sets in the training set, item set weight is calculated by a formula (3), and weighting support is calculated by the item set support and the item set weight, wherein the calculation formula is as follows:
wsupp(r)=weight(itemse=t)*support(itemset) (4);
and S325, respectively mining the weighted candidate item meeting the minimum support degree of the defect tendency class and the weighted candidate item meeting the minimum support degree of the defect-free tendency class.
7. The prediction method according to claim 2, wherein the step S33 includes:
the defective tendency association rule set and the non-defective tendency association rule set are sorted in an order of priority based on the weighted support, the length of the predecessor, and the generation.
8. The prediction method of claim 2, wherein the conflict rule pruning in step S34 is: when the front pieces of the two rules have the same characteristics and the back pieces belong to different categories, both the two rules are removed;
the redundancy rule pruning in the step S34 is: when the front parts of a plurality of rules have inclusion relations, the back parts belong to the same class, and a redundant rule with small weighting support degree is pruned by adopting a weighting support degree-based mode.
9. The prediction method according to claim 2, wherein the step S35 includes:
if the sum of the weighted support degrees of the software module meeting the high-risk defect rules is larger than the sum of the weighted support degrees of the software module meeting the low-risk defect-free rules, classifying the software module into a defect tendency class, otherwise classifying the software module into a defect-free tendency class; if none of the rules satisfies the software module, the software module is classified as a defect tendency class, such as the formula:
wherein c represents a defective or non-defective class, rcIndicating a defective class rule or a non-defective class rule satisfying the condition, and R indicates a defective class rule set or a non-defective class rule set.
10. The prediction method according to claim 3, wherein the step S36 includes:
substituting the test set into the software defect prediction model based on the Classification rule of the weighted correlation of the Pearson correlation to obtain an evaluation result;
performing classification evaluation on the evaluation result by using G-mean, Mcc and Balance;
the evaluation indexes of G-mean, Mcc and Balance are defined as follows:
wherein TP is the number of classes with defects classified as defective, FN is the number of classes with defects classified as non-defective, FP is the number of classes with defects classified as defective, TN is the number of classes with defects classified as non-defective;
and comparing the evaluation indexes of the evaluation results of the software defect prediction model based on the Clason correlation weighting association classification rule with the evaluation indexes of the software defect prediction model based on the Clason correlation weighting association classification rule and the prediction model of the classical association rule algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911114620.7A CN111090579B (en) | 2019-11-14 | 2019-11-14 | Software defect prediction method based on Pearson correlation weighting association classification rule |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911114620.7A CN111090579B (en) | 2019-11-14 | 2019-11-14 | Software defect prediction method based on Pearson correlation weighting association classification rule |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111090579A true CN111090579A (en) | 2020-05-01 |
CN111090579B CN111090579B (en) | 2021-08-31 |
Family
ID=70393674
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911114620.7A Active CN111090579B (en) | 2019-11-14 | 2019-11-14 | Software defect prediction method based on Pearson correlation weighting association classification rule |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111090579B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113204481A (en) * | 2021-04-21 | 2021-08-03 | 武汉大学 | Class imbalance software defect prediction method based on data resampling |
CN113679394A (en) * | 2021-09-26 | 2021-11-23 | 华东理工大学 | Correlation-based motor imagery lead selection method and device |
CN115545125A (en) * | 2022-11-30 | 2022-12-30 | 北京航空航天大学 | Software defect association rule network pruning method and system |
CN115599698A (en) * | 2022-11-30 | 2023-01-13 | 北京航空航天大学(Cn) | Software defect prediction method and system based on class association rule |
CN115617698A (en) * | 2022-12-15 | 2023-01-17 | 北京航空航天大学 | Software defect measurement element selection method based on association rule network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103713990A (en) * | 2012-09-29 | 2014-04-09 | 西门子公司 | Method and device for predicting defaults of software |
US20170192880A1 (en) * | 2016-01-06 | 2017-07-06 | Hcl Technologies Limited | Defect prediction |
CN107239798A (en) * | 2017-05-24 | 2017-10-10 | 武汉大学 | A kind of feature selection approach of software-oriented defect number prediction |
CN107677741A (en) * | 2017-09-08 | 2018-02-09 | 中国科学技术大学 | A kind of method that the judgement of being similar in kind property of combustion adjuvant is carried out using Pearson product-moment correlation coefficient (PPMC) method |
CN108647138A (en) * | 2018-02-27 | 2018-10-12 | 中国电子科技集团公司电子科学研究院 | A kind of Software Defects Predict Methods, device, storage medium and electronic equipment |
KR20190127411A (en) * | 2018-05-04 | 2019-11-13 | 한국과학기술원 | Method and apparatus for fault localization using code and change metrics |
-
2019
- 2019-11-14 CN CN201911114620.7A patent/CN111090579B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103713990A (en) * | 2012-09-29 | 2014-04-09 | 西门子公司 | Method and device for predicting defaults of software |
US20170192880A1 (en) * | 2016-01-06 | 2017-07-06 | Hcl Technologies Limited | Defect prediction |
CN107239798A (en) * | 2017-05-24 | 2017-10-10 | 武汉大学 | A kind of feature selection approach of software-oriented defect number prediction |
CN107677741A (en) * | 2017-09-08 | 2018-02-09 | 中国科学技术大学 | A kind of method that the judgement of being similar in kind property of combustion adjuvant is carried out using Pearson product-moment correlation coefficient (PPMC) method |
CN108647138A (en) * | 2018-02-27 | 2018-10-12 | 中国电子科技集团公司电子科学研究院 | A kind of Software Defects Predict Methods, device, storage medium and electronic equipment |
KR20190127411A (en) * | 2018-05-04 | 2019-11-13 | 한국과학기술원 | Method and apparatus for fault localization using code and change metrics |
Non-Patent Citations (2)
Title |
---|
SHIHAI WANG等: ""Kernel Spectral Embedding Transfer Ensemble"", 《 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING》 * |
宫丽娜等: ""软件缺陷预测技术研究进展"", 《软件学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113204481A (en) * | 2021-04-21 | 2021-08-03 | 武汉大学 | Class imbalance software defect prediction method based on data resampling |
CN113204481B (en) * | 2021-04-21 | 2022-03-04 | 武汉大学 | Class imbalance software defect prediction method based on data resampling |
CN113679394A (en) * | 2021-09-26 | 2021-11-23 | 华东理工大学 | Correlation-based motor imagery lead selection method and device |
CN115545125A (en) * | 2022-11-30 | 2022-12-30 | 北京航空航天大学 | Software defect association rule network pruning method and system |
CN115599698A (en) * | 2022-11-30 | 2023-01-13 | 北京航空航天大学(Cn) | Software defect prediction method and system based on class association rule |
CN115599698B (en) * | 2022-11-30 | 2023-03-14 | 北京航空航天大学 | Software defect prediction method and system based on class association rule |
CN115617698A (en) * | 2022-12-15 | 2023-01-17 | 北京航空航天大学 | Software defect measurement element selection method based on association rule network |
Also Published As
Publication number | Publication date |
---|---|
CN111090579B (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111090579B (en) | Software defect prediction method based on Pearson correlation weighting association classification rule | |
Arbin et al. | Comparative analysis between k-means and k-medoids for statistical clustering | |
CN110928764B (en) | Automated evaluation method for crowdsourcing test report of mobile application and computer storage medium | |
US9275344B2 (en) | Computer-implemented system and method for generating a reference set via seed documents | |
CN107292350A (en) | The method for detecting abnormality of large-scale data | |
CN110381079A (en) | Network log method for detecting abnormality is carried out in conjunction with GRU and SVDD | |
CN106250442A (en) | The feature selection approach of a kind of network security data and system | |
Utari et al. | Implementation of data mining for drop-out prediction using random forest method | |
Devika Varshini et al. | Optimized machine learning classification approaches for prediction of autism spectrum disorder | |
CN111338972A (en) | Machine learning-based software defect and complexity incidence relation analysis method | |
CN106529580A (en) | EDSVM-based software defect data association classification method | |
CN111338950A (en) | Software defect feature selection method based on spectral clustering | |
Salim et al. | Time series prediction on college graduation using kNN algorithm | |
CN111950645A (en) | Method for improving class imbalance classification performance by improving random forest | |
Mumtaz et al. | Feature Selection Using Artificial Immune Network: An Approach for Software Defect Prediction. | |
CN115630732A (en) | City operation-oriented enterprise migration big data monitoring and early warning method and device | |
CN113837266A (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
CN113920366A (en) | Comprehensive weighted main data identification method based on machine learning | |
CN113448840A (en) | Software quality evaluation method based on predicted defect rate and fuzzy comprehensive evaluation model | |
Ding et al. | A novel software defect prediction method based on isolation forest | |
Karthik et al. | Defect association and complexity prediction by mining association and clustering rules | |
CN111221704B (en) | Method and system for determining running state of office management application system | |
CN112380224B (en) | Mass big data system for massive heterogeneous multidimensional data acquisition | |
Azzalini et al. | Data Quality and Data Ethics: Towards a Trade-off Evaluation. | |
Zaim et al. | Software Defect Prediction Framework Using Hybrid Software Metric |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |