CN113176998A - Cross-project software defect prediction method based on source selection - Google Patents
Cross-project software defect prediction method based on source selection Download PDFInfo
- Publication number
- CN113176998A CN113176998A CN202110503077.0A CN202110503077A CN113176998A CN 113176998 A CN113176998 A CN 113176998A CN 202110503077 A CN202110503077 A CN 202110503077A CN 113176998 A CN113176998 A CN 113176998A
- Authority
- CN
- China
- Prior art keywords
- source
- dist
- project
- selection
- constructing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3608—Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The invention provides a cross-project software defect prediction method based on source selection, which comprises the following steps: s1, constructing a data set; s2, constructing a feature selection method set FSelection; s3, obtaining an optimal feature selection method BFmethod; s4, obtaining an optimal feature quantity FThreshold; s5, constructing a source project selection method set SPselection; s6, constructing a cross-project defect prediction method CPSPM based on source selection. The invention provides a method for selecting various source items, which can provide better source items for subsequent data training and can effectively improve the efficiency of software defect prediction.
Description
Technical Field
The invention belongs to the technical field of software defect prediction, and particularly relates to a cross-project software defect prediction method based on source selection, which is mainly used for optimizing the data set quality in the aspect of source project selection and further improving a software defect prediction result.
Background
In the process of rapid development of software development, developers can invisibly generate some software errors in the development process, and the software errors are software defects. The hidden danger of the software defect is very large, the hidden danger not only can affect the use experience and the software quality of a user, but also can endanger social security, so that a potential software defect existing in the software needs to be discovered earlier.
Software defect prediction can help software developers effectively predict potential defects of software, and recently, researchers have proposed a plurality of methods which are mainly used for improving the result of software defect prediction. However, software prediction across projects is extremely difficult, mainly because the data distribution difference between the source project and the target project is large, and the prediction effect is poor.
Disclosure of Invention
The invention aims to provide a cross-project software defect prediction method based on source selection, which improves the accuracy of cross-project software defect prediction, can effectively assist software developers to use the prediction model to reduce defects in the software development process, and has higher accuracy and efficiency.
To solve the above technical problem, an embodiment of the present invention provides a cross-project software defect prediction method based on source selection, including the following steps:
s1, constructing a data set;
s2, constructing a feature selection method set FSelection;
s3, obtaining an optimal feature selection method BFmethod;
s4, obtaining an optimal feature quantity FThreshold;
s5, constructing a source project selection method set SPselection;
s6, constructing a cross-project defect prediction method CPSPM based on source selection.
The specific steps of step S1 are:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing feature sets { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBM, AMC, Ca, Ce, Max _ CC, Avg _ CC, LOC } based on open source data history records, project source code syntactic structures and source code abstract syntax trees;
wherein WMC represents a weighted method for each class; DIT represents the depth of the inheritance tree; NOC represents the number of subclasses; CBO represents the coupling between object classes; RFC stands for a class of responses; LCOM and LCOM3 represent the lack of cohesion on the process; NPM represents the number of public classes; DAM represents a data access index; MOA represents a measure of polymerization; MFA represents a measure of functional abstraction; CAM represents an aggregation between class methods; IC stands for legacy coupling; CBW represents the coupling between methods; AMC represents the average method complexity; ca represents afferent coupling; ce stands for outgoing coupling; max _ CC represents the maximum value of McCabe circle complexity; avg _ CC represents the average value of McCabe circle complexity; LOC represents the number of lines of the code;
s1.4, forming a defect prediction data set DATASET based on the examples and the characteristics.
The specific steps of step S2 are:
FSelection={RF,CL,GR,IG,OR,SU};
wherein the RF method evaluates the value of an attribute by iteratively sampling an instance and taking into account the value of the given attribute in the most recent instances of the same class and of different classes, it can operate on discrete and continuous class data;
the CL method determines the value of an attribute by measuring the correlation between the attribute and a class, and a nominal attribute is considered on the basis of one value, each of which is regarded as an index. The overall correlation value of a nominal attribute is obtained by weight vector averaging;
the GR method evaluates the value of an attribute by measuring its gain value relative to the class;
the IG method evaluates the weight of an attribute by measuring the information gain of an attribute for a class;
the OR method uses the minimum error attribute to predict and can discretize the numerical attribute;
the SU method evaluates the value of an attribute by measuring its symmetry uncertainty for the class.
The specific steps of step S3 are:
s3.1, constructing a feature quantity set, wherein initially, the feature quantity starts from alpha, the step length is beta, and gamma features are selected as the feature quantity set { alpha + beta, … alpha + gamma + beta }, wherein alpha + gamma beta is equal to the total number of features 20;
s3.2, selecting a feature selection method fs from the set FSelection;
s3.3, selecting a feature quantity fn from the feature quantity set;
s3.4, training a data set DATASET based on fs, fn and a logistic regression classification algorithm, and obtaining F-measure performance parameters;
s3.5, repeating the steps S3.3 to S3.4 until all the feature quantities are selected;
s3.6, repeating the step S3.2 to the step S3.5 until all the feature selection methods are selected;
and S3.7, obtaining an optimal feature selection method BFmethod by comparing F-measure performance parameters.
The specific steps of step S4 are:
s4.1, constructing a feature quantity set, wherein initially, the feature quantity starts from alpha, the step length is beta, and gamma features are selected as the feature quantity set { alpha + beta, … alpha + gamma + beta }, wherein alpha + gamma beta is equal to the total number of features 20;
s4.2, selecting a characteristic selection method as BFmethod;
s4.3, selecting a feature quantity fn from the feature quantity set;
s4.4, training a data set DATASET based on BFmethod, fn and a logistic regression classification algorithm, and obtaining F-measure performance parameters;
s4.5, repeating the steps S4.3 to S4.4 until all the feature quantities are selected;
and S4.6, comparing the F-measure performance parameters to obtain the optimal feature quantity FThreshold.
The specific steps of step S5 are:
s5.1, constructing a source project selection method mean _ log: for a given set of source items { X1,X2,...,XnWhere i ═ 1, 2.., n, target items Y, Xi′=log(1+Xi),Y′=log(1+Y);Mean(Xi') is Xi' a vector consisting of the average of all feature metric values; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.2, constructing a source item selection method std _ log: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi′=log(1+Xi),Y′=log(1+Y);Std(Xi') is Xi' a vector of standard deviations of all feature metric values; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.3, constructing a source project selection method mean _ log: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi′=log(1+Xi),Y′=log(1+Y);Median(Xi') is Xi' a vector consisting of the median of all the feature metrics; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.4, constructing a source project selection method media _ zscore: for a given set of source items { X1,X2,...,Xn}, target itemY,Xi′=zscore Xi,Y′=zscore(Y);Median(Xi') is Xi' a vector consisting of the median of all the feature metrics; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1_,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.5, constructing a source project selection method TDS: the method selects data through the distribution characteristics of the data, and provides two training data Selection strategy methods (EM-Clusting, New Neighbor Selection) based on similarity as distance;
s5.6, a component source item selection method set SPSelection { mean _ log, std _ log, mean _ zscore, TDS }.
The specific steps of step S6 are:
s6.1, selecting a method from the source project selection method set SPselection in the step S5 for testing;
s6.2, under the characteristic number of FThreshmod, calculating the prediction and evaluation effects between the source project and other projects;
s6.3, calculating the average value of the prediction results under the same source item selection method;
s6.4, repeating the step S6.1 to the step S6.3 until all the source project selection methods are tested;
s6.5, comparing the average values of the prediction results to obtain an optimal source item selection method;
and S6.6, obtaining a cross-project defect prediction method CPSPM.
In the research of software prediction, an F-measure index is widely used for measuring the efficiency of a feature method. And the index uses two parameters of Precision and Recall.
Precision indicates the percentage of all instances that the number of instances that are correctly divided into clean is. Where TP represents the number of modules that predict defective modules as defective modules, TN represents the number of modules that predict non-defective modules as non-defective modules, FP represents the number of modules that predict non-defective modules as defective modules, and FN represents the number of modules that predict defective modules as non-defective modules.
Recall indicates the percentage of the number of defective modules into which the instance is correctly divided to all defective modules. The higher the value, the higher the probability that the model can correctly identify the defect, and the more defective modules can be identified.
The Accuracy of the model classification is higher when the proportion of the number of correctly divided modules in the total number of modules is higher, and the Accuracy is lower when the proportion is higher.
F-measure is a composite method of two measurement parameters of P and CRR. The higher the value, the better the method performs.
The value of the F-measure is between 0 and 1, and the higher the value is, the better the model performance is.
The technical scheme of the invention has the following beneficial effects:
the cross-project software defect prediction method based on source selection provided by the invention provides a plurality of source project selection methods, and selects the source projects by combining the corresponding characteristic selection methods, so that the method is favorable for greatly improving the software defect prediction effect.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of a method for constructing feature selection sets and a method for selecting source items in accordance with the present invention;
FIG. 3 is an F-measure image obtained by six feature selection methods according to the present invention;
FIG. 4 is a graph showing the results obtained by using the RF method according to the present invention;
FIG. 5 is an Accuracy graph obtained using different source item selection techniques in the present invention;
FIG. 6 is a diagram of F-measure obtained using different source item selection techniques in the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the present invention provides a cross-project software defect prediction method based on source selection, which is mainly used for optimizing software defect performance, and comprises the following steps:
s1, constructing a data set;
s2, constructing a feature selection method set FSelection;
s3, obtaining an optimal feature selection method BFmethod;
s4, obtaining an optimal feature quantity FThreshold;
s5, constructing a source project selection method set SPselection;
s6, constructing a cross-project defect prediction method CPSPM based on source selection.
Step S1, the specific steps of data set construction are as follows:
taking a premium dataset as an example, selecting items from the premium dataset to test, wherein the dataset mainly comprises the contents of several aspects such as the name of the dataset, the number of defective modules, the total number of modules, the number of module features, the percentage of error examples in the total number of examples, and the like.
Step S2, the specific steps of constructing the feature selection method set FSelection are as follows:
FSelection={RF,CL,GR,IG,OR,SU}。
the RF method evaluates the value of an attribute by iteratively sampling one instance and taking the value of a given attribute into account in the most recent instances of the same class and different classes. It can manipulate both discrete and continuous class data;
the CL method determines the value of an attribute by measuring the correlation between the attribute and a class. The nominal attribute is considered on the basis of one value, each value being considered as an index. The overall correlation value of a nominal attribute is obtained by weight vector averaging;
the GR method evaluates the value of an attribute by measuring its gain value relative to the class;
the IG method evaluates the weight of an attribute by measuring the information gain of an attribute for a class;
the OR method uses the minimum error attribute to predict and can discretize the numerical attribute;
the SU method evaluates the value of an attribute by measuring its symmetry uncertainty for the class.
The construction process is shown in FIG. 2.
Step S3, the method for obtaining the optimal feature selection BFMethod includes the following steps:
under the same feature number range, the six methods are respectively tested for effects, and the test results are shown in fig. 3. As can be seen from the figure, the performance of the six methods is greatly different when the number of features is small, and the performance of the six methods is nearly uniform when the number of features is greater than 14.
Based on the evaluation of the effects of the above six feature selection methods, the present invention finally uses the RF method as the feature selection method.
Step S4, the specific steps of obtaining the optimal feature quantity FThreshold are as follows:
initially, the feature number starts at 1, the step size is 1, and 1 feature is selected as the feature number set { α + β.
Selecting a feature selection method fs from the set FSelection, selecting a feature quantity fn from the feature quantity set, training a data set based on fs, fn and a logistic regression classification algorithm, and obtaining an Accuracy value and an F-measure value. As can be seen from FIG. 4, the F-measure value increases with increasing feature value, eventually approaching 0.3; starting with a eigenvalue of 2, the Accuracy value also increases with increasing eigenvalue, eventually floating around 0.6.
Step S5, the specific steps of constructing the source item selection method set SPSelection are as follows:
s5.1, constructing a source project selection method mean _ log: for a given set of source items { X1,X2,...,XnWhere i ═ 1, 2.., n, target items Y, Xi=log(1+Xi),Y=log(1+Y)。Mean(Xi) Is XiThe average of all the feature metric values constitutes a vector. Dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item.
S5.2, constructing a source item selection method std _ log: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi=log(1+Xi),Y=log(1+Y)。Std(Xi) Is XiThe standard deviation of all the feature metric values constitutes a vector. Dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),..,Dist(Xn', Y') }, then XjIs selected as the source item.
S5.3, constructing a source project selection method mean _ log: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi=log(1+Xi),Y=log(1+Y)。Median(Xi) Is Xi' vector consisting of median of all feature metric values. Dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j′,Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item.
S5.4, constructing a source project selection method media _ zscore: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi=zscore Xi,Y=zscore(Y)。Median(Xi) Is Xi' vector consisting of median of all feature metric values. Dist (X)iY) is XiAnd Y, if Dist (X)j', Y') is { Dist (X)1′,Y′),Dist(X2′,Y′),...,Dist(Xn', Y') }, then XjIs selected as the source item.
S5.5, constructing a source project selection method TDS: the method selects data through the distribution characteristics of the data, and provides two training data Selection strategy methods (EM-Clusting, New Neighbor Selection) based on similarity as distance.
S5.6, a component source item selection method set SPSelection { mean _ log, std _ log, mean _ zscore, TDS }.
S6, the concrete steps of constructing the cross-project defect prediction method CPSPM based on source selection are as follows:
the Accuracy index obtained using these four source item selection methods is shown in fig. 5, while comparing the effects in combination with the TDS method. It can be seen from the figure that, in the case of a small number of feature values, the Accuracy values obtained based on the TDS, std _ log and mean _ log methods are all smaller than those without the selection technique; the Accuracy values obtained with these four methods tend to be stable as the number of features increases, comparable to the values obtained without the use of selection techniques.
The F-measure indexes obtained by using the four source item selection methods are shown in FIG. 6, and when the characteristic value is less than 9, the value obtained by only the mean _ log method is greater than that obtained by a method without adopting a selection technology; under the condition that the characteristic value is gradually increased, the value obtained by the TDS method is gradually increased, and when the characteristic value is 9, the obtained F-measure value is higher than that obtained by the method without adopting the selection technology for the first time. When the characteristic value is 20, the F-measure obtained by the other methods except the medium _ zscore method is higher than that obtained by the method without using the selection technology, but the medium _ zscore method is superior to other methods when the characteristic value is between 6 and 12.
The invention provides four different source item selection methods, firstly, six characteristic selection methods are adopted to obtain an RF characteristic selection method with better effect; the method is used for carrying out the next source project selection operation, and in the step, the Accuracy and the F-measure indexes are respectively used for carrying out the evaluation of the method. From the results obtained, both indexes increase with the increase of the number of the eigenvalues, and when the eigenvalue is 20, the method other than the mean _ zscore method is superior to the method without using the selection technique, but from the experimental point of view, the method is still the best source item selection method.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (7)
1. A cross-project software defect prediction method based on source selection is characterized by comprising the following steps:
s1, constructing a data set;
s2, constructing a feature selection method set FSelection;
s3, obtaining an optimal feature selection method BFmethod;
s4, obtaining an optimal feature quantity FThreshold;
s5, constructing a source project selection method set SPselection;
s6, constructing a cross-project defect prediction method CPSPM based on source selection.
2. The method for cross-project software defect prediction based on source selection as claimed in claim 1, wherein the specific steps of step S1 are:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing feature sets { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBM, AMC, Ca, Ce, Max _ CC, Avg _ CC, LOC } based on open source data history records, project source code syntactic structures and source code abstract syntax trees;
wherein WMC represents a weighted method for each class; DIT represents the depth of the inheritance tree; NOC represents the number of subclasses; CBO represents the coupling between object classes; RFC stands for a class of responses; LCOM and LCOM3 represent the lack of cohesion on the process; NPM represents the number of public classes; DAM represents a data access index; MOA represents a measure of polymerization; MFA represents a measure of functional abstraction; CAM represents an aggregation between class methods; IC stands for legacy coupling; CBW represents the coupling between methods; AMC represents the average method complexity; ca represents afferent coupling; ce stands for outgoing coupling; max _ CC represents the maximum value of McCabe circle complexity; avg _ CC represents the average value of McCabe circle complexity; LOC represents the number of lines of the code;
s1.4, forming a defect prediction data set DATASET based on the examples and the characteristics.
3. The method for cross-project software defect prediction based on source selection as claimed in claim 1, wherein the specific steps of step S2 are:
FSelection={RF,CL,GR,IG,OR,SU};
wherein the RF method evaluates the value of an attribute by iteratively sampling an instance and taking into account the value of the given attribute in the most recent instances of the same class and of different classes, it can operate on discrete and continuous class data;
the CL method determines the value of an attribute by measuring the correlation between the attribute and a class, and a nominal attribute is considered on the basis of one value, each of which is regarded as an index. The overall correlation value of a nominal attribute is obtained by weight vector averaging;
the GR method evaluates the value of an attribute by measuring its gain value relative to the class;
the IG method evaluates the weight of an attribute by measuring the information gain of an attribute for a class;
the OR method uses the minimum error attribute to predict and can discretize the numerical attribute;
the SU method evaluates the value of an attribute by measuring its symmetry uncertainty for the class.
4. The method for cross-project software defect prediction based on source selection as claimed in claim 1, wherein the specific steps of step S3 are:
s3.1, constructing a feature quantity set, wherein initially, the feature quantity starts from alpha, the step length is beta, and gamma features are selected as the feature quantity set { alpha + beta, … alpha + gamma + beta }, wherein alpha + gamma beta is equal to the total number of features 20;
s3.2, selecting a feature selection method fs from the set FSelection;
s3.3, selecting a feature quantity fn from the feature quantity set;
s3.4, training a data set DATASET based on fs, fn and a logistic regression classification algorithm, and obtaining F-measure performance parameters;
s3.5, repeating the steps S3.3 to S3.4 until all the feature quantities are selected;
s3.6, repeating the step S3.2 to the step S3.5 until all the feature selection methods are selected;
and S3.7, obtaining an optimal feature selection method BFmethod by comparing F-measure performance parameters.
5. The method for cross-project software defect prediction based on source selection as claimed in claim 1, wherein the specific steps of step S4 are:
s4.1, constructing a feature quantity set, wherein initially, the feature quantity starts from alpha, the step length is beta, and gamma features are selected as the feature quantity set { alpha + beta, … alpha + gamma + beta }, wherein alpha + gamma beta is equal to the total number of features 20;
s4.2, selecting a characteristic selection method as BFmethod;
s4.3, selecting a feature quantity fn from the feature quantity set;
s4.4, training a data set DATASET based on BFmethod, fn and a logistic regression classification algorithm, and obtaining F-measure performance parameters;
s4.5, repeating the steps S4.3 to S4.4 until all the feature quantities are selected;
and S4.6, comparing the F-measure performance parameters to obtain the optimal feature quantity FThreshold.
6. The method for cross-project software defect prediction based on source selection as claimed in claim 1, wherein the specific steps of step S5 are:
s5.1, constructing a source project selection method mean _ log: for a given set of source items { X1,X2,...,XnWhere i ═ 1, 2.., n, target items Y, Xi=log(1+Xi),Y’=log(1+Y);Mean(Xi') is Xi' a vector consisting of the average of all feature metric values; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1’,Y’),Dist(X2’,Y),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.2, constructing a source item selection method std _ log: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi’=log(1+Xi),Y’=log(1+Y);Std(Xi') is Xi' a vector of standard deviations of all feature metric values; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1’,Y’),Dist(X2’,Y’),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.3, constructing a source project selection method mean _ log: for a given set of source items { X1,X2,…,Xn}, target item Y, Xi’=log(1+Xi),Y’=log(1+Y);Median(Xi') is Xi' a vector consisting of the median of all the feature metrics; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1’,Y’),Dist(X2’,Y’),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.4, constructing a source project selection method media _ zscore: for a given set of source items { X1,X2,...,Xn}, target item Y, Xi’=zscore Xi,Y’=zscore(Y);Median(Xi') is Xi' a vector consisting of the median of all the feature metrics; dist (X)i', Y') is XiEuclidean distance between 'and Y', if Dist (X)j', Y') is { Dist (X)1’,Y’),Dist(X2’,Y’),...,Dist(Xn', Y') }, then XjIs selected as the source item;
s5.5, constructing a source project selection method TDS: the method selects data through the distribution characteristics of the data, and provides two training data selection strategy methods based on similarity as distance;
s5.6, a component source item selection method set SPSelection { mean _ log, std _ log, mean _ zscore, TDS }.
7. The method for cross-project software defect prediction based on source selection as claimed in claim 1, wherein the specific steps of step S6 are:
s6.1, selecting a method from the source project selection method set SPselection in the step S5 for testing;
s6.2, under the characteristic number of FThreshmod, calculating the prediction and evaluation effects between the source project and other projects;
s6.3, calculating the average value of the prediction results under the same source item selection method;
s6.4, repeating the step S6.1 to the step S6.3 until all the source project selection methods are tested;
s6.5, comparing the average values of the prediction results to obtain an optimal source item selection method;
and S6.6, obtaining a cross-project defect prediction method CPSPM.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110503077.0A CN113176998A (en) | 2021-05-10 | 2021-05-10 | Cross-project software defect prediction method based on source selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110503077.0A CN113176998A (en) | 2021-05-10 | 2021-05-10 | Cross-project software defect prediction method based on source selection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113176998A true CN113176998A (en) | 2021-07-27 |
Family
ID=76928591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110503077.0A Pending CN113176998A (en) | 2021-05-10 | 2021-05-10 | Cross-project software defect prediction method based on source selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113176998A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510431A (en) * | 2022-04-20 | 2022-05-17 | 武汉理工大学 | Workload-aware intelligent contract defect prediction method, system and equipment |
CN114924962A (en) * | 2022-05-17 | 2022-08-19 | 北京航空航天大学 | Cross-project software defect prediction data selection method |
CN115269378A (en) * | 2022-06-23 | 2022-11-01 | 南通大学 | Cross-project software defect prediction method based on domain feature distribution |
CN115269377A (en) * | 2022-06-23 | 2022-11-01 | 南通大学 | Cross-project software defect prediction method based on optimization instance selection |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391369A (en) * | 2017-07-13 | 2017-11-24 | 武汉大学 | A kind of spanned item mesh failure prediction method based on data screening and data oversampling |
US20190265970A1 (en) * | 2018-02-28 | 2019-08-29 | Fujitsu Limited | Automatic identification of relevant software projects for cross project learning |
CN111581116A (en) * | 2020-06-16 | 2020-08-25 | 江苏师范大学 | Cross-project software defect prediction method based on hierarchical data screening |
CN111966586A (en) * | 2020-08-05 | 2020-11-20 | 南通大学 | Cross-project defect prediction method based on module selection and weight updating |
-
2021
- 2021-05-10 CN CN202110503077.0A patent/CN113176998A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391369A (en) * | 2017-07-13 | 2017-11-24 | 武汉大学 | A kind of spanned item mesh failure prediction method based on data screening and data oversampling |
US20190265970A1 (en) * | 2018-02-28 | 2019-08-29 | Fujitsu Limited | Automatic identification of relevant software projects for cross project learning |
CN111581116A (en) * | 2020-06-16 | 2020-08-25 | 江苏师范大学 | Cross-project software defect prediction method based on hierarchical data screening |
CN111966586A (en) * | 2020-08-05 | 2020-11-20 | 南通大学 | Cross-project defect prediction method based on module selection and weight updating |
Non-Patent Citations (2)
Title |
---|
WANZHI WEN 等: "An Empirical Study on Combining Source Selection and Transfer Learning for Cross-Project Defect Prediction", 《2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF)》 * |
王莉萍: "基于实例选择的集成跨项目缺陷预测方法的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510431A (en) * | 2022-04-20 | 2022-05-17 | 武汉理工大学 | Workload-aware intelligent contract defect prediction method, system and equipment |
CN114924962A (en) * | 2022-05-17 | 2022-08-19 | 北京航空航天大学 | Cross-project software defect prediction data selection method |
CN114924962B (en) * | 2022-05-17 | 2024-05-31 | 北京航空航天大学 | Cross-project software defect prediction data selection method |
CN115269378A (en) * | 2022-06-23 | 2022-11-01 | 南通大学 | Cross-project software defect prediction method based on domain feature distribution |
CN115269377A (en) * | 2022-06-23 | 2022-11-01 | 南通大学 | Cross-project software defect prediction method based on optimization instance selection |
CN115269378B (en) * | 2022-06-23 | 2023-06-09 | 南通大学 | Cross-project software defect prediction method based on domain feature distribution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113176998A (en) | Cross-project software defect prediction method based on source selection | |
Wang et al. | Input feature selection method based on feature set equivalence and mutual information gain maximization | |
Wang et al. | Truth discovery via exploiting implications from multi-source data | |
US20200257731A1 (en) | Disambiguation of massive graph databases | |
CN114564410A (en) | Software defect prediction method based on class level source code similarity | |
CN116226103A (en) | Method for detecting government data quality based on FPGrow algorithm | |
Gao et al. | Adapting the TopLeaders algorithm for dynamic social networks | |
Li et al. | A new density peak clustering algorithm based on cluster fusion strategy | |
Yao et al. | An improved clustering algorithm and its application in wechat sports users analysis | |
Qinl et al. | Synthesizing privacy preserving entity resolution datasets | |
Song et al. | On saving outliers for better clustering over noisy data | |
Malik et al. | A comprehensive approach towards data preprocessing techniques & association rules | |
Li et al. | A novel approach to remote sensing image retrieval with multi-feature VP-tree indexing and online feature selection | |
CN113705920B (en) | Method for generating water data sample set for thermal power plant and terminal equipment | |
Wu et al. | Optimization and improvement based on K-Means Cluster algorithm | |
Li et al. | Intelligent fuzzy optimization algorithm of data mining based on BP neural network | |
Lv et al. | Active learning of three-way decision based on neighborhood entropy | |
CN111652384B (en) | Balancing method for data volume distribution and data processing method | |
CN109086373B (en) | Method for constructing fair link prediction evaluation system | |
Shao et al. | Research on Cross‐Company Defect Prediction Method to Improve Software Security | |
Shao et al. | A quantitative measurement method of code quality evaluation indicators based on data mining | |
Hang et al. | A hierarchical clustering algorithm based on K-means with constraints | |
Gong et al. | Diversified and Compatible Web APIs Recommendation in IoT | |
CN113723835B (en) | Water consumption evaluation method and terminal equipment for thermal power plant | |
Wang et al. | Resisting the edge-type disturbance for link prediction in heterogeneous networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |