CN115269377A - Cross-project software defect prediction method based on optimization instance selection - Google Patents

Cross-project software defect prediction method based on optimization instance selection Download PDF

Info

Publication number
CN115269377A
CN115269377A CN202210717428.2A CN202210717428A CN115269377A CN 115269377 A CN115269377 A CN 115269377A CN 202210717428 A CN202210717428 A CN 202210717428A CN 115269377 A CN115269377 A CN 115269377A
Authority
CN
China
Prior art keywords
instance
index
constructing
optimization
project
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210717428.2A
Other languages
Chinese (zh)
Other versions
CN115269377B (en
Inventor
张瑞年
王楚越
王晨宇
尹思文
王超
郭伟琪
文万志
胡彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202210717428.2A priority Critical patent/CN115269377B/en
Publication of CN115269377A publication Critical patent/CN115269377A/en
Application granted granted Critical
Publication of CN115269377B publication Critical patent/CN115269377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a cross-project software defect prediction method based on optimization instance selection, which comprises the following steps: s1, constructing a project vector set PVS; s2, constructing a target instance optimization index IPI; s3, constructing a pre-training set TPRED; s4, constructing an optimized index TPOI of the target project; s5, constructing a BOD (BOD) of a training set selected based on an optimization example; and S6, constructing a cross-project software defect prediction method BOICP selected based on the optimized examples. The invention provides a cross-project software defect prediction method based on optimization case selection.

Description

Cross-project software defect prediction method based on optimization instance selection
Technical Field
The invention belongs to the technical field of software defect prediction, and particularly relates to a cross-project software defect prediction method based on optimization case selection.
Background
Researchers need to predict software defects by using historical data, however, a new system often has insufficient historical data, and one way to solve the problem is to select historical data from other projects, use the historical data to build a defect prediction model and predict defects of new projects.
For a project with a large amount of data, researchers need to think how to select example data which is more suitable for a target project, and the more the source example data and the target project data are matched, the more accurate the established defect prediction model is.
Disclosure of Invention
The invention aims to solve the technical problem of providing a cross-project software defect prediction method based on optimization case selection.
In order to solve the above technical problem, an embodiment of the present invention provides a cross-project software defect prediction method based on optimization instance selection, including the following steps:
s1, constructing a project vector set PVS;
s2, constructing a target instance optimization index IPI;
s3, constructing a pre-training set TPREDD;
s4, constructing an optimized index TPOI of the target project;
s5, constructing a BOD (Biochemical oxygen demand) training set selected based on the optimized example;
and S6, constructing a cross-project software defect prediction method BOICP selected based on the optimized examples.
Wherein, step S1 includes the following steps:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing a traditional set of metrics { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBM, AMC, ca, ce, maxCC, avg _ CC, LOC }, based on the open source data history, project source code syntax structure, source code abstract syntax trees, wherein WMC represents a weighted method for each class, DIT represents the depth of the inheritance tree, NOC represents the number of subclasses, CBO represents the coupling between object classes, RFC represents the response of a class, LCOM and LCOM3 represent the cohesion lacking in the method, NPM represents the number of common classes, DAM represents a data access index, MOA represents a measure of aggregation, MFA represents a measure of functional abstraction, CAM represents an aggregation between class methods, IC represents an inheritance coupling, CBW represents a coupling between methods, AMC represents the average method complexity, ca represents an incoming coupling, ce represents an outgoing complexity, mcMFCC _ CC represents the maximum of a circle, and Cabe _ CC represents the average value of an Ave _ CC;
s1.4, processing all instances in the source project according to the step S1.3 to obtain a source project traditional measurement element vector set SCPIVS = [ instance =1,instance2,…,instancei]Wherein i =1,2,3, …, n;
s1.5, processing all instances in the target project according to the step S1.3 to obtain a target project traditional measurement element vector set TCPIVS = [ translation _ value =1,tradition_value2,…,tradition_valuej]Wherein j =1,2,3, …, m;
s1.6, constructing a source item instance label SLABEL = [ stag ] based on open source data historical records1,stag2,…,stagi]Wherein i =1,2,3, …, n; the label corresponds to an instance in the source project traditional measurement element vector set SCPIVS in the step S1.4;
s1.7, building a target item instance label TLABEL = [ ttag ] based on open source data historical records1,ttag2,…,ttagj]Wherein j =1,2,3, …, m; the label corresponds to an instance in the target item traditional measurement vector set TCPIVS in the step S1.5;
s1.8, constructing a project vector set PVS = { SCPIVS, SLABEL, TCPIVS, TLABEL }.
Wherein, step S2 includes the following steps:
s2.1, constructing an optimized index empty list IPI and a source instance index list ASI;
s2.2, selecting a target instance vector;
s2.3, if the list length of the IPI is empty, constructing a global feature vector GFV of the example training set as a target example vector of the step S2.2, otherwise, the GFV is a set of standard deviations of the same metric cells of all examples in the example training set;
s2.4, constructing a source example library SIL to be selected for the source item traditional measurement element vector set SCPIVS selection example in the step S1 by using the source example index list ASI in the step S2.1;
s2.5, calculating the Euclidean distance between each instance in the source instance library SIL to be selected and the GFV, and returning an index min-index corresponding to the minimum Euclidean distance;
s2.6, adding the min-index into the IPI in the optimized index list in the step S2.1;
s2.7, deleting the min-index in the ASI;
s2.8, setting the number of the source instances selected by each target instance to be k, and circularly executing the step S2.3 ℃
S2.7, until the length of the IPI of the optimized index list meets k;
and S2.9, obtaining the target instance optimization index IPI after the step S2.8 is executed.
Wherein, step S3 includes the following steps:
s3.1, executing each instance in the target project according to the step S2 to obtain a target instance optimization index IPI of each target instance;
s3.2, combining and de-duplicating the optimized indexes obtained by each target instance in the step S3.1, and constructing a pre-training set optimized index TIPI;
s3.3, selecting the traditional metric vector set SCPIVS of the source item in the step S1 according to an example by using the optimal index TIPI of the pre-training set obtained in the step S3.2 to obtain an example vector set TPRED-D of the pre-training set;
s3.4, selecting the source item instance label SLABEL in the step S1 according to the instance by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain a label set TPRED-L of the pre-training set;
s3.5, constructing a pre-training set TPRED = { TPRED-D, TPRED-L }.
Wherein, step S4 includes the following steps:
s4.1, combining the example vector set TPRED-D of the pre-training set obtained in the step S3 with the label set TPRED-L of the pre-training set according to columns, and placing the label set in the last column;
s4.2, calculating the direct correlation between each measurement element and the last column of labels by using the spearman to obtain a correlation list CList;
s4.3, sorting all the elements of the correlation list CList in the step S4.2 from big to small after taking absolute values, and returning a feature corresponding index;
s4.4, setting the number of the selected correlation characteristic indexes as q;
s4.5, selecting the feature index returned in the step S4.3 by using the number q of the relevant feature indexes in the step S4.4, and constructing a source item relevant feature set SPTFS by using the obtained feature index;
s4.6, constructing a target item relevance feature set TPTFS by using the target item traditional metric vector set TCPIVS and the target item instance label TLABEL in the step S1 according to the steps S4.1-S4.5;
s4.7, calculating Euclidean distances of all source examples in the source item relevance feature set SPTFS and one example relevance feature set in the target item relevance feature set TPTFS, returning an index list after the Euclidean distances are sorted from small to large, setting the number of selected examples in the SPTFS as p, and then obtaining p indexes selected by the target examples;
and S4.8, processing all target instances in the target item correlation feature set TPTFS according to the step S4.7 to obtain an optimized index set, and removing the index set to obtain an optimized index TPOI of the target item.
Wherein, step S5 includes the following steps:
s5.1, selecting an example vector set TPRED-D of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a training feature set BOD-D selected based on the optimization example;
s5.2, selecting the tagset TPRED-L of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a tagset BOD-L selected based on the optimization example;
and S5.3, constructing a BOD = { BOD-D, BOD-L } of the training set selected based on the optimization example.
Wherein, step S6 comprises the following steps:
s6.1, obtaining an item vector set PVS = { SCPIVS, SLABEL, TCPIVS, TLABEL } through the step S1;
s6.2, obtaining a target instance optimization index IPI through the step S2;
s6.3, obtaining an example vector set TPRED-D of the pre-training set and a label set TPRED-L of the pre-training set through the step S3;
s6.4, obtaining an optimized index TPOI of the target item through the step S4;
s6.5, obtaining the BOD-D of the training feature set selected based on the optimization example and the BOD-L of the tag set selected based on the optimization example through the step S5;
s6.6, performing model training on the BOD-D of the training feature set selected based on the optimization example and the BOD-L of the tag set selected based on the optimization example in the step S6.5 by using a Logistic classification algorithm;
s6.7, performing defect prediction on the target item traditional measurement element vector set TCPIVS obtained in the step S1 by using the model obtained through training in the step S6.6 to obtain a prediction LABEL set PRED _ LABEL, and calculating by combining a target item instance LABEL TLABEL through a formula to obtain f-score;
and S6.8, obtaining a cross-project software defect prediction method BOICP selected based on the optimized example.
The technical scheme of the invention has the following beneficial effects:
the invention provides a cross-project software defect prediction method based on optimization case selection.
Drawings
FIG. 1 is a block flow diagram of the present invention;
FIG. 2 is a BOD flow diagram of a training set selected based on an optimization example in the present invention;
FIG. 3 is a graph of selected example numbers at different k in the present invention;
FIG. 4 is a graph of f-score obtained using Logistic at different k's according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in FIG. 1, the present invention provides a cross-project software defect prediction method based on optimization case selection, comprising the following steps:
s1, constructing a project vector set PVS;
s2, constructing a target instance optimization index IPI;
s3, constructing a pre-training set TPRED;
s4, constructing an optimized index TPOI of the target project;
s5, constructing a BOD (BOD) of a training set selected based on an optimization example;
and S6, constructing a cross-project software defect prediction method BOICP selected based on the optimized examples.
Step S1, the concrete steps of constructing the project vector set PVS are as follows:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing a feature set { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBM, AMC, ca, ce, max _ CC, avg _ CC, LOC } based on open source data history, project source code syntax structure, source code abstraction syntax trees, wherein WMC represents a weighted method for each class, DIT represents a depth of an inheritance tree, NOC represents a number of subclasses, CBO represents a coupling between object classes, RFC represents a response of a class, LCOM and LCOM3 represent a lack of cohesion on a method, NPM represents a number of common classes, DAM represents a data access index, MOA represents a measure of aggregation, MFA represents a measure of functional abstraction, a measure represents an aggregation between CAM class methods, IC represents an inheritance coupling, CBW represents a coupling between methods, AMC represents an average method complexity, ca represents an incoming coupling, ce represents a complexity coupling, cabe _ CC represents a complexity, and Cabe _ Max represents a maximum number of mcg _ CC rings of mcbe _ CC and a number of rows of mcbe _ CC represents an average number of mcbe codes.
S1.4, processing all instances in the source project according to the steps to obtain a source project traditional measurement element vector set SCPIVS = [ translation _ value =1,tradition_value2,…,tradition_valuei]Wherein i =1,2,3, …, n;
s1.5, processing all instances in the target project according to the same steps to obtain a target project traditional measurement element vector set TCPIVS = [ translation _ value =1,tradition_value2,…,tradition_valuej]Where j =1,2,3, …, m.
S1.6, constructing a source item instance label SLABEL = [ stag ] based on open source data historical records1,stag2,…,stagi]Wherein i =1,2,3, …, n; the label corresponds to an instance in a source project traditional measurement element vector set SCPIVS;
s1.7, building a target item instance label TLABEL = [ ttag ] based on open source data historical records1,ttag2,…,ttagj]Wherein j =1,2,3, …, m; the label corresponds to an instance in the target item conventional measurement element vector set TCPIVS.
S1.8, constructing a project vector set PVS = { SCPIVS, SLABEL, TCPIVS, TLABEL }.
S2, the specific steps of constructing the target instance optimization index IPI are as follows:
s2.1, constructing an optimized index empty list IPI and a source instance index list ASI.
S2.2, selecting a target instance vector;
and S2.3, if the list length of the IPI is empty, constructing a global feature vector GFV of the example training set as a target example vector of the step S2.2, otherwise, the GFV is a set of the same metric standard deviation of all examples in the example training set.
S2.4, constructing a source example library SIL to be selected for the source item traditional measurement element vector set SCPIVS selection example in the step S1 by using the source example index list ASI in the step S2.1.
S2.5, calculating the Euclidean distance between each instance in the source instance library SIL to be selected and the GFV, and returning the index min-index corresponding to the minimum Euclidean distance.
S2.6, adding the min-index into the IPI in the optimized index list in the step S2.1;
s2.7, deleting the min-index in the ASI.
S2.8, setting the number of the source instances selected by each target instance to be 5, and circularly executing the step S2.3-the step S2.7 until the length of the IPI of the optimized index list meets k;
and S2.9, obtaining the target instance optimization index IPI after the step S2.8 is executed.
S3, constructing a pre-training set TPRED specifically comprises the following steps:
s3.1, executing each instance in the target project according to the steps S2.1-S2.9 to obtain a target instance optimization index IPI of each target instance;
and S3.2, combining and de-duplicating the optimized indexes obtained by each target instance in the step S3.1, and constructing a pre-training set optimized index TIPI.
S3.3, selecting the traditional metric vector set SCPIVS of the source item in the step S1.8 according to an example by using the optimal index TIPI of the pre-training set obtained in the step S3.2 to obtain an example vector set TPRED-D of the pre-training set;
and S3.4, selecting the source item instance label SLABEL in the step S1.6 according to the instance by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain a label set TPRED-L of the pre-training set.
S3.5, constructing a pre-training set TPRED = { TPRED-D, TPRED-L }.
S4, the specific steps of constructing the optimized index TPOI of the target item are as follows:
s4.1, combining the example vector set TPRED-D of the pre-training set obtained in the step S3.5 and the label set TPRED-L of the pre-training set according to columns, and placing the label set in the last column;
and S4.2, calculating the direct correlation between each metric element and the last list of tags by using the spearman to obtain a correlation list CList.
S4.3, sorting all the elements of the correlation list CList in the step S4.2 from big to small after taking absolute values, and returning a feature corresponding index;
s4.4, setting and selecting the number 10 of the correlation characteristic indexes;
and S4.5, selecting the feature index returned in the step S4.3 by using the number of the relevant feature indexes in the step S4.4, and constructing a source item relevance feature set SPTFS by using the obtained feature index.
S4.6, constructing a target item relevance feature set TPTFS by using the target item traditional metric vector set TCPIVS in the step S1.5 and the target item instance label TLABEL in the step S1.7 according to the steps S4.1-S4.5.
S4.7, calculating Euclidean distances of all source examples in the source item correlation characteristic set SPTFS and one example correlation characteristic set in the target item correlation characteristic set TPTFS, returning to an index list after the Euclidean distances are sorted from small to large, setting the number of the examples selected from the SPTFS to be 2, and obtaining 2 indexes selected by the target examples.
And S4.8, processing all target instances in the target item correlation feature set TPTFS according to the step S4.7 to obtain an optimized index set, and removing the index set to obtain an optimized index TPOI of the target item.
S5, constructing a BOD (BOD) training set selected based on the optimized example, and comprising the following specific steps of:
s5.1, selecting the example vector set TPRED-D of the pre-training set in the step S3.3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a training feature set BOD-D selected based on the optimization example.
S5.2, selecting the label set TPRED-L of the pre-training set in the step S3.3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a label set BOD-L selected based on the optimization example.
And S5.3, constructing a BOD = { BOD-D, BOD-L } of the training set selected based on the optimization example.
A flow chart for constructing a BOD for a training set selected based on an optimization instance is shown in fig. 2.
S6, constructing a cross-project software defect prediction method BOICP based on optimization instance selection, and specifically comprising the following steps:
ivy-2.0 is selected as the source project and synapse-1.2 is selected as the target project. And constructing a source project traditional measurement element vector set SCPIVS and a source project instance label SLABEL according to the situation of the source project instance, and constructing a target project traditional measurement element vector set TCPIVS and a target project instance label TLBEL according to the situation of the target project instance.
And obtaining the target instance optimization index IPI when different source instance numbers are selected according to the defined method for constructing the target instance optimization index.
And obtaining an example vector set TPRED-D of the pre-training set and a label set TPRED-L of the pre-training set according to the pre-training set constructing method defined above.
And obtaining the optimized index TPOI of the target item according to the defined optimized index method for constructing the target item.
The number of selected features k is set in the range of 1 to 5, and BOD = { BOD-D, BOD-D } of the training set based on the instance selection of the feature subset is obtained at each k.
A Logistic classifier is used for establishing a classification model for the example selection training set BFSID based on the characteristic subset and predicting, experiments show that the f-score value obtained by the model is 0.343 at most, and the value is larger than 0.149 obtained by the example selection method; the performance of the model established by the method is superior to that of the model which is not established by using the example selection method, so that the effectiveness of the cross-project software defect prediction method based on optimization example selection is shown.
The number of selected instances at different k is shown in fig. 3.
The f-score obtained using Logistic at different k is shown in FIG. 4.
According to the method, a global feature vector is constructed for each target instance, the vector is used for selecting the instances from the source projects, then correlation analysis is used in the selected training set, the source instances are further selected by using the correlation features of the instances, all the selected source instances form a training data set, and a cross-project defect prediction model is established by using the training data set, so that a better cross-project defect prediction effect is realized.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A cross-project software defect prediction method based on optimization instance selection is characterized by comprising the following steps:
s1, constructing a project vector set PVS;
s2, constructing a target instance optimization index IPI;
s3, constructing a pre-training set TPRED;
s4, constructing an optimized index TPOI of the target project;
s5, constructing a BOD (BOD) of a training set selected based on an optimization example;
and S6, constructing a cross-project software defect prediction method BOICP selected based on the optimized examples.
2. The optimization instance selection-based cross-project software defect prediction method according to claim 1, wherein the step S1 comprises the steps of:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing a traditional set of metrics { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBM, AMC, ca, ce, maxCC, avg _ CC, LOC }, based on the open source data history, project source code syntax structure, source code abstract syntax trees, wherein WMC represents a weighted method for each class, DIT represents the depth of the inheritance tree, NOC represents the number of subclasses, CBO represents the coupling between object classes, RFC represents the response of a class, LCOM and LCOM3 represent the cohesion lacking in the method, NPM represents the number of common classes, DAM represents a data access index, MOA represents a measure of aggregation, MFA represents a measure of functional abstraction, CAM represents an aggregation between class methods, IC represents an inheritance coupling, CBW represents a coupling between methods, AMC represents the average method complexity, ca represents an incoming coupling, ce represents an outgoing complexity, mcMFCC _ CC represents the maximum of a circle, and Cabe _ CC represents the average value of an Ave _ CC;
s1.4, processing all instances in the source project according to the step S1.3 to obtain a source project traditional measurement element vector set SCPIVS = [ instance =1,instance2,…,instancei]Wherein i =1,2,3, …, n;
s1.5, processing all instances in the target project according to the step S1.3 to obtain a target project traditional measurement element vector set TCPIVS = [ translation _ value =1,tradition_value2,…,tradition_valuej]Wherein j =1,2,3, …, m;
s1.6, constructing a source item instance label SLABEL = [ stag ] based on open source data historical records1,stag2,…,stagi]Wherein i =1,2,3, …, n; the label corresponds to an instance in the source project traditional measurement element vector set SCPIVS in the step S1.4;
s1.7, constructing a target item instance label TLABEL = [ ttag ] based on open source data history records1,ttag2,…,ttagj]Wherein j =1,2,3, …, m; the label corresponds to an instance in the target item traditional measurement element vector set TCPIVS in the step S1.5;
s1.8, constructing a project vector set PVS = { SCPIVS, SLABEL, TCPIVS, TLABEL }.
3. The optimization instance selection-based cross-project software defect prediction method according to claim 1, wherein the step S2 comprises the steps of:
s2.1, constructing an optimized index empty list IPI and a source instance index list ASI;
s2.2, selecting a target instance vector;
s2.3, if the list length of the IPI is empty, constructing a global feature vector GFV of the example training set as a target example vector of the step S2.2, otherwise, the GFV is a set of standard deviations of the same metric cells of all examples in the example training set;
s2.4, constructing a source example library SIL to be selected for the source item traditional measurement element vector set SCPIVS selection example in the step S1 by using the source example index list ASI in the step S2.1;
s2.5, calculating the Euclidean distance between each instance in the source instance library SIL to be selected and the GFV, and returning an index min-index corresponding to the minimum Euclidean distance;
s2.6, adding the min-index into the IPI in the optimized index list in the step S2.1;
s2.7, deleting the min-index in the ASI;
s2.8, setting the number of the source instances selected by each target instance as k, and circularly executing the steps S2.3-S2.7 until the length of the IPI (index optimization) list meets k;
and S2.9, obtaining the target instance optimization index IPI after the step S2.8 is executed.
4. The optimization instance selection-based cross-project software defect prediction method according to claim 1, wherein the step S3 comprises the steps of:
s3.1, executing each instance in the target project according to the step S2 to obtain a target instance optimization index IPI of each target instance;
s3.2, combining and de-duplicating the optimized indexes obtained by each target instance in the step S3.1, and constructing a pre-training set optimized index TIPI;
s3.3, selecting the traditional metric vector set SCPIVS of the source item in the step S1 according to an example by using the optimal index TIPI of the pre-training set obtained in the step S3.2 to obtain an example vector set TPRED-D of the pre-training set;
s3.4, selecting the source item instance label SLABEL in the step S1 according to the instance by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain a label set TPRED-L of the pre-training set;
s3.5, constructing a pre-training set TPRED = { TPRED-D, TPRED-L }.
5. The optimization instance selection-based cross-project software defect prediction method according to claim 1, wherein the step S4 comprises the steps of:
s4.1, combining the example vector set TPRED-D of the pre-training set obtained in the step S3 with the label set TPRED-L of the pre-training set according to columns, and placing the label set in the last column;
s4.2, calculating the direct correlation between each measurement element and the last column of labels by using the spearman to obtain a correlation list CList;
s4.3, sorting all the elements of the correlation list CList in the step S4.2 from big to small after taking absolute values, and returning a feature corresponding index;
s4.4, setting the number of the selected correlation characteristic indexes as q;
s4.5, selecting the feature index returned in the step S4.3 by using the number q of the relevant feature indexes in the step S4.4, and constructing a source item relevant feature set SPTFS by using the obtained feature index;
s4.6, constructing a target item relevance feature set TPTFS by using the target item traditional metric vector set TCPIVS and the target item instance label TLABEL in the step S1 according to the steps S4.1-S4.5;
s4.7, calculating Euclidean distances of all source examples in the source item relevance feature set SPTFS and one example relevance feature set in the target item relevance feature set TPTFS, returning an index list after the Euclidean distances are sorted from small to large, setting the number of selected examples in the SPTFS as p, and then obtaining p indexes selected by the target examples;
and S4.8, processing all target instances in the target item correlation feature set TPTFS according to the step S4.7 to obtain an optimized index set, and removing the index set to obtain an optimized index TPOI of the target item.
6. The method of claim 1, wherein step S5 comprises the steps of:
s5.1, selecting an example vector set TPREDD-D of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a training feature set BOD-D selected based on the optimization example;
s5.2, selecting the label set TPRED-L of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a label set BOD-L selected based on the optimization example;
and S5.3, constructing a BOD = { BOD-D, BOD-L } of the training set selected based on the optimization example.
7. The method of claim 1, wherein step S6 comprises the steps of:
s6.1, obtaining an item vector set PVS = { SCPIVS, SLABEL, TCPIVS, TLABEL } through the step S1;
s6.2, obtaining a target instance optimization index IPI through the step S2;
s6.3, obtaining an example vector set TPRED-D of the pre-training set and a label set TPRED-L of the pre-training set through the step S3;
s6.4, obtaining the optimized index TPOI of the target item through the step S4;
s6.5, obtaining the BOD-D of the training feature set selected based on the optimization example and the BOD-L of the tag set selected based on the optimization example through the step S5;
s6.6, performing model training on the training feature set BOD-D selected based on the optimization example and the tag set BOD-L selected based on the optimization example in the step S6.5 by using a Logistic classification algorithm;
s6.7, performing defect prediction on the target item traditional measurement element vector set TCPIVS in the step S1 by using the model obtained through training in the step S6.6 to obtain a prediction LABEL set PRED _ LABEL, and calculating by combining a target item instance LABEL TLABEL through a formula to obtain f-score;
and S6.8, obtaining a cross-project software defect prediction method BOICP selected based on the optimized example.
CN202210717428.2A 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection Active CN115269377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210717428.2A CN115269377B (en) 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210717428.2A CN115269377B (en) 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection

Publications (2)

Publication Number Publication Date
CN115269377A true CN115269377A (en) 2022-11-01
CN115269377B CN115269377B (en) 2023-07-11

Family

ID=83761872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210717428.2A Active CN115269377B (en) 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection

Country Status (1)

Country Link
CN (1) CN115269377B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0768617A2 (en) * 1995-10-16 1997-04-16 AT&T Corp. An interleaved segmental method of handwriting recognition
US20080263507A1 (en) * 2007-04-17 2008-10-23 Ching-Pao Chang Action-based in-process software defect prediction software defect prediction techniques based on software development activities
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
US20150112903A1 (en) * 2013-02-28 2015-04-23 Huawei Technologies Co., Ltd. Defect prediction method and apparatus
KR101746328B1 (en) * 2016-01-29 2017-06-12 한국과학기술원 Hybrid instance selection method using nearest-neighbor for cross-project defect prediction
CN110008584A (en) * 2019-04-02 2019-07-12 广东石油化工学院 A kind of semi-supervised heterogeneous software failure prediction algorithm based on GitHub
CN111858328A (en) * 2020-07-15 2020-10-30 南通大学 Software defect module severity prediction method based on ordered neural network
CN112346974A (en) * 2020-11-07 2021-02-09 重庆大学 Cross-mobile application program instant defect prediction method based on depth feature embedding
WO2021093140A1 (en) * 2019-11-11 2021-05-20 南京邮电大学 Cross-project software defect prediction method and system thereof
CN113157564A (en) * 2021-03-17 2021-07-23 江苏师范大学 Cross-project defect prediction method based on feature distribution alignment and neighborhood instance selection
CN113176998A (en) * 2021-05-10 2021-07-27 南通大学 Cross-project software defect prediction method based on source selection
CN113268434A (en) * 2021-07-08 2021-08-17 北京邮电大学 Software defect prediction method based on Bayesian model and particle swarm optimization
CN113486902A (en) * 2021-06-29 2021-10-08 南京航空航天大学 Three-dimensional point cloud classification algorithm automatic selection method based on meta-learning
CN114117454A (en) * 2021-12-10 2022-03-01 中国电子科技集团公司第十五研究所 Seed optimization method based on vulnerability prediction model
CN114328221A (en) * 2021-12-28 2022-04-12 以萨技术股份有限公司 Cross-project software defect prediction method and system based on feature and instance migration
CN114529751A (en) * 2021-12-28 2022-05-24 国网四川省电力公司眉山供电公司 Automatic screening method for intelligent identification sample data of power scene
CN114564410A (en) * 2022-03-21 2022-05-31 南通大学 Software defect prediction method based on class level source code similarity
CN114565063A (en) * 2022-03-31 2022-05-31 南通大学 Software defect prediction method based on multi-semantic extractor

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0768617A2 (en) * 1995-10-16 1997-04-16 AT&T Corp. An interleaved segmental method of handwriting recognition
US20080263507A1 (en) * 2007-04-17 2008-10-23 Ching-Pao Chang Action-based in-process software defect prediction software defect prediction techniques based on software development activities
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
US20150112903A1 (en) * 2013-02-28 2015-04-23 Huawei Technologies Co., Ltd. Defect prediction method and apparatus
KR101746328B1 (en) * 2016-01-29 2017-06-12 한국과학기술원 Hybrid instance selection method using nearest-neighbor for cross-project defect prediction
CN110008584A (en) * 2019-04-02 2019-07-12 广东石油化工学院 A kind of semi-supervised heterogeneous software failure prediction algorithm based on GitHub
WO2021093140A1 (en) * 2019-11-11 2021-05-20 南京邮电大学 Cross-project software defect prediction method and system thereof
CN111858328A (en) * 2020-07-15 2020-10-30 南通大学 Software defect module severity prediction method based on ordered neural network
CN112346974A (en) * 2020-11-07 2021-02-09 重庆大学 Cross-mobile application program instant defect prediction method based on depth feature embedding
CN113157564A (en) * 2021-03-17 2021-07-23 江苏师范大学 Cross-project defect prediction method based on feature distribution alignment and neighborhood instance selection
CN113176998A (en) * 2021-05-10 2021-07-27 南通大学 Cross-project software defect prediction method based on source selection
CN113486902A (en) * 2021-06-29 2021-10-08 南京航空航天大学 Three-dimensional point cloud classification algorithm automatic selection method based on meta-learning
CN113268434A (en) * 2021-07-08 2021-08-17 北京邮电大学 Software defect prediction method based on Bayesian model and particle swarm optimization
CN114117454A (en) * 2021-12-10 2022-03-01 中国电子科技集团公司第十五研究所 Seed optimization method based on vulnerability prediction model
CN114328221A (en) * 2021-12-28 2022-04-12 以萨技术股份有限公司 Cross-project software defect prediction method and system based on feature and instance migration
CN114529751A (en) * 2021-12-28 2022-05-24 国网四川省电力公司眉山供电公司 Automatic screening method for intelligent identification sample data of power scene
CN114564410A (en) * 2022-03-21 2022-05-31 南通大学 Software defect prediction method based on class level source code similarity
CN114565063A (en) * 2022-03-31 2022-05-31 南通大学 Software defect prediction method based on multi-semantic extractor

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHAO NI等: "Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction", 《IEEE TRANSACTIONS ON SOFTWARE ENGINEERING》 *
PENG HE等: "Simplification of Training Data for Cross-Project Defect Prediction", 《HTTPS://DOI.ORG/10.48550/ARXIV.1405.0773》 *
李勇;刘战东;张海军;: "跨项目软件缺陷预测方法研究综述", 计算机技术与发展, no. 03 *
毛发贵;李碧雯;沈备军;: "基于实例迁移的跨项目软件缺陷预测", 计算机科学与探索, no. 01 *
王星;何鹏;陈丹;曾诚;: "跨项目缺陷预测中训练数据选择方法", 计算机应用, no. 11 *
陈翔;沈宇翔;孟少卿;崔展齐;鞠小林;王赞;: "基于多目标优化的软件缺陷预测特征选择方法", 计算机科学与探索, no. 09 *
陈翔;王莉萍;顾庆;王赞;倪超;刘望舒;王秋萍;: "跨项目软件缺陷预测方法研究综述", 计算机学报, no. 01 *
陈翔等: "静态软件缺陷预测方法研究", 《软件学报》 *

Also Published As

Publication number Publication date
CN115269377B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
CN111353030A (en) Knowledge question and answer retrieval method and device based on travel field knowledge graph
CN109117440B (en) Metadata information acquisition method, system and computer readable storage medium
CN102073708A (en) Large-scale uncertain graph database-oriented subgraph query method
US7822700B2 (en) Method for using lengths of data paths in assessing the morphological similarity of sets of data by using equivalence signatures
US8037057B2 (en) Multi-column statistics usage within index selection tools
CN112434024B (en) Relational database-oriented data dictionary generation method, device, equipment and medium
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN114491082A (en) Plan matching method based on network security emergency response knowledge graph feature extraction
CN111813744A (en) File searching method, device, equipment and storage medium
CN109656712B (en) Method and system for extracting GRIB code data
CN114900346A (en) Network security testing method and system based on knowledge graph
CN117033534A (en) Geographic information processing method, device, computer equipment and storage medium
CN115269377A (en) Cross-project software defect prediction method based on optimization instance selection
CN107122412A (en) A kind of magnanimity telephone number Rapid matching search method
CN116401212A (en) Personnel file quick searching system based on data analysis
Kitzes et al. macroeco: reproducible ecological pattern analysis in Python
CN115617689A (en) Software defect positioning method based on CNN model and domain features
CN113204676B (en) Compression storage method based on graph structure data
CN112651026B (en) Application version mining method and device with service safety problem
US11775757B2 (en) Automated machine-learning dataset preparation
CN115269378B (en) Cross-project software defect prediction method based on domain feature distribution
Fan et al. New strategy of mass spectrum simulation based on reduced and concentrated knowledge databases
CN117193889B (en) Construction method of code example library and use method of code example library
CN114860595A (en) Instance selection cross-project defect prediction method based on feature correlation analysis
CN115640577B (en) Vulnerability detection method and system for binary Internet of things firmware program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant