CN115269377B - Cross-project software defect prediction method based on optimization instance selection - Google Patents

Cross-project software defect prediction method based on optimization instance selection Download PDF

Info

Publication number
CN115269377B
CN115269377B CN202210717428.2A CN202210717428A CN115269377B CN 115269377 B CN115269377 B CN 115269377B CN 202210717428 A CN202210717428 A CN 202210717428A CN 115269377 B CN115269377 B CN 115269377B
Authority
CN
China
Prior art keywords
instance
constructing
optimization
index
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210717428.2A
Other languages
Chinese (zh)
Other versions
CN115269377A (en
Inventor
张瑞年
王楚越
王晨宇
尹思文
王超
郭伟琪
文万志
胡彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202210717428.2A priority Critical patent/CN115269377B/en
Publication of CN115269377A publication Critical patent/CN115269377A/en
Application granted granted Critical
Publication of CN115269377B publication Critical patent/CN115269377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a cross-project software defect prediction method based on optimization instance selection, which comprises the following steps: s1, constructing a project vector set PVS; s2, constructing a target instance optimization index IPI; s3, constructing a pre-training set TPRED; s4, constructing an optimization index TPOI of the target item; s5, constructing a training set BOD selected based on the optimization examples; s6, constructing a cross-project software defect prediction method BOICP based on optimization instance selection. The invention provides a cross-project software defect prediction method based on optimized example selection, which realizes source example selection by constructing a global feature vector of a target example, then further optimizes example selection by using correlation analysis, and a training set constructed by the method is beneficial to selecting reliable example data to realize better cross-project defect prediction effect.

Description

Cross-project software defect prediction method based on optimization instance selection
Technical Field
The invention belongs to the technical field of software defect prediction, and particularly relates to a cross-project software defect prediction method based on optimized example selection, which aims at optimizing source example selection of a target example vector in a target project and further improves cross-project defect prediction results.
Background
Researchers need to implement software defect prediction with the help of historical data, however there is often insufficient historical data for a new system, and one way to solve this problem is to select historical data from other projects, use these to build defect prediction models and make defect predictions for new projects.
For an item with a large amount of data, a researcher needs to think about how to select instance data which is more suitable for a target item, and the more consistent the source instance data and the target item data are, the more accurate a defect prediction model is established.
Disclosure of Invention
The invention aims to provide a cross-project software defect prediction method based on optimized example selection, which is used for realizing source example selection by constructing global dynamic characteristics of target examples in the example selection process, and further optimizing example selection by using correlation analysis, thereby being beneficial to realizing better cross-project defect prediction effect.
In order to solve the above technical problems, an embodiment of the present invention provides a cross-project software defect prediction method based on optimization instance selection, including the following steps:
s1, constructing a project vector set PVS;
s2, constructing a target instance optimization index IPI;
s3, constructing a pre-training set TPRED;
s4, constructing an optimization index TPOI of the target item;
s5, constructing a training set BOD selected based on the optimization examples;
s6, constructing a cross-project software defect prediction method BOICP based on optimization instance selection.
Wherein, step S1 includes the following steps:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing a traditional measurement element set { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBW, AMC, ca, ce, max_CC, avg_CC, LOC } based on an open source data history record, a project source code grammar structure and a source code abstract grammar tree, wherein WMC represents a weighting method of each class, DIT represents the depth of an inheritance tree, NOC represents the number of subclasses, CBO represents coupling among object classes, RFC represents the response of one class, LCOM and LCOM3 represent the condensation force lacking in the method, NPM represents the number of common classes, DAM represents a data access index, MOA represents an aggregated measure, MFA represents a measure of functional abstraction, CAM represents aggregation among class methods, IC represents inheritance coupling, CBW represents coupling among methods, AMC represents average method complexity, ca represents incoming coupling, ce represents outgoing coupling, max_CC represents the maximum value of McCabe circle complexity, avg_CC represents the average value of McCabe circle complexity, and LOC represents the number of lines of codes;
s1.4, processing all the examples in the source item according to the step S1.3 to obtain a traditional metric element vector set SCPIVS= [ instance 1 , instance 2 , …, instance i ]Wherein i=1, 2,3, …, n;
s1.5, processing all the examples in the target item according to the step S1.3 to obtain a traditional metric element vector set TCPIVS= [ transformation_value ] of the target item 1 , tradition_value 2 , …, tradition_value j ]Wherein j=1, 2,3, …, m;
s1.6, constructing a source item instance tag SLABEL= [ gag ] based on open source data history 1 , stag 2 , …, stag i ]Wherein i=1, 2,3, …, n; the label corresponds to an instance in the conventional metric element vector set SCPIVS of the source item in the step S1.4;
s1.7, constructing a target item instance label TLABEL= [ ttag ] based on open source data history 1 , ttag 2 , …, ttag j ]Wherein j=1, 2,3, …, m; the label corresponds to an instance in the traditional metric element vector set TCPIVS of the target item in the step S1.5;
s1.8, build item vector set pvs= { SCPIVS, SLABEL, TCPIVS, TLABEL }.
Wherein, step S2 includes the following steps:
s2.1, constructing a target instance optimization index IPI and a source instance index list ASI;
s2.2, selecting a target instance vector;
s2.3, if the list length of the IPI is empty, constructing a global feature vector GFV of the instance training set as a target instance vector of the step S2.2, otherwise, GFV is a set of the same metric standard deviation of all instances in the instance training set;
s2.4, constructing a source instance library SIL to be selected for the source item traditional metric meta-vector set SCPIVS selection instance in the step S1 by using the source instance index list ASI in the step S2.1;
s2.5, calculating Euclidean distance between each instance in the SIL of the source instance library to be selected and the GFV, and returning an index min-index corresponding to the minimum Euclidean distance;
s2.6, adding the min-index into the target instance optimization index IPI of the step S2.1;
s2.7, deleting the min-index in the source instance index list ASI;
s2.8, setting the number of source instances selected by each target instance as k, and circularly executing the steps S2.3-S2.7 until the length of the target instance optimization index IPI meets k;
s2.9, after the step S2.8 is executed, the target instance optimization index IPI is obtained.
Wherein, step S3 includes the following steps:
s3.1, executing each instance in the target item according to the step S2 to obtain a target instance optimization index IPI of each target instance;
s3.2, combining and de-duplicating the optimization indexes obtained by each target instance in the step S3.1, and constructing a pre-training set optimization index TIPI;
s3.3, selecting the conventional metric element vector set SCPIVS of the source project in the step S1 according to examples by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain an example vector set TPRED-D of the pre-training set;
s3.4, selecting the source item instance label SLABEL of the step S1 according to an instance by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain a label set TPRED-L of the pre-training set;
s3.5, constructing a pre-training set TPRED= { TPRED-D, TPRED-L }.
Wherein, step S4 includes the following steps:
s4.1, combining the instance vector set TPRED-D of the pre-training set obtained in the step S3 with the label set TPRED-L of the pre-training set according to columns, and placing the label set in the last column;
s4.2, calculating the direct correlation between each metric element and the last column of labels by using the clearman to obtain a correlation list CList;
s4.3, after taking absolute values of all elements of the correlation list CList in the step S4.2, sorting from large to small, and returning to the feature corresponding index;
s4.4, setting the number of the indexes of the selected correlation characteristic as q;
s4.5, selecting the feature indexes returned in the step S4.3 by using the number q of the correlation feature indexes in the step S4.4, and constructing a source item correlation feature set SPTFS by using the obtained feature indexes;
s4.6, constructing a target item correlation feature set TPTFS by using the target item traditional metric element vector set TCPIVS and the target item instance label TLABEL in the step S1 according to the steps S4.1-S4.5;
s4.7, calculating Euclidean distances between all source examples in the source project correlation feature set SPTFS and one example correlation feature set in the target project correlation feature set TPTFS, returning to an index list after sequencing from small to large, setting the number of selected examples in the SPTFS as p, and obtaining p indexes selected by the target examples;
s4.8, processing all target instances in the target item correlation feature set TPTFS according to the step S4.7 to obtain an optimized index set, and de-duplicating the index set to obtain an optimized index TPOI of the target item.
Wherein, step S5 includes the following steps:
s5.1, selecting an instance vector set TPRED-D of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a training feature set BOD-D selected based on an optimization instance;
s5.2, selecting a tag set TPRED-L of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a tag set BOD-L selected based on an optimization example;
s5.3, constructing a training set BOD= { BOD-D, BOD-L } selected based on the optimization example.
Wherein, step S6 includes the following steps:
s6.1, obtaining a project vector set PVS= { SCPIVS, SLABEL, TCPIVS, TLABEL }, through the step S1;
s6.2, obtaining a target instance optimization index IPI through the step S2;
s6.3, obtaining an instance vector set TPRED-D of the pre-training set and a label set TPRED-L of the pre-training set through the step S3;
s6.4, obtaining an optimized index TPOI of the target item through the step S4;
s6.5, obtaining a training feature set BOD-D selected based on the optimization examples and a label set BOD-L selected based on the optimization examples through the step S5;
s6.6, performing model training on the training feature set BOD-D selected based on the optimization examples and the label set BOD-L selected based on the optimization examples in the step S6.5 by using a Logistic classification algorithm;
s6.7, performing defect prediction on the traditional metric element vector set TCPIVS of the target item in the step S1 by using the model trained in the step S6.6 to obtain a prediction LABEL set PRED_LABEL, and obtaining f-score through formula calculation by combining the target item instance LABEL TLABEL;
s6.8, obtaining a cross-project software defect prediction method BOICP selected based on the optimization examples.
The technical scheme of the invention has the following beneficial effects:
the invention provides a cross-project software defect prediction method based on optimization example selection, which comprises the steps of firstly constructing a global feature vector for each target example, using the vector to select examples from source projects, then using correlation analysis in a selected training set, using correlation characteristics of the examples to further select the source examples, forming a training data set by using all the selected source examples, and using the training data set to establish a cross-project defect prediction model, thereby being beneficial to realizing a better cross-project defect prediction effect.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the training set BOD selected based on the optimization examples in the present invention;
FIG. 3 is a chart showing selected example numbers at different k in the present invention;
FIG. 4 is a graph of f-score obtained using Logistic at different k in the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
As shown in FIG. 1, the invention provides a cross-project software defect prediction method based on optimization instance selection, which comprises the following steps:
s1, constructing a project vector set PVS;
s2, constructing a target instance optimization index IPI;
s3, constructing a pre-training set TPRED;
s4, constructing an optimization index TPOI of the target item;
s5, constructing a training set BOD selected based on the optimization examples;
s6, constructing a cross-project software defect prediction method BOICP based on optimization instance selection.
The specific steps of constructing the project vector set PVS are as follows:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing feature sets { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM, NPM, DAM, MOA, MFA, CAM, IC, CBW, AMC, ca, ce, max_CC, avg_CC, LOC } based on an open source data history, a project source code grammar structure and a source code abstract grammar tree, wherein WMC represents a weighted method of each class, DIT represents the depth of an inheritance tree, NOC represents the number of subclasses, CBO represents coupling among object classes, RFC represents the response of one class, LCOM and LCOM3 represent the condensation force lacking in the method, NPM represents the number of common classes, DAM represents a data access index, MOA represents an aggregated measure, MFA represents a measure of functional abstraction, CAM represents aggregation among class methods, IC represents inheritance coupling, CBW represents coupling among methods, AMC represents average method complexity, ce represents outgoing coupling, max_CC represents the maximum value of McCabe circle complexity, avg_CC represents the average value of McCabe circle complexity, and LOC represents the number of lines of codes.
S1.4, processing all the examples in the source item according to the steps to obtain a traditional metric element vector set SCPIVS= [ transformation_value ] of the source item 1 , tradition_value 2 , …, tradition_ value i ]Wherein i=1, 2,3, …, n;
s1.5, processing all the instances in the target item according to the same steps to obtain a traditional metric element vector set TCPIVS= [ transformation_value ] of the target item 1 , tradition_value 2 , …, tradition_value j ]Where j=1, 2,3, …, m.
S1.6, constructing a source item instance tag SLABEL= [ gag ] based on open source data history 1 , stag 2 , …, stag i ]Wherein i=1, 2,3, …, n; the label corresponds to an instance in the source project traditional metric element vector set SCPIVS;
s1.7, constructing a target item instance label TLABEL= [ ttag ] based on open source data history 1 , ttag 2 , …, ttag j ]Wherein j=1, 2,3, …, m; the tag corresponds to an instance in the target item traditional metric meta-vector set TCPIVS.
S1.8, build item vector set pvs= { SCPIVS, SLABEL, TCPIVS, TLABEL }.
The specific steps of constructing the target instance optimization index IPI are as follows:
s2.1, constructing a target instance optimization index IPI and a source instance index list ASI.
S2.2, selecting a target instance vector;
s2.3, if the list length of the IPI is empty, constructing a global feature vector GFV of the instance training set as the target instance vector of the step S2.2, otherwise, GFV is the set of the same metric standard deviation of all instances in the instance training set.
S2.4, constructing a source instance library SIL to be selected for the source item traditional metric meta-vector set SCPIVS selection instance in the step S1 by using the source instance index list ASI in the step S2.1.
S2.5, calculating Euclidean distance between each instance in the SIL of the source instance library to be selected and the GFV, and returning an index min-index corresponding to the minimum Euclidean distance.
S2.6, adding the min-index into the target instance optimization index IPI of the step S2.1;
s2.7, deleting the min-index in the source instance index list ASI.
S2.8, setting the number of source instances selected by each target instance as 5, and circularly executing the steps S2.3-S2.7 until the length of the target instance optimization index IPI meets k;
s2.9, after the step S2.8 is executed, the target instance optimization index IPI is obtained.
The specific steps of constructing the pre-training set TPRED are as follows:
s3.1, executing each instance in the target item according to the steps S2.1-S2.9 to obtain a target instance optimization index IPI of each target instance;
and S3.2, combining and de-duplicating the optimization indexes obtained by each target instance in the step S3.1, and constructing a pre-training set optimization index TIPI.
S3.3, selecting the conventional metric element vector set SCPIVS of the source project in the step S1.8 according to examples by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain an example vector set TPRED-D of the pre-training set;
s3.4, selecting the source item instance label SLABEL of the step S1.6 according to the instance by using the pre-training set optimization index TIPI obtained in the step S3.2, and obtaining a label set TPRED-L of the pre-training set.
S3.5, constructing a pre-training set TPRED= { TPRED-D, TPRED-L }.
The specific steps of constructing the optimization index TPOI of the target item are as follows:
s4.1, combining the instance vector set TPRED-D of the pre-training set obtained in the step S3.5 with the label set TPRED-L of the pre-training set according to columns, and placing the label set in the last column;
s4.2, calculating the direct correlation between each metric element and the last column of labels by using the clearman to obtain a correlation list CList.
S4.3, after taking absolute values of all elements of the correlation list CList in the step S4.2, sorting from large to small, and returning to the feature corresponding index;
s4.4, setting the number of the selected correlation characteristic indexes to 10;
and S4.5, selecting the feature indexes returned in the step S4.3 by using the number of the correlation feature indexes in the step S4.4, and constructing a source item correlation feature set SPTFS by using the obtained feature indexes.
S4.6, constructing a target item correlation feature set TPTFS by using the target item traditional metric element vector set TCPIVS of the step S1.5 and the target item instance label TLABEL of the step S1.7 according to the steps S4.1-S4.5.
S4.7, calculating Euclidean distances between all source examples in the source project correlation feature set SPTFS and one example correlation feature set in the target project correlation feature set TPTFS, returning to an index list after sequencing from small to large, setting the number of selected examples 2 from the SPTFS, and obtaining 2 indexes selected by the target examples.
S4.8, processing all target instances in the target item correlation feature set TPTFS according to the step S4.7 to obtain an optimized index set, and de-duplicating the index set to obtain an optimized index TPOI of the target item.
Step S5, the specific steps of constructing a training set BOD selected based on the optimization examples are as follows:
s5.1, selecting an instance vector set TPRED-D of the pre-training set in the step S3.3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a training feature set BOD-D selected based on the optimization instance.
S5.2, selecting the tag set TPRED-L of the pre-training set in the step S3.3 by using the optimization index TPOI of the target item obtained in the step S4, and obtaining the tag set BOD-L selected based on the optimization example.
S5.3, constructing a training set BOD= { BOD-D, BOD-L } selected based on the optimization example.
A flowchart for constructing the training set BOD selected based on the optimization instance is shown in fig. 2.
Step S6, constructing a cross-project software defect prediction method BOICP selected based on an optimization example, wherein the specific steps are as follows:
ivy-2.0 was selected as the source item and synapse-1.2 was selected as the target item. And constructing a source project traditional metric element vector set SCPIVS and a source project instance label SLABEL according to the source project instance condition, and constructing a target project traditional metric element vector set TCPIVS and a target project instance label TLABEL according to the target project instance condition.
The target instance optimization index IPI when the number of different source instances is selected is obtained according to the defined method for constructing the target instance optimization index.
And obtaining an instance vector set TPRED-D of the pre-training set and a label set TPRED-L of the pre-training set according to the method for constructing the pre-training set.
And obtaining the optimized index TPOI of the target item according to the optimized index method for constructing the target item defined above.
The number of selected features k is set in the range of 1 to 5, and at each k, an example selection training set bod= { BOD-D, BOD-D } based on the feature subset is obtained.
Establishing a classification model for the example selection training set BSID based on the feature subset by using a Logistic classifier and predicting, wherein experiments show that the maximum f-score value obtained by the model is 0.343, and the value is greater than 0.149 obtained without using the example selection method; the model performance established by the invention is superior to the model established without using the example selection method, thereby showing the effectiveness of the cross-project software defect prediction method based on the optimization example selection.
The number of examples selected for different k is shown in figure 3.
The f-score obtained using Logistic at different k is shown in FIG. 4.
According to the method, a global feature vector is firstly constructed for each target instance, instance selection is carried out from source items by using the vector, then correlation analysis is used in a selected training set, source instances are further selected by using the correlation features of the instances, all the selected source instances form a training data set, and a cross-item defect prediction model is built by using the training data set, so that better cross-item defect prediction effect is achieved.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (1)

1. The cross-project software defect prediction method based on the optimization example selection is characterized by comprising the following steps:
s1, constructing a project vector set PVS;
s2, constructing a target instance optimization index IPI;
s3, constructing a pre-training set TPRED;
s4, constructing an optimization index TPOI of the target item;
s5, constructing a training set BOD selected based on the optimization examples;
s6, constructing a cross-project software defect prediction method BOICP based on optimization instance selection;
step S1 comprises the steps of:
s1.1, acquiring a software project set based on an open source website;
s1.2, constructing a project instance set by taking a project class as an instance;
s1.3, constructing a traditional measurement element set { WMC, DIT, NOC, CBO, RFC, LCOM, LCOM3, NPM, DAM, MOA, MFA, CAM, IC, CBW, AMC, ca, ce, max_CC, avg_CC, LOC } based on an open source data history record, a project source code grammar structure and a source code abstract grammar tree, wherein WMC represents a weighting method of each class, DIT represents the depth of an inheritance tree, NOC represents the number of subclasses, CBO represents coupling among object classes, RFC represents the response of one class, LCOM and LCOM3 represent the condensation force lacking in the method, NPM represents the number of common classes, DAM represents a data access index, MOA represents an aggregated measure, MFA represents a measure of functional abstraction, CAM represents aggregation among class methods, IC represents inheritance coupling, CBW represents coupling among methods, AMC represents average method complexity, ca represents incoming coupling, ce represents outgoing coupling, max_CC represents the maximum value of McCabe circle complexity, avg_CC represents the average value of McCabe circle complexity, and LOC represents the number of lines of codes;
s1.4, processing all the examples in the source item according to the step S1.3 to obtain a traditional metric element vector set SCPIVS= [ instance 1 , instance 2 , …, instance i ]Wherein i=1, 2,3, …, n;
s1.5, processing all the examples in the target item according to the step S1.3 to obtain a traditional metric element vector set TCPIVS= [ transformation_value ] of the target item 1 , tradition_value 2 , …, tradition_value j ]Wherein j=1, 2,3, …, m;
s1.6, constructing a source item instance tag SLABEL= [ gag ] based on open source data history 1 , stag 2 , …, stag i ]Wherein i=1, 2,3, …, n; the label corresponds to an instance in the conventional metric element vector set SCPIVS of the source item in the step S1.4;
s1.7, constructing a target item instance label TLABEL= [ ttag ] based on open source data history 1 , ttag 2 , …, ttag j ]Wherein j=1, 2,3, …, m; the label corresponds to an instance in the traditional metric element vector set TCPIVS of the target item in the step S1.5;
s1.8, constructing a project vector set PVS= { SCPIVS, SLABEL, TCPIVS, TLABEL };
step S2 includes the steps of:
s2.1, constructing a target instance optimization index IPI and a source instance index list ASI;
s2.2, selecting a target instance vector;
s2.3, if the list length of the IPI is empty, constructing a global feature vector GFV of the instance training set as a target instance vector of the step S2.2, otherwise, GFV is a set of the same metric standard deviation of all instances in the instance training set;
s2.4, constructing a source instance library SIL to be selected for the source item traditional metric meta-vector set SCPIVS selection instance in the step S1 by using the source instance index list ASI in the step S2.1;
s2.5, calculating Euclidean distance between each instance in the SIL of the source instance library to be selected and the GFV, and returning an index min-index corresponding to the minimum Euclidean distance;
s2.6, adding the min-index into the target instance optimization index IPI of the step S2.1;
s2.7, deleting the min-index in the source instance index list ASI;
s2.8, setting the number of source instances selected by each target instance as k, and circularly executing the steps S2.3-S2.7 until the length of the target instance optimization index IPI meets k;
s2.9, after the step S2.8 is executed, obtaining a target instance optimization index IPI;
step S3 includes the steps of:
s3.1, executing each instance in the target item according to the step S2 to obtain a target instance optimization index IPI of each target instance;
s3.2, combining and de-duplicating the optimization indexes obtained by each target instance in the step S3.1, and constructing a pre-training set optimization index TIPI;
s3.3, selecting the conventional metric element vector set SCPIVS of the source project in the step S1 according to examples by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain an example vector set TPRED-D of the pre-training set;
s3.4, selecting the source item instance label SLABEL of the step S1 according to an instance by using the pre-training set optimization index TIPI obtained in the step S3.2 to obtain a label set TPRED-L of the pre-training set;
s3.5, constructing a pre-training set TPRED= { TPRED-D, TPRED-L };
step S4 includes the steps of:
s4.1, combining the instance vector set TPRED-D of the pre-training set obtained in the step S3 with the label set TPRED-L of the pre-training set according to columns, and placing the label set in the last column;
s4.2, calculating the direct correlation between each metric element and the last column of labels by using the clearman to obtain a correlation list CList;
s4.3, after taking absolute values of all elements of the correlation list CList in the step S4.2, sorting from large to small, and returning to the feature corresponding index;
s4.4, setting the number of the indexes of the selected correlation characteristic as q;
s4.5, selecting the feature indexes returned in the step S4.3 by using the number q of the correlation feature indexes in the step S4.4, and constructing a source item correlation feature set SPTFS by using the obtained feature indexes;
s4.6, constructing a target item correlation feature set TPTFS by using the target item traditional metric element vector set TCPIVS and the target item instance label TLABEL in the step S1 according to the steps S4.1-S4.5;
s4.7, calculating Euclidean distances between all source examples in the source project correlation feature set SPTFS and one example correlation feature set in the target project correlation feature set TPTFS, returning to an index list after sequencing from small to large, setting the number of selected examples in the SPTFS as p, and obtaining p indexes selected by the target examples;
s4.8, processing all target examples in the target item correlation feature set TPTFS according to the step S4.7 to obtain an optimized index set, and de-duplicating the index set to obtain an optimized index TPOI of the target item;
step S5 includes the steps of:
s5.1, selecting an instance vector set TPRED-D of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a training feature set BOD-D selected based on an optimization instance;
s5.2, selecting a tag set TPRED-L of the pre-training set in the step S3 by using the optimization index TPOI of the target item obtained in the step S4 to obtain a tag set BOD-L selected based on an optimization example;
s5.3, constructing a training set BOD= { BOD-D, BOD-L } selected based on the optimization example;
step S6 includes the steps of:
s6.1, obtaining a project vector set PVS= { SCPIVS, SLABEL, TCPIVS, TLABEL }, through the step S1;
s6.2, obtaining a target instance optimization index IPI through the step S2;
s6.3, obtaining an instance vector set TPRED-D of the pre-training set and a label set TPRED-L of the pre-training set through the step S3;
s6.4, obtaining an optimized index TPOI of the target item through the step S4;
s6.5, obtaining a training feature set BOD-D selected based on the optimization examples and a label set BOD-L selected based on the optimization examples through the step S5;
s6.6, performing model training on the training feature set BOD-D selected based on the optimization examples and the label set BOD-L selected based on the optimization examples in the step S6.5 by using a Logistic classification algorithm;
s6.7, performing defect prediction on the traditional metric element vector set TCPIVS of the target item in the step S1 by using the model trained in the step S6.6 to obtain a prediction LABEL set PRED_LABEL, and obtaining f-score through formula calculation by combining the target item instance LABEL TLABEL;
s6.8, obtaining a cross-project software defect prediction method BOICP selected based on the optimization examples.
CN202210717428.2A 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection Active CN115269377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210717428.2A CN115269377B (en) 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210717428.2A CN115269377B (en) 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection

Publications (2)

Publication Number Publication Date
CN115269377A CN115269377A (en) 2022-11-01
CN115269377B true CN115269377B (en) 2023-07-11

Family

ID=83761872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210717428.2A Active CN115269377B (en) 2022-06-23 2022-06-23 Cross-project software defect prediction method based on optimization instance selection

Country Status (1)

Country Link
CN (1) CN115269377B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328221A (en) * 2021-12-28 2022-04-12 以萨技术股份有限公司 Cross-project software defect prediction method and system based on feature and instance migration
CN114565063A (en) * 2022-03-31 2022-05-31 南通大学 Software defect prediction method based on multi-semantic extractor

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09128490A (en) * 1995-10-16 1997-05-16 Lucent Technol Inc Method and apparatus for recognition of handwritten character
US7856616B2 (en) * 2007-04-17 2010-12-21 National Defense University Action-based in-process software defect prediction software defect prediction techniques based on software development activities
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
CN104021264B (en) * 2013-02-28 2017-06-20 华为技术有限公司 A kind of failure prediction method and device
KR101746328B1 (en) * 2016-01-29 2017-06-12 한국과학기술원 Hybrid instance selection method using nearest-neighbor for cross-project defect prediction
CN110008584B (en) * 2019-04-02 2020-11-06 广东石油化工学院 GitHub-based semi-supervised heterogeneous software defect prediction method
CN110825644B (en) * 2019-11-11 2021-06-11 南京邮电大学 Cross-project software defect prediction method and system
CN111858328B (en) * 2020-07-15 2021-11-12 南通大学 Software defect module severity prediction method based on ordered neural network
CN112346974B (en) * 2020-11-07 2023-08-22 重庆大学 Depth feature embedding-based cross-mobile application program instant defect prediction method
CN113157564B (en) * 2021-03-17 2023-11-07 江苏师范大学 Cross-project defect prediction method based on feature distribution alignment and neighborhood instance selection
CN113176998A (en) * 2021-05-10 2021-07-27 南通大学 Cross-project software defect prediction method based on source selection
CN113486902A (en) * 2021-06-29 2021-10-08 南京航空航天大学 Three-dimensional point cloud classification algorithm automatic selection method based on meta-learning
CN113268434B (en) * 2021-07-08 2022-07-26 北京邮电大学 Software defect prediction method based on Bayes model and particle swarm optimization
CN114117454A (en) * 2021-12-10 2022-03-01 中国电子科技集团公司第十五研究所 Seed optimization method based on vulnerability prediction model
CN114529751B (en) * 2021-12-28 2024-06-21 国网四川省电力公司眉山供电公司 Automatic screening method for intelligent identification sample data of power scene
CN114564410A (en) * 2022-03-21 2022-05-31 南通大学 Software defect prediction method based on class level source code similarity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328221A (en) * 2021-12-28 2022-04-12 以萨技术股份有限公司 Cross-project software defect prediction method and system based on feature and instance migration
CN114565063A (en) * 2022-03-31 2022-05-31 南通大学 Software defect prediction method based on multi-semantic extractor

Also Published As

Publication number Publication date
CN115269377A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN102243647B (en) Higher-order knowledge is extracted from structural data
CN110110858B (en) Automatic machine learning method based on reinforcement learning
CN110457514A (en) A kind of multi-tag image search method based on depth Hash
CN111325264A (en) Multi-label data classification method based on entropy
CN102073708A (en) Large-scale uncertain graph database-oriented subgraph query method
CN107832458A (en) A kind of file classification method based on depth of nesting network of character level
CN117609470B (en) Question-answering system based on large language model and knowledge graph, construction method thereof and intelligent data management platform
JP2022530447A (en) Chinese word division method based on deep learning, equipment, storage media and computer equipment
CN109753517A (en) A kind of method, apparatus, computer storage medium and the terminal of information inquiry
CN114491082A (en) Plan matching method based on network security emergency response knowledge graph feature extraction
CN115310355A (en) Multi-energy coupling-considered multi-load prediction method and system for comprehensive energy system
EP4363997A1 (en) Using query logs to optimize execution of parametric queries
CN113076089B (en) API (application program interface) completion method based on object type
CN115269377B (en) Cross-project software defect prediction method based on optimization instance selection
CN111753151B (en) Service recommendation method based on Internet user behavior
CN117198427A (en) Molecule generation method and device, electronic equipment and storage medium
CN116861373A (en) Query selectivity estimation method, system, terminal equipment and storage medium
CN115269378B (en) Cross-project software defect prediction method based on domain feature distribution
CN114565063A (en) Software defect prediction method based on multi-semantic extractor
CN110309273A (en) Answering method and device
Mukherjee et al. Frequent item set, sequential pattern mining and sequence prediction: structures and algorithms
CN118471327B (en) Genome prediction method and device based on genotype and environment interaction heterograms
CN114860595A (en) Instance selection cross-project defect prediction method based on feature correlation analysis
CN117193889B (en) Construction method of code example library and use method of code example library
CN114896150A (en) Cross-project defect prediction method based on instance selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant