CN114328174A - Multi-view software defect prediction method and system based on counterstudy - Google Patents

Multi-view software defect prediction method and system based on counterstudy Download PDF

Info

Publication number
CN114328174A
CN114328174A CN202111329931.2A CN202111329931A CN114328174A CN 114328174 A CN114328174 A CN 114328174A CN 202111329931 A CN202111329931 A CN 202111329931A CN 114328174 A CN114328174 A CN 114328174A
Authority
CN
China
Prior art keywords
view
software module
network model
static
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111329931.2A
Other languages
Chinese (zh)
Inventor
韩璐
严军荣
潘方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunwave Communications Co Ltd
Original Assignee
Sunwave Communications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunwave Communications Co Ltd filed Critical Sunwave Communications Co Ltd
Priority to CN202111329931.2A priority Critical patent/CN114328174A/en
Publication of CN114328174A publication Critical patent/CN114328174A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a multi-view software defect prediction method and a system based on counterstudy, wherein the method comprises the following steps: constructing a first network model according to sample data of a multi-view software module, wherein the first network model is used for constructing an inter-view discriminant analysis loss function for distinguishing similar views from heterogeneous views through depth measurement learning; constructing a second network model according to the multi-view software module sample data, wherein the second network model is used for constructing a countermeasure loss function for distinguishing different software module views in a public subspace through countermeasure learning; constructing a third network model according to the first network model and the second network model; and inputting the multi-view software test data into the third network model to obtain a prediction result. The invention solves the technical problems of poor prediction performance and low accuracy of prediction results of the existing single-view-based software defect prediction technology.

Description

Multi-view software defect prediction method and system based on counterstudy
Technical Field
The invention belongs to the technical field of software defect prediction, and particularly relates to a multi-view software defect prediction method and system based on counterstudy.
Background
The existing software defect prediction method is generally that a software module set is constructed based on a metric element, then a prediction model is designed in the existing software module according to historical data, and finally the existing tendency of defects of the new software module is predicted.
In recent years, with the development of Deep Neural Networks (DNNs), software defect prediction methods based on generation of countermeasure networks (GANs) have become a new focus of research. For example, chinese patent publication No. CN113419948A, "a prediction method for deep learning cross-project software defects based on GAN network", proposes to use a simplified abstract syntax tree to represent the code of each extracted program module in the target project and the source project; extracting token vectors through a depth traversal abstract syntax tree; performing word embedding on the token vector to obtain a word vector corresponding to each word, replacing the token in the token vector with the word vector, and converting the token vector into a numerical value vector; taking a numerical vector corresponding to a source item as input, and training a source encoder and a source classifier; taking a numerical vector corresponding to a target item as input, and setting initial parameters of a target encoder to be the same as parameters of a trained source encoder; taking the output characteristics of the trained source encoder as real data in the GAN network, taking the output characteristics of the target encoder as false data, and training through a discriminator of the GAN network; classifying the output characteristics of the target encoder by using a trained source classifier; and outputting a classification result. Chinese patent CN110162475A, a software defect prediction method based on deep migration, proposes to convert source code files of source items and target items into image files by a visualization method; constructing a deep migration network; constructing a loss function according to the maximum mean difference between the training sample characteristics and the test sample characteristics extracted by adopting a self-attention mechanism and the cross entropy of the prediction output of the deep migration network and the truth label self-checking of the sample, and training the deep migration network by taking the convergence of the loss function as a target to obtain a software defect prediction model; when the method is applied, the source code file to be detected is converted into an image by a visualization method, the image is input into a software defect prediction model, and a defect prediction result of the source code file to be detected is output after calculation.
The software defect prediction technology based on the countermeasure network is mainly based on a single view, and the obtained measurement element attributes are directly connected in series to be used as a sample vector for subsequent feature learning. However, in the process of extracting the attributes of the software module sample metric elements, the metric elements can be divided into static software module views and dynamic software module views from the aspects of static metrics and dynamic metrics, and single-view data often lacks complementary information compared with multi-view data. Therefore, the existing single-view-based software defect prediction technology has poor prediction performance and low accuracy of prediction results.
At present, a multi-view-oriented software defect prediction technology based on a countermeasure network does not exist, and therefore a multi-view software defect prediction method based on countermeasure learning is provided.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and a system for predicting multi-view software defects based on counterstudy.
The invention discloses a multi-view software defect prediction method based on antagonistic learning, which is characterized by comprising the following steps of:
constructing a first network model according to sample data of a multi-view software module, wherein the first network model is used for constructing an inter-view discriminant analysis loss function for distinguishing similar views from heterogeneous views through depth measurement learning;
constructing a second network model according to the multi-view software module sample data, wherein the second network model is used for constructing a countermeasure loss function for distinguishing different software module views in a public subspace through countermeasure learning;
constructing a third network model according to the first network model and the second network model;
and inputting the multi-view software test data into the third network model to obtain a prediction result.
Preferably, before the first network model is constructed according to the multi-view software module sample data, the method further comprises the following steps:
normalizing the metric elements of the multi-view software module;
and nonlinearly projecting the sample data of the normalized multi-view software module to a public subspace.
Further preferably, the normalizing the metric of the multi-view software module includes the steps of:
any software module sample in the software warehouse is represented as a static software module view consisting of static measurement elements and a dynamic software module view consisting of dynamic measurement elements; the static measurement element represents the attribute information of the item after the item is developed and is counted, and the dynamic measurement element represents the attribute information recorded in the development process;
and (5) enabling the measurement elements in the static software module view and the dynamic software module view to be in the same dimension, namely an interval [0,1] through a min-max normalization method.
Further preferably, the non-linearly projecting the sample data of the normalized multi-view software module to the common subspace includes:
extracting static software module view initial features from a static software module view dataset
Figure BDA0003346428100000031
Extracting dynamic software module view initial features from a dynamic software module view dataset
Figure BDA0003346428100000032
Constructing a parameter-shared two-channel network;
initializing static software module views to an initial feature
Figure BDA0003346428100000033
Inputting four-layer FNN network to obtain static software module view specific characteristics
Figure BDA0003346428100000034
Initializing dynamic software module views to features
Figure BDA0003346428100000035
Inputting four-layer FNN network to obtain specific characteristics of dynamic software module view
Figure BDA0003346428100000036
Wherein
Figure BDA0003346428100000037
A four-layer FNN network mapping function representing a view of static software modules,
Figure BDA0003346428100000038
a four-layer FNN network mapping function representing a view of the dynamic software module,
Figure BDA0003346428100000039
sharing a network parameter θFNN
Preferably, the constructing of the inter-view discriminant analysis loss function for distinguishing homogeneous views from heterogeneous views through depth metric learning includes the steps of:
calculating distances between corresponding sample features of the static software module view and the dynamic software module view in a common subspace
Figure BDA00033464281000000310
Wherein S (i) represents sample features of a static software module view, and D (j) represents sample features of a dynamic software module view;
using a value 1 to represent that the sample type is defective, using a value 0 to represent that the sample type is non-defective, wherein samples which consist of the same measurement elements and have the same sample type belong to the same type view, and samples which consist of the same measurement elements and have different sample types belong to the different type view;
constructing an inter-view discriminant analysis loss function LG
Figure BDA0003346428100000041
Figure BDA0003346428100000042
Where the function h (t) max (0, t) represents the hinge loss function, γ is a previously set hyperparameter, τ is a previously set positive threshold, l () represents the sample class,
Figure BDA0003346428100000043
representing a static software module view of an original sample,
Figure BDA0003346428100000044
representing a dynamic software module view raw samples.
Preferably, the constructing of the countermeasure loss function for distinguishing different software module views in a common subspace through countermeasure learning includes the steps of:
constructing static software module view discriminators in a common subspace
Figure BDA0003346428100000045
And dynamic software module view discriminator
Figure BDA0003346428100000046
Taking the view characteristics of the static software module view as a real sample, taking the view characteristics of the dynamic software module view as a generation sample, and establishing a resistance loss function based on the static software module view:
Figure BDA0003346428100000047
wherein Pdata is the view characteristics of the static software module view in the common subspace, PG is the view characteristics of the dynamic software module view in the common subspace,
Figure BDA0003346428100000048
discriminator for static software module view
Figure BDA0003346428100000049
Network parameter of, Ex~PdataA data distribution representing a view of a static software module,
Figure BDA00033464281000000410
a data distribution representing a dynamic software module view;
the view characteristics of the dynamic software module view are used as real samples, and the view characteristics of the static software module viewCharacterizing as a generation sample, and establishing a resistance loss function based on a dynamic software module view:
Figure BDA00033464281000000411
Figure BDA00033464281000000412
wherein
Figure BDA00033464281000000413
Discriminator for dynamic software module view
Figure BDA00033464281000000414
The network parameter of (2);
obtaining a discriminant loss function of the countermeasure network according to the countermeasure loss function based on the static software module view and the countermeasure loss function based on the dynamic software module view:
Figure BDA0003346428100000051
Figure BDA0003346428100000052
preferably, the constructing the third network model according to the first network model and the second network model is combining an inter-view discriminant analysis loss function L of the first network modelGDiscriminant loss function L of the countermeasure network with the second network modelDD) Training by adopting a minimum and maximum game strategy, wherein the training is represented as:
Figure BDA0003346428100000053
Figure BDA0003346428100000054
and
Figure BDA0003346428100000055
and obtaining a third network model parameter by adopting an optimization algorithm of random gradient descent.
Preferably, the inputting the multi-view software test data into the third network model to obtain the prediction result includes the steps of:
inputting the multi-view software test data into a third network model to obtain sample characteristics;
inputting sample features into a classifier;
and the classifier outputs a classification result according to the sample characteristics, namely a prediction result.
A computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program, when executed by a processor, causes a computer to perform the above-mentioned method.
A multi-view software bug prediction system based on counterlearning, comprising:
an input-output device;
a processor;
a memory;
and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs causing the computer to perform the above-described method.
The method and the system have the advantages that:
(1) the nonlinear characteristics in the constraint subspace are learned by utilizing depth measurement, and the inter-view discriminant analysis loss function is designed, so that different samples among similar views are compact, different samples among heterogeneous views are far away from each other, the discriminant analysis capability among the views is improved, and the structural relationship of data among the views is effectively mined.
(2) And a discriminator is constructed, and a confrontation loss function for distinguishing different software module views in the public subspace is constructed through confrontation learning, so that the static software module view characteristics and the dynamic software module view characteristics can be effectively distinguished and distinguished under the condition of giving characteristic projection on an unknown public subspace.
(3) The total network is constructed according to the inter-view discrimination analysis loss function and the countermeasure loss function, and the structural relationship of the inter-view data can be mined on the basis of keeping the characteristic structure, so that the discrimination capability of the network model is effectively enhanced, and the classification prediction performance and the accuracy of the prediction result are improved.
Drawings
FIG. 1 is a flowchart of a method for predicting multi-view software defects based on counterlearning according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a multi-view software defect prediction system based on counterstudy.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
An embodiment of the present invention based on multi-view software defect prediction against learning is shown in fig. 1, and the flowchart includes:
constructing a first network model according to sample data of a multi-view software module, wherein the first network model is used for constructing an inter-view discriminant analysis loss function for distinguishing similar views from heterogeneous views through depth measurement learning;
constructing a second network model according to the multi-view software module sample data, wherein the second network model is used for constructing a countermeasure loss function for distinguishing different software module views in a public subspace through countermeasure learning;
constructing a third network model according to the first network model and the second network model;
and inputting the multi-view software test data into the third network model to obtain a prediction result.
In a preferred embodiment, before the first network model is constructed according to the multi-view software module sample data, the method further includes the following steps:
normalizing the metric elements of the multi-view software module;
and nonlinearly projecting the sample data of the normalized multi-view software module to a public subspace.
In a preferred embodiment, the normalizing the metric of the multi-view software module includes the steps of:
any software module sample in the software warehouse is represented as a static software module view consisting of static measurement elements and a dynamic software module view consisting of dynamic measurement elements; the static measurement element represents the attribute information of the item after the item is developed and is counted, and the dynamic measurement element represents the attribute information recorded in the development process;
and (5) enabling the measurement elements in the static software module view and the dynamic software module view to be in the same dimension, namely an interval [0,1] through a min-max normalization method.
In this embodiment, any software module sample v in the software warehouseiIs shown as
Figure BDA0003346428100000071
The vector components of the joint representation are composed,
Figure BDA0003346428100000072
corresponding sample viThe static software module view of (1) is composed of static measurement elements,
Figure BDA0003346428100000073
corresponding sample viThe dynamic software module view of (1) is composed of dynamic measurement elements. By min-max normalization method
Figure BDA0003346428100000074
And
Figure BDA0003346428100000075
is in the same dimension, i.e. the interval [0,1]]And realizing the normalization of the sample metric element.
In a preferred embodiment, the non-linearly projecting the sample data of the normalized multi-view software module to the common subspace includes the steps of:
extracting static software module view initial features from a static software module view dataset
Figure BDA0003346428100000076
Extracting dynamic software module view initial features from a dynamic software module view dataset
Figure BDA0003346428100000077
Constructing a parameter-shared two-channel network;
initializing static software module views to an initial feature
Figure BDA0003346428100000078
Inputting four-layer FNN network to obtain static software module view specific characteristics
Figure BDA0003346428100000081
Initializing dynamic software module views to features
Figure BDA0003346428100000082
Inputting four-layer FNN network to obtain specific characteristics of dynamic software module view
Figure BDA0003346428100000083
Wherein
Figure BDA0003346428100000084
A four-layer FNN network mapping function representing a view of static software modules,
Figure BDA0003346428100000085
a four-layer FNN network mapping function representing a view of the dynamic software module,
Figure BDA0003346428100000086
sharing a network parameter θFNN
In this embodiment, the data set is viewed from a static software module
Figure BDA0003346428100000087
Extracting corresponding initial features of static software module view and dynamic software module view, and recording as
Figure BDA0003346428100000088
Viewing a data set from a dynamic software module
Figure BDA0003346428100000089
Extracting corresponding initial features of static software module view and dynamic software module view, and recording as
Figure BDA00033464281000000810
Constructing a parameter-shared dual-channel network, and recording the shared network parameter as thetaFNN
Initial features from static software module views
Figure BDA00033464281000000811
Four-layer FNN network mapping function of static software module view
Figure BDA00033464281000000812
Computing specific features of static software module views
Figure BDA00033464281000000813
Initial features from dynamic software module views
Figure BDA00033464281000000814
Four-layer FNN network mapping function of dynamic software module view
Figure BDA00033464281000000815
Computing particular features of dynamic software module views
Figure BDA00033464281000000816
In a preferred embodiment, the constructing the inter-view discriminant analysis loss function for distinguishing homogeneous view from heterogeneous view through depth metric learning includes the steps of:
computing static software module views and dynamicsSoftware module views distances between corresponding sample features in a common subspace
Figure BDA00033464281000000817
Wherein S (i) represents sample features of a static software module view, and D (j) represents sample features of a dynamic software module view;
the value 1 represents that the sample class is defective, and the value 0 represents that the sample class is non-defective;
constructing an inter-view discriminant analysis loss function LG
Figure BDA00033464281000000818
Figure BDA0003346428100000091
Where the function h (t) max (0, t) represents the hinge loss function, γ is a previously set hyperparameter, τ is a previously set positive threshold, l () represents the sample class,
Figure BDA0003346428100000092
representing a static software module view of an original sample,
Figure BDA0003346428100000093
representing a dynamic software module view raw samples.
In the embodiment, depth metric learning is utilized to constrain nonlinear characteristics in a subspace, and an inter-view discriminant analysis loss function is designed to realize compactness of different samples among similar views and mutual separation of different samples among heterogeneous views, so that the capability of discriminant analysis among views is improved. The original samples of the static software module view and the dynamic software module view comprise the same measurement element and the samples with the same type belong to the same type of view, and the samples with the same measurement element and different types belong to the different type of view.
According to l2Paradigm calculation static software module views and dynamic software module views in a common subspaceThe distance between any two corresponding sample features is:
Figure BDA0003346428100000094
in the above formula (1), s (i) represents the sample characteristics of the static software module view, d (j) represents the sample characteristics of the dynamic software module view,
Figure BDA0003346428100000095
sample features mapped in a common subspace for a static software module view,
Figure BDA0003346428100000096
sample features mapped in a common subspace for a dynamic software module view;
sample classes are denoted by l (, ml), which are classified as defective (denoted by 1) and non-defective (denoted by 0);
constructing an inter-view discriminant analysis loss function LG
Figure BDA0003346428100000097
In the above formula (2), the function h (t) ═ max (0, t) represents a hinge loss function, γ is a previously set hyper-parameter, τ is a previously set positive threshold value,
Figure BDA0003346428100000101
representing a static software module view of an original sample,
Figure BDA0003346428100000102
representing a dynamic software module view raw samples.
In a preferred embodiment, the constructing of the countermeasure loss function for distinguishing different software module views in a common subspace through countermeasure learning includes the steps of:
constructing static software module view discriminators in a common subspace
Figure BDA0003346428100000103
And dynamic software module view discriminator
Figure BDA0003346428100000104
Taking the view characteristics of the static software module view as a real sample, taking the view characteristics of the dynamic software module view as a generation sample, and establishing a resistance loss function based on the static software module view:
Figure BDA0003346428100000105
wherein Pdata is the view characteristics of the static software module view in the common subspace, PG is the view characteristics of the dynamic software module view in the common subspace,
Figure BDA0003346428100000106
discriminator for static software module view
Figure BDA0003346428100000107
Network parameter of, Ex~PdataA data distribution representing a view of a static software module,
Figure BDA0003346428100000108
a data distribution representing a dynamic software module view;
taking the view characteristics of the dynamic software module view as a real sample, taking the view characteristics of the static software module as a generation sample, and establishing a resistance loss function based on the dynamic software module view:
Figure BDA0003346428100000109
Figure BDA00033464281000001010
wherein
Figure BDA00033464281000001011
Discriminator for dynamic software module view
Figure BDA00033464281000001012
The network parameter of (2);
obtaining a discriminant loss function of the countermeasure network according to the countermeasure loss function based on the static software module view and the countermeasure loss function based on the dynamic software module view:
Figure BDA00033464281000001013
Figure BDA00033464281000001014
in this embodiment, a static software module view discriminator in a common subspace is constructed
Figure BDA00033464281000001015
And dynamic software module view discriminator
Figure BDA00033464281000001016
Under the condition of giving feature projection on an unknown public subspace, judging whether the view feature of the static software module or the view feature of the dynamic software module is the view feature as much as possible;
taking the view characteristics of the static software module view as a real sample, taking the view characteristics of the dynamic software module view as a generation sample, and establishing a resistance loss function based on the static software module view:
Figure BDA0003346428100000111
pdata in the above equation (3) is the view characteristics of the static software module view in the common subspace, PG is the view characteristics of the dynamic software module view in the common subspace,
Figure BDA0003346428100000112
discriminator for static software module view
Figure BDA0003346428100000113
Network parameter of, Ex~PdataA data distribution representing a view of a static software module,
Figure BDA0003346428100000114
a data distribution representing a dynamic software module view;
taking the view characteristics of the dynamic software module view as a real sample, taking the view characteristics of the static software module as a generation sample, and establishing a resistance loss function based on the dynamic software module view:
Figure BDA0003346428100000115
in the above formula (4)
Figure BDA0003346428100000116
Discriminator for dynamic software module view
Figure BDA0003346428100000117
The network parameter of (2);
and (3) synthesizing the formula (3) and the formula (4) to obtain a discriminant loss function of the countermeasure network:
Figure BDA0003346428100000118
in a preferred embodiment, the constructing the third network model based on the first network model and the second network model is combining an inter-view discriminant analysis loss function L of the first network modelGDiscriminant loss function L of the countermeasure network with the second network modelDD) Training by adopting a minimum and maximum game strategy, wherein the training is represented as:
Figure BDA0003346428100000119
and
Figure BDA00033464281000001110
and obtaining a third network model parameter by adopting an optimization algorithm of random gradient descent.
In this embodiment, the third network model is composed of a generation model and a discrimination model, combines corresponding loss functions, that is, formula (2) and formula (5), and is trained by using the infinitesimal maximum game strategy, which is expressed as:
Figure BDA00033464281000001111
Figure BDA00033464281000001112
and obtaining the third network model parameter by adopting an optimization algorithm of random gradient descent.
In a preferred embodiment, inputting the multi-view software test data into the third network model to obtain the predicted result comprises the following steps:
inputting the multi-view software test data into a third network model to obtain sample characteristics;
inputting sample features into a classifier;
and the classifier outputs a classification result according to the sample characteristics, namely a prediction result.
In this embodiment, the multi-view software test data is input into the third network model, the sample characteristics are obtained through calculation, the sample characteristics are input into a preset classifier, such as a softmax classifier, and the classifier outputs a classification result (defective or non-defective) according to the sample characteristics, which is a prediction result.
The following describes the advantageous effects of the present invention with reference to specific experiments.
The invention performs experiments on a widely used software defect prediction public test data set AEEEM. Table 1 lists the entries contained in the AEEEM data set, and the number of samples, the proportion of defective samples, and the number of measurement units for each entry.
TABLE 1 AEEEM data set
Name of item Number of samples Proportion of defective sample (%) Number of measurement elements
EQ 324 39.81 61
JDT 997 20.66 61
LC 691 9.26 61
ML 1862 13.16 61
PDE 1497 13.96 61
Firstly, an AEEEM data set constructs a static measurement element set such as LOC (lines of code), FANIN (number of Input data) and the like according to attribute information counted after the development of the project is finished, and constructs a dynamic measurement element set such as NREV (number of videos), DELETELOC (lines delayed) and the like according to the attribute information recorded in the development process. The full name information of each item in the AEEEM data set is as follows: EQ for Equinox Framework, JDT for eclipseJDTCore, LC for Apachelcene, ML for Mylyn, and PDE for eclipseDEUI.
In this experiment, two indicators widely used in software defect prediction techniques were still used: f-measure and G-measure are used to evaluate the performance of the model. The F-measure and the G-measure are respectively calculated by the following formulas:
F-measure=2*pd*precision/(pd+precision) (8)
G-measure=(2*pd*specificity)/(pd+specificity) (9)
wherein, the statistical measurement of recall (pd) is defined as TP/(TP + FN), TP represents True Positive, FN represents False Negative; precision (Pre) is a statistical measure defined as TP/(TP + FP), which represents False Positive. The G-measure considers both recall and specificity and is the geometric mean of recall and specificity. specificity is a statistical indicator defined as TN/(TN + FP), where TN denotes True Negative. The larger the F-measure is, the better the performance of the cross-project defect prediction is.
To evaluate the performance of the invention, the following methods were chosen for comparison, respectively: (1) the method of Depth Canonical Correlation Analysis (DCCA) in the literature "Multi-view predictor: a deep model for learning surface identity and view representation" (authors Zhou Z et al); (2) NN-filter method in the document "On the relative value of cross-composition and with-composition data for defect prediction" (author Turhan B, etc.); (3) the multiview depth mesh (MvDN) method in the document "Multi-view depth network for cross-view classification" (author Kan mn et al).
To address the randomness of example selection, 5 experiments were performed randomly. Finally, the F-measure and G-measure means for each test item are reported, as shown in tables 2 and 3. As can be seen from Table 2, the prediction performances of the method are superior to those of DCCA, NN-filter and MvDN methods, and the main reasons are as follows: the DCCA method does not pay much attention to the mining of the identification information among the views; the NN-filter is used for connecting different views in series in the experimental process, then classifying the views, and not paying much attention to the relation among the views; compared with the MvDA method, the method has the advantages that the feature learning is carried out on the software module, meanwhile, the effective identification features of the views are extracted by using the countermeasure network, and the high-level semantic features of the views can be deeply mined. Therefore, the prediction performance of the method is superior to that of a comparison method, and the method is an effective software defect characteristic learning method.
TABLE 2 average F-measure values of the invention and comparison methods across various items
Figure BDA0003346428100000141
TABLE 3 average G-measure values of the inventive and comparative methods across various projects
Figure BDA0003346428100000142
A computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program, when executed by a processor, causes a computer to perform the above-mentioned method.
The multi-view software defect prediction system based on counterstudy of the embodiment of the invention has a structural schematic diagram as shown in fig. 2, and is characterized by comprising:
an input-output device;
a processor;
a memory;
and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs causing the computer to perform the above-described method.
Of course, those skilled in the art should realize that the above embodiments are only used for illustrating the present invention, and not as a limitation to the present invention, and that the changes and modifications of the above embodiments will fall within the protection scope of the present invention as long as they are within the scope of the present invention.

Claims (10)

1. A multi-view software defect prediction method based on counterstudy is characterized by comprising the following steps:
constructing a first network model according to sample data of a multi-view software module, wherein the first network model is used for constructing an inter-view discriminant analysis loss function for distinguishing similar views from heterogeneous views through depth measurement learning;
constructing a second network model according to the multi-view software module sample data, wherein the second network model is used for constructing a countermeasure loss function for distinguishing different software module views in a public subspace through countermeasure learning;
constructing a third network model according to the first network model and the second network model;
and inputting the multi-view software test data into the third network model to obtain a prediction result.
2. The method of claim 1, further comprising the steps of, before constructing the first network model from the multi-view software module sample data:
normalizing the metric elements of the multi-view software module;
and nonlinearly projecting the sample data of the normalized multi-view software module to a public subspace.
3. The method of claim 2, wherein the normalizing the metric elements of the multi-view software module comprises the steps of:
any software module sample in the software warehouse is represented as a static software module view consisting of static measurement elements and a dynamic software module view consisting of dynamic measurement elements; the static measurement element represents the attribute information of the item after the item is developed and is counted, and the dynamic measurement element represents the attribute information recorded in the development process;
and (5) enabling the measurement elements in the static software module view and the dynamic software module view to be in the same dimension, namely an interval [0,1] through a min-max normalization method.
4. The method of claim 2, wherein the non-linearly projecting the sample data of the normalized multi-view software module to the common subspace comprises the steps of:
extracting static software module view initial features from a static software module view dataset
Figure FDA0003346428090000011
Extracting dynamic software module view initial features from a dynamic software module view dataset
Figure FDA0003346428090000012
Constructing a parameter-shared two-channel network;
initializing static software module views to an initial feature
Figure FDA0003346428090000021
Inputting four-layer FNN network to obtain static software module view specific characteristics
Figure FDA0003346428090000022
Initializing dynamic software module views to features
Figure FDA0003346428090000023
Inputting four-layer FNN network to obtain specific characteristics of dynamic software module view
Figure FDA0003346428090000024
Wherein
Figure FDA0003346428090000025
A four-layer FNN network mapping function representing a view of static software modules,
Figure FDA0003346428090000026
a four-layer FNN network mapping function representing a view of the dynamic software module,
Figure FDA0003346428090000027
sharing a network parameter θFNN
5. The method for multi-view software defect prediction based on antagonistic learning according to claim 4, wherein the construction of the inter-view discriminant analysis loss function for distinguishing homogeneous view from heterogeneous view by depth metric learning comprises the steps of:
calculating distances between corresponding sample features of the static software module view and the dynamic software module view in a common subspace
Figure FDA0003346428090000028
Wherein S (i) represents sample features of a static software module view, and D (j) represents sample features of a dynamic software module view;
the value 1 represents that the sample class is defective, and the value 0 represents that the sample class is non-defective;
constructing an inter-view discriminant analysis loss function LG
Figure FDA0003346428090000029
And is
Figure FDA00033464280900000210
And is
Figure FDA00033464280900000211
Or
Figure FDA00033464280900000212
Or
Figure FDA00033464280900000213
d(s) (i), d (j))), wherein the function h (t) max (0, t) denotes the hinge loss function, γ is a previously set hyperparameter, τ is a previously set positive threshold, l (·) denotes the sample class,
Figure FDA00033464280900000214
representing a static software module view of an original sample,
Figure FDA00033464280900000215
representing a dynamic software module view raw samples.
6. The method of claim 5, wherein the constructing a countermeasure loss function for differentiating different software module views in a common subspace through countermeasure learning comprises the steps of:
constructing static software module view discriminators in a common subspace
Figure FDA00033464280900000216
And dynamic software module view discriminator
Figure FDA0003346428090000031
Taking the view characteristics of the static software module view as a real sample, taking the view characteristics of the dynamic software module view as a generation sample, and establishing a resistance loss function based on the static software module view:
Figure FDA0003346428090000032
wherein Pdata is the view characteristics of the static software module view in the common subspace, PG is the view characteristics of the dynamic software module view in the common subspace,
Figure FDA0003346428090000033
discriminator for static software module view
Figure FDA0003346428090000034
Network parameter of, Ex~PdataA data distribution representing a view of a static software module,
Figure FDA0003346428090000035
a data distribution representing a dynamic software module view;
taking the view characteristics of the dynamic software module view as a real sample, taking the view characteristics of the static software module as a generation sample, and establishing a resistance loss function based on the dynamic software module view:
Figure FDA0003346428090000036
Figure FDA0003346428090000037
wherein
Figure FDA0003346428090000038
Discriminator for dynamic software module view
Figure FDA0003346428090000039
The network parameter of (2);
obtaining a discriminant loss function of the countermeasure network according to the countermeasure loss function based on the static software module view and the countermeasure loss function based on the dynamic software module view:
Figure FDA00033464280900000310
Figure FDA00033464280900000311
7. the method of claim 6, wherein the building of the third network model from the first network model and the second network model is an inter-view integration of the first network modelDiscriminant analysis loss function LGDiscriminant loss function L of the countermeasure network with the second network modelDD) Training by adopting a minimum and maximum game strategy, wherein the training is represented as:
Figure FDA00033464280900000312
and
Figure FDA00033464280900000313
and obtaining a third network model parameter by adopting an optimization algorithm of random gradient descent.
8. The method for predicting defects in multi-view software based on antagonistic learning as claimed in claim 1, wherein the step of inputting multi-view software test data into a third network model to obtain a prediction result comprises the steps of:
inputting the multi-view software test data into a third network model to obtain sample characteristics;
inputting sample features into a classifier;
and the classifier outputs a classification result according to the sample characteristics, namely a prediction result.
9. A computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method according to any one of claims 1-8.
10. A multi-view software bug prediction system based on counterlearning, comprising:
an input-output device;
a processor;
a memory;
and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs causing the computer to perform the method of any of claims 1-8.
CN202111329931.2A 2021-11-10 2021-11-10 Multi-view software defect prediction method and system based on counterstudy Pending CN114328174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111329931.2A CN114328174A (en) 2021-11-10 2021-11-10 Multi-view software defect prediction method and system based on counterstudy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111329931.2A CN114328174A (en) 2021-11-10 2021-11-10 Multi-view software defect prediction method and system based on counterstudy

Publications (1)

Publication Number Publication Date
CN114328174A true CN114328174A (en) 2022-04-12

Family

ID=81044597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111329931.2A Pending CN114328174A (en) 2021-11-10 2021-11-10 Multi-view software defect prediction method and system based on counterstudy

Country Status (1)

Country Link
CN (1) CN114328174A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421244A (en) * 2023-11-17 2024-01-19 北京邮电大学 Multi-source cross-project software defect prediction method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421244A (en) * 2023-11-17 2024-01-19 北京邮电大学 Multi-source cross-project software defect prediction method, device and storage medium
CN117421244B (en) * 2023-11-17 2024-05-24 北京邮电大学 Multi-source cross-project software defect prediction method, device and storage medium

Similar Documents

Publication Publication Date Title
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
WO2020249125A1 (en) Method and system for automatically training machine learning model
US20180137150A1 (en) Automatic entity resolution with rules detection and generation system
WO2019015246A1 (en) Image feature acquisition
CN110458324B (en) Method and device for calculating risk probability and computer equipment
WO2023056723A1 (en) Fault diagnosis method and apparatus, and electronic device and storage medium
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
WO2022121032A1 (en) Data set division method and system in federated learning scene
WO2024067387A1 (en) User portrait generation method based on characteristic variable scoring, device, vehicle, and storage medium
CN111783903A (en) Text processing method, text model processing method and device and computer equipment
KR20230107558A (en) Model training, data augmentation methods, devices, electronic devices and storage media
CN111639230B (en) Similar video screening method, device, equipment and storage medium
CN111062431A (en) Image clustering method, image clustering device, electronic device, and storage medium
CN116610831A (en) Semanteme subdivision and modal alignment reasoning learning cross-modal retrieval method and retrieval system
CN115203550A (en) Social recommendation method and system for enhancing neighbor relation
CN114328174A (en) Multi-view software defect prediction method and system based on counterstudy
CN113343123B (en) Training method and detection method for generating confrontation multiple relation graph network
CN117807245A (en) Node characteristic extraction method and similar node searching method in network asset map
Darling et al. Toward uncertainty quantification for supervised classification
CN117371511A (en) Training method, device, equipment and storage medium for image classification model
CN116451081A (en) Data drift detection method, device, terminal and storage medium
US20230259756A1 (en) Graph explainable artificial intelligence correlation
US10990883B2 (en) Systems and methods for estimating and/or improving user engagement in social media content
CN115310606A (en) Deep learning model depolarization method and device based on data set sensitive attribute reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination