CN113705616B - Model construction method, software defect prediction method, device and electronic device - Google Patents

Model construction method, software defect prediction method, device and electronic device Download PDF

Info

Publication number
CN113705616B
CN113705616B CN202110873565.0A CN202110873565A CN113705616B CN 113705616 B CN113705616 B CN 113705616B CN 202110873565 A CN202110873565 A CN 202110873565A CN 113705616 B CN113705616 B CN 113705616B
Authority
CN
China
Prior art keywords
view
static
software
dynamic
defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110873565.0A
Other languages
Chinese (zh)
Other versions
CN113705616A (en
Inventor
韩璐
严军荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunwave Communications Co Ltd
Original Assignee
Sunwave Communications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunwave Communications Co Ltd filed Critical Sunwave Communications Co Ltd
Priority to CN202110873565.0A priority Critical patent/CN113705616B/en
Publication of CN113705616A publication Critical patent/CN113705616A/en
Application granted granted Critical
Publication of CN113705616B publication Critical patent/CN113705616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a model construction method, a software defect prediction device and an electronic device, wherein the method comprises the following steps: constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set; and performing multi-view feature learning on the static view and the dynamic view by using a network architecture of deep learning to construct a heterogeneous multi-view software defect prediction model. The application solves the problems of limited prediction effect and easy error generation of the defect prediction model constructed by utilizing static data, realizes multi-view feature learning of static view and dynamic view by utilizing a network architecture of deep learning, constructs a heterogeneous multi-view software defect prediction model, and improves the prediction effect and the prediction accuracy.

Description

Model construction method, software defect prediction method, device and electronic device
Technical Field
The present application relates to the technical field of software engineering, and in particular, to a model building method, a software defect prediction device, and an electronic device.
Background
Owing to the rapid development of computer software and hardware, people have changed greatly in clothing and food. The appearance of software products such as Taobao, mei Tuo, and ticker improves the life and production efficiency of people, and people are paying more attention to the quality of the software products, and only software products with higher reliability (with lower potential defect possibility) can be finally supported and approved by users.
According to one standard definition of IEEE standard on defects, from the inside of a product, the defects are various problems such as errors, faults and the like in the process of developing or maintaining a software product; from the outside of the product, a drawback is the failure or violation of some function that the system needs to perform. Thus, the hidden defect inside the software may cause unexpected results in actual running, the quality of the software is slightly affected, and the life and property security of people is threatened. From the perspective of software itself, team work, technical problems, etc., the occurrence of software defects is mainly determined by the characteristics of the software product and the development process, and the existence of defects is unavoidable.
Although the defects are difficult to put an end to, the defects can be analyzed and monitored, so that the purpose of reducing the defects is achieved. Software defect prediction is a technology capable of effectively mining potential defects and distribution situations of the potential defects, which may remain yet to be discovered, in software, and can enable project developers to focus on defective modules, but to mine defective modules, intrinsic properties related to the external appearance of defects need to be found, and the intrinsic properties are software measurement data, namely measurement elements.
The current software defect prediction method constructs a defect prediction model by mining static data in a software history warehouse, so as to predict defects of a new program module. Program modules may be arranged as packages, files, classes, functions, etc. according to actual test requirements. When the test resources are sufficient, the technique can be used to check whether each program module is defective; when the test resources are insufficient, the technology can reasonably allocate the resources to discover defects as much as possible. However, different types of features reflect different information of the software module, and a defect prediction model constructed by using static data only cannot fully utilize various types of features, so that the problem that the prediction effect is limited and errors are easy to generate exists.
Aiming at the problems that in the related art, a defect prediction model constructed by utilizing static data has a limited prediction effect and is easy to generate errors, an effective solution is not proposed.
Disclosure of Invention
The embodiment of the application provides a model construction method, a software defect prediction device and an electronic device, which are used for at least solving the problems that a defect prediction model constructed by static data in the related technology has a limited prediction effect and is easy to generate errors.
In a first aspect, an embodiment of the present application provides a method for constructing a model, including:
Constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;
and performing multi-view feature learning on the static view and the dynamic view by using a network architecture of deep learning to construct a heterogeneous multi-view software defect prediction model.
In some of these embodiments, the constructing a defect dataset from software defect sample data and forming a static view and a dynamic view based on the defect dataset includes:
and constructing a defect data set according to the software defect sample data, and extracting all metric meta-features corresponding to the view types of each sample in the defect data set to form the static view and the dynamic view.
In some of these embodiments, the constructing a defect dataset from software defect sample data includes:
digging and analyzing a software history warehouse, extracting a software module from the software warehouse as a sample to carry out category marking, and setting a metric element which is strongly related to the defect existence of the software module;
And generating a defect data set X= { X 1,x2,…,xn }, wherein n represents the number of samples, and the sample X i=[m1,m2,…,md represents a feature vector consisting of d metric elements according to the category labels and the metric elements.
In some of these embodiments, the extracting all metric features of each sample in the defect dataset corresponding to a view type to form the static view and the dynamic view includes:
extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the static type to form the static view;
And extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the dynamic type to form the dynamic view.
In some of these embodiments, the multi-view feature learning of the static view and the dynamic view by the network architecture of deep learning to construct a heterogeneous multi-view software defect prediction model includes:
preprocessing the static view and the dynamic view by using a normalization algorithm;
extracting high-level semantic features from the preprocessed static view and the preprocessed dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation instance pair;
And constructing a binary group constraint and a ternary group constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
In some embodiments, the extracting the high-level semantic features of the preprocessed static view and the preprocessed dynamic view to obtain the static high-level semantic features and the dynamic high-level semantic features includes:
constructing a self-coding network for the normalized static view, and learning the depth self-coding network to extract static high-level semantic features of the normalized static view;
And constructing a self-coding network for the normalized dynamic view, and learning the depth self-coding network to extract dynamic high-level semantic features of the normalized dynamic view.
In some embodiments, mapping the static high-level semantic features and the dynamic high-level semantic features to a common subspace to obtain a transformation instance pair includes:
And adopting four layers of FNNs as feature mapping, and nonlinear converting the static high-level semantic features and the dynamic high-level semantic features into a common subspace to obtain a transformation instance pair.
In some embodiments, the constructing a binary constraint and a triplet constraint according to the transformation instance pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model includes:
constructing a binary constraint according to the transformation instance pair;
Constructing a triplet constraint according to the transformation instance pair;
constructing a loss function according to the binary group constraint and the ternary group constraint;
And taking the binary group constraint and the ternary group constraint as training samples, taking relevant data in AEEEM data sets as test samples, taking the loss function as a convergence condition, and constructing a heterogeneous multi-view software defect prediction model.
In a second aspect, an embodiment of the present application provides a software defect prediction method, including:
Constructing a heterogeneous multi-view software defect prediction model according to the first aspect;
and predicting the software defects of the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
In a third aspect, an embodiment of the present application provides a model building apparatus, which is characterized by including a processing module and a learning module;
the processing module is used for constructing a defect data set according to the software defect sample data and forming a static view and a dynamic view based on the defect data set;
And the learning module is used for performing multi-view feature learning on the static view and the dynamic view by utilizing a network architecture of deep learning so as to construct a heterogeneous multi-view software defect prediction model.
In a fourth aspect, an embodiment of the present application provides a software defect prediction apparatus, including a construction module and a prediction module;
the construction module is used for constructing the heterogeneous multi-view software defect prediction model according to the first aspect;
And the prediction module is used for predicting the software defects of the software module to be analyzed by utilizing the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
In a fifth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the software defect prediction method according to the first aspect.
In a sixth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the software defect prediction method as described in the first aspect above.
Compared with the related art, the model construction method, the software defect prediction device and the electronic device provided by the embodiment of the application construct a defect data set according to software defect sample data, and form a static view and a dynamic view based on the defect data set; the static view and the dynamic view are subjected to multi-view feature learning by utilizing a deep learning network architecture so as to construct a heterogeneous multi-view software defect prediction model, the problem that the defect prediction model constructed by utilizing static data has limited prediction effect and is easy to generate errors is solved, the multi-view feature learning of the static view and the dynamic view by utilizing the deep learning network architecture is realized, the heterogeneous multi-view software defect prediction model is constructed, and the prediction effect and the prediction accuracy are improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
Fig. 1 is a hardware block diagram of a terminal device of a model building method according to an embodiment of the present application;
FIG. 2 is a flow chart of a model building method according to an embodiment of the present application;
Fig. 3 is a flowchart of step S220 in fig. 2;
FIG. 4 is a block diagram of a model building apparatus according to an embodiment of the present application;
FIG. 5 is a flowchart of a software defect prediction method according to an embodiment of the present application;
fig. 6 is a block diagram of a software defect predicting apparatus according to an embodiment of the present application.
In the figure: 210. a processing module; 220. a learning module; 510. constructing a module; 520. and a prediction module.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means greater than or equal to two. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The method embodiment provided in this embodiment may be executed in a terminal, a computer or a similar computing device. Taking the operation on the terminal as an example, fig. 1 is a block diagram of the hardware structure of the terminal of the model building method according to the embodiment of the present application. As shown in fig. 1, the terminal 10 may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting on the structure of the terminal described above. For example, the terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a model building method in an embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
It should be noted that, the software defect prediction method embodiment may also be executed in a terminal, a computer, or a similar computing device. The structure may be similar to that of fig. 1 and is not discussed here.
The present embodiment provides a model building method, fig. 2 is a flowchart of the model building method according to the embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
Step S210, a defect data set is constructed according to software defect sample data, and a static view and a dynamic view are formed based on the defect data set;
And S220, performing multi-view feature learning on the static view and the dynamic view by utilizing a network architecture of deep learning so as to construct a heterogeneous multi-view software defect prediction model.
It should be noted that the defect data set is constructed based on software defect sample data. The software defect sample data is the category and attribute of the software module in the software history warehouse. The categories of the software modules are divided into two categories of defective modules and nondefective modules in advance; such as: defective modules are marked 1; the defect-free block is marked 0. The attribute of the software module may be obtained by measuring the software module by using methods such as McCabe measurement, halstead measurement, and the like, and may also be called a measuring element. Then each sample vector x i in the defect dataset may be defined byView vector sum/>View vector joint representation. Wherein/>The view vector set is the static view.The view vector set is the dynamic view. And then, utilizing a deep learning network architecture to learn multi-view characteristics of the static view and the dynamic view (category and attribute), and constructing a heterogeneous multi-view software defect prediction model.
Through the steps, the defect prediction model constructed by utilizing static data is solved, the problems of limited prediction effect and easy error generation are solved, multi-view feature learning is carried out on the static view and the dynamic view by utilizing a network architecture of deep learning, meanwhile, effective discrimination features in the view are considered, and the correlation between the views is considered; compared with a defect prediction model constructed by using static data, the prediction effect and the prediction accuracy can be improved.
The specific construction process of the heterogeneous multi-view software defect prediction model is described in detail below.
In some of these embodiments, step S210 includes the following steps;
and constructing a defect data set according to the software defect sample data, and extracting all metric meta-features corresponding to the view types of each sample in the defect data set to form a static view and a dynamic view.
The specific process of constructing the defect data set according to the software defect sample data is as follows:
step S211, a software history warehouse is mined and analyzed, a software module is extracted from the software warehouse to be used as a sample for category marking, and a measuring element which is strongly related to the defect existence of the software module is set;
Step S212, generating a defect data set X= { X 1,x2,…,xn } according to the category marks and the metric elements, wherein n represents the number of samples, and the samples X i=[m1,m2,…,md ]; where m d represents the d-th metric element. Sample x i represents a feature vector consisting of d metric elements.
That is, constructing a defect dataset from software defect sample data has a sample labeling stage, a feature extraction stage, and a construction defect dataset stage.
Sample marking stage: a software history repository is mined and analyzed, and software modules are extracted from the software repository as samples for class marking, 0 (representing non-defective modules) or 1 (representing defective modules).
Feature extraction: software defect prediction techniques may focus project developers on those modules that are defective, but to mine out defective modules, it is necessary to find out intrinsic properties that are related to the external appearance of the defect, which are software metrology data, i.e., metrology elements. The attribute of the software module may be obtained by measuring the software module by using McCabe measurement, halstead measurement and the like.
A stage of constructing a defect data set: a defect dataset is created for each program module based on metric elements (e.g., mcCabe metrics, halstead metrics, etc.). Specifically, assume that a defect dataset x= { X 1,x2,…,xn }, where n represents the number of samples, and sample X i=[m1,m2,…,md represents a feature vector composed of d metric elements. The defect dataset is then a set of samples and feature vectors corresponding to the metrics.
The specific process of extracting all metric meta-features corresponding to the view types from each sample in the defect data set to form the static view and the dynamic view is as follows:
step S213, extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the static type to form a static view;
Step S214, extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the dynamic type to form a dynamic view.
In particular, as previously mentioned, it may be considered a multiview formation stage: extracting all metric element characteristics corresponding to the current view type according to the static type for all metric elements in each sample x i to form a static view; wherein a static (static) view can be expressed as:
And extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the dynamic (dynamic) type to form a dynamic (dynamic) view. Wherein a dynamic (dynamic) view may be represented as
In some of these embodiments, as shown in fig. 3, step S220 includes the following steps;
step S221, preprocessing the static view and the dynamic view by using a normalization algorithm;
Step S222, extracting high-level semantic features from the preprocessed static view and the preprocessed dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
Step S223, mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation instance pair;
and step 224, constructing a binary group constraint and a ternary group constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
In step S221, the specific process of preprocessing the static view and the dynamic view by using the normalization algorithm is as follows:
The static and dynamic views are normalized using a normalization algorithm, which may be specifically a minimum maximum normalization algorithm, that converts all values into intervals [0,1]. For either one of the metrics in the static and dynamic views, assuming a i, the normalized value The method comprises the following steps:
Wherein a i represents the ith metric element in the sample of the static view; max (a) represents the maximum value of a, and min (a) represents the minimum value of a. Preprocessing the metric elements can reduce the range of various metric element values in the static view and the dynamic view so as to normalize the data.
Because each sample in the static view and the dynamic view is a vector set formed by different metric elements, in order to keep the dimension of each view consistent, and simultaneously, the structural information of the sample is remained maximally, a self-coding network is adopted to replace the traditional PCA method to reduce the dimension of each sample in the static view and the dynamic view. Therefore, in step S222, high-level semantic feature extraction is performed on the preprocessed static view and dynamic view, and the specific process of obtaining the static high-level semantic feature and the dynamic high-level semantic feature is as follows:
constructing a self-coding network for the normalized static view, and learning the depth self-coding network to extract the static high-level semantic features of the normalized static view; specifically, it is provided with Representing the ith sample after normalization processing in the static view, and constructing a self-coding network/> for the static viewAnd will self-encode the network/>Coding layer correspondence sample/>Output characteristics of (a) are denoted as/> Self-encoding networkCorresponding sample/>Output characteristics of (a) are denoted as/>Where θ (static) represents the depth self-coding network/>, of the static viewIs a parameter of (a).
And constructing a self-coding network for the normalized dynamic view, and learning the depth self-coding network to extract dynamic high-level semantic features of the normalized dynamic view. Specifically, it is provided withRepresenting the ith normalized sample in the dynamic view, and constructing a self-coding network/>, for the dynamic viewAnd will/>Coding layer corresponds to sample/>Output characteristics of (a) are denoted as/> Will/>Corresponding sample/>Output characteristics of (a) are denoted as/>Where θ (dynamic) represents the depth self-coding network of the dynamic view/>Is a parameter of (a).
The normalized static high-level semantic features F (static) and dynamic high-level semantic features F (dynamic) can be extracted respectively through learning of the respective depth self-coding networks. The dimensions of each view can be kept consistent as previously described while maximally preserving the structural information of the sample.
Since most machine learning modeling techniques are built on a hypothetical condition: the similarity distribution assumption. However, the static view and the dynamic view of the present application describe the data of a software module from the angles of static measurement and dynamic measurement, and a large data distribution difference exists between views. Therefore, step S223 maps the static high-level semantic features and the dynamic high-level semantic features to the common subspace, and the specific process of obtaining the transformation instance pair is as follows:
And adopting four layers of FNNs as feature mapping, and nonlinear converting the static high-level semantic features and the dynamic high-level semantic features into a common subspace to obtain a transformation instance pair.
Specifically, four-layer FNN is used as feature mapping, and the static high-level semantic features F (static) and the dynamic high-level semantic features F (dynamic) are converted into a common subspace in a nonlinear mode. Provided in the common subspace, the static high-level semantic features F (static) are mapped to F (S)=f(S)(F(static)S), the dynamic high-level semantic features F (dynamic) are mapped to F (D)=f(D)(F(dynamic)D), where F (S)(F(static)S) and T (D)(F(dynamic)D) are mapping functions, θ S and θ D are tunable parameters, and F (s) and F (D) are referred to as transform instance pairs in the common space. As previously described, a common subspace is learned to minimize the distance between views, reducing data distribution differences, while maintaining the data structure characteristics of the original views. And the transformed features can obtain the maximum correlation, so that the problem of data distribution difference is well solved.
Step S224, constructing a binary group constraint and a ternary group constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model, wherein the specific process is as follows:
constructing a binary constraint according to the transformation instance pair;
Constructing a triplet constraint according to the transformation instance pair;
constructing a loss function according to the binary group constraint and the ternary group constraint;
And taking the binary group constraint and the ternary group constraint as training samples, taking relevant data in AEEEM data sets as test samples, taking a loss function as a convergence condition, and constructing the heterogeneous multi-view software defect prediction model.
First, a two-tuple constraint is constructed to transform instance pairs (F i (S),Fi (D)) for two different views of the same sample in a common space, minimizing the distance between the transform instance pairs when feature learning. The L 1 norm is used to calculate the distance between the mapping instances F i (S),Fi (D) after feature mapping, expressed as:
L1(Fi (S),Fi (D))=||f(S)(Fi (static)S)-f(D)(Fi (dynamic)D)||2 (1).
Where f (S)(F(static)S) and f (D)(F(dynamic)D) are mapping functions, and θ S and θ D are adjustable parameters.
And secondly, constructing a triplet constraint to further enhance the intra-class relation of the mapping instance, namely that the similar samples are close and the different samples are far away in the public space. Construction of triplesWherein F i (S) is selected as the static view feature vector of the anchor point,/>Is a view feature vector from a dynamic view and having the same label as F i (S),/>Is a view feature vector from a dynamic view and having a different label than F i (S). The L 2 norm is used to calculate the post-feature map mapping instance F i (S),/>The distance between them, expressed as:
The L 3 norm is used to calculate the post-feature map mapping instance F i (S), The distance between them, expressed as:
Again, according to equations (1), (2), and (3), the loss function of the global feature constraint is defined as follows:
L=max(L3-αL1-βL2) (4);
wherein α and β are hyper-parameters.
TABLE 1
It should be noted that AEEEM is a public defect data set, as shown in table 1, the metric element in AEEEM data set is divided into two views according to a static metric element and a dynamic metric element, i.e. a static view and a dynamic view are obtained, and the static view and the dynamic view are used as test samples. AEEEM the dataset consisted of items Equinox Framework (EQ), eclipseJDTCore (JDT), apacheLucene (LC), mylyn (ML) and EclipsePDEUI (PDE). It has 61 metrics including Chidamber and Kemerer (CK) metrics, object-oriented metrics, source code metrics churn, source code metrics entropy, etc.
And finally, learning the model according to the training sample, the test sample and the loss function to obtain the heterogeneous multi-view software defect prediction model.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment also provides a model building device, which is used for implementing the above embodiment and the preferred implementation manner, and is not described in detail. As used below, the terms "module," "unit," "sub-unit," and the like may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 4 is a block diagram of a model construction apparatus according to an embodiment of the present application, as shown in fig. 4, including: a processing module 210 and a learning module 220;
A processing module 210, configured to construct a defect data set according to the software defect sample data, and form a static view and a dynamic view based on the defect data set;
the learning module 220 is configured to perform multi-view feature learning on the static view and the dynamic view by using the network architecture of deep learning, so as to construct a heterogeneous multi-view software defect prediction model.
The model construction device solves the problems that the defect prediction model constructed by utilizing static data has limited prediction effect and is easy to generate errors, realizes multi-view feature learning on static views and dynamic views by utilizing a network architecture of deep learning, constructs a heterogeneous multi-view software defect prediction model, and improves the prediction effect and the prediction accuracy.
In one embodiment, the processing module 210 is further configured to mine and analyze a software history repository, extract a software module from the software repository as a sample for category marking, and set a metric element that is strongly related to the existence of a defect in the software module;
Generating a defect data set X= { X 1,x2,…,xn }, wherein n represents the number of samples, and sample X i=[m1,m2,…,md represents a feature vector consisting of d metric elements according to the category labels and the metric elements;
Extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the static type to form a static view;
And extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the dynamic type to form a dynamic view.
In one embodiment, the learning module 220 is further configured to pre-process the static view and the dynamic view using a normalization algorithm;
extracting high-level semantic features from the preprocessed static view and the preprocessed dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation instance pair;
And constructing a binary group constraint and a ternary group constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.
The embodiment also provides a software defect prediction method, and fig. 5 is a flowchart of a model building method according to an embodiment of the present application, as shown in fig. 5, where the flowchart includes the following steps:
s510, constructing a heterogeneous multi-view software defect prediction model by using the model construction method;
And step S520, performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
The software defect prediction method can improve the prediction effect and the prediction accuracy.
Specifically, four software defect prediction methods are selected based on AEEEM data sets, compared with the method, and the four selected software defect prediction methods are respectively as follows: manualDown method, NN-filter method, SSTCA +ISDA method, and CKSDL method.
The batch size on all data sets was set to 64, and the super parameters α and β were set to 0.01 and 0.1, respectively. F-measure and G-measure evaluation indexes are used. The F-measure index is evaluated by combining pd with precision, i.e., F-measure=2×pd×precision/(pd+precision). The G-measure index considers recall and specificity simultaneously, and is a geometric average of recall and specificity. specificity is a statistical indicator defined as TN/(TN+FP). Wherein TN represents True Negative. G-measure is defined as 2. Times.pd. Specificity/(pd+specificity). Then a larger F-measure indicates better performance across project defect predictions.
TABLE 2
As shown in Table 2, F-Meaure and Pd are listed for cross-project defect prediction on AEEEM datasets for the present and comparative methods. As can be seen from Table 2, the cross-project defect prediction performance of the present application is superior to ManualDown, NN-filter, SSTCA + ISDA and CKSDL methods. The application constructs a view set by carrying out static measurement and dynamic measurement on a software module, namely carrying out feature learning on a defect data set from the angle of multi-view learning, and simultaneously considering effective identification features in views and correlation among views; the comparison method belongs to a single-view learning method, and does not pay much attention to the relationship between views. Therefore, the prediction performance of the application is superior to that of a comparison method, and the method is an effective software defect characteristic learning method.
The present embodiment also provides a software defect prediction apparatus, as shown in fig. 6, which includes a construction module 510 and a prediction module 520;
A construction module 510, configured to construct a heterogeneous multi-view software defect prediction model according to the model construction method described above;
and the prediction module 520 is configured to predict the software defects of the software module to be analyzed by using the heterogeneous multi-view software defect prediction model, so as to obtain a prediction result of the software module to be analyzed.
The software defect prediction device can improve the prediction effect and the prediction accuracy.
The present embodiment also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;
s2, performing multi-view feature learning on the static view and the dynamic view by using a network architecture of deep learning to construct a heterogeneous multi-view software defect prediction model.
Alternatively, in this embodiment, the above-mentioned processor may be configured to further execute the following steps by a computer program:
S3, constructing a heterogeneous multi-view software defect prediction model in the model construction method;
S4, performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and this embodiment is not repeated herein.
In addition, in combination with the software defect prediction method in the above embodiment, the embodiment of the present application may be implemented by providing a storage medium. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any one of the model building methods and any one of the software defect prediction methods of the above embodiments.
It should be understood by those skilled in the art that the technical features of the above-described embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above-described embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (12)

1. A method of modeling, comprising:
Constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;
Performing multi-view feature learning on the static view and the dynamic view by using a deep-learning network architecture to construct a heterogeneous multi-view software defect prediction model, wherein the preprocessing is performed on the static view and the dynamic view by using a normalization algorithm;
extracting high-level semantic features from the preprocessed static view and the preprocessed dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation instance pair;
And constructing a binary group constraint and a ternary group constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
2. The model construction method according to claim 1, wherein constructing a defect dataset from software defect sample data and forming a static view and a dynamic view based on the defect dataset, comprises:
and constructing a defect data set according to the software defect sample data, and extracting all metric meta-features corresponding to the view types of each sample in the defect data set to form the static view and the dynamic view.
3. The model construction method according to claim 2, wherein constructing a defect dataset from software defect sample data comprises:
digging and analyzing a software history warehouse, extracting a software module from the software warehouse as a sample to carry out category marking, and setting a metric element which is strongly related to the defect existence of the software module;
And generating a defect data set X= { X 1,x2,…,xn }, wherein n represents the number of samples, and the sample X i=[m1,m2,…,md represents a feature vector consisting of d metric elements according to the category labels and the metric elements.
4. A model building method according to claim 3, wherein said extracting all metric features of each sample in the defect dataset corresponding to a view type to form the static view and the dynamic view comprises:
extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the static type to form the static view;
And extracting all metric element characteristics corresponding to the current view type from all metric elements in each sample x i according to the dynamic type to form the dynamic view.
5. The method for constructing a model according to claim 1, wherein the extracting the high-level semantic features of the preprocessed static view and dynamic view to obtain the static high-level semantic features and dynamic high-level semantic features includes:
constructing a self-coding network for the normalized static view, and learning the depth self-coding network to extract static high-level semantic features of the normalized static view;
And constructing a self-coding network for the normalized dynamic view, and learning the depth self-coding network to extract dynamic high-level semantic features of the normalized dynamic view.
6. The method for constructing a model according to claim 1, wherein mapping the static high-level semantic features and the dynamic high-level semantic features to a common subspace to obtain a transformation instance pair comprises:
And adopting four layers of FNNs as feature mapping, and nonlinear converting the static high-level semantic features and the dynamic high-level semantic features into a common subspace to obtain a transformation instance pair.
7. The method of claim 1, wherein constructing the binary constraint and the triplet constraint from the transformation instance pair and performing multi-view feature learning to construct the heterogeneous multi-view software defect prediction model comprises:
constructing a binary constraint according to the transformation instance pair;
Constructing a triplet constraint according to the transformation instance pair;
constructing a loss function according to the binary group constraint and the ternary group constraint;
And taking the binary group constraint and the ternary group constraint as training samples, taking relevant data in AEEEM data sets as test samples, taking the loss function as a convergence condition, and constructing a heterogeneous multi-view software defect prediction model.
8. A method for predicting a software defect, comprising:
constructing a heterogeneous multi-view software defect prediction model according to any of claims 1 to 7;
and predicting the software defects of the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
9. The model construction device is characterized by comprising a processing module and a learning module;
the processing module is used for constructing a defect data set according to the software defect sample data and forming a static view and a dynamic view based on the defect data set;
the learning module is used for performing multi-view feature learning on the static view and the dynamic view by using a network architecture of deep learning so as to construct a heterogeneous multi-view software defect prediction model, and comprises the steps of preprocessing the static view and the dynamic view by using a normalization algorithm; extracting high-level semantic features from the preprocessed static view and the preprocessed dynamic view to obtain static high-level semantic features and dynamic high-level semantic features; mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation instance pair; and constructing a binary group constraint and a ternary group constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
10. The software defect prediction device is characterized by comprising a construction module and a prediction module;
The construction module is used for constructing the heterogeneous multi-view software defect prediction model according to any one of claims 1 to 7;
And the prediction module is used for predicting the software defects of the software module to be analyzed by utilizing the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
11. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the software defect prediction method of any one of claims 1 to 8; the processor is further arranged to run the computer program to perform the software defect prediction method as claimed in claim 8.
12. A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the software defect prediction method of any one of claims 1 to 7 at run-time; the computer program is arranged to execute, when executed, the software defect prediction method as claimed in claim 8.
CN202110873565.0A 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device Active CN113705616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110873565.0A CN113705616B (en) 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110873565.0A CN113705616B (en) 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device

Publications (2)

Publication Number Publication Date
CN113705616A CN113705616A (en) 2021-11-26
CN113705616B true CN113705616B (en) 2024-05-10

Family

ID=78651147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110873565.0A Active CN113705616B (en) 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device

Country Status (1)

Country Link
CN (1) CN113705616B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820539B (en) * 2023-08-30 2023-11-10 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103472078A (en) * 2013-09-26 2013-12-25 徐州工程学院 Automatic correspondence method of batch defects of multi-view images
US8819617B1 (en) * 2013-09-19 2014-08-26 Fmr Llc System and method for providing access to data in a plurality of software development systems
CN105138913A (en) * 2015-07-24 2015-12-09 四川大学 Malware detection method based on multi-view ensemble learning
CN107885607A (en) * 2017-10-20 2018-04-06 北京航空航天大学 One kind is based on built-in system software multi views hazard model and its modeling method
CN108459955A (en) * 2017-09-29 2018-08-28 重庆大学 Software Defects Predict Methods based on depth autoencoder network
CN108710576A (en) * 2018-05-30 2018-10-26 浙江工业大学 Data set extending method and Software Defects Predict Methods based on isomery migration
CN110659207A (en) * 2019-09-02 2020-01-07 北京航空航天大学 Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN112015659A (en) * 2020-09-02 2020-12-01 三维通信股份有限公司 Prediction method and device based on network model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11159547B2 (en) * 2017-08-03 2021-10-26 International Business Machines Corporation Malware clustering approaches based on cognitive computing techniques

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8819617B1 (en) * 2013-09-19 2014-08-26 Fmr Llc System and method for providing access to data in a plurality of software development systems
CN103472078A (en) * 2013-09-26 2013-12-25 徐州工程学院 Automatic correspondence method of batch defects of multi-view images
CN105138913A (en) * 2015-07-24 2015-12-09 四川大学 Malware detection method based on multi-view ensemble learning
CN108459955A (en) * 2017-09-29 2018-08-28 重庆大学 Software Defects Predict Methods based on depth autoencoder network
CN107885607A (en) * 2017-10-20 2018-04-06 北京航空航天大学 One kind is based on built-in system software multi views hazard model and its modeling method
CN108710576A (en) * 2018-05-30 2018-10-26 浙江工业大学 Data set extending method and Software Defects Predict Methods based on isomery migration
CN110659207A (en) * 2019-09-02 2020-01-07 北京航空航天大学 Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN112015659A (en) * 2020-09-02 2020-12-01 三维通信股份有限公司 Prediction method and device based on network model

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A survey of deep learning: Platforms, applications and emerging research trends;Hatcher W G等;《IEEE Access》;第6卷;24411-24432 *
Android malware detection via an app similarity graph;Frenklach T等;《Computers & Security》;第109卷;1-16 *
Android恶意软件检测系统的研究与实现;谭炜;《中国优秀硕士学位论文全文数据库信息科技辑》(第5期);I138-202 *
Convolutional neural networks over control flow graphs for software defect prediction;Phan A V等;《2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI)》;45-52 *
基于动静态分析的安卓恶意软件检测;陈锐;《中国优秀硕士学位论文全文数据库信息科技辑》(第7期);I138-78 *
基于深度自编码网络的软件缺陷预测方法;周末等;《计算机工程与科学》;第40卷(第10期);1796-1804 *
基于混合特征的深度自编码器的恶意软件家族分类;谭杨等;《信息网络安全》;第20卷(第12期);72-82 *

Also Published As

Publication number Publication date
CN113705616A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
US10361926B2 (en) Link prediction with spatial and temporal consistency in dynamic networks
CN109783582B (en) Knowledge base alignment method, device, computer equipment and storage medium
Guo et al. Supplier selection based on hierarchical potential support vector machine
US8583586B2 (en) Mining temporal patterns in longitudinal event data using discrete event matrices and sparse coding
CN109934249A (en) Data processing method, device, medium and calculating equipment
CN110362660A (en) A kind of Quality of electronic products automatic testing method of knowledge based map
US11574145B2 (en) Cross-modal weak supervision for media classification
CN113011282A (en) Graph data processing method and device, electronic equipment and computer storage medium
Lu et al. Data-driven floor plan understanding in rural residential buildings via deep recognition
CN112989059A (en) Method and device for identifying potential customer, equipment and readable computer storage medium
CN115858796A (en) Fault knowledge graph construction method and device
Wu et al. Sharing deep neural network models with interpretation
CN113705616B (en) Model construction method, software defect prediction method, device and electronic device
CN111209317A (en) Knowledge graph abnormal community detection method and device
CN116601626A (en) Personal knowledge graph construction method and device and related equipment
CN116129286A (en) Method for classifying graphic neural network remote sensing images based on knowledge graph
CN114329099B (en) Overlapping community identification method, device, equipment, storage medium and program product
CN116206201A (en) Monitoring target detection and identification method, device, equipment and storage medium
Asghar et al. Automated data mining techniques: A critical literature review
Li et al. Semantic analysis of literary vocabulary based on microsystem and computer aided deep research
CN113190730A (en) Method and device for classifying block chain addresses
CN112199280A (en) Defect prediction method and apparatus, storage medium, and electronic apparatus
Chen et al. Multi-view robust discriminative feature learning for remote sensing image with noisy labels
CN112947928A (en) Code evaluation method and device, electronic equipment and storage medium
Steever et al. A graph-based approach for relating integer programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant