CN113705616A - Model construction method, software defect prediction device and electronic device - Google Patents

Model construction method, software defect prediction device and electronic device Download PDF

Info

Publication number
CN113705616A
CN113705616A CN202110873565.0A CN202110873565A CN113705616A CN 113705616 A CN113705616 A CN 113705616A CN 202110873565 A CN202110873565 A CN 202110873565A CN 113705616 A CN113705616 A CN 113705616A
Authority
CN
China
Prior art keywords
view
software
static
dynamic
defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110873565.0A
Other languages
Chinese (zh)
Other versions
CN113705616B (en
Inventor
韩璐
严军荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunwave Communications Co Ltd
Original Assignee
Sunwave Communications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunwave Communications Co Ltd filed Critical Sunwave Communications Co Ltd
Priority to CN202110873565.0A priority Critical patent/CN113705616B/en
Publication of CN113705616A publication Critical patent/CN113705616A/en
Application granted granted Critical
Publication of CN113705616B publication Critical patent/CN113705616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a model construction method, a software defect prediction device and an electronic device, wherein the method comprises the following steps: constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set; and performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model. By the method and the device, the problems that a defect prediction model constructed by using static data is limited in prediction effect and easy to generate errors are solved, multi-view feature learning is performed on a static view and a dynamic view by using a deep learning network architecture, so that a heterogeneous multi-view software defect prediction model is constructed, and the prediction effect and the prediction accuracy are improved.

Description

Model construction method, software defect prediction device and electronic device
Technical Field
The present application relates to the field of software engineering technologies, and in particular, to a model construction method, a software defect prediction apparatus, and an electronic apparatus.
Background
Thanks to the rapid development of computer software and hardware, people's clothes and eating habits have changed greatly. The advent of software products such as Taobao, Mei Tuo, and ticker has increased the efficiency of life and production, and people are also paying more attention to the quality of software products, so that only software products with higher reliability (lower potential for defects) can be supported and approved by users.
According to the standard definition of IEEE standard on defects, from the inside of a product, the defects are various problems such as errors, faults and the like in the development or maintenance process of a software product; a defect is a failure or violation of some function that the system needs to implement, as viewed from outside the product. Therefore, the hidden defect inside the software may cause unexpected results in actual operation, slightly affect the software quality and seriously threaten the safety of people's lives and properties. From the perspectives of software itself, team work, technical problems and the like, the generation of software defects is mainly determined by the characteristics and the development process of software products, and the defects are inevitable.
Although the defects are difficult to stop, the defects can be analyzed and monitored, and the purpose of reducing the defects is achieved. Software defect prediction is a technology capable of effectively mining potential defects and distribution situations thereof which may be left in software but not discovered yet, and the technology can make project developers focus on the defective modules, but to mine the defective modules, it is necessary to find out intrinsic attributes which have a relationship with the appearance of the defects, and the attributes are software measurement data, namely measurement elements.
The existing software defect prediction method builds a defect prediction model by mining static data in a software historical warehouse, so as to predict the defects of a new program module. The program modules may be arranged into packages, files, classes, or functions, etc. according to actual test requirements. When the test resources are sufficient, the technique can be used to check each program module for defects; when the testing resources are insufficient, the resources can be reasonably distributed by the technology to generate defects as many as possible. However, different types of features reflect different information of software modules, and a defect prediction model constructed by only using static data cannot fully utilize various types of features, so that the problems of limited prediction effect and easy error generation exist.
At present, aiming at the problems that in the related technology, a defect prediction model constructed by utilizing static data has limited prediction effect and is easy to generate errors, an effective solution is not provided.
Disclosure of Invention
The embodiment of the application provides a model construction method, a software defect prediction device and an electronic device, and aims to at least solve the problems that a defect prediction model constructed by using static data in the related technology has limited prediction effect and is easy to generate errors.
In a first aspect, an embodiment of the present application provides a model building method, including:
constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;
and performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
In some embodiments, the constructing a defect data set according to software defect sample data and forming a static view and a dynamic view based on the defect data set includes:
and constructing a defect data set according to the software defect sample data, and extracting all measurement meta-features corresponding to the view type of each sample in the defect data set to form the static view and the dynamic view.
In some embodiments, the constructing the defect data set according to the software defect sample data includes:
mining and analyzing a software historical warehouse, extracting a software module from the software warehouse as a sample for carrying out category marking, and setting a measurement element which is strongly related to the defects of the software module;
generating a defect data set X ═ X from the class labels and the metric elements1,x2,…,xnWhere n denotes the number of samples, sample xi=[m1,m2,…,md]Representing a feature vector consisting of d metric elements.
In some embodiments, the extracting all the metric features corresponding to the view type for each sample in the defect data set to form the static view and the dynamic view includes:
for each sample xiExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form the static view;
for each sample xiAccording to the dynamic type, all the metric element characteristics corresponding to the current view type are extracted to form the dynamic view.
In some embodiments, the performing, by using the deep learning network architecture, multi-view feature learning on the static view and the dynamic view to construct a heterogeneous multi-view software defect prediction model includes:
preprocessing the static view and the dynamic view by utilizing a normalization algorithm;
performing high-level semantic feature extraction on the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation example pair;
and constructing a binary constraint and a triple constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
In some embodiments, the extracting high-level semantic features from the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features includes:
constructing a self-coding network for the normalized static view, and learning a deep self-coding network to extract static high-level semantic features of the normalized static view;
and constructing a self-coding network for the normalized dynamic view, and learning a deep self-coding network to extract the dynamic high-level semantic features of the normalized dynamic view.
In some embodiments, the mapping the static high-level semantic features and the dynamic high-level semantic features to a common subspace to obtain a transformation instance pair includes:
and adopting four layers of FNN as feature mapping, and converting the static high-level semantic features and the dynamic high-level semantic features into a public subspace in a nonlinear manner to obtain a transformation example pair.
In some embodiments, the constructing a binary constraint and a triple constraint according to the transformation instance pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model includes:
constructing a binary constraint according to the transformation instance pair;
constructing a triplet constraint according to the transformation instance pair;
constructing a loss function according to the binary constraint and the triple constraint;
and constructing a heterogeneous multi-view software defect prediction model by taking the binary constraint and the triple constraint as training samples, taking relevant data in an AEEEM data set as test samples and taking the loss function as a convergence condition.
In a second aspect, an embodiment of the present application provides a software defect prediction method, including:
constructing a heterogeneous multi-view software defect prediction model according to the first aspect;
and performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
In a third aspect, an embodiment of the present application provides a model building apparatus, which is characterized by comprising a processing module and a learning module;
the processing module is used for constructing a defect data set according to software defect sample data and forming a static view and a dynamic view based on the defect data set;
the learning module is used for performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
In a fourth aspect, an embodiment of the present application provides a software defect prediction apparatus, including a construction module and a prediction module;
the building module is used for building the heterogeneous multi-view software defect prediction model according to the first aspect;
the prediction module is used for performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
In a fifth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the software defect prediction method according to the first aspect.
In a sixth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the software defect prediction method according to the first aspect.
Compared with the related art, the model construction method, the software defect prediction method, the device and the electronic device provided by the embodiment of the application construct the defect data set according to the software defect sample data and form the static view and the dynamic view based on the defect data set; the method has the advantages that the deep learning network architecture is utilized to conduct multi-view feature learning on the static view and the dynamic view so as to construct a heterogeneous multi-view software defect prediction model, the problems that the defect prediction model constructed by utilizing static data is limited in prediction effect and prone to generating errors are solved, the deep learning network architecture is utilized to conduct multi-view feature learning on the static view and the dynamic view so as to construct the heterogeneous multi-view software defect prediction model, and the prediction effect and the prediction accuracy are improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a terminal device according to a model building method provided in an embodiment of the present application;
FIG. 2 is a flow chart of a model building method provided by an embodiment of the present application;
FIG. 3 is a flowchart of step S220 in FIG. 2;
fig. 4 is a block diagram of a model building apparatus according to an embodiment of the present application;
FIG. 5 is a flowchart of a software bug prediction method according to an embodiment of the present application;
fig. 6 is a block diagram illustrating a software defect prediction apparatus according to an embodiment of the present application.
In the figure: 210. a processing module; 220. a learning module; 510. building a module; 520. and a prediction module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The method provided by the embodiment can be executed in a terminal, a computer or a similar operation device. Taking an example of the operation on a terminal, fig. 1 is a hardware structure block diagram of the terminal of the model construction method according to the embodiment of the present application. As shown in fig. 1, the terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the model building method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
It should be noted that the software defect prediction method embodiment may also be executed in a terminal, a computer, or a similar computing device. The structure may be similar to that of fig. 1 and will not be discussed herein.
The present embodiment provides a model building method, and fig. 2 is a flowchart of the model building method according to the embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
s210, constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;
and S220, performing multi-view feature learning on the static view and the dynamic view by using the deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
The defect data set is constructed based on software defect sample data. The software defect sample data is the category and the attribute of the software module in the software historical warehouse. The software module is divided into two categories of a defective module and a non-defective module; such as: the defective module is marked 1; the non-defective module is labeled 0. The attribute of the software module may be measured by using methods such as McCabe measurement, Halstead measurement, and the like, to obtain the attribute of the software module, which may also be referred to as a measurement element. Then each sample vector x in the defect datasetiCan be composed of
Figure BDA0003189550850000061
View vector sum
Figure BDA0003189550850000062
And (5) view vector joint representation. Wherein the content of the first and second substances,
Figure BDA0003189550850000063
the set of view vectors is a static view.
Figure BDA0003189550850000064
The set of view vectors is the dynamic view. And then multi-view feature learning is carried out on the static view and the dynamic view (category and attribute) by utilizing a deep learning network architecture, so as to construct a heterogeneous multi-view software defect prediction model.
Through the steps, the problems that a defect prediction model constructed by using static data has limited prediction effect and is easy to generate errors are solved, multi-view feature learning of a static view and a dynamic view is realized by using a deep learning network architecture, effective identification features in the views are considered, and correlation among the views is also considered; compared with a defect prediction model constructed by using static data, the method can improve the prediction effect and the prediction accuracy.
The specific construction process of the heterogeneous multi-view software defect prediction model is described in detail below.
In some of these embodiments, step S210 includes the following steps;
and constructing a defect data set according to the software defect sample data, and extracting all measurement meta-features of each sample in the defect data set, which correspond to the view type, so as to form a static view and a dynamic view.
The specific process of constructing the defect data set according to the software defect sample data comprises the following steps:
step S211, excavating and analyzing a software historical warehouse, extracting a software module from the software warehouse as a sample for carrying out category marking, and setting a measurement element which is strongly related to the defects of the software module;
step S212, generating a defect data set X ═ X according to the class mark and the metric element1,x2,…,xnWhere n denotes the number of samples, sample xi=[m1,m2,…,md](ii) a Wherein m isdRepresenting the d-th metric. Sample xiRepresenting a feature vector consisting of d metric elements.
That is, the defect data set construction according to the software defect sample data comprises a sample marking stage, a feature extraction stage and a defect data set construction stage.
A sample marking stage: and (3) mining and analyzing a software history warehouse, and extracting software modules from the software history warehouse as samples to perform class marking, wherein the class marking is 0 (indicating a non-defective module) or 1 (indicating a defective module).
A characteristic extraction stage: software defect prediction techniques may focus project developers on defective modules, but to mine defective modules, it is necessary to find intrinsic attributes that have a relationship with the appearance of the defects, which are software metrology data, i.e., metrology elements. The attribute of the software module can be obtained by measuring the software module by using methods such as McCabe measurement, Halstead measurement and the like.
And (3) constructing a defect data set stage: and establishing a defect data set for each program module based on the measurement element (such as methods of McCabe measurement, Halstead measurement and the like). Specifically, assume that the defect data set X ═ X1,x2,…,xnWhere n denotes the number of samples, sample xi=[m1,m2,…,md]Representing a feature vector consisting of d metric elements. The defect data set is a collection of samples and feature vectors corresponding to metrics.
Wherein, the specific process of extracting all the measurement element characteristics corresponding to each sample and view type in the defect data set to form the static view and the dynamic view comprises the following steps:
step S213, for each sample xiExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form a static view;
step S214, for each sample xiAccording to the dynamic type, all the measurement element characteristics corresponding to the current view type are extracted to form a dynamic view.
Specifically, as mentioned above, it can be considered as the multi-view forming stage: for each sample xiExtracting all the metric element characteristics corresponding to the current view type according to the static (static) type to form a static (static) view; among them, a static (static) view can be represented as:
Figure BDA0003189550850000081
for each sample xiAccording to the dynamic (dynamic) type, all the metric element characteristics corresponding to the current view type are extracted to form a dynamic (dynamic) viewFigure (a). Wherein, the dynamic view can be expressed as
Figure BDA0003189550850000082
In some of the embodiments, as shown in fig. 3, step S220 includes the following steps;
step S221, preprocessing the static view and the dynamic view by utilizing a normalization algorithm;
step S222, extracting high-level semantic features of the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
step S223, mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation example pair;
and S224, constructing a binary constraint and a triple constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
In step S221, the specific process of preprocessing the static view and the dynamic view by using the normalization algorithm is as follows:
normalizing the static view and the dynamic view by using a normalization algorithm, specifically, the normalization algorithm can be a minimum-maximum normalization algorithm, and all values are converted into an interval [0,1 ]]. For any metric element in the static view and the dynamic view, assume aiNormalized value
Figure BDA0003189550850000083
Comprises the following steps:
Figure BDA0003189550850000084
wherein, aiRepresenting the ith metric element in the sample of the static view; max (a) represents the maximum value of a, and min (a) represents the minimum value of a. The measurement elements are preprocessed, so that the range of various measurement element values in the static view and the dynamic view can be narrowed, and data can be normalized.
Because each sample in the static view and the dynamic view is a vector set formed by different measurement elements, in order to keep the dimension of each view consistent and maximally preserve the structural information of the sample, a self-coding network is adopted to replace the traditional PCA method to reduce the dimension of each sample in the static view and the dynamic view. Therefore, in step S222, the specific process of extracting the high-level semantic features from the preprocessed static view and dynamic view to obtain the static high-level semantic features and the dynamic high-level semantic features is as follows:
constructing a self-coding network for the normalized static view, and learning a deep self-coding network to extract static high-level semantic features of the normalized static view; specifically, it is provided with
Figure BDA0003189550850000091
Representing the ith sample after normalization processing in the static view, and constructing a self-coding network for the static view
Figure BDA0003189550850000092
And will self-encode the network
Figure BDA0003189550850000093
Corresponding samples of coding layer of
Figure BDA0003189550850000094
Is characterized by
Figure BDA0003189550850000095
Figure BDA0003189550850000096
Will self-encode the network
Figure BDA0003189550850000097
Corresponding sample
Figure BDA0003189550850000098
Is characterized by
Figure BDA0003189550850000099
Wherein, theta(static)Depth self-coding network representing static views
Figure BDA00031895508500000910
The parameter (c) of (c).
And constructing a self-coding network for the normalized dynamic view, and learning the deep self-coding network to extract the dynamic high-level semantic features of the normalized dynamic view. Specifically, it is provided with
Figure BDA00031895508500000911
Representing the ith sample after normalization processing in the dynamic view, and constructing a self-coding network for the dynamic view
Figure BDA00031895508500000912
And will be
Figure BDA00031895508500000913
Coding layer corresponding samples
Figure BDA00031895508500000914
Is characterized by
Figure BDA00031895508500000915
Figure BDA00031895508500000916
Will be provided with
Figure BDA00031895508500000917
Corresponding sample
Figure BDA00031895508500000918
Is characterized by
Figure BDA00031895508500000919
Wherein, theta(dynamic)Depth self-coding network for representing dynamic views
Figure BDA00031895508500000920
The parameter (c) of (c).
Then, through the learning of respective deep self-coding network, the normalized static high-level semantic feature F can be respectively extracted(static)And dynamic high level semantic features F(dynamic). As previously described, the dimensions of each view can be kept consistent while preserving the structural information of the sample maximally programmatically.
Since most machine learning modeling techniques are based on a hypothesis: the similarity distribution assumption. However, the static view and the dynamic view of the present application describe data of a software module from the perspective of static measurement and dynamic measurement, and there is a large data distribution difference between the views. Therefore, the step S223 maps the static high-level semantic features and the dynamic high-level semantic features to the common subspace, and the specific process of obtaining the transformation instance pair is as follows:
and adopting four layers of FNN as feature mapping, and converting the static high-level semantic features and the dynamic high-level semantic features into a public subspace in a nonlinear manner to obtain a transformation example pair.
Specifically, four layers of FNNs are adopted as feature mapping, and static high-layer semantic features F are used(static)And dynamic high level semantic features F(dynamic)The non-linearity is transformed into a common subspace. Static high-level semantic features F arranged in a common subspace(static)Is mapped to F(S)=f(S)(F(static);θS) Dynamic high level semantic features F(dynamic)Is mapped to F(D)=f(D)(F(dynamic);θD) Wherein f is(S)(F(static);θS) And T(D)(F(dynamic);θD) Is a mapping function, θSAnd thetaDIs a tunable parameter, F(s)And F(D)Referred to as a transformation instance pair in a common space. As previously described, a common subspace is learned to minimize the distance between views, reducing the data distribution difference, while maintaining the data structure characteristics of the original views. And the transformed features can obtain the maximum correlation, thereby well solving the problem of data distribution differenceTo give a title.
Step S224, according to the transformation example pair, constructing a binary group constraint and a triple group constraint, and performing multi-view feature learning, wherein a specific process for constructing a heterogeneous multi-view software defect prediction model is as follows:
constructing a binary constraint according to the transformation example pair;
constructing a triple constraint according to the transformation example pair;
constructing a loss function according to the binary group constraint and the triple group constraint;
and constructing a heterogeneous multi-view software defect prediction model by taking the binary group constraint and the triple group constraint as training samples, taking relevant data in the AEEEM data set as test samples and taking a loss function as a convergence condition.
First, a dyad constraint is constructed for two different pairs of view transformation instances (F) of the same sample in a common spacei (S),Fi (D)) The distance between pairs of transformation instances is minimized in feature learning. L is1Mapping instance F after the norm was used to compute the feature mapi (S),Fi (D)The distance between, expressed as:
L1(Fi (S),Fi (D))=||f(S)(Fi (static);θS)-f(D)(Fi (dynamic);θD)||2 (1)。
wherein f is(S)(F(static);θS) And f(D)(F(dynamic);θD) Is a mapping function, θSAnd thetaDIs an adjustable parameter.
Secondly, constructing triple constraints to further enhance the intra-class and inter-class relationship of the mapping examples, namely, the same-class samples are close to each other and the heterogeneous samples are far away from each other in the public space. Building triplets
Figure BDA0003189550850000101
Wherein, Fi (S)The static view feature vector selected as the anchor point,
Figure BDA0003189550850000102
is from a dynamic view and is associated with Fi (S)The view feature vectors having the same label,
Figure BDA0003189550850000103
is from a dynamic view and is associated with Fi (S)View feature vectors with different labels. L is2Mapping instance F after the norm was used to compute the feature mapi (S)
Figure BDA0003189550850000104
The distance between, which is expressed as:
Figure BDA0003189550850000105
L3mapping instance F after the norm was used to compute the feature mapi (S)
Figure BDA0003189550850000106
The distance between, which is expressed as:
Figure BDA0003189550850000107
again, according to equations (1), (2), (3), the loss function of the global feature structure constraint is defined as follows:
L=max(L3-αL1-βL2) (4);
wherein α and β are hyperparameters.
TABLE 1
Figure BDA0003189550850000108
Figure BDA0003189550850000111
It should be noted that the AEEEM dataset is a public defect dataset, and as shown in table 1, the metric elements in the AEEEM dataset are divided into two views according to the static metric elements and the dynamic metric elements, that is, the static view and the dynamic view are obtained and used as test samples. AEEEM datasets consist of the Equinox Framework (EQ), eclipseJDTCore (JDT), Apache Lucene (LC), Mylyn (ML), and eclipsePDUI (PDE) projects. It has 61 metrics including chidammber and kemer (ck) metrics, object-oriented metrics, churn in source code metrics, entropy in source code metrics, and so on.
And finally, learning the model according to the training sample, the test sample and the loss function to obtain the heterogeneous multi-view software defect prediction model.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment further provides a model building apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted here. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a model building apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes: a processing module 210 and a learning module 220;
the processing module 210 is configured to construct a defect data set according to the software defect sample data, and form a static view and a dynamic view based on the defect data set;
and the learning module 220 is configured to perform multi-view feature learning on the static view and the dynamic view by using a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
The model construction device solves the problems that a defect prediction model constructed by utilizing static data is limited in prediction effect and easy to generate errors, realizes multi-view feature learning on a static view and a dynamic view by utilizing a deep learning network architecture, constructs a heterogeneous multi-view software defect prediction model, and improves the prediction effect and the prediction accuracy.
In one embodiment, the processing module 210 is further configured to mine and analyze a software history repository, extract software modules from the software history repository as samples for class marking, and set metric elements that are strongly related to the existence of defects of the software modules;
generating a defect data set X ═ X according to the class mark and the metric element1,x2,…,xnWhere n denotes the number of samples, sample xi=[m1,m2,…,md]Representing a feature vector consisting of d metric elements;
for each sample xiExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form a static view;
for each sample xiAccording to the dynamic type, all the measurement element characteristics corresponding to the current view type are extracted to form a dynamic view.
In one embodiment, the learning module 220 is further configured to pre-process the static view and the dynamic view using a normalization algorithm;
extracting high-level semantic features of the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation example pair;
and constructing a binary constraint and a triple constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
Fig. 5 is a flowchart of a model building method according to an embodiment of the present application, and as shown in fig. 5, the flowchart includes the following steps:
step S510, constructing a heterogeneous multi-view software defect prediction model by using the model construction method;
and S520, performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
The software defect prediction method can improve the prediction effect and the prediction accuracy.
Specifically, four software defect prediction methods are selected based on an AEEEM data set and compared with the method, and the four selected software defect prediction methods are as follows: a ManualDown method, an NN-filter method, an SSTCA + ISDA method and a CKSDL method.
The batch size on all datasets was set to 64 and the hyperparameters α and β were set to 0.01 and 0.1, respectively. F-measure and G-measure are used to evaluate the index. The F-measure index is evaluated by combining pd with precision, i.e., F-measure ═ 2 × pd × precision/(pd + precision). The G-measure index considers both recall and specificity, and is a geometric mean value of recall and specificity. specificity is a statistical measure defined as TN/(TN + FP). Wherein TN represents True Negative. G-measure is defined as 2 pd specificity/(pd + specificity). The larger the F-measure, the better the performance of the cross-project defect prediction.
TABLE 2
Figure BDA0003189550850000131
As shown in Table 2, the F-Meaure and Pd for the cross-item defect prediction on the AEEEM dataset by the present application and comparative method are listed. As can be seen from Table 2, the cross-project defect prediction performance of the method is superior to that of the ManualDown method, the NN-filter method, the SSTCA + ISDA method and the CKSDL method. The method constructs a view set by carrying out static measurement and dynamic measurement on a software module, namely, carrying out feature learning on a defect data set from the angle of multi-view learning, simultaneously considering effective identification features in the view and the correlation between the views; the contrast method belongs to a single-view learning method, and does not pay much attention to the relation between views. Therefore, the prediction performance of the method is superior to that of a comparison method, and the method is an effective software defect feature learning method.
The embodiment also provides a software defect prediction apparatus, as shown in fig. 6, the apparatus includes a building module 510 and a prediction module 520;
a building module 510, configured to build a heterogeneous multi-view software defect prediction model in the model building method described above;
the prediction module 520 is configured to perform software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model, so as to obtain a prediction result of the software module to be analyzed.
The software defect prediction device can improve the prediction effect and the prediction accuracy.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, constructing a defect data set according to the software defect sample data, and forming a static view and a dynamic view based on the defect data set;
and S2, performing multi-view feature learning on the static view and the dynamic view by using the deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
Optionally, in this embodiment, the processor may be configured to further execute, by the computer program, the following steps:
s3, constructing a heterogeneous multi-view software defect prediction model in the model construction method;
and S4, performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the software defect prediction method in the foregoing embodiment, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the model construction methods and any of the software defect prediction methods of the above embodiments.
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (13)

1. A method of model construction, comprising:
constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;
and performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
2. The model building method of claim 1, wherein said building a defect data set from software defect sample data and forming a static view and a dynamic view based on said defect data set comprises:
and constructing a defect data set according to the software defect sample data, and extracting all measurement meta-features corresponding to the view type of each sample in the defect data set to form the static view and the dynamic view.
3. The model building method of claim 2, wherein said building a defect data set from software defect sample data comprises:
mining and analyzing a software historical warehouse, extracting a software module from the software warehouse as a sample for carrying out category marking, and setting a measurement element which is strongly related to the defects of the software module;
generating a defect data set X ═ X from the class labels and the metric elements1,x2,…,xnWhere n denotes the number of samples, sample xi=[m1,m2,…,md]Representing a feature vector consisting of d metric elements.
4. The model building method of claim 3, wherein said extracting all the metric features corresponding to a view type for each sample in the defect dataset to form the static view and the dynamic view comprises:
for each sample xiExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form the static view;
for each sample xiAccording to the dynamic type, all the metric element characteristics corresponding to the current view type are extracted to form the dynamic view.
5. The model building method according to any one of claims 1-4, wherein the performing multi-view feature learning on the static view and the dynamic view by using the deep learning network architecture to build a heterogeneous multi-view software defect prediction model comprises:
preprocessing the static view and the dynamic view by utilizing a normalization algorithm;
performing high-level semantic feature extraction on the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;
mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation example pair;
and constructing a binary constraint and a triple constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.
6. The model building method according to claim 5, wherein the performing high-level semantic feature extraction on the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features comprises:
constructing a self-coding network for the normalized static view, and learning a deep self-coding network to extract static high-level semantic features of the normalized static view;
and constructing a self-coding network for the normalized dynamic view, and learning a deep self-coding network to extract the dynamic high-level semantic features of the normalized dynamic view.
7. The model building method of claim 5, wherein mapping the static high-level semantic features and the dynamic high-level semantic features to a common subspace to obtain a transformation instance pair comprises:
and adopting four layers of FNN as feature mapping, and converting the static high-level semantic features and the dynamic high-level semantic features into a public subspace in a nonlinear manner to obtain a transformation example pair.
8. The model building method of claim 5, wherein the building of the binary constraint and the triple constraint according to the transformation instance pair and the multi-view feature learning to build the heterogeneous multi-view software defect prediction model comprises:
constructing a binary constraint according to the transformation instance pair;
constructing a triplet constraint according to the transformation instance pair;
constructing a loss function according to the binary constraint and the triple constraint;
and constructing a heterogeneous multi-view software defect prediction model by taking the binary constraint and the triple constraint as training samples, taking relevant data in an AEEEM data set as test samples and taking the loss function as a convergence condition.
9. A method for predicting software defects, comprising:
building the heterogeneous multi-view software defect prediction model of any one of claims 1 to 8;
and performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
10. A model building device is characterized by comprising a processing module and a learning module;
the processing module is used for constructing a defect data set according to software defect sample data and forming a static view and a dynamic view based on the defect data set;
the learning module is used for performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.
11. The software defect prediction device is characterized by comprising a construction module and a prediction module;
the building module is used for building the heterogeneous multi-view software defect prediction model according to any one of claims 1 to 8;
the prediction module is used for performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.
12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the software defect prediction method of any one of claims 1 to 8; the processor is further arranged to run the computer program to perform the software defect prediction method as claimed in claim 9.
13. A storage medium having stored thereon a computer program, wherein the computer program is arranged to perform the software bug prediction method of any of claims 1 to 8 when executed; the computer program is arranged to perform, when running, the software defect prediction method as claimed in claim 9.
CN202110873565.0A 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device Active CN113705616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110873565.0A CN113705616B (en) 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110873565.0A CN113705616B (en) 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device

Publications (2)

Publication Number Publication Date
CN113705616A true CN113705616A (en) 2021-11-26
CN113705616B CN113705616B (en) 2024-05-10

Family

ID=78651147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110873565.0A Active CN113705616B (en) 2021-07-30 2021-07-30 Model construction method, software defect prediction method, device and electronic device

Country Status (1)

Country Link
CN (1) CN113705616B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820539A (en) * 2023-08-30 2023-09-29 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103472078A (en) * 2013-09-26 2013-12-25 徐州工程学院 Automatic correspondence method of batch defects of multi-view images
US8819617B1 (en) * 2013-09-19 2014-08-26 Fmr Llc System and method for providing access to data in a plurality of software development systems
CN105138913A (en) * 2015-07-24 2015-12-09 四川大学 Malware detection method based on multi-view ensemble learning
CN107885607A (en) * 2017-10-20 2018-04-06 北京航空航天大学 One kind is based on built-in system software multi views hazard model and its modeling method
CN108459955A (en) * 2017-09-29 2018-08-28 重庆大学 Software Defects Predict Methods based on depth autoencoder network
CN108710576A (en) * 2018-05-30 2018-10-26 浙江工业大学 Data set extending method and Software Defects Predict Methods based on isomery migration
US20190044964A1 (en) * 2017-08-03 2019-02-07 International Business Machines Corporation Malware Clustering Approaches Based on Cognitive Computing Techniques
CN110659207A (en) * 2019-09-02 2020-01-07 北京航空航天大学 Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN112015659A (en) * 2020-09-02 2020-12-01 三维通信股份有限公司 Prediction method and device based on network model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8819617B1 (en) * 2013-09-19 2014-08-26 Fmr Llc System and method for providing access to data in a plurality of software development systems
CN103472078A (en) * 2013-09-26 2013-12-25 徐州工程学院 Automatic correspondence method of batch defects of multi-view images
CN105138913A (en) * 2015-07-24 2015-12-09 四川大学 Malware detection method based on multi-view ensemble learning
US20190044964A1 (en) * 2017-08-03 2019-02-07 International Business Machines Corporation Malware Clustering Approaches Based on Cognitive Computing Techniques
CN108459955A (en) * 2017-09-29 2018-08-28 重庆大学 Software Defects Predict Methods based on depth autoencoder network
CN107885607A (en) * 2017-10-20 2018-04-06 北京航空航天大学 One kind is based on built-in system software multi views hazard model and its modeling method
CN108710576A (en) * 2018-05-30 2018-10-26 浙江工业大学 Data set extending method and Software Defects Predict Methods based on isomery migration
CN110659207A (en) * 2019-09-02 2020-01-07 北京航空航天大学 Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN112015659A (en) * 2020-09-02 2020-12-01 三维通信股份有限公司 Prediction method and device based on network model

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
FRENKLACH T等: "Android malware detection via an app similarity graph", 《COMPUTERS & SECURITY》, vol. 109, pages 1 - 16 *
HATCHER W G等: "A survey of deep learning: Platforms, applications and emerging research trends", 《IEEE ACCESS》, vol. 6, pages 24411 - 24432, XP011684251, DOI: 10.1109/ACCESS.2018.2830661 *
PHAN A V等: "Convolutional neural networks over control flow graphs for software defect prediction", 《2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI)》, pages 45 - 52 *
周末等: "基于深度自编码网络的软件缺陷预测方法", 《计算机工程与科学》, vol. 40, no. 10, pages 1796 - 1804 *
谭杨等: "基于混合特征的深度自编码器的恶意软件家族分类", 《信息网络安全》, vol. 20, no. 12, pages 72 - 82 *
谭炜: "Android恶意软件检测系统的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 5, pages 138 - 202 *
陈锐: "基于动静态分析的安卓恶意软件检测", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 7, pages 138 - 78 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820539A (en) * 2023-08-30 2023-09-29 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet
CN116820539B (en) * 2023-08-30 2023-11-10 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet

Also Published As

Publication number Publication date
CN113705616B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
US11055567B2 (en) Unsupervised exception access detection method and apparatus based on one-hot encoding mechanism
EP3449369B1 (en) Methods for enabling data analytics service at service layer
US11741094B2 (en) Method and system for identifying core product terms
CN110516173B (en) Illegal network station identification method, illegal network station identification device, illegal network station identification equipment and illegal network station identification medium
CN114416900A (en) Method and device for analyzing track stop point
CN113705616A (en) Model construction method, software defect prediction device and electronic device
US20230041339A1 (en) Method, device, and computer program product for user behavior prediction
CN105978722B (en) User property method for digging and device
Du et al. Image recommendation algorithm combined with deep neural network designed for social networks
Bi et al. Hierarchical social recommendation model based on a graph neural network
CN116703526A (en) Article recommendation method, device, equipment and storage medium
CN111814044A (en) Recommendation method and device, terminal equipment and storage medium
CN116094907A (en) Complaint information processing method, complaint information processing device and storage medium
CN114329099B (en) Overlapping community identification method, device, equipment, storage medium and program product
Wang et al. Hybrid ontology matching for solving the heterogeneous problem of the IoT
Ma [Retracted] Construction of Tourism Management Engineering Based on Data Mining Technology
WO2022242923A1 (en) Artificial intelligence based cognitive test script generation
CN112947928A (en) Code evaluation method and device, electronic equipment and storage medium
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
Guo et al. Multisource target data fusion tracking method for heterogeneous network based on data mining
CN112199280A (en) Defect prediction method and apparatus, storage medium, and electronic apparatus
CN106294433A (en) Facility information processing method and processing device
Zhu et al. Influencing Factors of e‐Commerce Enterprise Development Based on Mobile Computing Big Data Analysis
CN114185617B (en) Service call interface configuration method, device, equipment and storage medium
US20230306291A1 (en) Methods, apparatuses and computer program products for generating synthetic data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant