CN113705616A

CN113705616A - Model construction method, software defect prediction device and electronic device

Info

Publication number: CN113705616A
Application number: CN202110873565.0A
Authority: CN
Inventors: 韩璐; 严军荣
Original assignee: Sunwave Communications Co Ltd
Current assignee: Sunwave Communications Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-11-26
Anticipated expiration: 2041-07-30
Also published as: CN113705616B

Abstract

The application relates to a model construction method, a software defect prediction device and an electronic device, wherein the method comprises the following steps: constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set; and performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model. By the method and the device, the problems that a defect prediction model constructed by using static data is limited in prediction effect and easy to generate errors are solved, multi-view feature learning is performed on a static view and a dynamic view by using a deep learning network architecture, so that a heterogeneous multi-view software defect prediction model is constructed, and the prediction effect and the prediction accuracy are improved.

Description

Model construction method, software defect prediction device and electronic device

Technical Field

The present application relates to the field of software engineering technologies, and in particular, to a model construction method, a software defect prediction apparatus, and an electronic apparatus.

Background

Thanks to the rapid development of computer software and hardware, people's clothes and eating habits have changed greatly. The advent of software products such as Taobao, Mei Tuo, and ticker has increased the efficiency of life and production, and people are also paying more attention to the quality of software products, so that only software products with higher reliability (lower potential for defects) can be supported and approved by users.

According to the standard definition of IEEE standard on defects, from the inside of a product, the defects are various problems such as errors, faults and the like in the development or maintenance process of a software product; a defect is a failure or violation of some function that the system needs to implement, as viewed from outside the product. Therefore, the hidden defect inside the software may cause unexpected results in actual operation, slightly affect the software quality and seriously threaten the safety of people's lives and properties. From the perspectives of software itself, team work, technical problems and the like, the generation of software defects is mainly determined by the characteristics and the development process of software products, and the defects are inevitable.

Although the defects are difficult to stop, the defects can be analyzed and monitored, and the purpose of reducing the defects is achieved. Software defect prediction is a technology capable of effectively mining potential defects and distribution situations thereof which may be left in software but not discovered yet, and the technology can make project developers focus on the defective modules, but to mine the defective modules, it is necessary to find out intrinsic attributes which have a relationship with the appearance of the defects, and the attributes are software measurement data, namely measurement elements.

The existing software defect prediction method builds a defect prediction model by mining static data in a software historical warehouse, so as to predict the defects of a new program module. The program modules may be arranged into packages, files, classes, or functions, etc. according to actual test requirements. When the test resources are sufficient, the technique can be used to check each program module for defects; when the testing resources are insufficient, the resources can be reasonably distributed by the technology to generate defects as many as possible. However, different types of features reflect different information of software modules, and a defect prediction model constructed by only using static data cannot fully utilize various types of features, so that the problems of limited prediction effect and easy error generation exist.

At present, aiming at the problems that in the related technology, a defect prediction model constructed by utilizing static data has limited prediction effect and is easy to generate errors, an effective solution is not provided.

Disclosure of Invention

The embodiment of the application provides a model construction method, a software defect prediction device and an electronic device, and aims to at least solve the problems that a defect prediction model constructed by using static data in the related technology has limited prediction effect and is easy to generate errors.

In a first aspect, an embodiment of the present application provides a model building method, including:

constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;

and performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.

In some embodiments, the constructing a defect data set according to software defect sample data and forming a static view and a dynamic view based on the defect data set includes:

and constructing a defect data set according to the software defect sample data, and extracting all measurement meta-features corresponding to the view type of each sample in the defect data set to form the static view and the dynamic view.

In some embodiments, the constructing the defect data set according to the software defect sample data includes:

mining and analyzing a software historical warehouse, extracting a software module from the software warehouse as a sample for carrying out category marking, and setting a measurement element which is strongly related to the defects of the software module;

generating a defect data set X ═ X from the class labels and the metric elements₁,x₂,…,x_nWhere n denotes the number of samples, sample x_i＝[m₁,m₂,…,m_d]Representing a feature vector consisting of d metric elements.

In some embodiments, the extracting all the metric features corresponding to the view type for each sample in the defect data set to form the static view and the dynamic view includes:

for each sample x_iExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form the static view;

for each sample x_iAccording to the dynamic type, all the metric element characteristics corresponding to the current view type are extracted to form the dynamic view.

In some embodiments, the performing, by using the deep learning network architecture, multi-view feature learning on the static view and the dynamic view to construct a heterogeneous multi-view software defect prediction model includes:

preprocessing the static view and the dynamic view by utilizing a normalization algorithm;

performing high-level semantic feature extraction on the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;

mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation example pair;

and constructing a binary constraint and a triple constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.

In some embodiments, the extracting high-level semantic features from the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features includes:

constructing a self-coding network for the normalized static view, and learning a deep self-coding network to extract static high-level semantic features of the normalized static view;

and constructing a self-coding network for the normalized dynamic view, and learning a deep self-coding network to extract the dynamic high-level semantic features of the normalized dynamic view.

In some embodiments, the mapping the static high-level semantic features and the dynamic high-level semantic features to a common subspace to obtain a transformation instance pair includes:

and adopting four layers of FNN as feature mapping, and converting the static high-level semantic features and the dynamic high-level semantic features into a public subspace in a nonlinear manner to obtain a transformation example pair.

In some embodiments, the constructing a binary constraint and a triple constraint according to the transformation instance pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model includes:

constructing a binary constraint according to the transformation instance pair;

constructing a triplet constraint according to the transformation instance pair;

constructing a loss function according to the binary constraint and the triple constraint;

and constructing a heterogeneous multi-view software defect prediction model by taking the binary constraint and the triple constraint as training samples, taking relevant data in an AEEEM data set as test samples and taking the loss function as a convergence condition.

In a second aspect, an embodiment of the present application provides a software defect prediction method, including:

constructing a heterogeneous multi-view software defect prediction model according to the first aspect;

and performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.

In a third aspect, an embodiment of the present application provides a model building apparatus, which is characterized by comprising a processing module and a learning module;

the processing module is used for constructing a defect data set according to software defect sample data and forming a static view and a dynamic view based on the defect data set;

the learning module is used for performing multi-view feature learning on the static view and the dynamic view by utilizing a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.

In a fourth aspect, an embodiment of the present application provides a software defect prediction apparatus, including a construction module and a prediction module;

the building module is used for building the heterogeneous multi-view software defect prediction model according to the first aspect;

the prediction module is used for performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.

In a fifth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the software defect prediction method according to the first aspect.

In a sixth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the software defect prediction method according to the first aspect.

Compared with the related art, the model construction method, the software defect prediction method, the device and the electronic device provided by the embodiment of the application construct the defect data set according to the software defect sample data and form the static view and the dynamic view based on the defect data set; the method has the advantages that the deep learning network architecture is utilized to conduct multi-view feature learning on the static view and the dynamic view so as to construct a heterogeneous multi-view software defect prediction model, the problems that the defect prediction model constructed by utilizing static data is limited in prediction effect and prone to generating errors are solved, the deep learning network architecture is utilized to conduct multi-view feature learning on the static view and the dynamic view so as to construct the heterogeneous multi-view software defect prediction model, and the prediction effect and the prediction accuracy are improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a terminal device according to a model building method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a model building method provided by an embodiment of the present application;

FIG. 3 is a flowchart of step S220 in FIG. 2;

fig. 4 is a block diagram of a model building apparatus according to an embodiment of the present application;

FIG. 5 is a flowchart of a software bug prediction method according to an embodiment of the present application;

fig. 6 is a block diagram illustrating a software defect prediction apparatus according to an embodiment of the present application.

In the figure: 210. a processing module; 220. a learning module; 510. building a module; 520. and a prediction module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The method provided by the embodiment can be executed in a terminal, a computer or a similar operation device. Taking an example of the operation on a terminal, fig. 1 is a hardware structure block diagram of the terminal of the model construction method according to the embodiment of the present application. As shown in fig. 1, the terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the model building method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

It should be noted that the software defect prediction method embodiment may also be executed in a terminal, a computer, or a similar computing device. The structure may be similar to that of fig. 1 and will not be discussed herein.

The present embodiment provides a model building method, and fig. 2 is a flowchart of the model building method according to the embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

s210, constructing a defect data set according to software defect sample data, and forming a static view and a dynamic view based on the defect data set;

and S220, performing multi-view feature learning on the static view and the dynamic view by using the deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.

The defect data set is constructed based on software defect sample data. The software defect sample data is the category and the attribute of the software module in the software historical warehouse. The software module is divided into two categories of a defective module and a non-defective module; such as: the defective module is marked 1; the non-defective module is labeled 0. The attribute of the software module may be measured by using methods such as McCabe measurement, Halstead measurement, and the like, to obtain the attribute of the software module, which may also be referred to as a measurement element. Then each sample vector x in the defect dataset_iCan be composed of

View vector sum

And (5) view vector joint representation. Wherein the content of the first and second substances,

the set of view vectors is a static view.

The set of view vectors is the dynamic view. And then multi-view feature learning is carried out on the static view and the dynamic view (category and attribute) by utilizing a deep learning network architecture, so as to construct a heterogeneous multi-view software defect prediction model.

Through the steps, the problems that a defect prediction model constructed by using static data has limited prediction effect and is easy to generate errors are solved, multi-view feature learning of a static view and a dynamic view is realized by using a deep learning network architecture, effective identification features in the views are considered, and correlation among the views is also considered; compared with a defect prediction model constructed by using static data, the method can improve the prediction effect and the prediction accuracy.

The specific construction process of the heterogeneous multi-view software defect prediction model is described in detail below.

In some of these embodiments, step S210 includes the following steps;

and constructing a defect data set according to the software defect sample data, and extracting all measurement meta-features of each sample in the defect data set, which correspond to the view type, so as to form a static view and a dynamic view.

The specific process of constructing the defect data set according to the software defect sample data comprises the following steps:

step S211, excavating and analyzing a software historical warehouse, extracting a software module from the software warehouse as a sample for carrying out category marking, and setting a measurement element which is strongly related to the defects of the software module;

step S212, generating a defect data set X ═ X according to the class mark and the metric element₁,x₂,…,x_nWhere n denotes the number of samples, sample x_i＝[m₁,m₂,…,m_d](ii) a Wherein m is_dRepresenting the d-th metric. Sample x_iRepresenting a feature vector consisting of d metric elements.

That is, the defect data set construction according to the software defect sample data comprises a sample marking stage, a feature extraction stage and a defect data set construction stage.

A sample marking stage: and (3) mining and analyzing a software history warehouse, and extracting software modules from the software history warehouse as samples to perform class marking, wherein the class marking is 0 (indicating a non-defective module) or 1 (indicating a defective module).

A characteristic extraction stage: software defect prediction techniques may focus project developers on defective modules, but to mine defective modules, it is necessary to find intrinsic attributes that have a relationship with the appearance of the defects, which are software metrology data, i.e., metrology elements. The attribute of the software module can be obtained by measuring the software module by using methods such as McCabe measurement, Halstead measurement and the like.

And (3) constructing a defect data set stage: and establishing a defect data set for each program module based on the measurement element (such as methods of McCabe measurement, Halstead measurement and the like). Specifically, assume that the defect data set X ═ X₁,x₂,…,x_nWhere n denotes the number of samples, sample x_i＝[m₁,m₂,…,m_d]Representing a feature vector consisting of d metric elements. The defect data set is a collection of samples and feature vectors corresponding to metrics.

Wherein, the specific process of extracting all the measurement element characteristics corresponding to each sample and view type in the defect data set to form the static view and the dynamic view comprises the following steps:

step S213, for each sample x_iExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form a static view;

step S214, for each sample x_iAccording to the dynamic type, all the measurement element characteristics corresponding to the current view type are extracted to form a dynamic view.

Specifically, as mentioned above, it can be considered as the multi-view forming stage: for each sample x_iExtracting all the metric element characteristics corresponding to the current view type according to the static (static) type to form a static (static) view; among them, a static (static) view can be represented as:

for each sample x_iAccording to the dynamic (dynamic) type, all the metric element characteristics corresponding to the current view type are extracted to form a dynamic (dynamic) viewFigure (a). Wherein, the dynamic view can be expressed as

In some of the embodiments, as shown in fig. 3, step S220 includes the following steps;

step S221, preprocessing the static view and the dynamic view by utilizing a normalization algorithm;

step S222, extracting high-level semantic features of the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;

step S223, mapping the static high-level semantic features and the dynamic high-level semantic features to a public subspace to obtain a transformation example pair;

and S224, constructing a binary constraint and a triple constraint according to the transformation example pair, and performing multi-view feature learning to construct a heterogeneous multi-view software defect prediction model.

In step S221, the specific process of preprocessing the static view and the dynamic view by using the normalization algorithm is as follows:

normalizing the static view and the dynamic view by using a normalization algorithm, specifically, the normalization algorithm can be a minimum-maximum normalization algorithm, and all values are converted into an interval [0,1 ]]. For any metric element in the static view and the dynamic view, assume a_iNormalized value

Comprises the following steps:

wherein, a_iRepresenting the ith metric element in the sample of the static view; max (a) represents the maximum value of a, and min (a) represents the minimum value of a. The measurement elements are preprocessed, so that the range of various measurement element values in the static view and the dynamic view can be narrowed, and data can be normalized.

Because each sample in the static view and the dynamic view is a vector set formed by different measurement elements, in order to keep the dimension of each view consistent and maximally preserve the structural information of the sample, a self-coding network is adopted to replace the traditional PCA method to reduce the dimension of each sample in the static view and the dynamic view. Therefore, in step S222, the specific process of extracting the high-level semantic features from the preprocessed static view and dynamic view to obtain the static high-level semantic features and the dynamic high-level semantic features is as follows:

constructing a self-coding network for the normalized static view, and learning a deep self-coding network to extract static high-level semantic features of the normalized static view; specifically, it is provided with

Representing the ith sample after normalization processing in the static view, and constructing a self-coding network for the static view

And will self-encode the network

Corresponding samples of coding layer of

Is characterized by

Will self-encode the network

Corresponding sample

Is characterized by

Wherein, theta^(static)Depth self-coding network representing static views

The parameter (c) of (c).

And constructing a self-coding network for the normalized dynamic view, and learning the deep self-coding network to extract the dynamic high-level semantic features of the normalized dynamic view. Specifically, it is provided with

Representing the ith sample after normalization processing in the dynamic view, and constructing a self-coding network for the dynamic view

And will be

Coding layer corresponding samples

Is characterized by

Will be provided with

Corresponding sample

Is characterized by

Wherein, theta^(dynamic)Depth self-coding network for representing dynamic views

The parameter (c) of (c).

Then, through the learning of respective deep self-coding network, the normalized static high-level semantic feature F can be respectively extracted^(static)And dynamic high level semantic features F^(dynamic). As previously described, the dimensions of each view can be kept consistent while preserving the structural information of the sample maximally programmatically.

Since most machine learning modeling techniques are based on a hypothesis: the similarity distribution assumption. However, the static view and the dynamic view of the present application describe data of a software module from the perspective of static measurement and dynamic measurement, and there is a large data distribution difference between the views. Therefore, the step S223 maps the static high-level semantic features and the dynamic high-level semantic features to the common subspace, and the specific process of obtaining the transformation instance pair is as follows:

Specifically, four layers of FNNs are adopted as feature mapping, and static high-layer semantic features F are used^(static)And dynamic high level semantic features F^(dynamic)The non-linearity is transformed into a common subspace. Static high-level semantic features F arranged in a common subspace^(static)Is mapped to F^(S)＝f^(S)(F^(static)；θ_S) Dynamic high level semantic features F^(dynamic)Is mapped to F^(D)＝f^(D)(F^(dynamic)；θ_D) Wherein f is^(S)(F^(static)；θ_S) And T^(D)(F^(dynamic)；θ_D) Is a mapping function, θ_SAnd theta_DIs a tunable parameter, F^(s)And F^(D)Referred to as a transformation instance pair in a common space. As previously described, a common subspace is learned to minimize the distance between views, reducing the data distribution difference, while maintaining the data structure characteristics of the original views. And the transformed features can obtain the maximum correlation, thereby well solving the problem of data distribution differenceTo give a title.

Step S224, according to the transformation example pair, constructing a binary group constraint and a triple group constraint, and performing multi-view feature learning, wherein a specific process for constructing a heterogeneous multi-view software defect prediction model is as follows:

constructing a binary constraint according to the transformation example pair;

constructing a triple constraint according to the transformation example pair;

constructing a loss function according to the binary group constraint and the triple group constraint;

and constructing a heterogeneous multi-view software defect prediction model by taking the binary group constraint and the triple group constraint as training samples, taking relevant data in the AEEEM data set as test samples and taking a loss function as a convergence condition.

First, a dyad constraint is constructed for two different pairs of view transformation instances (F) of the same sample in a common space_i ^(S)，F_i ^(D)) The distance between pairs of transformation instances is minimized in feature learning. L is₁Mapping instance F after the norm was used to compute the feature map_i ^(S)，F_i ^(D)The distance between, expressed as:

L₁(F_i ^(S)，F_i ^(D))＝||f^(S)(F_i ^(static)；θ_S)-f^(D)(F_i ^(dynamic)；θ_D)||₂ (1)。

wherein f is^(S)(F^(static)；θ_S) And f^(D)(F^(dynamic)；θ_D) Is a mapping function, θ_SAnd theta_DIs an adjustable parameter.

Secondly, constructing triple constraints to further enhance the intra-class and inter-class relationship of the mapping examples, namely, the same-class samples are close to each other and the heterogeneous samples are far away from each other in the public space. Building triplets

Wherein, F_i ^(S)The static view feature vector selected as the anchor point,

is from a dynamic view and is associated with F_i ^(S)The view feature vectors having the same label,

is from a dynamic view and is associated with F_i ^(S)View feature vectors with different labels. L is₂Mapping instance F after the norm was used to compute the feature map_i ^(S)，

The distance between, which is expressed as:

L₃mapping instance F after the norm was used to compute the feature map_i ^(S)，

The distance between, which is expressed as:

again, according to equations (1), (2), (3), the loss function of the global feature structure constraint is defined as follows:

L＝max(L₃-αL₁-βL₂) (4)；

wherein α and β are hyperparameters.

TABLE 1

It should be noted that the AEEEM dataset is a public defect dataset, and as shown in table 1, the metric elements in the AEEEM dataset are divided into two views according to the static metric elements and the dynamic metric elements, that is, the static view and the dynamic view are obtained and used as test samples. AEEEM datasets consist of the Equinox Framework (EQ), eclipseJDTCore (JDT), Apache Lucene (LC), Mylyn (ML), and eclipsePDUI (PDE) projects. It has 61 metrics including chidammber and kemer (ck) metrics, object-oriented metrics, churn in source code metrics, entropy in source code metrics, and so on.

And finally, learning the model according to the training sample, the test sample and the loss function to obtain the heterogeneous multi-view software defect prediction model.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The present embodiment further provides a model building apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted here. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of a model building apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes: a processing module 210 and a learning module 220;

the processing module 210 is configured to construct a defect data set according to the software defect sample data, and form a static view and a dynamic view based on the defect data set;

and the learning module 220 is configured to perform multi-view feature learning on the static view and the dynamic view by using a deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.

The model construction device solves the problems that a defect prediction model constructed by utilizing static data is limited in prediction effect and easy to generate errors, realizes multi-view feature learning on a static view and a dynamic view by utilizing a deep learning network architecture, constructs a heterogeneous multi-view software defect prediction model, and improves the prediction effect and the prediction accuracy.

In one embodiment, the processing module 210 is further configured to mine and analyze a software history repository, extract software modules from the software history repository as samples for class marking, and set metric elements that are strongly related to the existence of defects of the software modules;

generating a defect data set X ═ X according to the class mark and the metric element₁,x₂,…,x_nWhere n denotes the number of samples, sample x_i＝[m₁,m₂,…,m_d]Representing a feature vector consisting of d metric elements;

for each sample x_iExtracting all the measurement element characteristics corresponding to the current view type according to the static type to form a static view;

for each sample x_iAccording to the dynamic type, all the measurement element characteristics corresponding to the current view type are extracted to form a dynamic view.

In one embodiment, the learning module 220 is further configured to pre-process the static view and the dynamic view using a normalization algorithm;

extracting high-level semantic features of the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features;

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

Fig. 5 is a flowchart of a model building method according to an embodiment of the present application, and as shown in fig. 5, the flowchart includes the following steps:

step S510, constructing a heterogeneous multi-view software defect prediction model by using the model construction method;

and S520, performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.

The software defect prediction method can improve the prediction effect and the prediction accuracy.

Specifically, four software defect prediction methods are selected based on an AEEEM data set and compared with the method, and the four selected software defect prediction methods are as follows: a ManualDown method, an NN-filter method, an SSTCA + ISDA method and a CKSDL method.

The batch size on all datasets was set to 64 and the hyperparameters α and β were set to 0.01 and 0.1, respectively. F-measure and G-measure are used to evaluate the index. The F-measure index is evaluated by combining pd with precision, i.e., F-measure ═ 2 × pd × precision/(pd + precision). The G-measure index considers both recall and specificity, and is a geometric mean value of recall and specificity. specificity is a statistical measure defined as TN/(TN + FP). Wherein TN represents True Negative. G-measure is defined as 2 pd specificity/(pd + specificity). The larger the F-measure, the better the performance of the cross-project defect prediction.

TABLE 2

As shown in Table 2, the F-Meaure and Pd for the cross-item defect prediction on the AEEEM dataset by the present application and comparative method are listed. As can be seen from Table 2, the cross-project defect prediction performance of the method is superior to that of the ManualDown method, the NN-filter method, the SSTCA + ISDA method and the CKSDL method. The method constructs a view set by carrying out static measurement and dynamic measurement on a software module, namely, carrying out feature learning on a defect data set from the angle of multi-view learning, simultaneously considering effective identification features in the view and the correlation between the views; the contrast method belongs to a single-view learning method, and does not pay much attention to the relation between views. Therefore, the prediction performance of the method is superior to that of a comparison method, and the method is an effective software defect feature learning method.

The embodiment also provides a software defect prediction apparatus, as shown in fig. 6, the apparatus includes a building module 510 and a prediction module 520;

a building module 510, configured to build a heterogeneous multi-view software defect prediction model in the model building method described above;

the prediction module 520 is configured to perform software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model, so as to obtain a prediction result of the software module to be analyzed.

The software defect prediction device can improve the prediction effect and the prediction accuracy.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, constructing a defect data set according to the software defect sample data, and forming a static view and a dynamic view based on the defect data set;

and S2, performing multi-view feature learning on the static view and the dynamic view by using the deep learning network architecture to construct a heterogeneous multi-view software defect prediction model.

Optionally, in this embodiment, the processor may be configured to further execute, by the computer program, the following steps:

s3, constructing a heterogeneous multi-view software defect prediction model in the model construction method;

and S4, performing software defect prediction on the software module to be analyzed by using the heterogeneous multi-view software defect prediction model to obtain a prediction result of the software module to be analyzed.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In addition, in combination with the software defect prediction method in the foregoing embodiment, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the model construction methods and any of the software defect prediction methods of the above embodiments.

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of model construction, comprising:

2. The model building method of claim 1, wherein said building a defect data set from software defect sample data and forming a static view and a dynamic view based on said defect data set comprises:

3. The model building method of claim 2, wherein said building a defect data set from software defect sample data comprises:

4. The model building method of claim 3, wherein said extracting all the metric features corresponding to a view type for each sample in the defect dataset to form the static view and the dynamic view comprises:

5. The model building method according to any one of claims 1-4, wherein the performing multi-view feature learning on the static view and the dynamic view by using the deep learning network architecture to build a heterogeneous multi-view software defect prediction model comprises:

6. The model building method according to claim 5, wherein the performing high-level semantic feature extraction on the preprocessed static view and dynamic view to obtain static high-level semantic features and dynamic high-level semantic features comprises:

7. The model building method of claim 5, wherein mapping the static high-level semantic features and the dynamic high-level semantic features to a common subspace to obtain a transformation instance pair comprises:

8. The model building method of claim 5, wherein the building of the binary constraint and the triple constraint according to the transformation instance pair and the multi-view feature learning to build the heterogeneous multi-view software defect prediction model comprises:

constructing a binary constraint according to the transformation instance pair;

9. A method for predicting software defects, comprising:

building the heterogeneous multi-view software defect prediction model of any one of claims 1 to 8;

10. A model building device is characterized by comprising a processing module and a learning module;

11. The software defect prediction device is characterized by comprising a construction module and a prediction module;

the building module is used for building the heterogeneous multi-view software defect prediction model according to any one of claims 1 to 8;

12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the software defect prediction method of any one of claims 1 to 8; the processor is further arranged to run the computer program to perform the software defect prediction method as claimed in claim 9.

13. A storage medium having stored thereon a computer program, wherein the computer program is arranged to perform the software bug prediction method of any of claims 1 to 8 when executed; the computer program is arranged to perform, when running, the software defect prediction method as claimed in claim 9.