CN111240993B - Software defect prediction method based on module dependency graph - Google Patents

Software defect prediction method based on module dependency graph Download PDF

Info

Publication number
CN111240993B
CN111240993B CN202010066087.8A CN202010066087A CN111240993B CN 111240993 B CN111240993 B CN 111240993B CN 202010066087 A CN202010066087 A CN 202010066087A CN 111240993 B CN111240993 B CN 111240993B
Authority
CN
China
Prior art keywords
software
module
defect
defect prediction
dependency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010066087.8A
Other languages
Chinese (zh)
Other versions
CN111240993A (en
Inventor
原仓周
柯鑫鑫
詹盼盼
齐征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianhang Changying (Jiangsu) Technology Co.,Ltd.
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010066087.8A priority Critical patent/CN111240993B/en
Publication of CN111240993A publication Critical patent/CN111240993A/en
Application granted granted Critical
Publication of CN111240993B publication Critical patent/CN111240993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3616Software analysis for verifying properties of programs using software metrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The software defect prediction method based on the module dependency graph provided by the disclosure identifies the defect information of a software module according to the version information of software to be analyzed; establishing a software module dependency graph according to the dependency relationship among the software modules, and taking developers as nodes in the module dependency graph; extracting internal features of the software module, extracting the dependency features of each node in a software module dependency graph in a network representation learning mode, forming the internal features and the dependency features between the modules into measurement tuples, and establishing a historical defect library of the software according to the measurement tuples and defect information of the modules; and training a defect prediction model by utilizing a historical defect library for predicting the defects of subsequent software, wherein the defect prediction adopts a classifier dynamic selection model based on local optimum, parameters of the defect prediction model are automatically optimized, and the result of the defect prediction model of the software module is used as the defect prediction result of the software to be analyzed. The method can improve the flexibility of constructing the network node measurement element and improve the effect of software defect prediction.

Description

Software defect prediction method based on module dependency graph
Technical Field
The invention belongs to the technical field of software quality assurance, and particularly relates to a software defect prediction method based on a module dependency graph.
Background
The software defect prediction is a very important research subject in software engineering, and the static software defect prediction technology based on measurement predicts the defects of a new software module by means of historical data obtained from the existing software module so as to judge whether the new software module has defects or not, thereby providing decision support for a software project. Machine learning technology is mostly adopted in existing software defect prediction research, and software defect prediction generally comprises the following steps: 1) marking module categories, wherein the software modules can be divided into two categories of defective modules and non-defective modules; 2) extracting module attributes, and measuring the software module by using methods such as McCabe measurement, Halstead measurement and the like to obtain the attributes of the software module; 3) establishing a prediction model, and obtaining a classifier by learning according to the category and attribute information of the software module by using a machine learning method; 4) and predicting the new module, and predicting the attribute of the new software module by using the classifier according to the attribute of the new software module so as to judge whether the module contains defects.
The method is characterized in that a measurement element which is strongly related to the software defect is set, and is the key for constructing a high-quality defect prediction model. The more complex the dependency relationship between modules, the more likely defects will occur, so the network metric elements of the modules can be used for defect prediction.
The Nachiappan et al think that the module is easy to have defects if the module has higher dependency relationship, and the author has a contribution point that the relationship between the network metric element and the defects is firstly put forward, the network metric element is extracted by using a centrality method in a social network, and the network metric element has better prediction effect by comparing with the complexity metric element in the module.
The designed measurement tuple is { LOCODE, LOCOM, INS, OUTS, Cluscoe, BetCen }, the first two indexes reflect the complexity inside the network nodes of the module dependency graph, and the last four indexes extract the coupling degree between the nodes from the module dependency graph. The patent utilizes a support vector machine algorithm to construct a defect prediction model.
Existing research mainly relies on user-defined structural feature metrics (such as degree statistics or centrality metrics) to describe the structural features of the nodes, and the lack of flexibility causes difficulty in extracting network node features. Developers are also responsible for the creation of defects, and there has been little research to take this into account. In order to solve the problems, a software defect prediction method based on a module dependency graph is provided.
Disclosure of Invention
In view of this, the embodiment of the present application provides a software defect prediction method based on a module dependency graph by using developers as network nodes in the module dependency graph, which can improve the flexibility of constructing network node measurement elements and improve the effect of software defect prediction.
According to an aspect of the present disclosure, there is provided a method for defect prediction based on a dependency graph of a software module, the method including:
s1: identifying the defect information of the software module according to the version information of the software to be analyzed;
s2: establishing a software module dependency graph according to the dependency relationship among the software modules, and taking developers as nodes in the module dependency graph;
s3: extracting internal features of the software module, extracting the dependency features of each node in the software module dependency graph by adopting a network representation learning mode, forming the internal features and the dependency features into a measurement tuple, and establishing a historical defect library of the software according to the measurement tuple and the defect information of the module;
s4: and training a defect prediction model by utilizing a historical defect library for predicting the subsequent software defects, wherein the software module defect prediction model adopts a classifier dynamic selection based on local optimum, parameters of the module defect prediction model are automatically optimized, and the result of the software module defect prediction model is used as the defect prediction result of the software to be analyzed.
Identifying the defect information of the software module according to the version information of the software to be analyzed; establishing a software module dependency graph according to the dependency relationship among the software modules, and taking developers as nodes in the module dependency graph; extracting internal features of a software module, extracting the dependency features of each node in a software module dependency graph by adopting a network representation learning mode, forming the internal features and the dependency features into a measurement tuple, and establishing a historical defect library of the software according to the measurement tuple and the defect information of the module; and training a defect prediction model by utilizing a historical defect library for predicting the subsequent software defects, wherein the software module defect prediction model adopts a classifier dynamic selection based on local optimum, parameters of the module defect prediction model are automatically optimized, and the result of the software module defect prediction model is used as the defect prediction result of the software to be analyzed.
Drawings
FIG. 1 illustrates a flow diagram of a software bug prediction method based on a module dependency graph according to an embodiment of the present disclosure;
FIG. 2 illustrates an overall flow diagram of a software bug prediction method based on a module dependency graph according to an embodiment of the present disclosure;
FIG. 3 illustrates a software module dependency graph of a software defect prediction method based on a module dependency graph according to an embodiment of the present disclosure.
FIG. 4 illustrates a software module dependency graph of a software defect prediction method based on a module dependency graph according to another embodiment of the present disclosure.
FIG. 5 illustrates a defect prediction model of a software defect prediction method based on a module dependency graph according to an embodiment of the present disclosure.
FIG. 6 illustrates a flow diagram of a hyperparametric optimization of classifiers for a software bug prediction method based on a module dependency graph according to an embodiment of the present disclosure.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.
FIG. 1 illustrates a flow diagram of a software bug prediction method based on a module dependency graph according to an embodiment of the present disclosure; FIG. 2 illustrates an overall flow diagram of a software bug prediction method based on a module dependency graph according to an embodiment of the present disclosure.
As shown in fig. 1, the software defect prediction method of the present disclosure includes:
s1: and identifying the defect information of the software module according to the version information of the software to be analyzed.
As shown in fig. 2, which software files have defects are identified in the C source code software version library to be analyzed according to the C source code software version information commit and issue information. The main identification method is as follows: the commits contain keywords 'fixed, closed, fix' for repairing defects and are followed by the number of issue; and then counting which files are changed by the commats to repair the defects, wherein the changed files are files containing the defects.
S2: establishing a software module dependency graph according to the dependency relationship among the software modules, and taking developers as nodes in the module dependency graph;
fig. 3 and 4 respectively show software module dependency graphs of a software defect prediction method based on the module dependency graphs according to an embodiment of the present disclosure.
For items of C source code software, the definition module dependency network MDN is a directed graph: MDN ═ V, V denotes the set of all nodes, for each node V ∈ V denotes a module in the project (source code file in the C language project), and the edge set E denotes the dependency of the module. The dependency relationship between two modules can be divided into two categories, data dependency and function call dependency. As shown in FIG. 3, the dependency between C and A represented by C- > A is data dependency, and the dependency between C and A represented by B- > A is function call dependency.
In the development process, developers can also cause certain defects, the developers can be used as nodes of a software module dependency graph to be constructed in the module dependency graph, and if the developers modify a certain software module, the developers and the software module have dependency relationship. As shown in fig. 4, the developer 1 commits the software module a and the software module B, the developer 2 commits the module C, and constructs a software module dependency graph by considering the developers 1 and 2 as nodes on the basis of the module dependency graph, and so on, and can construct a software module dependency graph by considering other developers as nodes.
S3: extracting internal features of the software module, extracting the dependency features of each node in the software module dependency graph by adopting a network representation learning mode, forming the internal features and the dependency features into a measurement tuple, and establishing a historical defect library of the software according to the measurement tuple and the defect information of the module;
as shown in FIG. 2, the internal features of the software module may include code scale features and code structure features. And extracting the code scale characteristics of the software module by using a LOC measurement tuple { blank line, comment line, total code line, executable code line } and a Helstead measurement tuple { sum of all operators and operands, program capacity, program length, complexity, workload, operator types, operand types, operator numbers and operand numbers }, and measuring the code scale characteristics of the software module. And selecting a McCabe measurement tuple { circle complexity and basic complexity } to extract the code structure characteristics of the software module, and measuring the code structure characteristics.
As shown in FIG. 2, a node2vec network representation learning method is adopted to extract the dependency characteristics of the nodes in the software module dependency graph. The node2vec method mainly uses the word2vec thought processed by natural language for reference. The node2vec generates a random walk sequence by using a breadth-first search strategy and a depth-first search strategy, and controls the jump probability of the random walk sequence by using parameters p and q. The parameter p controls the extent of the wandering, and the parameter q controls the depth of the wandering, so that more homogeneous information or isomorphic information can be acquired by the wandering sequence by selecting different p and q combinations. In an example, p may be designed to have a value of 0.25, q may have a value of 2, the step size of the walk is 7, the number of walks per node is 80, and the resulting number of network metric meta-features per software module is 128 dimensions.
Finally, combining the LOC measurement tuple, the Helstead measurement tuple, the McCabe measurement tuple and the network measurement tuple together to form a measurement tuple, and establishing a historical defect library of the software according to the measurement tuple and the defect information of the module;
s4: training a defect prediction model by utilizing a historical defect library for predicting subsequent software defects, dynamically selecting the software module defect prediction model by adopting a classifier based on local optimum, automatically optimizing parameters of the module defect prediction model, and taking the result of the software module defect prediction model as the defect prediction result of software to be analyzed;
as shown in fig. 5, for each sample to be tested (software to be analyzed), k neighbors of the sample to be tested in the training set are found, and it is determined which trained classification algorithm of the k neighbor training samples has the best prediction effect, so as to implement a dynamic selection classifier. In one example, k can be set to 8, the base classifier adopts an SVM, naive Bayes, logistic regression, random deep forest and other dynamic models for defect prediction, a locally optimal dynamic model (SVM, naive Bayes, logistic regression, random deep forest) can be adopted to train a software module training set as a software module defect prediction model, the classifier models of SVM, naive Bayes, logistic regression, random deep forest and the like are optimized by using a genetic algorithm for hyperreference due to different data distribution in different software defect libraries, for example, the hyperreference of each base classifier is used as a gene of the genetic algorithm to form a chromosome, the initial value of the population can be set to 50, the genetic algebra can be set to 100, the fitness can be the F-measure value of defect prediction, the optimal offspring is reserved by using an elite strategy, and then the parameters of the block defect prediction model are automatically optimized, and taking the result of the software module defect prediction model as the defect prediction result of the software to be analyzed.
Identifying the defect information of the software module according to the version information of the software to be analyzed; establishing a software module dependency graph according to the dependency relationship among the software modules, and taking developers as nodes in the module dependency graph; extracting internal features of a software module, extracting the dependency features of each node in a software module dependency graph by adopting a network representation learning mode, forming the internal features and the dependency features into a measurement tuple, and establishing a historical defect library of the software according to the measurement tuple and the defect information of the module; and training a defect prediction model by utilizing a historical defect library for predicting the subsequent software defects, wherein the software module defect prediction model adopts a classifier dynamic selection based on local optimum, parameters of the module defect prediction model are automatically optimized, and the result of the software module defect prediction model is used as the defect prediction result of the software to be analyzed. The method can improve the flexibility of constructing the network node measurement element and improve the effect of software defect prediction.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (1)

1. A software defect prediction method based on a module dependency graph is characterized by comprising the following steps:
s1: identifying the defect information of the software module according to the version information of the software to be analyzed;
s2: establishing a software module dependency graph according to the dependency relationship among the software modules, and taking developers as nodes in the module dependency graph;
s3: extracting the internal features of the software modules, extracting the dependency features of each node in the software module dependency graph by adopting a node2vec network representation learning mode, and finally obtaining the network measurement tuple of each software module; forming a measurement tuple by the internal features and the dependency features, and establishing a historical defect library of the software according to the measurement tuple and the defect information of the module;
the internal features comprise code scale features and code structure features; the code scale model features are extracted by using LOC (local area network) measurement tuples and Helstead measurement tuples; extracting the code structure characteristics by using McCabe measurement tuples;
combining the LOC measurement tuple, the Helstead measurement tuple, the McCabe measurement tuple and the network measurement tuple together to form the measurement tuple;
s4: training a defect prediction model by utilizing a historical defect library for predicting subsequent software defects, dynamically selecting the software module defect prediction model by adopting a classifier based on local optimum, carrying out hyper-parametric optimization on the module defect prediction model by utilizing a genetic algorithm, and taking the result of the software module defect prediction model as the defect prediction result of a software module to be analyzed;
the implementation process of the dynamic selection of the classifier specifically comprises the steps of finding k neighbors of each sample to be tested, namely software to be analyzed, in a training set, judging which trained classification algorithm has the best prediction effect of the k neighbor training samples, and achieving the dynamic selection of the classifier.
CN202010066087.8A 2020-01-20 2020-01-20 Software defect prediction method based on module dependency graph Active CN111240993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010066087.8A CN111240993B (en) 2020-01-20 2020-01-20 Software defect prediction method based on module dependency graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010066087.8A CN111240993B (en) 2020-01-20 2020-01-20 Software defect prediction method based on module dependency graph

Publications (2)

Publication Number Publication Date
CN111240993A CN111240993A (en) 2020-06-05
CN111240993B true CN111240993B (en) 2021-05-14

Family

ID=70865774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010066087.8A Active CN111240993B (en) 2020-01-20 2020-01-20 Software defect prediction method based on module dependency graph

Country Status (1)

Country Link
CN (1) CN111240993B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579463B (en) * 2020-12-25 2024-05-24 大卜科技(北京)有限公司 Solidity intelligent contract-oriented defect prediction method
CN114896138B (en) * 2022-03-31 2023-03-24 西南民族大学 Software defect prediction method based on complex network and graph neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255733A (en) * 2018-01-30 2018-07-06 北京航空航天大学 A kind of method based on Complex Networks Theory assessment software systems reliability

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556553B (en) * 2009-03-27 2011-04-06 中国科学院软件研究所 Defect prediction method and system based on requirement change
US20180113799A1 (en) * 2016-10-24 2018-04-26 Ca, Inc. Model generation for model-based application testing
CN110134613B (en) * 2019-05-22 2020-09-08 北京航空航天大学 Software defect data acquisition system based on code semantics and background information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255733A (en) * 2018-01-30 2018-07-06 北京航空航天大学 A kind of method based on Complex Networks Theory assessment software systems reliability

Also Published As

Publication number Publication date
CN111240993A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
US11620574B2 (en) Holistic optimization for accelerating iterative machine learning
Watson et al. A systematic literature review on the use of deep learning in software engineering research
Wang et al. Learning semantic program embeddings with graph interval neural network
Kessentini et al. A cooperative parallel search-based software engineering approach for code-smells detection
Panichella et al. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms
US11074256B2 (en) Learning optimizer for shared cloud
CN111240993B (en) Software defect prediction method based on module dependency graph
CN111767216B (en) Cross-version depth defect prediction method capable of relieving class overlap problem
Nam et al. Marble: Mining for boilerplate code to identify API usability problems
Maggo et al. A machine learning based efficient software reusability prediction model for java based object oriented software
Unterkalmsteiner et al. Large-scale information retrieval in software engineering-an experience report from industrial application
CN116324810A (en) Potential policy distribution for assumptions in a network
Groß A prediction system for evolutionary testability applied to dynamic execution time analysis
Tinnes et al. Learning domain-specific edit operations from model repositories with frequent subgraph mining
Ataman et al. Transforming large-scale participation data through topic modelling in urban design processes
Jeevanantham et al. Extension of deep learning based feature envy detection for misplaced fields and methods
Guns et al. Declarative heuristic search for pattern set mining
Fan et al. High-frequency keywords to predict defects for android applications
Anjali et al. Moth Flame Optimization Based FCNN for Prediction of Bugs in Software.
Khan et al. Predicting bug inducing source code change patterns
CN117473510B (en) Automatic vulnerability discovery technology based on relationship between graph neural network and vulnerability patch
Navaei et al. Impact of Machine Learning on Software Development Life Cycle.
Xin Usable and Efficient Systems for Machine Learning
Malagón Azpeitia A combinatorial approach for profile guided optimization with metaheuristics
Bragilovski et al. Model-based knowledge searching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Yuan Cangzhou

Inventor after: Ke Xinxin

Inventor after: Zhan Panpan

Inventor after: Qi Zheng

Inventor before: Yuan Cangzhou

Inventor before: Ke Xinxin

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220120

Address after: 215488 No. 301, building 11, phase II, Taicang University Science Park, No. 27, Zigang Road, science and education new town, Taicang City, Suzhou City, Jiangsu Province

Patentee after: Tianhang Changying (Jiangsu) Technology Co.,Ltd.

Address before: 100191 No. 37, Haidian District, Beijing, Xueyuan Road

Patentee before: BEIHANG University