CN109933538B - Cost perception-oriented real-time defect prediction model enhancement method - Google Patents

Cost perception-oriented real-time defect prediction model enhancement method Download PDF

Info

Publication number
CN109933538B
CN109933538B CN201910261531.9A CN201910261531A CN109933538B CN 109933538 B CN109933538 B CN 109933538B CN 201910261531 A CN201910261531 A CN 201910261531A CN 109933538 B CN109933538 B CN 109933538B
Authority
CN
China
Prior art keywords
prediction model
cost
model
defect prediction
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910261531.9A
Other languages
Chinese (zh)
Other versions
CN109933538A (en
Inventor
荆晓远
李志强
陈昊文
黄鹤
彭奕
姚永芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Petrochemical Technology
Original Assignee
Guangdong University of Petrochemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Petrochemical Technology filed Critical Guangdong University of Petrochemical Technology
Priority to CN201910261531.9A priority Critical patent/CN109933538B/en
Publication of CN109933538A publication Critical patent/CN109933538A/en
Application granted granted Critical
Publication of CN109933538B publication Critical patent/CN109933538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Stored Programmes (AREA)

Abstract

The invention provides a cost perception-oriented real-time defect prediction model enhancement framework, which comprises the steps of converting an original prediction risk value and a cost value into the same dimension, performing weighted combination on the converted values through self-adaptive learning to serve as a new prediction risk value, and finally applying the cost perception-oriented real-time defect prediction model enhancement framework to the existing cost perception-oriented real-time defect prediction model for enhancement. The real-time defect prediction model enhancement framework oriented to cost perception fully utilizes the risk value and the censored cost value of the original prediction model by using self-adaptive learning, solves the problem of unbalanced data distribution through data conversion, and effectively improves the prediction effect of the model.

Description

Cost perception-oriented real-time defect prediction model enhancement method
Technical Field
The invention relates to a self-adaptive learning framework simultaneously considering prediction risk and code review cost, belongs to the field of software defect prediction, and particularly relates to a cost perception-oriented real-time defect prediction model enhancement method.
Background
Software quality assurance provides guarantee for developing high-quality software systems, and software defect prediction helps developers to preferentially allocate limited quality assurance resources to defective modules, thereby improving testing and reviewing efficiency.
The cost-aware real-time defect prediction model sequences all software changes based on the predicted risk values and the modified code lines, so that changes with higher risk values and fewer code modifiers predicted will be examined first, and the practical significance of the model is improved to a greater extent.
The existing cost perception real-time defect prediction model can be divided into an unsupervised model and a supervised model according to types.
Representative unsupervised models are LT and CCUM: yang et al descending order the reciprocals of 12 different modification metric values to construct 12 unsupervised models respectively, and experimental results of 6 open source projects show that the effect of most models in the unsupervised models is superior to that of the supervised method; liu et al proposed an unsupervised model based on the amount of code change, CCUM, with results on 6 open source projects showing superiority over all existing supervised and unsupervised models, using code changes of different scales.
Representative supervised models are EALR, OneWay, CBS and MULTI: kamei et al propose a supervised cost-aware linear regression model, earr, which detects 35% defect changes requiring an average review of 20% code change through studies on 6 open source projects and 5 commercial projects. Fu and Menzies reappear the work of Yang et al, the proposed supervised model OneWay can automatically select the optimal change measurement by using the marked training data on the basis of the unsupervised model trained by Yang et al, and the experimental result proves that OneWay is superior to most unsupervised models. Similarly, Huang et al, after reproducing the work of Yang et al, has designed a compact classification-first-then-ordering enhanced supervised model CBS, and the results show that CBS is on par with LT in recall index and exceeds LT in other indexes. Chen et al considers cost-sensitive real-time defect prediction as a multi-objective optimization problem, aiming to simultaneously maximize the number of predicted defect changes and minimize the inspection cost. The supervised method constructed in this way is MULTI in PoptAnd supervised and unsupervised methods that are superior to the comparative in recall index.
Most of the existing real-time defect prediction models with cost perception only simply use predicted risk values and cost values for improving the prediction effect of the models, do not carry out adaptive optimization on the weights of the predicted risk values and the cost values, and do not consider that the value distribution of code change has high imbalance.
Disclosure of Invention
Aiming at the defects of the existing real-time defect prediction model for cost perception, the invention provides a real-time defect prediction model enhancement method facing cost perception, which can be applied to the models, can adaptively learn and predict the weight of a risk value and an inspection cost value, and convert unbalanced data. The real-time defect prediction model enhancement method facing cost perception specifically comprises the following steps:
step 1, judging whether the type of a defect prediction model is a supervised model or an unsupervised model, and if the type of the defect prediction model is the supervised model, turning to step 2; if the model is an unsupervised model, turning to the step 4;
step 2, building a supervised defect prediction model by using the training data, and predicting the risk values of the training data and the test data respectively;
step 3, according to the risk value of the training data and the cost value of the training data obtained in the step 2, automatically learning the optimal weight value lambda by using a genetic algorithm1、λ2
Step 4, constructing an unsupervised defect prediction model by using the training data, predicting the risk value of the test data, and calculating lambda1And λ2Are all assigned a value of 1;
step 5, carrying out subtraction combination on the risk value and the cost value of the test data according to the weight to obtain a new risk value of the test data;
and 6, calculating the evaluation index of the model enhanced by the cost perception-oriented real-time defect prediction model enhancement framework.
Preferably, in the step 3, the genetic algorithm selects a roulette selection, a single-point crossing and a random variation as genetic operators, the roulette selection selects chromosomes with higher fitness, the chromosomes are combined and crossed on the individual code strings, the single-point crossing is that an intersection is randomly arranged on the individual code strings, two paired chromosomes are partially exchanged in the vicinity of the intersection, and the random variation randomly modifies a partial chromosome of a parent chromosome of the individual;
selecting lambda according to fitness1、λ2And the fitness is a specified evaluation index or an average value of all the evaluation indexes.
Preferably, in steps 2 and 4, the risk value is directly calculated by a prediction formula of an original cost-aware real-time defect prediction model, and if the original defect prediction model is the supervised model earr, the calculation formula of the risk value r (x) is as follows:
R(x)=Y(x)/effort(x)
wherein x is a software change, if the change is defective, y (x) is 1, otherwise is 0, effort (x) is the workload required for examining the change, and is represented by the total number of modified code lines;
if the original defect prediction model is an unsupervised model and is based on the value M (x) of a certain change metric, the calculation formula of the risk value R (x) is as follows:
R(x)=1/M(x)
preferably, in step 5, on the basis of the existing real-time defect prediction model for cost perception, the real-time defect prediction model enhancement framework for cost perception first converts the original predicted risk value and the cost value into the same dimension, and then performs weighted combination on the converted values to serve as a new predicted risk value, where a calculation formula of the new predicted risk value R' (x) is:
R′(x)=λ1*θ(y(x))-λ2*θ(effort(x))
where x is a software change, y (x) is the original risk value, effort (x) is the workload required to review changes, represented by the total number of modified code lines, θ (-) is the transfer function, λ1And λ2The weight values are all positive real numbers.
The real-time defect prediction model enhancement framework oriented to cost perception fully utilizes the risk value and the censored cost value of the original prediction model by using self-adaptive learning, solves the problem of unbalanced data distribution through data conversion, and effectively improves the prediction effect of the model.
Drawings
FIG. 1 is a general flowchart of a cost-aware-oriented real-time defect prediction model enhancement method proposed in the present invention;
FIG. 2 is a box diagram of the results of each evaluation index on 6 open-source project data sets Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the present invention provides a method for enhancing a real-time defect prediction model for cost perception, which includes the following steps:
step 1, judging whether the defect prediction model belongs to a supervised model or not according to a cost-sensitive real-time defect prediction model applied specifically, namely judging whether training data used in training the prediction model has class marks or not. If the model is a supervision model, turning to the step 2; if the model is an unsupervised model, go to step 4.
And 2, constructing a supervised cost perception real-time defect prediction model by using the training data, and predicting the risk values of the training data and the test data respectively, wherein the risk values are directly calculated by a prediction formula of the original model.
And 3, the original real-time defect prediction model simply uses the predicted risk value and the predicted cost value for improving the prediction result, and does not carry out combined optimization of different weights on the predicted risk value and the predicted cost value, so that the risk value of software change is more reasonably represented. The real-time defect prediction model enhancement framework facing cost perception automatically learns the weight values lambda of the two by using a genetic algorithm1、λ2Thus, both kinds of information can be fully utilized.
Wherein, the genetic algorithm selects roulette selection, single-point crossing and random variation as genetic operators. Roulette selection will select a chromosome with a higher fitness with a higher probability. The combination crosses over the individual code strings, and the single-point crossing is that a cross point is randomly arranged on the individual code strings, and two paired chromosomes exchange partial chromosomes near the cross point. Random variation will randomly modify parts of a chromosome of one parent chromosome of an individual.
λ1、λ2And selecting the optimal solution as the optimal solution of the genetic algorithm according to the fitness. The fitness is a specified evaluation index, or the average value of all the evaluation indexes is taken.
Step 4, constructing an unsupervised cost-sensitive real-time defect prediction model by using the training data, predicting the risk value of the test data, directly calculating the risk value by using the prediction formula of the original model, and calculating the lambda1、λ2Is assigned a value of 1.
Step 5, the cost perception-oriented real-time defect prediction model enhancement framework firstly converts the original prediction risk value and the cost value into the same dimension on the basis of the existing cost perception real-time defect prediction model, then performs weighted combination on the converted values, and finally serves as a calculation formula of a new prediction risk value R' (x):
R′(x)=λ1*θ(y(x))-λ2*θ(effort(x))
where x is a software change, y (x) is the original risk value, effort (x) is the workload required to review changes, typically represented by the total number of code lines modified, θ (-) is the transfer function, λ1And λ2The weight parameters are used for controlling the importance degree of the converted original risk value and the converted cost value, and the weight values are positive and real numbers.
In order to verify that the cost perception-oriented real-time defect prediction model enhancement framework provided by the invention can effectively enhance the existing cost perception real-time defect prediction model, time-sensitive cross validation setting is adopted in the experiment, and the defect prediction model enhanced by the cost perception-oriented real-time defect prediction model enhancement framework is compared with the original model on four evaluation indexes respectively. These include 4 supervised models: EALR, OneWay, CBS and MULTI, 13 unsupervised models: 12 unsupervised models and CCUM proposed by yang et al, and 5 widely used classifier models: NB, LR, RF, J48, and SVM. For ease of discussion, the enhanced model is named the name and "+" of the corresponding original model. For example, enhanced EALR is denoted EALR +. The 4 evaluation indexes are PoptRecall, Precision and fmeasure.
In addition, the statistical significance test employs a non-parametric wilcoxon signed rank test with a significance level of 95%, and corrects the multiple comparison results using Benjamini-hochberg (bh) corrections. Effect value testing uses Cliff's delta (δ). Also, to check if there are some methods that are better than all others, all methods are sorted and grouped using two Scott-Knott ESD checks.
The box diagrams of the evaluation index results on the 6 open-source project datasets Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL are shown in FIG. 2:
wherein b represents that the corresponding improved method is superior to the reference method in statistical significance, and the corresponding Wilcoxson symbol rank test (BenJAMINI-HOCHBERG (BH) corrected p value is less than 0.05) and the result obtained according to Cliff's delta (delta) (| delta | ≧ 0.147) is that the difference between the two methods is large; c represents that the improved method is not superior to the reference method in statistical significance, and the difference between the two comparative methods is small (delta is more than or equal to 0.147); a indicates that the method after phase modification is worse than the reference method, and the difference between the two methods is larger.
The results on the six data sets showed that almost all of the improved methods (except CCUM + and MULTI +) were on PoptAnd recall are significantly better than their original methods, and while not all enhancement methods are able to upgrade the original method on precision, most of the improvements are better than the original methods in considering both the recall and precision fmeasure indices. In general, most of the existing real-time defect prediction models with cost perception can obtain better performance of three-quarters indexes on six data sets after being enhanced by the enhancement framework provided by the invention, and the invention is fully provedThe effectiveness of the framework is enhanced to a cost-aware real-time defect prediction model.
In order to clearly compare the prediction results of the improved method and the reference method, the detailed result values of EALR +, LT +, AGE +, OneWay +, CBS +, Multi + and the reference method are particularly selected. Because these baseline methods are more representative, they achieve the best average performance than other methods. The results of the above method on the median of each index on the six data sets are given in the following table:
Figure GDA0002291898270000051
Figure GDA0002291898270000061
Figure GDA0002291898270000062
Figure GDA0002291898270000063
Figure GDA0002291898270000064
Figure GDA0002291898270000071
wherein √ and ×, respectively, indicate whether the results of the improved method using the wilcoxon signed rank test and BENJAMINI-hochberg (bh) correction are significantly better than the original method. AVG represents the average performance over 6 items. "W/T/L" indicates the number of items that the improved method performs better/the same/worse than the original method.
As can be seen from the table, the improved process is able to achieve higher P than the original processoptAnd recall, all improved methods performed better on statistical significance (all improved methods followed by a mark √ after the result), most Cliff's dThe values of elta (δ) show large significant differences. Although most of the improvement methods do not perform well on precision. But still has better or comparable performance on fmeasure compared to the original method. On average over 6 data sets, with PoptFor example, all improved process values are 0.847, 0.832, 0.807, 0.850, 0.790, 0.865, and 0.843, respectively, while the corresponding baseline process values are 0.582, 0.712, 0.702, 0.699, 0.608, 0.836, and 0.442, respectively, which is a 3.42% to 90.96% improvement. Similarly, the improved process improved 5.92% to 227.74% on recall and 2.48% to 25.55% on fmeasure (except for CBS +). The comparison fully shows that the cost perception-oriented real-time defect prediction model enhancement framework has a remarkable improvement effect on cost-sensitive real-time defect prediction, and the purpose of the invention is really achieved.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.

Claims (2)

1. A cost-aware-oriented real-time defect prediction model enhancement method can adapt to various existing JIT models, and is characterized by comprising the following steps:
step 1, judging whether the type of a defect prediction model is a supervised model or an unsupervised model, and if the type of the defect prediction model is the supervised model, turning to step 2; if the model is an unsupervised model, turning to the step 4;
step 2, constructing a supervised defect prediction model by using the training data, and predicting the risk values of the training data and the test data respectively, wherein the risk values are directly calculated by a prediction formula of the original cost-aware real-time defect prediction model, and if the original defect prediction model is the supervised model EALR, the calculation formula of the risk values y (x) is as follows:
y(x)=Y(x)/effort(x)
wherein x is a software change, if the change is defective, y (x) is 1, otherwise is 0, effort (x) is the workload required for examining the change, and is represented by the total number of modified code lines;
if the original defect prediction model is an unsupervised model and is based on the value M (x) of a certain change metric, the calculation formula of the risk value y (x) is as follows:
y(x)=1/M(x);
step 3, according to the risk value of the training data and the cost value of the training data obtained in the step 2, automatically learning the optimal weight value lambda by using a genetic algorithm1、λ2
Step 4, constructing an unsupervised defect prediction model by using the training data, predicting the risk value of the test data, and calculating lambda1And λ2Are all assigned a value of 1;
step 5, carrying out subtraction combination on the risk value and the cost value of the test data according to the weight to obtain a new risk value of the test data; the method specifically comprises the following steps: on the basis of the existing real-time defect prediction model with cost perception, the original prediction risk value and the cost value are firstly converted into the same dimension, then the converted values are subjected to weighted combination to be used as a new prediction risk value, and the calculation formula of the new prediction risk value R' (x) is as follows:
R′(x)=λ1*θ(y(x))-λ2*θ(effort(x))
where x is a software change, y (x) is the original risk value, effort (x) is the workload required to review changes, represented by the total number of modified code lines, θ (-) is the transfer function, λ1And λ2The weight values are weight parameters which are all positive real numbers;
and 6, calculating the evaluation index of the model enhanced by the cost perception-oriented real-time defect prediction model enhancement framework.
2. The cost-aware-oriented real-time defect prediction model enhancement method of claim 1, wherein in the step 3, the genetic algorithm selects a roulette selection, a one-point crossing and a random variation as genetic operators, the roulette selection selects chromosomes with higher fitness, the chromosomes are combined and crossed on the individual code strings, the one-point crossing is that an intersection is randomly arranged on the individual code strings, two paired chromosomes are partially exchanged in the vicinity of the intersection, and the random variation randomly modifies a partial chromosome of a parent chromosome of the individual;
selecting lambda according to fitness1、λ2And the fitness is a specified evaluation index or an average value of all the evaluation indexes.
CN201910261531.9A 2019-04-02 2019-04-02 Cost perception-oriented real-time defect prediction model enhancement method Active CN109933538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910261531.9A CN109933538B (en) 2019-04-02 2019-04-02 Cost perception-oriented real-time defect prediction model enhancement method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910261531.9A CN109933538B (en) 2019-04-02 2019-04-02 Cost perception-oriented real-time defect prediction model enhancement method

Publications (2)

Publication Number Publication Date
CN109933538A CN109933538A (en) 2019-06-25
CN109933538B true CN109933538B (en) 2020-04-28

Family

ID=66989104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910261531.9A Active CN109933538B (en) 2019-04-02 2019-04-02 Cost perception-oriented real-time defect prediction model enhancement method

Country Status (1)

Country Link
CN (1) CN109933538B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810102A (en) * 2014-02-19 2014-05-21 北京理工大学 Method and system for predicting software defects
CN106201871A (en) * 2016-06-30 2016-12-07 重庆大学 Based on the Software Defects Predict Methods that cost-sensitive is semi-supervised
CN106991049A (en) * 2017-04-01 2017-07-28 南京邮电大学 A kind of Software Defects Predict Methods and forecasting system
CN107784325A (en) * 2017-10-20 2018-03-09 河北工业大学 Spiral fault diagnosis model based on the fusion of data-driven increment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810102A (en) * 2014-02-19 2014-05-21 北京理工大学 Method and system for predicting software defects
CN106201871A (en) * 2016-06-30 2016-12-07 重庆大学 Based on the Software Defects Predict Methods that cost-sensitive is semi-supervised
CN106991049A (en) * 2017-04-01 2017-07-28 南京邮电大学 A kind of Software Defects Predict Methods and forecasting system
CN107784325A (en) * 2017-10-20 2018-03-09 河北工业大学 Spiral fault diagnosis model based on the fusion of data-driven increment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Misclassification Cost-Sensitive Software Defect Prediction";Ling Xu 等;《2018 IEEE International Conference on Information Reuse and Integration (IRI)》;20180709;第256-263页 *
"Predicting software defects: A cost-sensitive approach";Miguel E. R. Bezerra 等;《2011 IEEE International Conference on Systems, Man, and Cybernetics》;20111012;第2515-2522页 *
"基于代价敏感学习的软件缺陷预测方法";陆海洋 等;《计算机技术与发展》;20151130;第25卷(第11期);第58-60,66页 *
"类不平衡稀疏重构度量学习软件缺陷预测";史作婷 等;《计算机技术与发展》;20180630;第28卷(第6期);第125-128,136页 *
"跨项目软件缺陷预测方法研究综述";陈翔 等;《计算机学报》;20180131;第41卷(第1期);第254-274页 *

Also Published As

Publication number Publication date
CN109933538A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN108596212B (en) Transformer fault diagnosis method based on improved cuckoo search optimization neural network
CN107506865B (en) Load prediction method and system based on LSSVM optimization
CN109345027B (en) Micro-grid short-term load prediction method based on independent component analysis and support vector machine
CN112734128B (en) 7-day power load peak prediction method based on optimized RBF
CN109783349B (en) Test case priority ranking method and system based on dynamic feedback weight
CN107346459B (en) Multi-mode pollutant integrated forecasting method based on genetic algorithm improvement
CN109413710B (en) Clustering method and device of wireless sensor network based on genetic algorithm optimization
CN102708047B (en) Data flow test case generating method
CN112819225A (en) Carbon market price prediction method based on BP neural network and ARIMA model
CN106919504B (en) Test data evolution generation method based on GA algorithm
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
CN113341889A (en) Distributed blocking flow workshop scheduling method and system with assembly stage and energy consumption
CN110880044B (en) Markov chain-based load prediction method
CN111985845A (en) Node priority tuning method for heterogeneous Spark cluster
CN110298506A (en) A kind of urban construction horizontal forecast system
CN110717264A (en) Improved strength pareto evolutionary algorithm for multi-objective optimization design of product appearance
CN109933538B (en) Cost perception-oriented real-time defect prediction model enhancement method
CN109697531A (en) A kind of logistics park-hinterland Forecast of Logistics Demand method
Huang et al. A new SSO-based algorithm for the bi-objective time-constrained task scheduling problem in cloud computing services
CN113762345A (en) Oil-immersed transformer fault diagnosis method and device
CN109754354B (en) Method and apparatus for optimizing carbon information disclosure schemes
CN114528094A (en) Distributed system resource optimization allocation method based on LSTM and genetic algorithm
CN113987261A (en) Video recommendation method and system based on dynamic trust perception
CN113392958A (en) Parameter optimization and application method and system of fuzzy neural network FNN
CN116822183B (en) Implementation method and device for bringing environmental factors into aluminum alloy material design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant