CN108304316A - A kind of Software Defects Predict Methods based on collaboration migration - Google Patents

A kind of Software Defects Predict Methods based on collaboration migration Download PDF

Info

Publication number
CN108304316A
CN108304316A CN201711417594.6A CN201711417594A CN108304316A CN 108304316 A CN108304316 A CN 108304316A CN 201711417594 A CN201711417594 A CN 201711417594A CN 108304316 A CN108304316 A CN 108304316A
Authority
CN
China
Prior art keywords
sample
data set
source item
sub
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711417594.6A
Other languages
Chinese (zh)
Other versions
CN108304316B (en
Inventor
陈晋音
胡可科
杨奕涛
方航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201711417594.6A priority Critical patent/CN108304316B/en
Publication of CN108304316A publication Critical patent/CN108304316A/en
Application granted granted Critical
Publication of CN108304316B publication Critical patent/CN108304316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

A kind of Software Defects Predict Methods based on collaboration migration, include the following steps:1) by four kinds of different standardized methods and TCA transfer learnings method in combination with former source item data set is expanded the new same size source item data set into four;2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration;3) failure prediction is carried out to sample to be predicted new in destination item.The present invention chooses four kinds of different standardized methods and is combined with TCA transfer learning methods to expand source item data set simultaneously, enrich the information representation of source item data, a sub-classifier is generated for each source item, and adaptive weighting distribution is carried out to sub-classifier according to PSO algorithms, to build synergetic classification device, failure prediction is carried out to the sample to be tested in destination item.

Description

A kind of Software Defects Predict Methods based on collaboration migration
Technical field
The invention belongs to software defect prediction algorithm fields, and in particular to a kind of software defect prediction based on collaboration migration Method.
Background technology
Software defect prediction can be divided into failure prediction and spanned item mesh failure prediction in project.Failure prediction needs big in project Measure in the project it is known whether defective sample, such as file, class and function, as training set, in conjunction with machine learning Method generate grader after target sample is predicted.The failure prediction of spanned item mesh then can be according to the sample of other relevant items This carries out failure prediction to destination item.Since destination item is too new or obtains the cost mistake of label in real development process Height causes training sample in destination item very few, it is often necessary to carry out spanned item mesh failure prediction.It is pre- in most of spanned item mesh defects In method of determining and calculating, due to the difference of destination item and source item development process, the two sample distribution often has differences, and becomes use Biggest obstacle when conventional machines learning algorithm, directly affects prediction effect.
In order to solve the problems, such as that source item differs greatly with destination item sample distribution in spanned item mesh failure prediction, migration is learned Habit is introduced in software defect prediction.Mainly have based on sample and based on spy currently based on the failure prediction algorithm of transfer learning Both are levied, the former selects the sample for contributing to destination item to predict in source item, and the latter is by the sample of source item and destination item Originally it is mapped to the expression again that the same potential feature space carries out feature, both can solve source item and destination item Sample distribution different problems.Turhan et al. using K arest neighbors methods be in destination item without category sample from source item The middle training sample for selecting ten most similar samples as prediction model;It is similar to the method that Turhan is proposed, Peters etc. People also utilizes arest neighbors method to select training sample, but it selects tactful difference;Ma et al. proposes one kind TransferBayes (TNB) method reduces source item and target item by distributing weight to the sample in training set Then data distribution difference between mesh builds prediction model using the training sample after weighting;Ryul et al. is by Boosting- SVM is combined with class imbalance problem solution, and the performance of TNB is improved with this;In addition to the above-mentioned migration based on sample Outside practising, Pan et al. proposes a kind of transfer learning method TransferComponent Analysis (TCA) of feature based, Source item and destination item are mapped to a latent space by it by learning a transformational relation so that both in the space Apart from as small as possible;On the basis of TCA, Nam et al. observes that different standardized methods is affected to migration effect, because This devises set of rule to select suitable standardized method to be combined with TCA, it is proposed that TCA+ methods.But the above migration is learned It practises and is all directed to one-to-one spanned item mesh failure prediction method, can not determine which relevant source item has most destination item It is great for other source items if only predicted using a source item under the premise of good prediction effect Waste, so how to efficiently use the sample information of other source items, i.e. multi-source transfer learning and one very important asks Topic.
For multi-source transfer learning, most efficient method is to generate them each source item after one grader at present In conjunction with to complete migration task.Schweikert et al. utilizes a kind of side of entitled Multiple Convex Combination Method is combined each source domain and aiming field with each SVM classifier that category data generate;Sun et al. proposes one kind and does not need Method with category target sample, this method are based on Bayesian learning principle, weigh the adaptedness of source domain and aiming field to divide With weight, which is indicated with the Euclidean distance average value of the k of source domain and aiming field closest samples;Yang et al. bases In support vector machines (SVM), combining adaptive function, it is proposed that a kind of adaptive support vector regression can be used for aiming field On-line monitoring, but be all equal for the weight of each subclassification.But above-mentioned algorithm is not by with pre- with software defect It surveys.
Generally speaking, there are following problems for current software defect prediction algorithm:In software defect prediction, migration Study is particularly significant for spanned item mesh failure prediction, how transfer learning algorithm to be made to make full use of the useful information of source item, To promote the failure prediction performance to destination item;Different source items has destination item different prediction effects, Under the premise of can not determine which source item estimated performance is best, compared with one-to-one spanned item mesh failure prediction, how Consider that other all relevant source items could improve estimated performance simultaneously.For the Railway Project present on, set forth herein A kind of software defect prediction algorithm based on collaboration migration.
Invention content
There are following problems for current software defect prediction algorithm:Software defect prediction in, transfer learning for Spanned item mesh failure prediction is particularly significant, how so that transfer learning algorithm makes full use of the useful information of source item to be promoted pair The failure prediction performance of destination item;Different source items has different prediction effects for destination item, with it is one-to-one across Project failure prediction is compared, and how to consider that other all relevant source items could improve estimated performance simultaneously.The present invention provides A kind of Software Defects Predict Methods based on collaboration migration, choose four kinds of different standardized methods simultaneously with TCA transfer learnings Method enriches the information representation of source item data in conjunction with source item data set is expanded, and one is generated for each source item Sub-classifier, and adaptive weighting distribution is carried out to sub-classifier according to PSO algorithms, to build synergetic classification device, to target Sample to be tested in project carries out failure prediction.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of Software Defects Predict Methods based on collaboration migration, the described method comprises the following steps:
1) by former source item data set by the standardized method different from four kinds of TCA transfer learnings method in combination with rear Source item data set expansion is carried out, process is as follows:
1.1) known class target sample mean in destination item first, is divided into target training set and target detection collection, It is required that including the defective sample of equivalent amount;
1.2) all relevant source item data set combination target detection collection are subjected to four kinds of standardizations, wherein four kinds Standardized method is maxmin criterion, the Z-score standards based on the common average value of source domain and aiming field and standard deviation Change, marked based on source domain average value and the Z-score of standard deviation standardization, based on aiming field average value and the Z-score of standard deviation Standardization;
1.3) call TCA algorithms respectively by the source item data set after four kinds of standardizations and the former source item before processing Data set carries out transfer learning, the new source item data set after being expanded and target detection collection for target detection collection;
2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration, process is such as Under:
2.1) respectively in data set after expansion each source item data set and target training set utilize machine learning In decision Tree algorithms generate a sub-classifier;
2.2) be each subclassification self-adjusted block weight to obtain a synergetic classification device;
3) it is that sample to be predicted new in destination item carries out failure prediction, process is as follows:
3.1) pretreatment being made of standardization and transfer learning is passed through to new sample;
3.2) it calls trained synergetic classification device to classify pretreated each new samples, predicts whether it contains It is defective.
Further, the process of the step 1.3) is as follows:
1.3.1 the heavy expression characteristic dimension obtained after TCA migrations, i.e., the dimension of potential feature space) are determined;
1.3.2) according to identified latent space dimension, a kind of transformational relation is determined by gaussian kernel function so that source After former feature space is transformed into potential feature space, distributional difference between the two subtracts for project data collection and target data set It is small;
1.3.3 former N number of source item data set and 1 target detection collection) are extended for 5*N source item data set and relatively The 4*N+1 target detection collection answered.
Further, the process of the step 2.2) is as follows:
2.2.1) synergetic classification device and object function are defined first:
Define 1 (synergetic classification device):All subclassifications are had according to respective contribution to point obtained after being combined with stressing Class device is synergetic classification device.Synergetic classification device classifies in the following manner for a new samples j:
Wherein Scorei(j) each sub-classifier C is indicatediThe confidence level provided, i.e. sample j are defective sample Possibility, between the section of confidence level is 0 to 1.wiFor the weight of each sub-classifier, for indicating the sub-classifier for association With the contribution of grader.M is the number of sub-classifier, and threshold is to judge whether the sample contains defective confidence Spend threshold value.The sum of the weighting confidence level of all sub-classifiers Comp (j) if more than the threshold value then by the sample classification be it is defective, Otherwise it is zero defect.
Define 2 (object functions):This optimization process is distributed for adaptive weighting, using F-measure as target letter Number, computational methods are:
F=(2 × P × R)/(P+R) (3)
P=TP/ (TP+FP) (4)
R=TP/ (TP+FN) (5)
Wherein, TP is real sample number, and representative is predicted as really containing defective sample number in defective sample;FP is False positive sample number represents the sample number for being predicted as that defect is actually free of in defective sample;FN is false anti-sample number, is represented It is predicted as actually containing the sample number for going defect in flawless sample.On this basis it can be calculated that P is the standard of classification True rate refers to and is predicted as in defective sample being really defective sample proportion, and the value is higher, and to represent grader more accurate;R is The recall rate of classification, it is really to be predicted to be defective sample proportion in defective sample to refer to, and the higher representative of the value has scarce Sunken sample is found more.F-measure is the harmonic-mean of accuracy rate and recall rate, and the value the high, illustrates to utilize The synergetic classification device that this group of weight and threshold value are constituted gets not only defective sample more but also accurate, i.e., estimated performance is better.
2.2.2 PSO algorithms) are introduced into weight self-adjusted block process herein, are first generated at random according to population scale A series of particles carry out population initialization.One of weight and threshold value is combined as a solution, and the disaggregation of all solution compositions are by table The population being shown as in a search space.Position where one particle is described by a series of coordinate values, and each value represents A part for one solution, i.e. weighted value or threshold value.
2.2.3 the fitness of each particle) is calculated, fitness herein is the association formed using this group of weight and threshold value With grader to the prediction effect of target detection collection, weighed with F-measure.
2.2.4 the desired positions that the desired positions and entire population) lived through according to the particle are lived through, i.e. institute Weight distribution and threshold value when obtaining F-measure maximums are arranged, and to update the position and speed of the particle, speed indicates the particle Mobile distance and direction.
2.2.5) return to step 2.2.2), until greatest iteration number, exports in population and obtain maximum F-measure values Particle position, as optimal weight and threshold value.
2.2.6) according to optimal weight and threshold value, all sub-classifiers are built into a final synergetic classification jointly Device;
In the step 1.3), choose herein four kinds of different standardized methods simultaneously with TCA transfer learning method knots It closes to expand source item data set, enriches the information representation of source item data.This be for the first time by multiple standards method with move It moves study to be applied in combination with after in software defect prediction algorithm, migration performance is substantially better than other methods.
In the step 2), the software defect prediction algorithm based on collaboration migration is used herein, which can be abundant The different information expressed after the processing of multiple standards method using each source item, to structure more fully sub-classifier, and And it is each sub-classifier self-adjusted block weight to combine the estimated performance to target detection collection, and synergetic classification device is built with this, from And achieve the purpose that optimize multi-source transfer learning, it can finally optimize spanned item mesh software defect estimated performance
In the step 3), herein when each new sample to be tested carries out failure prediction in for destination item, Other related source items are combined to pre-process in the sample first, wherein pre-processing by multiple standards method and TCA migrations Study composition then combines the threshold of trained synergetic classification device to provide new training set for each sub-classifier Value and weight classify new samples, realize task of spanned item mesh failure prediction is carried out to each target sample to be tested.
The present invention technical concept be:This paper presents the software defect prediction algorithm based on collaboration migration, the algorithm is first First by TCA algorithms and multiple standards method in combination with fully extracting the abundant information in source item data set, and subtract The data distribution difference of few source item and destination item, expands source item data set with this;Then, to source item after expansion Mesh number trains a sub-classifier according to each data set of concentration using decision Tree algorithms, and test sample is waited for for the same target This, each sub-classifier provides the whether defective confidence level of the sample;Then, the software based on collaboration migration is called Failure prediction algorithm obtains a synergetic classification device, can be had to the contribution of synergetic classification device according to each sub-classifier and stress ground They are combined;Finally, after the pretreatment for carrying out being combined by standardized method and TCA to target sample to be tested, instruction is called The synergetic classification device perfected carries out failure prediction.
Beneficial effects of the present invention are mainly manifested in:By by TCA algorithms and multiple standards method in combination with to source All information for fully utilizing source item that can provide while project is reduced with destination item sample distribution difference as far as possible, and Newly-generated data set is expanded into source item data set, and one is obtained by the software defect prediction algorithm based on collaboration migration Synergetic classification device can carry out failure prediction using all relevant source items to destination item, and to target sample to be tested It carries out calling synergetic classification device to carry out spanned item mesh failure prediction when failure prediction.5 Item Sets of the software defect prediction algorithm pair, Amount to 28 software projects, test result show that the failure prediction algorithm can make full use of all source item information, and have Effect improves the effect of prediction.
Description of the drawings
Fig. 1 is the structure chart of the Software Defects Predict Methods based on collaboration migration.
Fig. 2 is the flow chart of the Software Defects Predict Methods based on collaboration migration.
Specific implementation mode
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Figures 1 and 2, a kind of Software Defects Predict Methods based on collaboration migration, include the following steps:
1) by four kinds of different standardized methods and TCA transfer learnings method in combination with by former source item data set Expand the new same size source item data set into four, process is as follows:
1.1) known class target sample in destination item first, is divided into two parts according to category:Target training set and Target detection collection, wherein require the similar mark sample number of the two identical and must all contain defective sample, its in destination item His sample without category is as target sample to be tested;
1.2) for current all and relevant set of source data of destination item, combining target test set is standardized place Reason, using with 4 kinds of standardization processing methods:
First method is maxmin criterion, and computational methods are as follows:
Second method is to be standardized based on the common average value of source item and destination item and the Z-score of standard deviation, Computational methods are as follows:
The third method is to be standardized based on source item average value and the Z-score of standard deviation, and computational methods are as follows:
Fourth method is to be standardized based on destination item average value and the Z-score of standard deviation, and computational methods are as follows:
Wherein, x represents the vector expression of certain one-dimensional characteristic in the data set after source item merges with target training set, xiGeneration The value of i-th of sample in table x, min () are to be minimized, and max () is to be maximized, and mean () is to be averaged, and std () is Take standard deviation, x'iFor xiIt is normalized treated value, four kinds of methods to former data carry out again express after, abundant information has Institute is different;
1.3) call TCA algorithms respectively to the source item data set after above-mentioned 4 kinds of standardizations and the original before processing Source item data set carries out transfer learning for corresponding target detection collection, obtains new source item data set and target is surveyed Examination collection, process are as follows:
1.3.1 the data set weight expression characteristic dimension obtained after TCA migrations) is determined, i.e., the dimension of potential feature space will It is set as original half;
1.3.2) according to set latent space dimension, a kind of transformational relation is determined by gaussian kernel function so that source After former feature space is transformed into potential feature space, the two Largest Mean difference is minimum for project data collection and target data set, Largest Mean difference calculation is:
Wherein src is source item data set, and tar is destination item data set, n1For source item data set sample number, n2For Destination item data set sample number, srciFor i-th of sample, tar in source itemiFor i-th of sample in destination item;
1.3.3 former N number of source item data set and 1 target detection collection) are extended for 5*N source item data set and relatively The 4*N+1 target detection collection answered;
2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration, process is such as Under:
2.1) to after each expansion in data set source item data set and target training set using in machine learning Decision Tree algorithms generate a sub-classifier respectively, and the decision Tree algorithms select the J48 algorithms in WEKA;
2.2) performance for combining synergetic classification device carries out adaptive weighting distribution for each sub-classifier, and process is as follows:
2.2.1) synergetic classification device and object function are defined first:
Define 1 (synergetic classification device):All sub-classifiers are had according to respective contribution and are obtained after being combined with stressing Grader is synergetic classification device, and synergetic classification device classifies in the following manner for a new samples j:
Wherein Scorei(j) each sub-classifier C is indicatediThe confidence level provided, i.e. sample j are defective sample Possibility, between the section of confidence level is 0 to 1, wiFor the weight of each sub-classifier, for indicating the sub-classifier for association With the contribution of grader.M is the number of sub-classifier, and threshold is to judge whether the sample contains defective confidence Spend threshold value, the sum of weighting confidence level of all sub-classifiers Comp (j) if more than the threshold value then by the sample classification be it is defective, Otherwise it is flawless;
Define 2 (object functions):This optimization process is distributed for adaptive weighting, using F-measure as target letter Number, computational methods are:
F=(2 × P × R)/(P+R) (8)
P=TP/ (TP+FP) (9)
R=TP/ (TP+FN) (10)
Wherein, TP is real sample number, and representative is predicted as really containing defective sample number in defective sample;FP is False positive sample number represents the sample number for being predicted as that defect is actually free of in defective sample;FN is false anti-sample number, is represented It is predicted as actually containing the sample number for going defect in flawless sample.On this basis it can be calculated that P is the standard of classification True rate refers to and is predicted as in defective sample being really defective sample proportion, and the value is higher, and to represent grader more accurate;R is The recall rate of classification, it is really to be predicted to be defective sample proportion in defective sample to refer to, and the higher representative of the value has scarce Sunken sample is found more.F-measure is the harmonic-mean of accuracy rate and recall rate, and the value the high, illustrates to utilize The synergetic classification device that this group of weight and threshold value are constituted gets not only defective sample more but also accurate, i.e., estimated performance is better.
2.2.2 PSO algorithms) are used when carrying out weight self-adjusted block to sub-classifier herein, for all subclassifications Device is assigned with a series of weight (w1,w2,..,wn) and a defect estimation threshold value threshold.Population scale is set first And greatest iteration number, a series of particles are then generated according to population scale at random and carry out population initialization.Weight and threshold One of value is combined as a solution, and the disaggregation of all solution compositions is represented as the population in a search space.One particle The position at place is described by a series of coordinate values, and each value represents a part for a solution, i.e. weighted value or threshold value.
2.2.3 the fitness of each particle) is calculated, fitness herein is the association formed using this group of weight and threshold value With grader to the prediction effect of target detection collection, indicated with F-measure, computational methods such as 2.2.1) defined in shown in 2.
2.2.4 the desired positions that the desired positions and entire population) and then according to the particle lived through are lived through, Weight distribution obtained by i.e. when F-measure maximums and threshold value setting, to update the position and speed of the particle, speed indicates should The distance of particle movement and direction.
2.2.5) return to step 2.2.3) it is iterated, until greatest iteration number, exports in population and obtain maximum F- The particle position of measure values, as optimal weight and threshold value.
2.2.6) according to optimal weight and threshold value, all sub-classifiers are built into a final synergetic classification jointly Device.
3) failure prediction is carried out to sample to be predicted new in destination item, process is as follows:
3.1) new sample is pre-processed, pretreatment is made of four kinds of standardized methods and TCA transfer learnings;
3.2) it calls trained synergetic classification device to classify pretreated each new samples, predicts whether it contains It is defective.

Claims (3)

1. a kind of Software Defects Predict Methods based on collaboration migration, it is characterised in that:It the described method comprises the following steps:
1) by four kinds of different standardized methods and TCA transfer learnings method in combination with by the expansion of former source item data set Into four new same size source item data sets, process is as follows:
1.1) known class target sample in destination item first, is divided into two parts according to category:Target training set and target Test set, wherein require the similar mark sample number of the two identical and must all contain defective sample, other nothings in destination item The sample of category is as target sample to be tested;
1.2) for current all and relevant set of source data of destination item, combining target test set is standardized, adopts With 4 kinds of standardization processing methods:
First method is maxmin criterion, and computational methods are as follows:
Second method is to be standardized based on the common average value of source item and destination item and the Z-score of standard deviation, is calculated Method is as follows:
The third method is to be standardized based on source item average value and the Z-score of standard deviation, and computational methods are as follows:
Fourth method is to be standardized based on destination item average value and the Z-score of standard deviation, and computational methods are as follows:
Wherein, x represents the vector expression of certain one-dimensional characteristic in the data set after source item merges with target training set, xiIt represents in x The value of i-th of sample, min () are to be minimized, and max () is to be maximized, and mean () is to be averaged, and std () is to take mark Poor, the x of standardi' it is xiIt is normalized treated value;
1.3) call TCA algorithms respectively to the source item data set after above-mentioned 4 kinds of standardizations and the former source item before processing Mesh data set carries out transfer learning for corresponding target detection collection, obtains new source item data set and target detection Collection;
2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration, process is as follows:
2.1) to after each expansion in data set source item data set and target training set utilize the decision in machine learning Tree algorithm generates a sub-classifier respectively, and the plan tree algorithm selects the J48 algorithms in WEKA;
2.2) performance for combining synergetic classification device carries out adaptive weighting distribution for each sub-classifier;
3) failure prediction is carried out to sample to be predicted new in destination item, process is as follows:
3.1) new sample is pre-processed, pretreatment is made of four kinds of standardized methods and TCA transfer learnings;
3.2) it calls trained synergetic classification device to classify pretreated each new samples, predicts it whether containing scarce It falls into.
2. the Software Defects Predict Methods as described in claim 1 based on collaboration migration, it is characterised in that:The step 1.3) Process it is as follows:
1.3.1) determine that the data set weight expression characteristic dimension obtained after TCA migrations, i.e., the dimension of potential feature space are set It is set to original half;
1.3.2) according to set latent space dimension, a kind of transformational relation is determined by gaussian kernel function so that source item After former feature space is transformed into potential feature space, the two Largest Mean difference is minimum for data set and target data set, maximum Mean value difference calculation is:
Wherein src is source item data set, and tar is destination item data set, n1For source item data set sample number, n2For target Project data collection sample number, srciFor i-th of sample, tar in source itemiFor i-th of sample in destination item;
1.3.3 former N number of source item data set and 1 target detection collection) are extended for 5*N source item data set and corresponding 4*N+1 target detection collection.
3. the Software Defects Predict Methods as claimed in claim 1 or 2 based on collaboration migration, it is characterised in that:The step 2.2) process is as follows:
2.2.1) to synergetic classification device, index F-meaure good and bad with it is evaluated is defined first:
Define 1:It is synergetic classification that all sub-classifiers are had according to respective contribution to the grader obtained after being combined with stressing Device, synergetic classification device classify in the following manner for a new samples j:
Wherein Scorei(j) each sub-classifier C is indicatediThe confidence level provided, i.e. sample j are the possibility of defective sample Property, between the section of confidence level is 0 to 1, wiFor the weight of each sub-classifier, for indicating the sub-classifier for collaboration point The contribution of class device, M are the number of sub-classifier, and threshold is to judge whether the sample contains defective confidence level threshold Value, the sum of weighting confidence level of all sub-classifiers Comp (j) if more than the threshold value then by the sample classification be it is defective, otherwise It is flawless;
Define 2:This optimization process is distributed for adaptive weighting, using F-measure as object function, computational methods For:
F=(2 × P × R)/(P+R) (8)
P=TP/ (TP+FP) (9)
R=TP/ (TP+FN) (10)
Wherein, TP is real sample number, and representative is predicted as really containing defective sample number in defective sample;FP be it is false just Sample number represents the sample number for being predicted as that defect is actually free of in defective sample;FN is false anti-sample number, represents prediction Actually to contain the sample number for going defect in flawless sample;On this basis it can be calculated that P is the accurate of classification Rate refers to and is predicted as in defective sample being really defective sample proportion, and the value is higher, and to represent grader more accurate;R is point The recall rate of class, it is really to be predicted to be defective sample proportion in defective sample to refer to;F-measure be accuracy rate and The harmonic-mean of recall rate;
2.2.2 PSO algorithms) are used when carrying out weight self-adjusted block to sub-classifier, are assigned with for all sub-classifiers A series of weight (w1,w2,..,wn) and a defect estimation threshold value threshold, population scale and maximum are set first Then number of iterations generates a series of particles according to population scale and carries out population initialization at random;One of weight and threshold value It is combined as a solution, the disaggregation of all solution compositions is represented as the population in a search space;Position where one particle It sets and is described by a series of coordinate values, each value represents a part for a solution, i.e. weighted value or threshold value;
2.2.3 the fitness of each particle) is calculated, fitness is the synergetic classification device pair formed using this group of weight and threshold value The prediction effect of target detection collection, is indicated with F-measure;
2.2.4 the desired positions that the desired positions and entire population) and then according to the particle lived through are lived through, i.e. institute Weight distribution and threshold value when obtaining F-measure maximums are arranged, and to update the position and speed of the particle, speed indicates the particle Mobile distance and direction;
2.2.5) return to step 2.2.3) it is iterated, until greatest iteration number, exports in population and obtain maximum F-measure The particle position of value, as optimal weight and threshold value;
2.2.6) according to optimal weight and threshold value, all sub-classifiers are built into a final synergetic classification device jointly.
CN201711417594.6A 2017-12-25 2017-12-25 Software defect prediction method based on collaborative migration Active CN108304316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711417594.6A CN108304316B (en) 2017-12-25 2017-12-25 Software defect prediction method based on collaborative migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711417594.6A CN108304316B (en) 2017-12-25 2017-12-25 Software defect prediction method based on collaborative migration

Publications (2)

Publication Number Publication Date
CN108304316A true CN108304316A (en) 2018-07-20
CN108304316B CN108304316B (en) 2021-04-06

Family

ID=62871017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711417594.6A Active CN108304316B (en) 2017-12-25 2017-12-25 Software defect prediction method based on collaborative migration

Country Status (1)

Country Link
CN (1) CN108304316B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325543A (en) * 2018-10-10 2019-02-12 南京邮电大学 Software Defects Predict Methods, readable storage medium storing program for executing and terminal
CN110825644A (en) * 2019-11-11 2020-02-21 南京邮电大学 Cross-project software defect prediction method and system
CN111131248A (en) * 2019-12-24 2020-05-08 广东电科院能源技术有限责任公司 Website application security defect detection model modeling method and defect detection method
CN111367801A (en) * 2020-02-29 2020-07-03 杭州电子科技大学 Data transformation method for cross-company software defect prediction
CN111881048A (en) * 2020-07-31 2020-11-03 武汉理工大学 Cross-project software aging defect prediction method
CN112347392A (en) * 2020-10-21 2021-02-09 上海淇玥信息技术有限公司 Anti-fraud assessment method and device based on transfer learning and electronic equipment
CN112527670A (en) * 2020-12-18 2021-03-19 武汉理工大学 Method for predicting software aging defects in project based on Active Learning
CN112651950A (en) * 2020-12-30 2021-04-13 珠海碳云智能科技有限公司 Data processing method, sample classification method, model training method and device
CN113268434A (en) * 2021-07-08 2021-08-17 北京邮电大学 Software defect prediction method based on Bayesian model and particle swarm optimization
CN114757305A (en) * 2022-06-13 2022-07-15 华中科技大学 Voltage transformer insulation fault identification method and system based on ensemble learning
CN117421244A (en) * 2023-11-17 2024-01-19 北京邮电大学 Multi-source cross-project software defect prediction method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637143A (en) * 2012-03-07 2012-08-15 南京邮电大学 Software defect priority prediction method based on improved support vector machine
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method
CN103810101A (en) * 2014-02-19 2014-05-21 北京理工大学 Software defect prediction method and system
KR101746328B1 (en) * 2016-01-29 2017-06-12 한국과학기술원 Hybrid instance selection method using nearest-neighbor for cross-project defect prediction
CN106991047A (en) * 2017-03-27 2017-07-28 中国电力科学研究院 A kind of method and system for being predicted to object-oriented software defect

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637143A (en) * 2012-03-07 2012-08-15 南京邮电大学 Software defect priority prediction method based on improved support vector machine
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method
CN103810101A (en) * 2014-02-19 2014-05-21 北京理工大学 Software defect prediction method and system
KR101746328B1 (en) * 2016-01-29 2017-06-12 한국과학기술원 Hybrid instance selection method using nearest-neighbor for cross-project defect prediction
CN106991047A (en) * 2017-03-27 2017-07-28 中国电力科学研究院 A kind of method and system for being predicted to object-oriented software defect

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JAECHANG NAM: "Transfer Defect Learning", 《 2013年第35届软件工程国际会议(ICSE)》 *
何吉元 等: "一种半监督集成跨项目软件缺陷预测方法", 《软件学报》 *
郝世锦: "基于缺陷分层与PSO算法的软件缺陷预测模型", 《软件》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325543A (en) * 2018-10-10 2019-02-12 南京邮电大学 Software Defects Predict Methods, readable storage medium storing program for executing and terminal
CN110825644A (en) * 2019-11-11 2020-02-21 南京邮电大学 Cross-project software defect prediction method and system
CN111131248B (en) * 2019-12-24 2021-09-24 南方电网电力科技股份有限公司 Website application security defect detection model modeling method and defect detection method
CN111131248A (en) * 2019-12-24 2020-05-08 广东电科院能源技术有限责任公司 Website application security defect detection model modeling method and defect detection method
CN111367801A (en) * 2020-02-29 2020-07-03 杭州电子科技大学 Data transformation method for cross-company software defect prediction
CN111881048A (en) * 2020-07-31 2020-11-03 武汉理工大学 Cross-project software aging defect prediction method
CN112347392A (en) * 2020-10-21 2021-02-09 上海淇玥信息技术有限公司 Anti-fraud assessment method and device based on transfer learning and electronic equipment
CN112527670A (en) * 2020-12-18 2021-03-19 武汉理工大学 Method for predicting software aging defects in project based on Active Learning
CN112651950A (en) * 2020-12-30 2021-04-13 珠海碳云智能科技有限公司 Data processing method, sample classification method, model training method and device
CN112651950B (en) * 2020-12-30 2023-09-29 珠海碳云诊断科技有限公司 Data processing method, sample classification method, model training method and device
CN113268434A (en) * 2021-07-08 2021-08-17 北京邮电大学 Software defect prediction method based on Bayesian model and particle swarm optimization
CN113268434B (en) * 2021-07-08 2022-07-26 北京邮电大学 Software defect prediction method based on Bayes model and particle swarm optimization
CN114757305A (en) * 2022-06-13 2022-07-15 华中科技大学 Voltage transformer insulation fault identification method and system based on ensemble learning
CN114757305B (en) * 2022-06-13 2022-09-20 华中科技大学 Voltage transformer insulation fault identification method and system based on ensemble learning
CN117421244A (en) * 2023-11-17 2024-01-19 北京邮电大学 Multi-source cross-project software defect prediction method, device and storage medium

Also Published As

Publication number Publication date
CN108304316B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN108304316A (en) A kind of Software Defects Predict Methods based on collaboration migration
Zhao et al. Local binary pattern-based adaptive differential evolution for multimodal optimization problems
CN109816092B (en) Deep neural network training method and device, electronic equipment and storage medium
Stoean et al. Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection
CN112507996B (en) Face detection method of main sample attention mechanism
CN110059852A (en) A kind of stock yield prediction technique based on improvement random forests algorithm
CN105095494A (en) Method for testing categorical data set
CN108388925A (en) The anti-pattern collapse robust image generation method for generating network is fought based on New Conditions
Zhang et al. Feat: A fairness-enhancing and concept-adapting decision tree classifier
Das et al. An oversampling technique by integrating reverse nearest neighbor in SMOTE: Reverse-SMOTE
Wozniak et al. Designing combining classifier with trained fuser—Analytical and experimental evaluation
CN112819063A (en) Image identification method based on improved Focal loss function
CN113011513B (en) Image big data classification method based on general domain self-adaption
CN111797935B (en) Semi-supervised depth network picture classification method based on group intelligence
CN111445025B (en) Method and device for determining hyper-parameters of business model
Wangli et al. Foxtail Millet ear detection approach based on YOLOv4 and adaptive anchor box adjustment
Cao et al. Miac: Mutual-information classifier with adasyn for imbalanced classification
Nguyen-Thi et al. Transfer AdaBoost SVM for link prediction in newly signed social networks using explicit and PNR features
Vaghela et al. Boost a weak learner to a strong learner using ensemble system approach
Li et al. GADet: A Geometry-Aware X-ray Prohibited Items Detector
Li et al. Study on the Prediction of Imbalanced Bank Customer Churn Based on Generative Adversarial Network
CN110414845A (en) For the methods of risk assessment and device of target transaction
Ren et al. The class overlap model for system log anomaly detection based on ensemble learning
Pacifico et al. Evolutionary elms with alternative treatments for the population out-bounded individuals
Dhivya et al. Weighted particle swarm optimization algorithm for randomized unit testing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant