CN108304316A - A kind of Software Defects Predict Methods based on collaboration migration - Google Patents
A kind of Software Defects Predict Methods based on collaboration migration Download PDFInfo
- Publication number
- CN108304316A CN108304316A CN201711417594.6A CN201711417594A CN108304316A CN 108304316 A CN108304316 A CN 108304316A CN 201711417594 A CN201711417594 A CN 201711417594A CN 108304316 A CN108304316 A CN 108304316A
- Authority
- CN
- China
- Prior art keywords
- sample
- data set
- source item
- sub
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Abstract
A kind of Software Defects Predict Methods based on collaboration migration, include the following steps:1) by four kinds of different standardized methods and TCA transfer learnings method in combination with former source item data set is expanded the new same size source item data set into four;2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration;3) failure prediction is carried out to sample to be predicted new in destination item.The present invention chooses four kinds of different standardized methods and is combined with TCA transfer learning methods to expand source item data set simultaneously, enrich the information representation of source item data, a sub-classifier is generated for each source item, and adaptive weighting distribution is carried out to sub-classifier according to PSO algorithms, to build synergetic classification device, failure prediction is carried out to the sample to be tested in destination item.
Description
Technical field
The invention belongs to software defect prediction algorithm fields, and in particular to a kind of software defect prediction based on collaboration migration
Method.
Background technology
Software defect prediction can be divided into failure prediction and spanned item mesh failure prediction in project.Failure prediction needs big in project
Measure in the project it is known whether defective sample, such as file, class and function, as training set, in conjunction with machine learning
Method generate grader after target sample is predicted.The failure prediction of spanned item mesh then can be according to the sample of other relevant items
This carries out failure prediction to destination item.Since destination item is too new or obtains the cost mistake of label in real development process
Height causes training sample in destination item very few, it is often necessary to carry out spanned item mesh failure prediction.It is pre- in most of spanned item mesh defects
In method of determining and calculating, due to the difference of destination item and source item development process, the two sample distribution often has differences, and becomes use
Biggest obstacle when conventional machines learning algorithm, directly affects prediction effect.
In order to solve the problems, such as that source item differs greatly with destination item sample distribution in spanned item mesh failure prediction, migration is learned
Habit is introduced in software defect prediction.Mainly have based on sample and based on spy currently based on the failure prediction algorithm of transfer learning
Both are levied, the former selects the sample for contributing to destination item to predict in source item, and the latter is by the sample of source item and destination item
Originally it is mapped to the expression again that the same potential feature space carries out feature, both can solve source item and destination item
Sample distribution different problems.Turhan et al. using K arest neighbors methods be in destination item without category sample from source item
The middle training sample for selecting ten most similar samples as prediction model;It is similar to the method that Turhan is proposed, Peters etc.
People also utilizes arest neighbors method to select training sample, but it selects tactful difference;Ma et al. proposes one kind
TransferBayes (TNB) method reduces source item and target item by distributing weight to the sample in training set
Then data distribution difference between mesh builds prediction model using the training sample after weighting;Ryul et al. is by Boosting-
SVM is combined with class imbalance problem solution, and the performance of TNB is improved with this;In addition to the above-mentioned migration based on sample
Outside practising, Pan et al. proposes a kind of transfer learning method TransferComponent Analysis (TCA) of feature based,
Source item and destination item are mapped to a latent space by it by learning a transformational relation so that both in the space
Apart from as small as possible;On the basis of TCA, Nam et al. observes that different standardized methods is affected to migration effect, because
This devises set of rule to select suitable standardized method to be combined with TCA, it is proposed that TCA+ methods.But the above migration is learned
It practises and is all directed to one-to-one spanned item mesh failure prediction method, can not determine which relevant source item has most destination item
It is great for other source items if only predicted using a source item under the premise of good prediction effect
Waste, so how to efficiently use the sample information of other source items, i.e. multi-source transfer learning and one very important asks
Topic.
For multi-source transfer learning, most efficient method is to generate them each source item after one grader at present
In conjunction with to complete migration task.Schweikert et al. utilizes a kind of side of entitled Multiple Convex Combination
Method is combined each source domain and aiming field with each SVM classifier that category data generate;Sun et al. proposes one kind and does not need
Method with category target sample, this method are based on Bayesian learning principle, weigh the adaptedness of source domain and aiming field to divide
With weight, which is indicated with the Euclidean distance average value of the k of source domain and aiming field closest samples;Yang et al. bases
In support vector machines (SVM), combining adaptive function, it is proposed that a kind of adaptive support vector regression can be used for aiming field
On-line monitoring, but be all equal for the weight of each subclassification.But above-mentioned algorithm is not by with pre- with software defect
It surveys.
Generally speaking, there are following problems for current software defect prediction algorithm:In software defect prediction, migration
Study is particularly significant for spanned item mesh failure prediction, how transfer learning algorithm to be made to make full use of the useful information of source item,
To promote the failure prediction performance to destination item;Different source items has destination item different prediction effects,
Under the premise of can not determine which source item estimated performance is best, compared with one-to-one spanned item mesh failure prediction, how
Consider that other all relevant source items could improve estimated performance simultaneously.For the Railway Project present on, set forth herein
A kind of software defect prediction algorithm based on collaboration migration.
Invention content
There are following problems for current software defect prediction algorithm:Software defect prediction in, transfer learning for
Spanned item mesh failure prediction is particularly significant, how so that transfer learning algorithm makes full use of the useful information of source item to be promoted pair
The failure prediction performance of destination item;Different source items has different prediction effects for destination item, with it is one-to-one across
Project failure prediction is compared, and how to consider that other all relevant source items could improve estimated performance simultaneously.The present invention provides
A kind of Software Defects Predict Methods based on collaboration migration, choose four kinds of different standardized methods simultaneously with TCA transfer learnings
Method enriches the information representation of source item data in conjunction with source item data set is expanded, and one is generated for each source item
Sub-classifier, and adaptive weighting distribution is carried out to sub-classifier according to PSO algorithms, to build synergetic classification device, to target
Sample to be tested in project carries out failure prediction.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of Software Defects Predict Methods based on collaboration migration, the described method comprises the following steps:
1) by former source item data set by the standardized method different from four kinds of TCA transfer learnings method in combination with rear
Source item data set expansion is carried out, process is as follows:
1.1) known class target sample mean in destination item first, is divided into target training set and target detection collection,
It is required that including the defective sample of equivalent amount;
1.2) all relevant source item data set combination target detection collection are subjected to four kinds of standardizations, wherein four kinds
Standardized method is maxmin criterion, the Z-score standards based on the common average value of source domain and aiming field and standard deviation
Change, marked based on source domain average value and the Z-score of standard deviation standardization, based on aiming field average value and the Z-score of standard deviation
Standardization;
1.3) call TCA algorithms respectively by the source item data set after four kinds of standardizations and the former source item before processing
Data set carries out transfer learning, the new source item data set after being expanded and target detection collection for target detection collection;
2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration, process is such as
Under:
2.1) respectively in data set after expansion each source item data set and target training set utilize machine learning
In decision Tree algorithms generate a sub-classifier;
2.2) be each subclassification self-adjusted block weight to obtain a synergetic classification device;
3) it is that sample to be predicted new in destination item carries out failure prediction, process is as follows:
3.1) pretreatment being made of standardization and transfer learning is passed through to new sample;
3.2) it calls trained synergetic classification device to classify pretreated each new samples, predicts whether it contains
It is defective.
Further, the process of the step 1.3) is as follows:
1.3.1 the heavy expression characteristic dimension obtained after TCA migrations, i.e., the dimension of potential feature space) are determined;
1.3.2) according to identified latent space dimension, a kind of transformational relation is determined by gaussian kernel function so that source
After former feature space is transformed into potential feature space, distributional difference between the two subtracts for project data collection and target data set
It is small;
1.3.3 former N number of source item data set and 1 target detection collection) are extended for 5*N source item data set and relatively
The 4*N+1 target detection collection answered.
Further, the process of the step 2.2) is as follows:
2.2.1) synergetic classification device and object function are defined first:
Define 1 (synergetic classification device):All subclassifications are had according to respective contribution to point obtained after being combined with stressing
Class device is synergetic classification device.Synergetic classification device classifies in the following manner for a new samples j:
Wherein Scorei(j) each sub-classifier C is indicatediThe confidence level provided, i.e. sample j are defective sample
Possibility, between the section of confidence level is 0 to 1.wiFor the weight of each sub-classifier, for indicating the sub-classifier for association
With the contribution of grader.M is the number of sub-classifier, and threshold is to judge whether the sample contains defective confidence
Spend threshold value.The sum of the weighting confidence level of all sub-classifiers Comp (j) if more than the threshold value then by the sample classification be it is defective,
Otherwise it is zero defect.
Define 2 (object functions):This optimization process is distributed for adaptive weighting, using F-measure as target letter
Number, computational methods are:
F=(2 × P × R)/(P+R) (3)
P=TP/ (TP+FP) (4)
R=TP/ (TP+FN) (5)
Wherein, TP is real sample number, and representative is predicted as really containing defective sample number in defective sample;FP is
False positive sample number represents the sample number for being predicted as that defect is actually free of in defective sample;FN is false anti-sample number, is represented
It is predicted as actually containing the sample number for going defect in flawless sample.On this basis it can be calculated that P is the standard of classification
True rate refers to and is predicted as in defective sample being really defective sample proportion, and the value is higher, and to represent grader more accurate;R is
The recall rate of classification, it is really to be predicted to be defective sample proportion in defective sample to refer to, and the higher representative of the value has scarce
Sunken sample is found more.F-measure is the harmonic-mean of accuracy rate and recall rate, and the value the high, illustrates to utilize
The synergetic classification device that this group of weight and threshold value are constituted gets not only defective sample more but also accurate, i.e., estimated performance is better.
2.2.2 PSO algorithms) are introduced into weight self-adjusted block process herein, are first generated at random according to population scale
A series of particles carry out population initialization.One of weight and threshold value is combined as a solution, and the disaggregation of all solution compositions are by table
The population being shown as in a search space.Position where one particle is described by a series of coordinate values, and each value represents
A part for one solution, i.e. weighted value or threshold value.
2.2.3 the fitness of each particle) is calculated, fitness herein is the association formed using this group of weight and threshold value
With grader to the prediction effect of target detection collection, weighed with F-measure.
2.2.4 the desired positions that the desired positions and entire population) lived through according to the particle are lived through, i.e. institute
Weight distribution and threshold value when obtaining F-measure maximums are arranged, and to update the position and speed of the particle, speed indicates the particle
Mobile distance and direction.
2.2.5) return to step 2.2.2), until greatest iteration number, exports in population and obtain maximum F-measure values
Particle position, as optimal weight and threshold value.
2.2.6) according to optimal weight and threshold value, all sub-classifiers are built into a final synergetic classification jointly
Device;
In the step 1.3), choose herein four kinds of different standardized methods simultaneously with TCA transfer learning method knots
It closes to expand source item data set, enriches the information representation of source item data.This be for the first time by multiple standards method with move
It moves study to be applied in combination with after in software defect prediction algorithm, migration performance is substantially better than other methods.
In the step 2), the software defect prediction algorithm based on collaboration migration is used herein, which can be abundant
The different information expressed after the processing of multiple standards method using each source item, to structure more fully sub-classifier, and
And it is each sub-classifier self-adjusted block weight to combine the estimated performance to target detection collection, and synergetic classification device is built with this, from
And achieve the purpose that optimize multi-source transfer learning, it can finally optimize spanned item mesh software defect estimated performance
In the step 3), herein when each new sample to be tested carries out failure prediction in for destination item,
Other related source items are combined to pre-process in the sample first, wherein pre-processing by multiple standards method and TCA migrations
Study composition then combines the threshold of trained synergetic classification device to provide new training set for each sub-classifier
Value and weight classify new samples, realize task of spanned item mesh failure prediction is carried out to each target sample to be tested.
The present invention technical concept be:This paper presents the software defect prediction algorithm based on collaboration migration, the algorithm is first
First by TCA algorithms and multiple standards method in combination with fully extracting the abundant information in source item data set, and subtract
The data distribution difference of few source item and destination item, expands source item data set with this;Then, to source item after expansion
Mesh number trains a sub-classifier according to each data set of concentration using decision Tree algorithms, and test sample is waited for for the same target
This, each sub-classifier provides the whether defective confidence level of the sample;Then, the software based on collaboration migration is called
Failure prediction algorithm obtains a synergetic classification device, can be had to the contribution of synergetic classification device according to each sub-classifier and stress ground
They are combined;Finally, after the pretreatment for carrying out being combined by standardized method and TCA to target sample to be tested, instruction is called
The synergetic classification device perfected carries out failure prediction.
Beneficial effects of the present invention are mainly manifested in:By by TCA algorithms and multiple standards method in combination with to source
All information for fully utilizing source item that can provide while project is reduced with destination item sample distribution difference as far as possible, and
Newly-generated data set is expanded into source item data set, and one is obtained by the software defect prediction algorithm based on collaboration migration
Synergetic classification device can carry out failure prediction using all relevant source items to destination item, and to target sample to be tested
It carries out calling synergetic classification device to carry out spanned item mesh failure prediction when failure prediction.5 Item Sets of the software defect prediction algorithm pair,
Amount to 28 software projects, test result show that the failure prediction algorithm can make full use of all source item information, and have
Effect improves the effect of prediction.
Description of the drawings
Fig. 1 is the structure chart of the Software Defects Predict Methods based on collaboration migration.
Fig. 2 is the flow chart of the Software Defects Predict Methods based on collaboration migration.
Specific implementation mode
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Figures 1 and 2, a kind of Software Defects Predict Methods based on collaboration migration, include the following steps:
1) by four kinds of different standardized methods and TCA transfer learnings method in combination with by former source item data set
Expand the new same size source item data set into four, process is as follows:
1.1) known class target sample in destination item first, is divided into two parts according to category:Target training set and
Target detection collection, wherein require the similar mark sample number of the two identical and must all contain defective sample, its in destination item
His sample without category is as target sample to be tested;
1.2) for current all and relevant set of source data of destination item, combining target test set is standardized place
Reason, using with 4 kinds of standardization processing methods:
First method is maxmin criterion, and computational methods are as follows:
Second method is to be standardized based on the common average value of source item and destination item and the Z-score of standard deviation,
Computational methods are as follows:
The third method is to be standardized based on source item average value and the Z-score of standard deviation, and computational methods are as follows:
Fourth method is to be standardized based on destination item average value and the Z-score of standard deviation, and computational methods are as follows:
Wherein, x represents the vector expression of certain one-dimensional characteristic in the data set after source item merges with target training set, xiGeneration
The value of i-th of sample in table x, min () are to be minimized, and max () is to be maximized, and mean () is to be averaged, and std () is
Take standard deviation, x'iFor xiIt is normalized treated value, four kinds of methods to former data carry out again express after, abundant information has
Institute is different;
1.3) call TCA algorithms respectively to the source item data set after above-mentioned 4 kinds of standardizations and the original before processing
Source item data set carries out transfer learning for corresponding target detection collection, obtains new source item data set and target is surveyed
Examination collection, process are as follows:
1.3.1 the data set weight expression characteristic dimension obtained after TCA migrations) is determined, i.e., the dimension of potential feature space will
It is set as original half;
1.3.2) according to set latent space dimension, a kind of transformational relation is determined by gaussian kernel function so that source
After former feature space is transformed into potential feature space, the two Largest Mean difference is minimum for project data collection and target data set,
Largest Mean difference calculation is:
Wherein src is source item data set, and tar is destination item data set, n1For source item data set sample number, n2For
Destination item data set sample number, srciFor i-th of sample, tar in source itemiFor i-th of sample in destination item;
1.3.3 former N number of source item data set and 1 target detection collection) are extended for 5*N source item data set and relatively
The 4*N+1 target detection collection answered;
2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration, process is such as
Under:
2.1) to after each expansion in data set source item data set and target training set using in machine learning
Decision Tree algorithms generate a sub-classifier respectively, and the decision Tree algorithms select the J48 algorithms in WEKA;
2.2) performance for combining synergetic classification device carries out adaptive weighting distribution for each sub-classifier, and process is as follows:
2.2.1) synergetic classification device and object function are defined first:
Define 1 (synergetic classification device):All sub-classifiers are had according to respective contribution and are obtained after being combined with stressing
Grader is synergetic classification device, and synergetic classification device classifies in the following manner for a new samples j:
Wherein Scorei(j) each sub-classifier C is indicatediThe confidence level provided, i.e. sample j are defective sample
Possibility, between the section of confidence level is 0 to 1, wiFor the weight of each sub-classifier, for indicating the sub-classifier for association
With the contribution of grader.M is the number of sub-classifier, and threshold is to judge whether the sample contains defective confidence
Spend threshold value, the sum of weighting confidence level of all sub-classifiers Comp (j) if more than the threshold value then by the sample classification be it is defective,
Otherwise it is flawless;
Define 2 (object functions):This optimization process is distributed for adaptive weighting, using F-measure as target letter
Number, computational methods are:
F=(2 × P × R)/(P+R) (8)
P=TP/ (TP+FP) (9)
R=TP/ (TP+FN) (10)
Wherein, TP is real sample number, and representative is predicted as really containing defective sample number in defective sample;FP is
False positive sample number represents the sample number for being predicted as that defect is actually free of in defective sample;FN is false anti-sample number, is represented
It is predicted as actually containing the sample number for going defect in flawless sample.On this basis it can be calculated that P is the standard of classification
True rate refers to and is predicted as in defective sample being really defective sample proportion, and the value is higher, and to represent grader more accurate;R is
The recall rate of classification, it is really to be predicted to be defective sample proportion in defective sample to refer to, and the higher representative of the value has scarce
Sunken sample is found more.F-measure is the harmonic-mean of accuracy rate and recall rate, and the value the high, illustrates to utilize
The synergetic classification device that this group of weight and threshold value are constituted gets not only defective sample more but also accurate, i.e., estimated performance is better.
2.2.2 PSO algorithms) are used when carrying out weight self-adjusted block to sub-classifier herein, for all subclassifications
Device is assigned with a series of weight (w1,w2,..,wn) and a defect estimation threshold value threshold.Population scale is set first
And greatest iteration number, a series of particles are then generated according to population scale at random and carry out population initialization.Weight and threshold
One of value is combined as a solution, and the disaggregation of all solution compositions is represented as the population in a search space.One particle
The position at place is described by a series of coordinate values, and each value represents a part for a solution, i.e. weighted value or threshold value.
2.2.3 the fitness of each particle) is calculated, fitness herein is the association formed using this group of weight and threshold value
With grader to the prediction effect of target detection collection, indicated with F-measure, computational methods such as 2.2.1) defined in shown in 2.
2.2.4 the desired positions that the desired positions and entire population) and then according to the particle lived through are lived through,
Weight distribution obtained by i.e. when F-measure maximums and threshold value setting, to update the position and speed of the particle, speed indicates should
The distance of particle movement and direction.
2.2.5) return to step 2.2.3) it is iterated, until greatest iteration number, exports in population and obtain maximum F-
The particle position of measure values, as optimal weight and threshold value.
2.2.6) according to optimal weight and threshold value, all sub-classifiers are built into a final synergetic classification jointly
Device.
3) failure prediction is carried out to sample to be predicted new in destination item, process is as follows:
3.1) new sample is pre-processed, pretreatment is made of four kinds of standardized methods and TCA transfer learnings;
3.2) it calls trained synergetic classification device to classify pretreated each new samples, predicts whether it contains
It is defective.
Claims (3)
1. a kind of Software Defects Predict Methods based on collaboration migration, it is characterised in that:It the described method comprises the following steps:
1) by four kinds of different standardized methods and TCA transfer learnings method in combination with by the expansion of former source item data set
Into four new same size source item data sets, process is as follows:
1.1) known class target sample in destination item first, is divided into two parts according to category:Target training set and target
Test set, wherein require the similar mark sample number of the two identical and must all contain defective sample, other nothings in destination item
The sample of category is as target sample to be tested;
1.2) for current all and relevant set of source data of destination item, combining target test set is standardized, adopts
With 4 kinds of standardization processing methods:
First method is maxmin criterion, and computational methods are as follows:
Second method is to be standardized based on the common average value of source item and destination item and the Z-score of standard deviation, is calculated
Method is as follows:
The third method is to be standardized based on source item average value and the Z-score of standard deviation, and computational methods are as follows:
Fourth method is to be standardized based on destination item average value and the Z-score of standard deviation, and computational methods are as follows:
Wherein, x represents the vector expression of certain one-dimensional characteristic in the data set after source item merges with target training set, xiIt represents in x
The value of i-th of sample, min () are to be minimized, and max () is to be maximized, and mean () is to be averaged, and std () is to take mark
Poor, the x of standardi' it is xiIt is normalized treated value;
1.3) call TCA algorithms respectively to the source item data set after above-mentioned 4 kinds of standardizations and the former source item before processing
Mesh data set carries out transfer learning for corresponding target detection collection, obtains new source item data set and target detection
Collection;
2) synergetic classification device is built to destination item using the software defect prediction algorithm based on collaboration migration, process is as follows:
2.1) to after each expansion in data set source item data set and target training set utilize the decision in machine learning
Tree algorithm generates a sub-classifier respectively, and the plan tree algorithm selects the J48 algorithms in WEKA;
2.2) performance for combining synergetic classification device carries out adaptive weighting distribution for each sub-classifier;
3) failure prediction is carried out to sample to be predicted new in destination item, process is as follows:
3.1) new sample is pre-processed, pretreatment is made of four kinds of standardized methods and TCA transfer learnings;
3.2) it calls trained synergetic classification device to classify pretreated each new samples, predicts it whether containing scarce
It falls into.
2. the Software Defects Predict Methods as described in claim 1 based on collaboration migration, it is characterised in that:The step 1.3)
Process it is as follows:
1.3.1) determine that the data set weight expression characteristic dimension obtained after TCA migrations, i.e., the dimension of potential feature space are set
It is set to original half;
1.3.2) according to set latent space dimension, a kind of transformational relation is determined by gaussian kernel function so that source item
After former feature space is transformed into potential feature space, the two Largest Mean difference is minimum for data set and target data set, maximum
Mean value difference calculation is:
Wherein src is source item data set, and tar is destination item data set, n1For source item data set sample number, n2For target
Project data collection sample number, srciFor i-th of sample, tar in source itemiFor i-th of sample in destination item;
1.3.3 former N number of source item data set and 1 target detection collection) are extended for 5*N source item data set and corresponding
4*N+1 target detection collection.
3. the Software Defects Predict Methods as claimed in claim 1 or 2 based on collaboration migration, it is characterised in that:The step
2.2) process is as follows:
2.2.1) to synergetic classification device, index F-meaure good and bad with it is evaluated is defined first:
Define 1:It is synergetic classification that all sub-classifiers are had according to respective contribution to the grader obtained after being combined with stressing
Device, synergetic classification device classify in the following manner for a new samples j:
Wherein Scorei(j) each sub-classifier C is indicatediThe confidence level provided, i.e. sample j are the possibility of defective sample
Property, between the section of confidence level is 0 to 1, wiFor the weight of each sub-classifier, for indicating the sub-classifier for collaboration point
The contribution of class device, M are the number of sub-classifier, and threshold is to judge whether the sample contains defective confidence level threshold
Value, the sum of weighting confidence level of all sub-classifiers Comp (j) if more than the threshold value then by the sample classification be it is defective, otherwise
It is flawless;
Define 2:This optimization process is distributed for adaptive weighting, using F-measure as object function, computational methods
For:
F=(2 × P × R)/(P+R) (8)
P=TP/ (TP+FP) (9)
R=TP/ (TP+FN) (10)
Wherein, TP is real sample number, and representative is predicted as really containing defective sample number in defective sample;FP be it is false just
Sample number represents the sample number for being predicted as that defect is actually free of in defective sample;FN is false anti-sample number, represents prediction
Actually to contain the sample number for going defect in flawless sample;On this basis it can be calculated that P is the accurate of classification
Rate refers to and is predicted as in defective sample being really defective sample proportion, and the value is higher, and to represent grader more accurate;R is point
The recall rate of class, it is really to be predicted to be defective sample proportion in defective sample to refer to;F-measure be accuracy rate and
The harmonic-mean of recall rate;
2.2.2 PSO algorithms) are used when carrying out weight self-adjusted block to sub-classifier, are assigned with for all sub-classifiers
A series of weight (w1,w2,..,wn) and a defect estimation threshold value threshold, population scale and maximum are set first
Then number of iterations generates a series of particles according to population scale and carries out population initialization at random;One of weight and threshold value
It is combined as a solution, the disaggregation of all solution compositions is represented as the population in a search space;Position where one particle
It sets and is described by a series of coordinate values, each value represents a part for a solution, i.e. weighted value or threshold value;
2.2.3 the fitness of each particle) is calculated, fitness is the synergetic classification device pair formed using this group of weight and threshold value
The prediction effect of target detection collection, is indicated with F-measure;
2.2.4 the desired positions that the desired positions and entire population) and then according to the particle lived through are lived through, i.e. institute
Weight distribution and threshold value when obtaining F-measure maximums are arranged, and to update the position and speed of the particle, speed indicates the particle
Mobile distance and direction;
2.2.5) return to step 2.2.3) it is iterated, until greatest iteration number, exports in population and obtain maximum F-measure
The particle position of value, as optimal weight and threshold value;
2.2.6) according to optimal weight and threshold value, all sub-classifiers are built into a final synergetic classification device jointly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711417594.6A CN108304316B (en) | 2017-12-25 | 2017-12-25 | Software defect prediction method based on collaborative migration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711417594.6A CN108304316B (en) | 2017-12-25 | 2017-12-25 | Software defect prediction method based on collaborative migration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108304316A true CN108304316A (en) | 2018-07-20 |
CN108304316B CN108304316B (en) | 2021-04-06 |
Family
ID=62871017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711417594.6A Active CN108304316B (en) | 2017-12-25 | 2017-12-25 | Software defect prediction method based on collaborative migration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108304316B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325543A (en) * | 2018-10-10 | 2019-02-12 | 南京邮电大学 | Software Defects Predict Methods, readable storage medium storing program for executing and terminal |
CN110825644A (en) * | 2019-11-11 | 2020-02-21 | 南京邮电大学 | Cross-project software defect prediction method and system |
CN111131248A (en) * | 2019-12-24 | 2020-05-08 | 广东电科院能源技术有限责任公司 | Website application security defect detection model modeling method and defect detection method |
CN111367801A (en) * | 2020-02-29 | 2020-07-03 | 杭州电子科技大学 | Data transformation method for cross-company software defect prediction |
CN111881048A (en) * | 2020-07-31 | 2020-11-03 | 武汉理工大学 | Cross-project software aging defect prediction method |
CN112347392A (en) * | 2020-10-21 | 2021-02-09 | 上海淇玥信息技术有限公司 | Anti-fraud assessment method and device based on transfer learning and electronic equipment |
CN112527670A (en) * | 2020-12-18 | 2021-03-19 | 武汉理工大学 | Method for predicting software aging defects in project based on Active Learning |
CN112651950A (en) * | 2020-12-30 | 2021-04-13 | 珠海碳云智能科技有限公司 | Data processing method, sample classification method, model training method and device |
CN113268434A (en) * | 2021-07-08 | 2021-08-17 | 北京邮电大学 | Software defect prediction method based on Bayesian model and particle swarm optimization |
CN114757305A (en) * | 2022-06-13 | 2022-07-15 | 华中科技大学 | Voltage transformer insulation fault identification method and system based on ensemble learning |
CN117421244A (en) * | 2023-11-17 | 2024-01-19 | 北京邮电大学 | Multi-source cross-project software defect prediction method, device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102637143A (en) * | 2012-03-07 | 2012-08-15 | 南京邮电大学 | Software defect priority prediction method based on improved support vector machine |
CN102663100A (en) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | Two-stage hybrid particle swarm optimization clustering method |
CN103810101A (en) * | 2014-02-19 | 2014-05-21 | 北京理工大学 | Software defect prediction method and system |
KR101746328B1 (en) * | 2016-01-29 | 2017-06-12 | 한국과학기술원 | Hybrid instance selection method using nearest-neighbor for cross-project defect prediction |
CN106991047A (en) * | 2017-03-27 | 2017-07-28 | 中国电力科学研究院 | A kind of method and system for being predicted to object-oriented software defect |
-
2017
- 2017-12-25 CN CN201711417594.6A patent/CN108304316B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102637143A (en) * | 2012-03-07 | 2012-08-15 | 南京邮电大学 | Software defect priority prediction method based on improved support vector machine |
CN102663100A (en) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | Two-stage hybrid particle swarm optimization clustering method |
CN103810101A (en) * | 2014-02-19 | 2014-05-21 | 北京理工大学 | Software defect prediction method and system |
KR101746328B1 (en) * | 2016-01-29 | 2017-06-12 | 한국과학기술원 | Hybrid instance selection method using nearest-neighbor for cross-project defect prediction |
CN106991047A (en) * | 2017-03-27 | 2017-07-28 | 中国电力科学研究院 | A kind of method and system for being predicted to object-oriented software defect |
Non-Patent Citations (3)
Title |
---|
JAECHANG NAM: "Transfer Defect Learning", 《 2013年第35届软件工程国际会议(ICSE)》 * |
何吉元 等: "一种半监督集成跨项目软件缺陷预测方法", 《软件学报》 * |
郝世锦: "基于缺陷分层与PSO算法的软件缺陷预测模型", 《软件》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325543A (en) * | 2018-10-10 | 2019-02-12 | 南京邮电大学 | Software Defects Predict Methods, readable storage medium storing program for executing and terminal |
CN110825644A (en) * | 2019-11-11 | 2020-02-21 | 南京邮电大学 | Cross-project software defect prediction method and system |
CN111131248B (en) * | 2019-12-24 | 2021-09-24 | 南方电网电力科技股份有限公司 | Website application security defect detection model modeling method and defect detection method |
CN111131248A (en) * | 2019-12-24 | 2020-05-08 | 广东电科院能源技术有限责任公司 | Website application security defect detection model modeling method and defect detection method |
CN111367801A (en) * | 2020-02-29 | 2020-07-03 | 杭州电子科技大学 | Data transformation method for cross-company software defect prediction |
CN111881048A (en) * | 2020-07-31 | 2020-11-03 | 武汉理工大学 | Cross-project software aging defect prediction method |
CN112347392A (en) * | 2020-10-21 | 2021-02-09 | 上海淇玥信息技术有限公司 | Anti-fraud assessment method and device based on transfer learning and electronic equipment |
CN112527670A (en) * | 2020-12-18 | 2021-03-19 | 武汉理工大学 | Method for predicting software aging defects in project based on Active Learning |
CN112651950A (en) * | 2020-12-30 | 2021-04-13 | 珠海碳云智能科技有限公司 | Data processing method, sample classification method, model training method and device |
CN112651950B (en) * | 2020-12-30 | 2023-09-29 | 珠海碳云诊断科技有限公司 | Data processing method, sample classification method, model training method and device |
CN113268434A (en) * | 2021-07-08 | 2021-08-17 | 北京邮电大学 | Software defect prediction method based on Bayesian model and particle swarm optimization |
CN113268434B (en) * | 2021-07-08 | 2022-07-26 | 北京邮电大学 | Software defect prediction method based on Bayes model and particle swarm optimization |
CN114757305A (en) * | 2022-06-13 | 2022-07-15 | 华中科技大学 | Voltage transformer insulation fault identification method and system based on ensemble learning |
CN114757305B (en) * | 2022-06-13 | 2022-09-20 | 华中科技大学 | Voltage transformer insulation fault identification method and system based on ensemble learning |
CN117421244A (en) * | 2023-11-17 | 2024-01-19 | 北京邮电大学 | Multi-source cross-project software defect prediction method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108304316B (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304316A (en) | A kind of Software Defects Predict Methods based on collaboration migration | |
Zhao et al. | Local binary pattern-based adaptive differential evolution for multimodal optimization problems | |
CN109816092B (en) | Deep neural network training method and device, electronic equipment and storage medium | |
Stoean et al. | Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection | |
CN112507996B (en) | Face detection method of main sample attention mechanism | |
CN110059852A (en) | A kind of stock yield prediction technique based on improvement random forests algorithm | |
CN105095494A (en) | Method for testing categorical data set | |
CN108388925A (en) | The anti-pattern collapse robust image generation method for generating network is fought based on New Conditions | |
Zhang et al. | Feat: A fairness-enhancing and concept-adapting decision tree classifier | |
Das et al. | An oversampling technique by integrating reverse nearest neighbor in SMOTE: Reverse-SMOTE | |
Wozniak et al. | Designing combining classifier with trained fuser—Analytical and experimental evaluation | |
CN112819063A (en) | Image identification method based on improved Focal loss function | |
CN113011513B (en) | Image big data classification method based on general domain self-adaption | |
CN111797935B (en) | Semi-supervised depth network picture classification method based on group intelligence | |
CN111445025B (en) | Method and device for determining hyper-parameters of business model | |
Wangli et al. | Foxtail Millet ear detection approach based on YOLOv4 and adaptive anchor box adjustment | |
Cao et al. | Miac: Mutual-information classifier with adasyn for imbalanced classification | |
Nguyen-Thi et al. | Transfer AdaBoost SVM for link prediction in newly signed social networks using explicit and PNR features | |
Vaghela et al. | Boost a weak learner to a strong learner using ensemble system approach | |
Li et al. | GADet: A Geometry-Aware X-ray Prohibited Items Detector | |
Li et al. | Study on the Prediction of Imbalanced Bank Customer Churn Based on Generative Adversarial Network | |
CN110414845A (en) | For the methods of risk assessment and device of target transaction | |
Ren et al. | The class overlap model for system log anomaly detection based on ensemble learning | |
Pacifico et al. | Evolutionary elms with alternative treatments for the population out-bounded individuals | |
Dhivya et al. | Weighted particle swarm optimization algorithm for randomized unit testing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |