CN109656808A - A kind of Software Defects Predict Methods based on hybrid active learning strategies - Google Patents
A kind of Software Defects Predict Methods based on hybrid active learning strategies Download PDFInfo
- Publication number
- CN109656808A CN109656808A CN201811319619.3A CN201811319619A CN109656808A CN 109656808 A CN109656808 A CN 109656808A CN 201811319619 A CN201811319619 A CN 201811319619A CN 109656808 A CN109656808 A CN 109656808A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- comentropy
- active learning
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3608—Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of Software Defects Predict Methods based on hybrid active learning strategies, Active Learning Method is cooperateed with relative entropy using the comentropy based on cost-sensitive, this method uses common comentropy as the evaluation index of high-quality sample, sample higher for comentropy marks by hand, the sample of low comentropy, significantly more efficient expansion labeled data collection are further analyzed using relative entropy simultaneously.It is shown experimentally that, software defect estimated performance can be improved in the present invention, reduces artificial mark cost, more efficiently.
Description
Technical field
The present invention relates to active learning techniques fields, more particularly to a kind of based on the soft of hybrid active learning strategies
Part failure prediction method.
Background technique
Software defect module will cause the operation failure in enterprise production process, lead to appearance of enterprise heavy losses, reduce
The satisfaction of client.Software defect prediction model is used to find software defect module, common model as early as possible in software development phase
It include the model and unsupervised model etc. of supervision.
If software project has history labeled data abundant, there can be the engineering of supervision by establishing
Model is practised, to construct same project software failure prediction (within-project defect prediction) model, is assessed soft
The probability of part Module defect or the defect number for calculating some module etc..In actual software development process, if software
Project is that the training data of completely new project or this project is fewer, then needing enterprise for defect module marks work
It devotes considerable time, while the work is professional relatively stronger work, software module mark is needed more professional
Personnel carry out, therefore the foundation of software defect prediction model requires a great deal of time, put into more manpower, mention
The cost of software development is risen.
Active Learning allows enterprise in face of magnanimity mark to solve the problems, such as that sample mark provides a variety of query strategies
It actively selects some sample to be labeled when injection molding block, sample to be marked is manually marked to completion and is added to mark sample later
Number of cases quickly establishes software defect prediction model according to concentration.The selection strategy of Active Learning be used to predict number from software defect
High-quality sample is selected according to concentrating, dimension reduction is used in combination by spread training data set after manually marking in sample, special
The other machines learning methods such as sign selection promote the performance of software defect prediction.
The selection strategy used includes the common strategies such as Uncertainty information entropy, however is not paid close attention in these researchs low
The higher sample of the sample of comentropy, i.e. degree of certainty, the often sample quilt of low comentropy during Active Learning one query
It abandons, the utilization of low comentropy sample is seldom related to.
Patent No. CN201710271035.2 discloses a kind of multi-tag Active Learning side that tally set is relied on based on condition
Method, by integrating simultaneously to sample information entropy and relative entropy, object of the big sample of filter information amount as Active Learning should
Relative entropy is added although having used comentropy and the cooperative principle of relative entropy, in comentropy processing stage in method simultaneously
Calculate, instead can operational efficiency to system and effect have an adverse effect, in addition low comentropy sample is not also fine
Utilization.
Summary of the invention
To solve manually to mark at high cost, the low problem of estimated performance, the present invention provides one kind to be based on hybrid active
The Software Defects Predict Methods of learning strategy.
A kind of Software Defects Predict Methods based on hybrid active learning strategies, which is characterized in that described based on mixing
The Software Defects Predict Methods of formula active learning strategies cooperate with Active Learning with relative entropy using the comentropy based on cost-sensitive
Method, referred to as UNCERTAINTYKL model, the UNCERTAINTYKL model use information entropy is as high-quality sample
Evaluation index is chosen the higher sample of comentropy from unmarked sample data and is marked by hand, while using relative entropy come into one
Step analyzes the sample of low comentropy, further expands marked data set.
Preferably, the UNCERTAINTYKL model the following steps are included:
Step 1: the comentropy of each unmarked sample data is calculated by comentropy calculation formula;Step 2: passing through meter
Calculate formula (1) selects the highest data sample of comentropy that domain expert is transferred to manually to be marked from unmarked sample data,
Marked data set is added after the completion of mark;
Step 3: the minimum unmarked sample data of remaining information entropy in screening step 2, using relative entropy calculation into
Rower note;
Step 4: presetting an opposite entropy threshold if relative entropy is lower than threshold value and the sample is added to marked number
According to collection, while using the label of prediction result as the puppet label of the data;If relative entropy is higher than threshold value, abandon to the sample
Processing.
Preferably, the calculation of the highest data sample of comentropy described in step 2 is as follows:
xu,max=argmax (- ∑iPθ(yi/x)logPθ(yi/x)) (1)
Wherein i indicate not mark for i-th sample (i=1,2 ... u), yiIndicate label value belonging to label to be sorted,
xu,maxIt indicates that the Unlabeled data obtained according to formula (1) concentrates the maximum data sample of comentropy, belongs to classification yiPrediction
Probability value, pθ(yi/ x) it indicates based under marked data set data distribution.
Preferably, relative entropy calculation described in step 3 includes following formula:
Indicate the mean value according to all disaggregated model KLD relative entropy being calculated, xu,minIt indicates according to formula (1)
Obtained Unlabeled data concentrates the smallest data sample of comentropy, and C indicates the classifier number of inquiry committee, classifier
Data set be dynamic update Dl, classify committee C={ θ1,...,θm, the classifier member for the committee of classifying represents not
Same classification policy can calculate current markers, P to Unlabeled dataC(yi/ x) for indicating inquiry committee's classification mould
Type is for label y belonging to label to be sortediProbability average value, D (Pθ(C)/PC) presentation class model θiTo other models
Relative Entropy.
Preferably, threshold value described in step 4 is set as empirical value 0.1, if describedValue meets threshold range, then makes
Use θiTo xu,minIn sample carry out pseudo- label.
Preferably, to solve the problems, such as the model solution, using following segmented optimisation strategy, optimization process is as follows:
A. system initialization: before system brings into operation, a part of sample is taken out from sample set pond and transfers to field special
Family carries out manual mark, which is denoted as Dl, the sample mode which combines is random to take from sample set
Sample, by DlData set is completed to disaggregated model θ1First training, as it is subsequent to data untagged classification basis;
B. unmarked sample actively selects: using disaggregated model θ1Each unmarked sample is predicted, according to public
Formula calculates the comentropy of each sample, and the maximum sample x of comentropy is taken out in sequenceu,maxDomain expert is transferred to carry out manual mark
Note, and by xu,maxFlag data collection D is addedl;
C. the pseudo- label processing of degree of certainty highest sample: by the sample x that comentropy is minimumu,minIt takes out, is calculated according to formula
KLD is compared, if meeting threshold value, to x by relative entropy, i.e. KLD with threshold valueu,minMark, and by xu,minLabel is added
Data set Dl;
D. disaggregated model updates: using flag data collection DlTrain classification models θ again1, then recycle until meeting eventually
Only until condition.
The utility model has the advantages that
1. Fusion Information Entropy and relative entropy cooperate, hybrid-type Active Learning query strategy is taken, by using phase
Entropy further analyzes low comentropy, the information for more making full use of sample to include, to improve software defect prediction
Performance finds more rapidly the defect module of software.
2. use hybrid-type Active Learning query strategy, enterprise only need early investment relative to less artificial mark at
This, to obtain better software defect predictive ability, can preferably control cost under the premise of meeting enterprise demand,
Save manpower.
Detailed description of the invention
Fig. 1 is the algorithm flow schematic diagram of the Software Defects Predict Methods of the hybrid active learning strategies of the present invention
Fig. 2 is AUC index schematic diagram of the Equinox data set under different Active Learning query strategies
Fig. 3 is AUC index schematic diagram of the Eclipse JDT Core data set under different Active Learning query strategies
Fig. 4 is AUC index schematic diagram of the Apache Lucene data set under different Active Learning query strategies
Fig. 5 is AUC index schematic diagram of the Mylyn data set under different Active Learning query strategies
Fig. 6 is AUC index schematic diagram of the Eclipse PED UI data set under different Active Learning query strategies
Specific embodiment
The part we will be described in detail it is proposed that cooperateed with based on the comentropy of cost-sensitive with relative entropy and actively learn
It practises tactful (Cost-Effective Entropy Kullback-Leibler-divergence Active Learning), letter
Referred to as UNCERTAINTYKL model.The learning strategy will be applied to common AEEEEM data set in software defect prediction
In, by the improvement disaggregated model of selection strategy increment, reaching classifier when identical data mark amount be can be realized more
Good classification indicators.
UNCERTAINTYKL active learning strategies
UNCERTAINTYKL model of the invention incorporates the thinking of coorinated training in the learning strategy of Active Learning, leads to
Cross the creation for minimizing number of labels to complete model.Active learning strategies based on Uncertainty information entropy will be marked never
Comentropy is highest is labeled by domain expert for selection in data, while the data to be marked that comentropy is minimum, is entrusted by inquiry
Member can vote according to KLD, if by the lower (warp being arranged at present of relative entropy for inquiring the sample that committee member calculates
Testing threshold value is 0.1), then the sample to be added to labeled data collection, while using the label of prediction result as the puppet mark of the data
Note, it is specific as follows
xu,max=argmax (- ∑iPθ(yi/x)logPθ(yi/x)) (1)
Define the data set that sample has been marked containing lDefine the data set not marked containing uIndicate not mark sample i-th that (i=1,2 ... u), xu,maxIndicate the unmarked number obtained according to (1)
According to the concentration maximum data sample of comentropy, pθ(yi/ x) it indicates to be based under labeled data collection data distribution, xu,maxBelong to
In classification yiPrediction probability value, argmax (- ∑iPθ(yi/x)logPθ(yi/ x)) indicate selection unlabeled data lumped values most
Big data sample.
xu,minIndicate that the Unlabeled data obtained according to (1) concentrates the smallest data sample of comentropy, C indicates inquiry committee
The classifier number of member's meeting, the data set of classifier are the D that dynamic updatesl.Classification committee C={ θ1,...,θm, classification committee
The classifier member of member's meeting represents different classification policies, can calculate current markers to Unlabeled data.PC(yi/ x) it uses
In expression inquiry committee's disaggregated model for label y belonging to label to be sortediProbability average value.D(Pθ(C)/PC) table
Show disaggregated model θiTo the Relative Entropy of other models,Expression is calculated opposite according to all disaggregated model KLD
The mean value of entropy.If relative entropy is less than threshold value, θ is usediPseudo- label is carried out to example to be marked.
The algorithm marked above is specific as follows:
For the optimisation strategy of segmented for solving the problems, such as the model solution, optimization process is as follows:
A system initialization
Before system brings into operation, a part of sample is taken out from sample set pond, domain expert is transferred to carry out manual mark
Note, the set are denoted as Dl.The initial markers combine sample mode be it is random sampled from sample set, by DlData set is complete
Pairs of disaggregated model θ1First training, as it is subsequent to data untagged classification basis;
The unmarked sample of B actively selects
Use disaggregated model θ1Each unmarked sample is predicted, the letter of each sample is calculated according to formula (1)
Entropy is ceased, the maximum sample x of comentropy is taken out in sequenceu,maxDomain expert is transferred to carry out manual mark, and by xu,maxLabel is added
Data set Dl;
The pseudo- label of C degree of certainty highest sample is handled
By the sample x that comentropy is minimumu,minTake out, according to formula (2) (3) (4) calculate relative entropy, i.e. KLD, by KLD with
Threshold value is compared, if meeting threshold value, to xu,minMark, and by xu,minFlag data collection D is addedl;
D disaggregated model updates
Use flag data collection DlTrain classification models θ again1, then recycle until meeting termination condition.
Entire algorithmic procedure can be summarized as algorithm 1
1 segmented strategy of algorithm solves UNCERTAINTYKL model
1:Input: initialization tag data set Dl={ x1,...xl, Unlabeled data collection Du={ xl+1,...xl+u},
Data markers y1i,...yl, maximum cycle Umax, KLD threshold value threshold
2:Output: classification committee setClassification performance set
3: using the disaggregated model of the flag data initial interrogation committee
4:while current cycle time < maximum cycle | | not converged do
5:for i < -1to Umax do
6: taking out committeeman θi, the training pattern on flag data collection
7: to current x(i)Class probability P is calculated, according to argmax (- ∑iPθ(yi/x)logPθ(yi/ x)) calculate corresponding letter
Cease entropy
8: the maximum sample x of comentropy(i)It transfers to domain expert manually to be marked, and flag data collection is added
9: the smallest sample x of comentropy(t), relative entropy, i.e. average KLD value are calculated according to formula (2) (3) (4).
10:Dl=Dl∪x(i);Du=Du\x(i)
11:if KLD > threshold:Dl=Dl∪x(t);Du=Du\x(t)
12:i=i+1
13:end for
14:end while
15:
Experimental design
Evaluating object
It is pre- to software defect that the present invention will analyze assessment UNCERTAINTYKL query strategy on public data collection AEEEM
Survey field Active Learning will be used to assess different learning strategies using tactful influence, AEEEM data set.The data set
It is widely used in software defect prediction field.The data set is used to carry out performance comparison as software defect prediction field
Benchmark dataset.The data set provides 61 indexs, including software development process measurement etc., in this experiment 61 fingers
Mark all be used to do classifier modeling.AEEEM data set summary info is shown in table 1.
1 AEEEM data set of table is summarized
Experimental setup
Experiment uses 5*2 folding cross validation and is tested, and experiment all does Optimum allocation random stratified sampling survey, half to data every time
As training data, a half data prevents from generating data overlap between training data and test data data as test data
And make evaluation result not independent.It takes out a certain proportion of data in training data manually to be marked, in this experiment initially
The ratio of flag data is 30%, and using original tag data collection train classification models, remaining 70% data, which are used as, not to be marked
Data are therefrom selected according to active learning strategies.Classifier support vector machines (supporting vector in experiment
Machine it) is trained using the RBF core and default parameters realized by libsvm.Make in UNCERTAINTYKL query strategy
With the random forest grader RandomForestClassifier realized by sklearn, default parameters is used in training.
It in UNCERTAINTYKL strategy, is iterated for unlabeled data, each iteration selects a highest sample of uncertainty
Example is manually labeled, and the minimum sample of simultaneous selection degree of certainty further uses KLD and judged, the threshold value setting of judgement
For empirical value 0.1.
Evaluation metrics
There are class imbalance problems for AEEEM data set, can be preferable using AUC (area under ROC curve) index
Reflection Active Learning query strategy performance, while AUC (area under ROC cruve) index be also carry out software
One of most index when failure prediction.The index is based on ROC curve, and the full name of ROC curve is subject's work
Feature (receiver operation characteristic) curve.Binary classification model is mixed in software defect prediction
The matrix that confuses is as shown in table 2.
2 confusion matrix of table
Pseudo- positive rate (FPR) is defined as X-axis by ROC curve, and true positive rate (TPR) is defined as Y-axis.TPR: in all realities
Border is to be correctly judged in the sample of defective module as the ratio of defective module.FPR: being zero defect in all reality
In the sample of module, it is wrongly judged the ratio for defective module.
TPR=TP/ (TP+FN)
FPR=FP/ (TN+FP)
It is area below ROC curve that AUC value is corresponding, and value range is worth more big then model between 0 to 1
Performance is better.
Pedestal method
Present invention uses the query strategies of following three kinds of Active Learnings actively to learn as baseline and UNCERTAINTYKL
Strategy is practised to be compared:
(1) stochastic sampling strategy (random): a query case is selected in random slave unlabeled data, transfers to lead
Domain expert is labeled, and is added to training data concentration;
(2) sampling policy (uncertainty) based on Uncertainty information entropy: based on SVM classifier to training data
Collection is trained, and is predicted each unlabeled data, and comentropy is calculated, and the highest example of uncertainty is selected to be labeled;
(3) SVM and random forest building inquiry the active learning strategies based on inquiry committee (committee): are used
Mark sample is inquired by the committee.It has used all training datas to be trained simultaneously, has been surveyed in test data
Examination, in order to compare with other active learning strategies, it is on training set that the training performance of the model, which can be approximately considered,
Optimum training data model.
Experimental result and analysis
The analysis result following table 3 of Fig. 2 to Fig. 6 analysis chart indicates, specific as follows:
Table 3:AUC value compare (mean value+standard deviation) based on paired t-tests confidence level be 95% optimum performance with
Black matrix mark
Table 4: the win/tie/loss of uncertaintykl model and other models is to score under different labeled ratio situation
Analysis
As shown in table 3-4, Fig. 2 to Fig. 6 illustrates the different Active Learning query strategies situation different in mark example
The situation of change of lower AUC value.
Table 3 illustrates the variation feelings of the AUC value when marking sample ratio and being 10%, 20%, 30%, 40%, 50%
Condition.When marking sample ratio greater than 50%, UNCERTAINTYKL active learning strategies have been completed using pseudo- labeling method
The label of all samples, without unmarked sample.It is united using the paired t-tests that certainty factor is 95%
Meter analysis, the optimal model of performance is marked out to come.Different learning strategies under 4 pairs of table mark ratio different situations are done
Win/tie/loss analysis, has counted UNCERTAINTYKL learning strategy and committee, random, uncertainty etc.
Tactful comparing result.
Firstly, we can observe that as unmarked sample gradually decreases, mark sample be added to labeled data collection with
Afterwards, evaluation metrics are all substantially able to maintain the trend of rising, which shows the validity of the learning strategy of Active Learning;
The sampling policy effect of uncertainty is pretty good, it was demonstrated that the strategy can be used as the baseline strategy in Active Learning field;Based on looking into
The active learning strategies of the inquiry committee show biggish unstability in AEEEM data set;In most cases,
UNCERTAINTYKL effect is best, all has a distinct increment than other several active learning strategies, performance boost in most situations
13%.
Above-described embodiment is presently preferred embodiments of the present invention, is not a limitation on the technical scheme of the present invention, as long as
Without the technical solution that creative work can be realized on the basis of the above embodiments, it is regarded as falling into of the invention special
In the rights protection scope of benefit.
Claims (6)
1. a kind of Software Defects Predict Methods based on hybrid active learning strategies, which is characterized in that described based on hybrid
The Software Defects Predict Methods of active learning strategies cooperate with Active Learning side with relative entropy using the comentropy based on cost-sensitive
Method, referred to as UNCERTAINTYKL model, evaluation of the UNCERTAINTYKL model use information entropy as high-quality sample
Index is chosen the higher sample of comentropy from unmarked sample data and marked by hand, while further divided using relative entropy
The sample of low comentropy is analysed, marked data set is further expanded.
2. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 1, feature
Be, the UNCERTAINTYKL model the following steps are included:
Step 1: the comentropy of each unmarked sample data is calculated by comentropy calculation formula;
Step 2: selecting the highest data sample of comentropy to transfer to field special from unmarked sample data by calculation formula (1)
Family is manually marked, and marked data set is added after the completion of mark;
Step 3: the minimum unmarked sample data of remaining information entropy in screening step 2 is marked using relative entropy calculation
Note;
Step 4: an opposite entropy threshold is preset, if relative entropy is lower than threshold value, the sample is added to marked data set,
Use the label of prediction result as the puppet label of the data simultaneously;If relative entropy is higher than threshold value, the processing to the sample is abandoned.
3. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 2, feature
It is, the calculation of the highest data sample of comentropy described in step 2 is as follows:
xu,max=argmax (- ∑iPθ(yi/x)logPθ(yi/x))(1)
Wherein i indicate not mark for i-th sample (i=1,2 ... u), yiIndicate label value belonging to label to be sorted, xu,maxTable
Show and the maximum data sample of comentropy is concentrated according to the Unlabeled data that formula (1) obtains, belongs to classification yiPrediction probability value,
pθ(yi/ x) it indicates based under marked data set data distribution.
4. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 2, feature
It is, relative entropy calculation described in step 3 includes following formula:
Indicate the mean value according to all disaggregated model KLD relative entropy being calculated, xu,minWhat expression was obtained according to formula (1)
Unlabeled data concentrates the smallest data sample of comentropy, and C indicates the classifier number of inquiry committee, the data set of classifier
The D updated for dynamicl, classify committee C={ θ1,...,θm, the classifier member for the committee of classifying represents different classification plans
Slightly, current markers, P can be calculated to Unlabeled dataC(yi/ x) for indicating inquiry committee's disaggregated model for be sorted
Label y belonging to labeliProbability average value, D (Pθ(C)/PC) presentation class model θiTo the Relative Entropy of other models.
5. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 2, feature
It is, threshold value described in step 4 is set as empirical value 0.1, if describedValue meets threshold range, then uses θiTo xu,minIn
Sample carry out pseudo- label.
6. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 1, feature
It is, to solve the problems, such as the model solution, using following segmented optimisation strategy, optimization process is as follows:
A. system initialization: before system brings into operation, taken out from sample set pond a part of sample transfer to domain expert into
Mark, the set are denoted as D to row by handl, the initial markers combine sample mode be it is random sampled from sample set, by Dl
Data set is completed to disaggregated model θ1First training, as it is subsequent to data untagged classification basis;
B. unmarked sample actively selects: using disaggregated model θ1Each unmarked sample is predicted, is calculated according to formula
The maximum sample x of comentropy is taken out in the comentropy of each sample, sequenceu,maxDomain expert is transferred to carry out manual mark, and will
xu,maxFlag data collection D is addedl;
C. the pseudo- label processing of degree of certainty highest sample: by the sample x that comentropy is minimumu,minIt takes out, is calculated according to formula opposite
KLD is compared, if meeting threshold value, to x by entropy, i.e. KLD with threshold valueu,minMark, and by xu,minFlag data is added
Collect Dl;
D. disaggregated model updates: using flag data collection DlTrain classification models θ again1, then recycle until meeting termination condition
Until.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811319619.3A CN109656808B (en) | 2018-11-07 | 2018-11-07 | Software defect prediction method based on hybrid active learning strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811319619.3A CN109656808B (en) | 2018-11-07 | 2018-11-07 | Software defect prediction method based on hybrid active learning strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109656808A true CN109656808A (en) | 2019-04-19 |
CN109656808B CN109656808B (en) | 2022-03-11 |
Family
ID=66110556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811319619.3A Active CN109656808B (en) | 2018-11-07 | 2018-11-07 | Software defect prediction method based on hybrid active learning strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109656808B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353291A (en) * | 2019-12-27 | 2020-06-30 | 北京合力亿捷科技股份有限公司 | Method and system for calculating optimal label set based on complaint work order training text |
CN111400617A (en) * | 2020-06-02 | 2020-07-10 | 四川大学 | Social robot detection data set extension method and system based on active learning |
CN111506504A (en) * | 2020-04-13 | 2020-08-07 | 扬州大学 | Software development process measurement-based software security defect prediction method and device |
CN111914061A (en) * | 2020-07-13 | 2020-11-10 | 上海乐言信息科技有限公司 | Radius-based uncertainty sampling method and system for text classification active learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104166706A (en) * | 2014-08-08 | 2014-11-26 | 苏州大学 | Multi-label classifier constructing method based on cost-sensitive active learning |
CN104899135A (en) * | 2015-05-14 | 2015-09-09 | 工业和信息化部电子第五研究所 | Software defect prediction method and system |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
-
2018
- 2018-11-07 CN CN201811319619.3A patent/CN109656808B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104166706A (en) * | 2014-08-08 | 2014-11-26 | 苏州大学 | Multi-label classifier constructing method based on cost-sensitive active learning |
CN104899135A (en) * | 2015-05-14 | 2015-09-09 | 工业和信息化部电子第五研究所 | Software defect prediction method and system |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
Non-Patent Citations (2)
Title |
---|
YU-HANG ZHOU: "Large Margin Distribution Learning", 《IEEE》 * |
杨杰: "基于小样本的作战系统失效预测模型研究及应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353291A (en) * | 2019-12-27 | 2020-06-30 | 北京合力亿捷科技股份有限公司 | Method and system for calculating optimal label set based on complaint work order training text |
CN111506504A (en) * | 2020-04-13 | 2020-08-07 | 扬州大学 | Software development process measurement-based software security defect prediction method and device |
CN111506504B (en) * | 2020-04-13 | 2023-04-07 | 扬州大学 | Software development process measurement-based software security defect prediction method and device |
CN111400617A (en) * | 2020-06-02 | 2020-07-10 | 四川大学 | Social robot detection data set extension method and system based on active learning |
CN111914061A (en) * | 2020-07-13 | 2020-11-10 | 上海乐言信息科技有限公司 | Radius-based uncertainty sampling method and system for text classification active learning |
CN111914061B (en) * | 2020-07-13 | 2021-04-16 | 上海乐言科技股份有限公司 | Radius-based uncertainty sampling method and system for text classification active learning |
Also Published As
Publication number | Publication date |
---|---|
CN109656808B (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | An end-to-end steel surface defect detection approach via fusing multiple hierarchical features | |
CN109656808A (en) | A kind of Software Defects Predict Methods based on hybrid active learning strategies | |
CN110796186A (en) | Dry and wet garbage identification and classification method based on improved YOLOv3 network | |
CN109741332A (en) | A kind of image segmentation and mask method of man-machine coordination | |
CN105975913B (en) | Road network extraction method based on adaptive cluster learning | |
CN109919106B (en) | Progressive target fine recognition and description method | |
NL2029214B1 (en) | Target re-indentification method and system based on non-supervised pyramid similarity learning | |
CN105389583A (en) | Image classifier generation method, and image classification method and device | |
CN104331716A (en) | SVM active learning classification algorithm for large-scale training data | |
CN112836739B (en) | Classification model building method based on dynamic joint distribution alignment and application thereof | |
CN109934203A (en) | A kind of cost-sensitive increment type face identification method based on comentropy selection | |
CN108898225A (en) | Data mask method based on man-machine coordination study | |
CN110263934A (en) | A kind of artificial intelligence data mask method and device | |
CN111353377A (en) | Elevator passenger number detection method based on deep learning | |
CN104680185A (en) | Hyperspectral image classification method based on boundary point reclassification | |
CN109523514A (en) | To the batch imaging quality assessment method of Inverse Synthetic Aperture Radar ISAR | |
CN116738551B (en) | Intelligent processing method for acquired data of BIM model | |
CN116704208B (en) | Local interpretable method based on characteristic relation | |
CN103093239B (en) | A kind of merged point to neighborhood information build drawing method | |
CN109409394A (en) | A kind of cop-kmeans method and system based on semi-supervised clustering | |
Li et al. | GADet: A Geometry-Aware X-ray Prohibited Items Detector | |
CN107909090A (en) | Learn semi-supervised music-book on pianoforte difficulty recognition methods based on estimating | |
CN116894113A (en) | Data security classification method and data security management system based on deep learning | |
CN105160336A (en) | Sigmoid function based face recognition method | |
CN112199287B (en) | Cross-project software defect prediction method based on enhanced hybrid expert model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |