CN109656808A - A kind of Software Defects Predict Methods based on hybrid active learning strategies - Google Patents

A kind of Software Defects Predict Methods based on hybrid active learning strategies Download PDF

Info

Publication number
CN109656808A
CN109656808A CN201811319619.3A CN201811319619A CN109656808A CN 109656808 A CN109656808 A CN 109656808A CN 201811319619 A CN201811319619 A CN 201811319619A CN 109656808 A CN109656808 A CN 109656808A
Authority
CN
China
Prior art keywords
sample
data
comentropy
active learning
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811319619.3A
Other languages
Chinese (zh)
Other versions
CN109656808B (en
Inventor
曲豫宾
李芳�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Textile Vocational Technology College
Original Assignee
Nantong Textile Vocational Technology College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Textile Vocational Technology College filed Critical Nantong Textile Vocational Technology College
Priority to CN201811319619.3A priority Critical patent/CN109656808B/en
Publication of CN109656808A publication Critical patent/CN109656808A/en
Application granted granted Critical
Publication of CN109656808B publication Critical patent/CN109656808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Software Defects Predict Methods based on hybrid active learning strategies, Active Learning Method is cooperateed with relative entropy using the comentropy based on cost-sensitive, this method uses common comentropy as the evaluation index of high-quality sample, sample higher for comentropy marks by hand, the sample of low comentropy, significantly more efficient expansion labeled data collection are further analyzed using relative entropy simultaneously.It is shown experimentally that, software defect estimated performance can be improved in the present invention, reduces artificial mark cost, more efficiently.

Description

A kind of Software Defects Predict Methods based on hybrid active learning strategies
Technical field
The present invention relates to active learning techniques fields, more particularly to a kind of based on the soft of hybrid active learning strategies Part failure prediction method.
Background technique
Software defect module will cause the operation failure in enterprise production process, lead to appearance of enterprise heavy losses, reduce The satisfaction of client.Software defect prediction model is used to find software defect module, common model as early as possible in software development phase It include the model and unsupervised model etc. of supervision.
If software project has history labeled data abundant, there can be the engineering of supervision by establishing Model is practised, to construct same project software failure prediction (within-project defect prediction) model, is assessed soft The probability of part Module defect or the defect number for calculating some module etc..In actual software development process, if software Project is that the training data of completely new project or this project is fewer, then needing enterprise for defect module marks work It devotes considerable time, while the work is professional relatively stronger work, software module mark is needed more professional Personnel carry out, therefore the foundation of software defect prediction model requires a great deal of time, put into more manpower, mention The cost of software development is risen.
Active Learning allows enterprise in face of magnanimity mark to solve the problems, such as that sample mark provides a variety of query strategies It actively selects some sample to be labeled when injection molding block, sample to be marked is manually marked to completion and is added to mark sample later Number of cases quickly establishes software defect prediction model according to concentration.The selection strategy of Active Learning be used to predict number from software defect High-quality sample is selected according to concentrating, dimension reduction is used in combination by spread training data set after manually marking in sample, special The other machines learning methods such as sign selection promote the performance of software defect prediction.
The selection strategy used includes the common strategies such as Uncertainty information entropy, however is not paid close attention in these researchs low The higher sample of the sample of comentropy, i.e. degree of certainty, the often sample quilt of low comentropy during Active Learning one query It abandons, the utilization of low comentropy sample is seldom related to.
Patent No. CN201710271035.2 discloses a kind of multi-tag Active Learning side that tally set is relied on based on condition Method, by integrating simultaneously to sample information entropy and relative entropy, object of the big sample of filter information amount as Active Learning should Relative entropy is added although having used comentropy and the cooperative principle of relative entropy, in comentropy processing stage in method simultaneously Calculate, instead can operational efficiency to system and effect have an adverse effect, in addition low comentropy sample is not also fine Utilization.
Summary of the invention
To solve manually to mark at high cost, the low problem of estimated performance, the present invention provides one kind to be based on hybrid active The Software Defects Predict Methods of learning strategy.
A kind of Software Defects Predict Methods based on hybrid active learning strategies, which is characterized in that described based on mixing The Software Defects Predict Methods of formula active learning strategies cooperate with Active Learning with relative entropy using the comentropy based on cost-sensitive Method, referred to as UNCERTAINTYKL model, the UNCERTAINTYKL model use information entropy is as high-quality sample Evaluation index is chosen the higher sample of comentropy from unmarked sample data and is marked by hand, while using relative entropy come into one Step analyzes the sample of low comentropy, further expands marked data set.
Preferably, the UNCERTAINTYKL model the following steps are included:
Step 1: the comentropy of each unmarked sample data is calculated by comentropy calculation formula;Step 2: passing through meter Calculate formula (1) selects the highest data sample of comentropy that domain expert is transferred to manually to be marked from unmarked sample data, Marked data set is added after the completion of mark;
Step 3: the minimum unmarked sample data of remaining information entropy in screening step 2, using relative entropy calculation into Rower note;
Step 4: presetting an opposite entropy threshold if relative entropy is lower than threshold value and the sample is added to marked number According to collection, while using the label of prediction result as the puppet label of the data;If relative entropy is higher than threshold value, abandon to the sample Processing.
Preferably, the calculation of the highest data sample of comentropy described in step 2 is as follows:
xu,max=argmax (- ∑iPθ(yi/x)logPθ(yi/x)) (1)
Wherein i indicate not mark for i-th sample (i=1,2 ... u), yiIndicate label value belonging to label to be sorted, xu,maxIt indicates that the Unlabeled data obtained according to formula (1) concentrates the maximum data sample of comentropy, belongs to classification yiPrediction Probability value, pθ(yi/ x) it indicates based under marked data set data distribution.
Preferably, relative entropy calculation described in step 3 includes following formula:
Indicate the mean value according to all disaggregated model KLD relative entropy being calculated, xu,minIt indicates according to formula (1) Obtained Unlabeled data concentrates the smallest data sample of comentropy, and C indicates the classifier number of inquiry committee, classifier Data set be dynamic update Dl, classify committee C={ θ1,...,θm, the classifier member for the committee of classifying represents not Same classification policy can calculate current markers, P to Unlabeled dataC(yi/ x) for indicating inquiry committee's classification mould Type is for label y belonging to label to be sortediProbability average value, D (Pθ(C)/PC) presentation class model θiTo other models Relative Entropy.
Preferably, threshold value described in step 4 is set as empirical value 0.1, if describedValue meets threshold range, then makes Use θiTo xu,minIn sample carry out pseudo- label.
Preferably, to solve the problems, such as the model solution, using following segmented optimisation strategy, optimization process is as follows:
A. system initialization: before system brings into operation, a part of sample is taken out from sample set pond and transfers to field special Family carries out manual mark, which is denoted as Dl, the sample mode which combines is random to take from sample set Sample, by DlData set is completed to disaggregated model θ1First training, as it is subsequent to data untagged classification basis;
B. unmarked sample actively selects: using disaggregated model θ1Each unmarked sample is predicted, according to public Formula calculates the comentropy of each sample, and the maximum sample x of comentropy is taken out in sequenceu,maxDomain expert is transferred to carry out manual mark Note, and by xu,maxFlag data collection D is addedl
C. the pseudo- label processing of degree of certainty highest sample: by the sample x that comentropy is minimumu,minIt takes out, is calculated according to formula KLD is compared, if meeting threshold value, to x by relative entropy, i.e. KLD with threshold valueu,minMark, and by xu,minLabel is added Data set Dl
D. disaggregated model updates: using flag data collection DlTrain classification models θ again1, then recycle until meeting eventually Only until condition.
The utility model has the advantages that
1. Fusion Information Entropy and relative entropy cooperate, hybrid-type Active Learning query strategy is taken, by using phase Entropy further analyzes low comentropy, the information for more making full use of sample to include, to improve software defect prediction Performance finds more rapidly the defect module of software.
2. use hybrid-type Active Learning query strategy, enterprise only need early investment relative to less artificial mark at This, to obtain better software defect predictive ability, can preferably control cost under the premise of meeting enterprise demand, Save manpower.
Detailed description of the invention
Fig. 1 is the algorithm flow schematic diagram of the Software Defects Predict Methods of the hybrid active learning strategies of the present invention
Fig. 2 is AUC index schematic diagram of the Equinox data set under different Active Learning query strategies
Fig. 3 is AUC index schematic diagram of the Eclipse JDT Core data set under different Active Learning query strategies
Fig. 4 is AUC index schematic diagram of the Apache Lucene data set under different Active Learning query strategies
Fig. 5 is AUC index schematic diagram of the Mylyn data set under different Active Learning query strategies
Fig. 6 is AUC index schematic diagram of the Eclipse PED UI data set under different Active Learning query strategies
Specific embodiment
The part we will be described in detail it is proposed that cooperateed with based on the comentropy of cost-sensitive with relative entropy and actively learn It practises tactful (Cost-Effective Entropy Kullback-Leibler-divergence Active Learning), letter Referred to as UNCERTAINTYKL model.The learning strategy will be applied to common AEEEEM data set in software defect prediction In, by the improvement disaggregated model of selection strategy increment, reaching classifier when identical data mark amount be can be realized more Good classification indicators.
UNCERTAINTYKL active learning strategies
UNCERTAINTYKL model of the invention incorporates the thinking of coorinated training in the learning strategy of Active Learning, leads to Cross the creation for minimizing number of labels to complete model.Active learning strategies based on Uncertainty information entropy will be marked never Comentropy is highest is labeled by domain expert for selection in data, while the data to be marked that comentropy is minimum, is entrusted by inquiry Member can vote according to KLD, if by the lower (warp being arranged at present of relative entropy for inquiring the sample that committee member calculates Testing threshold value is 0.1), then the sample to be added to labeled data collection, while using the label of prediction result as the puppet mark of the data Note, it is specific as follows
xu,max=argmax (- ∑iPθ(yi/x)logPθ(yi/x)) (1)
Define the data set that sample has been marked containing lDefine the data set not marked containing uIndicate not mark sample i-th that (i=1,2 ... u), xu,maxIndicate the unmarked number obtained according to (1) According to the concentration maximum data sample of comentropy, pθ(yi/ x) it indicates to be based under labeled data collection data distribution, xu,maxBelong to In classification yiPrediction probability value, argmax (- ∑iPθ(yi/x)logPθ(yi/ x)) indicate selection unlabeled data lumped values most Big data sample.
xu,minIndicate that the Unlabeled data obtained according to (1) concentrates the smallest data sample of comentropy, C indicates inquiry committee The classifier number of member's meeting, the data set of classifier are the D that dynamic updatesl.Classification committee C={ θ1,...,θm, classification committee The classifier member of member's meeting represents different classification policies, can calculate current markers to Unlabeled data.PC(yi/ x) it uses In expression inquiry committee's disaggregated model for label y belonging to label to be sortediProbability average value.D(Pθ(C)/PC) table Show disaggregated model θiTo the Relative Entropy of other models,Expression is calculated opposite according to all disaggregated model KLD The mean value of entropy.If relative entropy is less than threshold value, θ is usediPseudo- label is carried out to example to be marked.
The algorithm marked above is specific as follows:
For the optimisation strategy of segmented for solving the problems, such as the model solution, optimization process is as follows:
A system initialization
Before system brings into operation, a part of sample is taken out from sample set pond, domain expert is transferred to carry out manual mark Note, the set are denoted as Dl.The initial markers combine sample mode be it is random sampled from sample set, by DlData set is complete Pairs of disaggregated model θ1First training, as it is subsequent to data untagged classification basis;
The unmarked sample of B actively selects
Use disaggregated model θ1Each unmarked sample is predicted, the letter of each sample is calculated according to formula (1) Entropy is ceased, the maximum sample x of comentropy is taken out in sequenceu,maxDomain expert is transferred to carry out manual mark, and by xu,maxLabel is added Data set Dl
The pseudo- label of C degree of certainty highest sample is handled
By the sample x that comentropy is minimumu,minTake out, according to formula (2) (3) (4) calculate relative entropy, i.e. KLD, by KLD with Threshold value is compared, if meeting threshold value, to xu,minMark, and by xu,minFlag data collection D is addedl
D disaggregated model updates
Use flag data collection DlTrain classification models θ again1, then recycle until meeting termination condition.
Entire algorithmic procedure can be summarized as algorithm 1
1 segmented strategy of algorithm solves UNCERTAINTYKL model
1:Input: initialization tag data set Dl={ x1,...xl, Unlabeled data collection Du={ xl+1,...xl+u}, Data markers y1i,...yl, maximum cycle Umax, KLD threshold value threshold
2:Output: classification committee setClassification performance set
3: using the disaggregated model of the flag data initial interrogation committee
4:while current cycle time < maximum cycle | | not converged do
5:for i < -1to Umax do
6: taking out committeeman θi, the training pattern on flag data collection
7: to current x(i)Class probability P is calculated, according to argmax (- ∑iPθ(yi/x)logPθ(yi/ x)) calculate corresponding letter Cease entropy
8: the maximum sample x of comentropy(i)It transfers to domain expert manually to be marked, and flag data collection is added
9: the smallest sample x of comentropy(t), relative entropy, i.e. average KLD value are calculated according to formula (2) (3) (4).
10:Dl=Dl∪x(i);Du=Du\x(i)
11:if KLD > threshold:Dl=Dl∪x(t);Du=Du\x(t)
12:i=i+1
13:end for
14:end while
15:
Experimental design
Evaluating object
It is pre- to software defect that the present invention will analyze assessment UNCERTAINTYKL query strategy on public data collection AEEEM Survey field Active Learning will be used to assess different learning strategies using tactful influence, AEEEM data set.The data set It is widely used in software defect prediction field.The data set is used to carry out performance comparison as software defect prediction field Benchmark dataset.The data set provides 61 indexs, including software development process measurement etc., in this experiment 61 fingers Mark all be used to do classifier modeling.AEEEM data set summary info is shown in table 1.
1 AEEEM data set of table is summarized
Experimental setup
Experiment uses 5*2 folding cross validation and is tested, and experiment all does Optimum allocation random stratified sampling survey, half to data every time As training data, a half data prevents from generating data overlap between training data and test data data as test data And make evaluation result not independent.It takes out a certain proportion of data in training data manually to be marked, in this experiment initially The ratio of flag data is 30%, and using original tag data collection train classification models, remaining 70% data, which are used as, not to be marked Data are therefrom selected according to active learning strategies.Classifier support vector machines (supporting vector in experiment Machine it) is trained using the RBF core and default parameters realized by libsvm.Make in UNCERTAINTYKL query strategy With the random forest grader RandomForestClassifier realized by sklearn, default parameters is used in training. It in UNCERTAINTYKL strategy, is iterated for unlabeled data, each iteration selects a highest sample of uncertainty Example is manually labeled, and the minimum sample of simultaneous selection degree of certainty further uses KLD and judged, the threshold value setting of judgement For empirical value 0.1.
Evaluation metrics
There are class imbalance problems for AEEEM data set, can be preferable using AUC (area under ROC curve) index Reflection Active Learning query strategy performance, while AUC (area under ROC cruve) index be also carry out software One of most index when failure prediction.The index is based on ROC curve, and the full name of ROC curve is subject's work Feature (receiver operation characteristic) curve.Binary classification model is mixed in software defect prediction The matrix that confuses is as shown in table 2.
2 confusion matrix of table
Pseudo- positive rate (FPR) is defined as X-axis by ROC curve, and true positive rate (TPR) is defined as Y-axis.TPR: in all realities Border is to be correctly judged in the sample of defective module as the ratio of defective module.FPR: being zero defect in all reality In the sample of module, it is wrongly judged the ratio for defective module.
TPR=TP/ (TP+FN)
FPR=FP/ (TN+FP)
It is area below ROC curve that AUC value is corresponding, and value range is worth more big then model between 0 to 1 Performance is better.
Pedestal method
Present invention uses the query strategies of following three kinds of Active Learnings actively to learn as baseline and UNCERTAINTYKL Strategy is practised to be compared:
(1) stochastic sampling strategy (random): a query case is selected in random slave unlabeled data, transfers to lead Domain expert is labeled, and is added to training data concentration;
(2) sampling policy (uncertainty) based on Uncertainty information entropy: based on SVM classifier to training data Collection is trained, and is predicted each unlabeled data, and comentropy is calculated, and the highest example of uncertainty is selected to be labeled;
(3) SVM and random forest building inquiry the active learning strategies based on inquiry committee (committee): are used Mark sample is inquired by the committee.It has used all training datas to be trained simultaneously, has been surveyed in test data Examination, in order to compare with other active learning strategies, it is on training set that the training performance of the model, which can be approximately considered, Optimum training data model.
Experimental result and analysis
The analysis result following table 3 of Fig. 2 to Fig. 6 analysis chart indicates, specific as follows:
Table 3:AUC value compare (mean value+standard deviation) based on paired t-tests confidence level be 95% optimum performance with Black matrix mark
Table 4: the win/tie/loss of uncertaintykl model and other models is to score under different labeled ratio situation Analysis
As shown in table 3-4, Fig. 2 to Fig. 6 illustrates the different Active Learning query strategies situation different in mark example The situation of change of lower AUC value.
Table 3 illustrates the variation feelings of the AUC value when marking sample ratio and being 10%, 20%, 30%, 40%, 50% Condition.When marking sample ratio greater than 50%, UNCERTAINTYKL active learning strategies have been completed using pseudo- labeling method The label of all samples, without unmarked sample.It is united using the paired t-tests that certainty factor is 95% Meter analysis, the optimal model of performance is marked out to come.Different learning strategies under 4 pairs of table mark ratio different situations are done Win/tie/loss analysis, has counted UNCERTAINTYKL learning strategy and committee, random, uncertainty etc. Tactful comparing result.
Firstly, we can observe that as unmarked sample gradually decreases, mark sample be added to labeled data collection with Afterwards, evaluation metrics are all substantially able to maintain the trend of rising, which shows the validity of the learning strategy of Active Learning; The sampling policy effect of uncertainty is pretty good, it was demonstrated that the strategy can be used as the baseline strategy in Active Learning field;Based on looking into The active learning strategies of the inquiry committee show biggish unstability in AEEEM data set;In most cases, UNCERTAINTYKL effect is best, all has a distinct increment than other several active learning strategies, performance boost in most situations 13%.
Above-described embodiment is presently preferred embodiments of the present invention, is not a limitation on the technical scheme of the present invention, as long as Without the technical solution that creative work can be realized on the basis of the above embodiments, it is regarded as falling into of the invention special In the rights protection scope of benefit.

Claims (6)

1. a kind of Software Defects Predict Methods based on hybrid active learning strategies, which is characterized in that described based on hybrid The Software Defects Predict Methods of active learning strategies cooperate with Active Learning side with relative entropy using the comentropy based on cost-sensitive Method, referred to as UNCERTAINTYKL model, evaluation of the UNCERTAINTYKL model use information entropy as high-quality sample Index is chosen the higher sample of comentropy from unmarked sample data and marked by hand, while further divided using relative entropy The sample of low comentropy is analysed, marked data set is further expanded.
2. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 1, feature Be, the UNCERTAINTYKL model the following steps are included:
Step 1: the comentropy of each unmarked sample data is calculated by comentropy calculation formula;
Step 2: selecting the highest data sample of comentropy to transfer to field special from unmarked sample data by calculation formula (1) Family is manually marked, and marked data set is added after the completion of mark;
Step 3: the minimum unmarked sample data of remaining information entropy in screening step 2 is marked using relative entropy calculation Note;
Step 4: an opposite entropy threshold is preset, if relative entropy is lower than threshold value, the sample is added to marked data set, Use the label of prediction result as the puppet label of the data simultaneously;If relative entropy is higher than threshold value, the processing to the sample is abandoned.
3. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 2, feature It is, the calculation of the highest data sample of comentropy described in step 2 is as follows:
xu,max=argmax (- ∑iPθ(yi/x)logPθ(yi/x))(1)
Wherein i indicate not mark for i-th sample (i=1,2 ... u), yiIndicate label value belonging to label to be sorted, xu,maxTable Show and the maximum data sample of comentropy is concentrated according to the Unlabeled data that formula (1) obtains, belongs to classification yiPrediction probability value, pθ(yi/ x) it indicates based under marked data set data distribution.
4. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 2, feature It is, relative entropy calculation described in step 3 includes following formula:
Indicate the mean value according to all disaggregated model KLD relative entropy being calculated, xu,minWhat expression was obtained according to formula (1) Unlabeled data concentrates the smallest data sample of comentropy, and C indicates the classifier number of inquiry committee, the data set of classifier The D updated for dynamicl, classify committee C={ θ1,...,θm, the classifier member for the committee of classifying represents different classification plans Slightly, current markers, P can be calculated to Unlabeled dataC(yi/ x) for indicating inquiry committee's disaggregated model for be sorted Label y belonging to labeliProbability average value, D (Pθ(C)/PC) presentation class model θiTo the Relative Entropy of other models.
5. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 2, feature It is, threshold value described in step 4 is set as empirical value 0.1, if describedValue meets threshold range, then uses θiTo xu,minIn Sample carry out pseudo- label.
6. a kind of Software Defects Predict Methods based on hybrid active learning strategies according to claim 1, feature It is, to solve the problems, such as the model solution, using following segmented optimisation strategy, optimization process is as follows:
A. system initialization: before system brings into operation, taken out from sample set pond a part of sample transfer to domain expert into Mark, the set are denoted as D to row by handl, the initial markers combine sample mode be it is random sampled from sample set, by Dl Data set is completed to disaggregated model θ1First training, as it is subsequent to data untagged classification basis;
B. unmarked sample actively selects: using disaggregated model θ1Each unmarked sample is predicted, is calculated according to formula The maximum sample x of comentropy is taken out in the comentropy of each sample, sequenceu,maxDomain expert is transferred to carry out manual mark, and will xu,maxFlag data collection D is addedl
C. the pseudo- label processing of degree of certainty highest sample: by the sample x that comentropy is minimumu,minIt takes out, is calculated according to formula opposite KLD is compared, if meeting threshold value, to x by entropy, i.e. KLD with threshold valueu,minMark, and by xu,minFlag data is added Collect Dl
D. disaggregated model updates: using flag data collection DlTrain classification models θ again1, then recycle until meeting termination condition Until.
CN201811319619.3A 2018-11-07 2018-11-07 Software defect prediction method based on hybrid active learning strategy Active CN109656808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811319619.3A CN109656808B (en) 2018-11-07 2018-11-07 Software defect prediction method based on hybrid active learning strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811319619.3A CN109656808B (en) 2018-11-07 2018-11-07 Software defect prediction method based on hybrid active learning strategy

Publications (2)

Publication Number Publication Date
CN109656808A true CN109656808A (en) 2019-04-19
CN109656808B CN109656808B (en) 2022-03-11

Family

ID=66110556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811319619.3A Active CN109656808B (en) 2018-11-07 2018-11-07 Software defect prediction method based on hybrid active learning strategy

Country Status (1)

Country Link
CN (1) CN109656808B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353291A (en) * 2019-12-27 2020-06-30 北京合力亿捷科技股份有限公司 Method and system for calculating optimal label set based on complaint work order training text
CN111400617A (en) * 2020-06-02 2020-07-10 四川大学 Social robot detection data set extension method and system based on active learning
CN111506504A (en) * 2020-04-13 2020-08-07 扬州大学 Software development process measurement-based software security defect prediction method and device
CN111914061A (en) * 2020-07-13 2020-11-10 上海乐言信息科技有限公司 Radius-based uncertainty sampling method and system for text classification active learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166706A (en) * 2014-08-08 2014-11-26 苏州大学 Multi-label classifier constructing method based on cost-sensitive active learning
CN104899135A (en) * 2015-05-14 2015-09-09 工业和信息化部电子第五研究所 Software defect prediction method and system
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166706A (en) * 2014-08-08 2014-11-26 苏州大学 Multi-label classifier constructing method based on cost-sensitive active learning
CN104899135A (en) * 2015-05-14 2015-09-09 工业和信息化部电子第五研究所 Software defect prediction method and system
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YU-HANG ZHOU: "Large Margin Distribution Learning", 《IEEE》 *
杨杰: "基于小样本的作战系统失效预测模型研究及应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353291A (en) * 2019-12-27 2020-06-30 北京合力亿捷科技股份有限公司 Method and system for calculating optimal label set based on complaint work order training text
CN111506504A (en) * 2020-04-13 2020-08-07 扬州大学 Software development process measurement-based software security defect prediction method and device
CN111506504B (en) * 2020-04-13 2023-04-07 扬州大学 Software development process measurement-based software security defect prediction method and device
CN111400617A (en) * 2020-06-02 2020-07-10 四川大学 Social robot detection data set extension method and system based on active learning
CN111914061A (en) * 2020-07-13 2020-11-10 上海乐言信息科技有限公司 Radius-based uncertainty sampling method and system for text classification active learning
CN111914061B (en) * 2020-07-13 2021-04-16 上海乐言科技股份有限公司 Radius-based uncertainty sampling method and system for text classification active learning

Also Published As

Publication number Publication date
CN109656808B (en) 2022-03-11

Similar Documents

Publication Publication Date Title
He et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features
CN109656808A (en) A kind of Software Defects Predict Methods based on hybrid active learning strategies
CN110796186A (en) Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN109741332A (en) A kind of image segmentation and mask method of man-machine coordination
CN105975913B (en) Road network extraction method based on adaptive cluster learning
CN109919106B (en) Progressive target fine recognition and description method
NL2029214B1 (en) Target re-indentification method and system based on non-supervised pyramid similarity learning
CN105389583A (en) Image classifier generation method, and image classification method and device
CN104331716A (en) SVM active learning classification algorithm for large-scale training data
CN112836739B (en) Classification model building method based on dynamic joint distribution alignment and application thereof
CN109934203A (en) A kind of cost-sensitive increment type face identification method based on comentropy selection
CN108898225A (en) Data mask method based on man-machine coordination study
CN110263934A (en) A kind of artificial intelligence data mask method and device
CN111353377A (en) Elevator passenger number detection method based on deep learning
CN104680185A (en) Hyperspectral image classification method based on boundary point reclassification
CN109523514A (en) To the batch imaging quality assessment method of Inverse Synthetic Aperture Radar ISAR
CN116738551B (en) Intelligent processing method for acquired data of BIM model
CN116704208B (en) Local interpretable method based on characteristic relation
CN103093239B (en) A kind of merged point to neighborhood information build drawing method
CN109409394A (en) A kind of cop-kmeans method and system based on semi-supervised clustering
Li et al. GADet: A Geometry-Aware X-ray Prohibited Items Detector
CN107909090A (en) Learn semi-supervised music-book on pianoforte difficulty recognition methods based on estimating
CN116894113A (en) Data security classification method and data security management system based on deep learning
CN105160336A (en) Sigmoid function based face recognition method
CN112199287B (en) Cross-project software defect prediction method based on enhanced hybrid expert model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant