CN107292330B - Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning - Google Patents
Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning Download PDFInfo
- Publication number
- CN107292330B CN107292330B CN201710315861.2A CN201710315861A CN107292330B CN 107292330 B CN107292330 B CN 107292330B CN 201710315861 A CN201710315861 A CN 201710315861A CN 107292330 B CN107292330 B CN 107292330B
- Authority
- CN
- China
- Prior art keywords
- noise
- supervised learning
- sample
- value
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000007635 classification algorithm Methods 0.000 claims description 9
- 230000009977 dual effect Effects 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013145 classification model Methods 0.000 abstract description 6
- 238000010801 machine learning Methods 0.000 abstract description 4
- 238000002372 labelling Methods 0.000 abstract description 3
- 238000007418 data mining Methods 0.000 abstract description 2
- 230000000295 complement effect Effects 0.000 abstract 1
- 238000001514 detection method Methods 0.000 description 10
- 210000000481 breast Anatomy 0.000 description 5
- 238000001914 filtration Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning, and belongs to the field of machine learning and data mining. The invention combines supervision and semi-supervision learning, and generates a supervision learning noise identification result for a supervision learning part in a soft multiple voting mode; for a semi-supervised learning part, labeling a label-free data set based on a classification model trained by pure data generated by the supervised learning, taking the labeled label-free data as a training set, and detecting the label data set by using a weighted KNN method to generate a noise identification result; the noise recognition results are finally combined to generate a final recognition result. The algorithm also adopts an iteration mode, and the sample to be measured input in each iteration is the residual sample after the noise is filtered out in the last iteration. Compared with the traditional noise identification algorithm, the method combines more complementary information, and simultaneously assists in an iterative mode, so that the noise identification accuracy can be better promoted.
Description
Technical Field
The invention relates to the technical field of data mining and machine learning, in particular to an iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning.
Background
Many training data used in practical applications of machine learning are noisy, and the causes include human error, hardware device error, data collection process error, and the like. The traditional method is to perform data preprocessing work on source data manually before applying machine learning algorithms to obtain pure source data, however, the manual work is labor-consuming, tedious and time-consuming, and cannot ensure the complete correctness of the data, which causes non-negligible influence on subsequent algorithm application. Data noise generally includes two categories: attribute noise, which refers to sample attribute value inaccuracy, and category noise, which refers to sample label inaccuracy [1 ]. The influence of class noise is larger than attribute noise.
The processing method for the class noise comprises the following steps: robust algorithms [2, 3] and noise detection algorithms [4, 5, 6, 7] are designed. The robust algorithm is designed mainly by improving the existing algorithm, so that the existing algorithm is less influenced by the class noise. And noise detection algorithms detect and remove noise before using the data containing the noise. In contrast, the noise-like detection algorithm is more effective and versatile.
The existing noise-like detection algorithms mainly comprise two types: supervised learning based and semi-supervised learning based. Where supervised learning based representatives are ensemble learning based algorithms, representatives of this class of algorithms are most filtering and consistency filtering [7 ]. In these algorithms, the training data is first randomly divided into subsets, and then each subset is individually noise detected. The basic idea of detection is the voting of multiple classifiers obtained by taking the remaining subset as a training sample. This type of algorithm mainly comprises two steps: sample division and multi-classifier voting. Because the sample division and the multi-classifier voting are performed only once, the method belongs to a label noise detection method based on single voting. The existing label noise detection method based on single voting has two defects: the result of a single vote is more affected by the sample division and the likelihood of missing noise is greater. Although a new improved algorithm (noise-like detection method of multiple votes [8]) was later developed for these deficiencies, some of the noise was missed. An algorithm [6] based on semi-supervised learning, wherein the idea of the algorithm is to train a classification model through known labeled data, label the unlabeled data, and add the labeled data to the existing labeled data set to enlarge the training set, so that a better classification model can be trained from a larger training set to better detect the label noise.
For supervised learning, hidden information in label-free data is not utilized and explored, and the possibility of first-floor noise is high; for semi-supervised learning, the original labeled data set has noise, and for labeling of unlabeled data, the original labeled data set also has noise, and if the noise of the original labeled data set is larger than the noise of the original labeled data set, a very poor classification model is finally obtained.
Reference documents:
[1]Zhu,Xingquan,and Xindong Wu.″Class noise vs.attribute noise:A quantitative study.″Artificial Intelligence Review 22.3(2004):177-210.
[2]J.Bootkrajang,A.Kaban,Classification of mislabelled microarrays using robust sparse logistic regression,Bioinformatics 29(7)(2013)870-877.
[3]J.Saez,M.Galar,J.Luengo,F.Herrera,A first study on decomposition strategies with data with class noise using decision trees,in:Hybrid Artificial Intelligent Systems,Lecture Notes in Computer Science,vol.7209,2012,pp.25-35.
[4]D.L.Wilson,Asymptotic properties of nearest neighbor rules using edited data,IEEE Trans.Syst.Man Cybernet.2(3)(1992)431-433.
[5]J.Young,J.Ashburner,S.Ourselin,Wrapper methods to correct mislabeled training data,in:3rd International Workshop on Pattern Recognition in Neuroimaging,2013,pp.170-173.
[6]D.Guan,W.Yuan,et al.,Identifying mislabeled training data with the aid of unlabeled data,Appl.Intell.35(3)(2011)345-358.
[7]C.E.Brodley,M.A.Friedl,Identifying mislabeled training data,J.Artif.Intell.Res.11(1999)131-167.
[8]Guan D,Yuan W,Ma T,et al.Detecting potential labeling errors for bioinformatics by multiple voting[J].Knowledge-Based Systems,2014,66(9):28-35.
disclosure of Invention
The invention aims to solve the problem of providing an iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning, which adopts a mode of double information of supervised learning and semi-supervised learning, can set corresponding parameters and strategies according to actual conditions, avoids the problem of noise detection of the two single information of the supervised learning and the semi-supervised learning, can effectively improve the identification accuracy, and can more thoroughly discover noise data in an iterative mode.
The iterative label noise identification algorithm based on the double information of supervised learning and semi-supervised learning, disclosed by the invention, comprises the following steps of:
step 1) determining algorithm input variables, including a sample set L to be processed and a non-label sample set U, a maximum iteration number maxIter, a multi-time voting number numVote, a noise identification judgment voting trust numFinalConfodence, a random block number numColossity, a classifier number numClassifier, a noise identification voting trust numConfidence, a trust threshold ConfidenceThreshold for judging noise, an initialization multi-time voting number t equal to 1, an iteration number m equal to 1, and an initialization sample set E equal to L;
step 2) randomly dividing E into numacross subsets with consistent sizesInitializing a parameter i to 1;
step 3) withSamples in the set are used as training data, numClassifier different classification algorithms are selected, and numClassifier different classifiers H are trained1,H2,…,HnumClassifier;
Step 4) with H1,H2,…,HnumClassifierFor sample setClassifying the middle samples, respectively calculating the numConfidence of each sample, and storing the calculation result into a table;
step 5) iteratively executing steps 2) to 4), adding 1 to the value i after each iteration until the value i is equal to nummatch, stopping the iteration, calculating nummatch of all samples after the voting is finished, and storing the nummatch into a table;
step 6) iteratively executing steps 2) to 5), adding 1 to the t value after each iteration until t is num Vote, and generating num Vote tables;
and 7) comprehensively analyzing num Vote tables, counting the numConfidence of each sample to obtain the numFinalConfidence (e) of each sample e, and storing the numFinalConfidence (e) into one table. Initializing an En, and storing samples of numFinalConfigence (e) smaller than a predetermined Configence threshold as suspicious samples in the En;
step 8), taking E' ═ E-En as a training set, generating numclassifier classifiers based on numclassifier classification algorithms, and marking a label-free sample set U by using the classifiers to obtain a sample set;
step 9), taking the data set E as a test set and a training set, calculating the numFinalConfidence (E) 'of each sample by a weighted KNN algorithm, and storing the numFinalConfidence (E)' in a table named as numConfidence;
and step 10) adding the numerical values of the same samples in the table and the numcfondence table to obtain a final Confidence table, and regarding the samples with the numerical values smaller than a specified threshold Confidence threshold as noise. Let the detected noise beThen
Step 11)Steps 2) to 10) are executed in an iteration mode, and after each iteration, the value of m is added with 1 until the value is up toOr m is maxIter;
and step 12) returning the value E, wherein E is the clean sample set after the noise is deleted, and the algorithm is ended.
Further, in the step 3), the numclassfier is selected as an odd number, and the selection of the odd number is favorable for realizing voting. The classification algorithm is one or more of k-neighborhood, decision tree, Bayes, neural network and support vector machine. And numclasifier selection is affected by the data set. For small sample datasets, a larger numClassifier value should be used to ensure differentiation between multiple classifiers. When the sample set label noise is high, a larger numclasifier value should also be assumed. The larger numclassfier can ensure high tag noise identification rate of each iteration, is beneficial to reducing iteration times and improving algorithm efficiency. On the other hand, a smaller numclasifier may be selected when the number of sample sets is larger and the sample tag to noise ratio is lower. For example, numclasifier may be set to 3.
In another improvement, in the step 7), the larger the threshold value configurethreshold is set, the larger the set En of suspected noise obtained by the supervised learning portion is, the cleaner the training data obtained by using E' ═ E-En to label the unlabeled data set U is, the higher the accuracy of the obtained label is, and the higher the accuracy of the noise data used to detect E as the training data is. However, the value of ConfidenceThreshold is not too large, which may make some of the labeled accurate data in E to be considered as noise data, so that the E' data set is small and the classification model cannot be well trained to label the unlabeled data set U.
In another improvement, the threshold value configurethreshold in step 7) may be selected from conventional values, such as configurethreshold of 0.1, 0.2, 0.3 or 0.4. The optimized threshold value configurethreshold value may also be calculated from separate calibration samples. The method comprises the following specific steps: a) estimating the noise ratio of the noise data to be processed according to the prior knowledge, b) adding random noise into the verification samples, c) traversing possible threshold Condensethreshold values and calculating the recognition accuracy of the algorithm on the noise in the verification samples under the values, d) selecting the threshold Condensethreshold with the highest recognition accuracy.
The invention has the beneficial effects that: the iterative label noise identification algorithm based on the double information of the supervised learning and the semi-supervised learning adopts a double information mode combining the supervised learning and the semi-supervised learning, the data is not detected by the single information any more, the supervised learning makes a judgment on the data, meanwhile, the semi-supervised learning also makes a judgment, and finally, the 2 judgment results are combined together to obtain a final classification result. For the supervised learning part, a multi-voting mode is adopted for noise identification, and the sample sequence is randomly disturbed before each voting, so that the difference of voting is ensured. After a suspicious noise set En obtained by the supervised learning part is filtered, a part of suspicious data is firstly filtered in an E '-E-En filtering mode, then E' is used as a training set of a non-label data set U, a training classification model is used for marking U, the marked data set is used as the training set, the data in E is tested by applying a classification algorithm with the weight KNN to obtain numFinaLConfidence (E)) of each data in E, and finally, the detected noise (representing the noise set detected by the mth iteration) is obtained by comparing the numFinaLConfidence (E)) with 2 classification results of numFinaLConfidence (E) and the pure data set E-are obtained. In addition, an iterative identification method is adopted in the identification algorithm, the samples to be detected input in each iteration are pure samples output for filtering noise in the last iteration, and all noise data can be identified more comprehensively and thoroughly. The identification algorithm solves the problem that the existing label noise identification algorithm is low in identification accuracy, and ensures high accuracy of noise identification.
Drawings
FIG. 1 is a flow chart of an iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning according to the present invention.
Detailed Description
The following describes an iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning in detail with reference to the accompanying drawings.
As shown in fig. 1, the iterative label noise identification algorithm based on dual information of supervised learning and semi-supervised learning of the present invention includes the following steps:
step 1) determining algorithm input variables, including a sample set L to be processed and a non-label sample set U, a maximum iteration number maxIter, a multi-time voting number num Vote, a noise identification judgment voting trust numFinalConfodence, a random block number numColossity, a classifier number numClassifier, a noise identification voting trust numConfidentiality, a confidence threshold Confidentisethreshold for judging noise, an initialization multi-time voting number t equal to 1, an iteration number m equal to 1, and an initialization sample set E equal to L;
step 2) randomly dividing E into numacross subsets with consistent sizesInitializing a parameter i to 1;
step 3) withSamples in the set are used as training data, numClassifier different classification algorithms are selected, and numClassifier different classifiers H are trained1,H2,…,HnumClassifier. numclasifier is chosen to be an odd number, such as 3, 5, 7, etc., although not limited to these odd numbers as enumerated; the classification algorithm is one or more of k-neighborhood, decision tree, Bayes, neural network and support vector machine;
step 4) with H1,H2,…,HnumClassifierFor sample setClassifying the middle samples, respectively calculating the numConfidence of each sample, and storing the calculation result into a table;
step 5) iteratively executing steps 2) to 4), adding 1 to the value i after each iteration until the value i is equal to nummatch, stopping the iteration, calculating nummatch of all samples after the voting is finished, and storing the nummatch into a table;
step 6) iteratively executing steps 2) to 5), adding 1 to the t value after each iteration until t is numVote, and generating numVote tables;
and 7) comprehensively analyzing the numVote tables, counting the numConfidence of each sample to obtain the numFinalConfidence (e) of each sample e, and storing the numVinConfidence (e) into one table. En is initialized and samples of numFinalConfigence (e) less than a predetermined Configence threshold are stored as suspect samples in En. The threshold confidencthreshold is chosen to be larger or better, and thus the resulting En is larger, thereby making the following E 'more pure, but not too large, otherwise the set of E' is small and a good training model cannot be trained for labeling U. Therefore, the threshold value configurethreshold is preferably 0.4, which is a preferred example, and other suitable values may be selected;
step 8), taking E' ═ E-En as a training set, generating numclassifier classifiers based on numclassifier classification algorithms, and marking a label-free sample set U by using the classifiers to obtain a sample set;
and step 9) taking the data set E as a test set and a training set, calculating the numFinalConfidence (E) 'of each sample by a weighted KNN algorithm, and storing the numFinalConfidence (E)' in a table named as numConfidence. The value of K of weighted KNN may be 3, 5, 7, 9, etc., and K is 5, which is a preferred example, and any other suitable value may be selected as one of the values;
and step 10) adding the numerical values of the same samples in the table and the numcfondence table to obtain a final Confidence table, and regarding the samples with the numerical values smaller than a specified threshold Confidence threshold as noise. Let the detected noise beThen
Step 11)Steps 2) to 10) are executed in an iteration mode, and after each iteration, the value of m is added with 1 until the value is up toOr m is maxIter;
and step 12) returning the value E, wherein E is the clean sample set after the noise is deleted, and the algorithm is ended.
The present invention is described in detail below with respect to the test results of 2 sets of data in the UCI database and the improvement in performance compared to the tag noise recognition algorithm. The recognition algorithm proposed herein is compared with the currently popular multi-vote recognition algorithms MFCF and CFMF and semi-supervised based CFAUD and MFAUD. (MFCF, CFMF see reference [8], CFAUD and MFAUD see reference [6]) because the data in the raw UCI database is free of label noise and label-free data, for a selected data set we consider to select a large portion of the removed labels as the label-free data set, and the remaining labeled data, artificially added noise, taking into account different noise ratios, including 10%, 20%, 30%, 40%. In this example, the performance of the tag noise detection algorithm is measured by the number of mis-tag errors. The error count includes two parts, one part being diagnostic good data of noise data errors, denoted by E1, and the other part being diagnostic good data errors, denoted by E2. The smaller the E1+ E2 value, the higher the algorithm accuracy.
TABLE 1 data set
Data set | Number of samples | Number of features |
Breast | 683 | 9 |
Credit-screening | 653 | 14 |
The parameters are set as follows: numwish is 3, numlast is 3 (three classification algorithms include naive bayes, decision trees and nearest neighbors), maxIter is 100, numvolt is 5; configurethreshold is 0.4.
TABLE 2 Breast data set, results at 10% noise ratio
TABLE 3 Breast data set, results at 20% noise ratio
TABLE 4 Breast data set, results at 30% noise ratio
TABLE 5 Breast data set, results at 40% noise ratio
TABLE 6 CRedit data set, results at 10% noise ratio
TABLE 7-hierarchy data set, results at 20% noise ratio
TABLE 8-hierarchy data set, results at 30% noise ratio
TABLE 9-hierarchy data set, results at 40% noise ratio
As shown in tables 2-9 above, the proposed algorithm is superior to the conventional two algorithms in stability based on different noise ratios on the two data used in the experiment.
In summary, the above embodiments are only used for illustrating the technical solutions of the present invention, and are not used for limiting the protection scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention shall be covered within the scope of the claims of the present invention.
Claims (4)
1. An iterative label noise identification method based on double information of supervised learning and semi-supervised learning is characterized by comprising the following steps:
step 1) determining algorithm input variables, including a sample set L to be processed and a non-label sample set U, a maximum iteration number maxIter, a multi-time voting number numVote, a noise identification judgment voting trust numFinalConfodence, a random block number numColossity, a classifier number numClassifier, a noise identification voting trust numConfidence, a trust threshold ConfidenceThreshold for judging noise, an initialization multi-time voting number t equal to 1, an iteration number m equal to 1, and an initialization sample set E equal to L;
step 2) randomly dividing E into numacross subsets with consistent sizes Wherein i is 1: n, the initialization parameter i is 1;
step 3) withSamples in the set are used as training data, numClassifier different classification algorithms are selected, and numClassifier different classifiers H are trained1,H2,...,HnumClassifier;
Step 4) with H1,H2,...,HnumClassifierFor sample setClassifying the middle samples, respectively calculating the numConfidence of each sample, and storing the calculation result into a table;
step 5) iteratively executing the step 2) to the step 4), adding 1 to the value i after each iteration until the value i is equal to nummatch, stopping the iteration, calculating nummatch of all samples after the voting is finished, and storing the nummatch into a table;
in the step 4) and the step 5), each element in the table corresponds to each sample in each sample set E to be processed and the probability numconfigence that they are correctly labeled;
step 6) iteratively executing the step 2) to the step 5), adding 1 to a t value after each iteration until t is equal to numVote, and generating numVote tables;
step 7) comprehensively analyzing numVote tables, counting the numConfidence of each sample to obtain the numFinalConfidence (e) of each sample e, and storing the numVinConfidence (e) into one table; initializing an En, and storing samples of numFinalConfigence (e) smaller than a predetermined Configence threshold as suspicious samples in the En;
the ConfigeneThreshold value in step 7) is selected to be a value between 0.1 and 0.4;
step 8), taking E' ═ E-En as a training set, generating numclassifier classifiers based on numclassifier classification methods, and marking a label-free sample set U by using the classifiers to obtain a sample set;
step 9), taking the data set E as a test set and the marked data set as a training set, calculating the numFinalConfidence (E) 'of each sample by a weighted KNN algorithm, and storing the numFinalConfidence (E)' in a table;
step 10), adding the numerical values of the same samples in the table containing the numFinalConfidence (e) and the table containing the numConfidence to obtain a final Confdence table, and regarding the sample with the numerical value smaller than a specified threshold ConfideceThreshold as noise; let the detected noise beThen
Step 11)Step 2) to step 10) are executed in an iteration mode, and after each iteration, the value of m is added with 1 until the value is up toOr m is maxIter;
and step 12) returning the value E, wherein E is the clean sample set after the noise is deleted, and the method is ended.
2. The iterative label noise identification method based on supervised learning and semi-supervised learning dual information as claimed in claim 1, wherein: in the step 3), numclasifier is selected as an odd number.
3. The iterative label noise identification method based on supervised learning and semi-supervised learning dual information as claimed in claim 2, wherein: the numclassfier is set to 3.
4. The iterative label noise identification method based on supervised learning and semi-supervised learning dual information as claimed in claim 1, wherein: calculating and optimizing the ConfigenceThreshold value in the step 7) through an independent check sample; the method comprises the following specific steps: a) estimating the noise ratio of the noise data to be processed according to the prior knowledge, b) adding random noise into the verification sample, c) traversing the possible Condensethreshold value and calculating the recognition accuracy of the recognition method on the noise in the verification sample under the value, and d) selecting Condensethreshold with higher recognition accuracy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710315861.2A CN107292330B (en) | 2017-05-02 | 2017-05-02 | Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710315861.2A CN107292330B (en) | 2017-05-02 | 2017-05-02 | Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107292330A CN107292330A (en) | 2017-10-24 |
CN107292330B true CN107292330B (en) | 2021-08-06 |
Family
ID=60094401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710315861.2A Expired - Fee Related CN107292330B (en) | 2017-05-02 | 2017-05-02 | Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107292330B (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862386A (en) * | 2017-11-03 | 2018-03-30 | 郑州云海信息技术有限公司 | A kind of method and device of data processing |
CN108021931A (en) * | 2017-11-20 | 2018-05-11 | 阿里巴巴集团控股有限公司 | A kind of data sample label processing method and device |
CN108021940B (en) * | 2017-11-30 | 2023-04-18 | 中国银联股份有限公司 | Data classification method and system based on machine learning |
US20190244138A1 (en) * | 2018-02-08 | 2019-08-08 | Apple Inc. | Privatized machine learning using generative adversarial networks |
CN110163376B (en) * | 2018-06-04 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Sample detection method, media object identification method, device, terminal and medium |
CN108985365B (en) * | 2018-07-05 | 2021-10-01 | 重庆大学 | Multi-source heterogeneous data fusion method based on deep subspace switching ensemble learning |
CN109213656A (en) * | 2018-07-23 | 2019-01-15 | 武汉智领云科技有限公司 | A kind of interactive mode big data dysgnosis detection system and method |
JP7299002B2 (en) * | 2018-08-23 | 2023-06-27 | ファナック株式会社 | Discriminator and machine learning method |
EP3807821A1 (en) | 2018-09-28 | 2021-04-21 | Apple Inc. | Distributed labeling for supervised learning |
CN109800785B (en) * | 2018-12-12 | 2021-12-28 | 中国科学院信息工程研究所 | Data classification method and device based on self-expression correlation |
CN110189305B (en) * | 2019-05-14 | 2023-09-22 | 上海大学 | Automatic analysis method for multitasking tongue picture |
CN110363228B (en) * | 2019-06-26 | 2022-09-06 | 南京理工大学 | Noise label correction method |
CN110633758A (en) * | 2019-09-20 | 2019-12-31 | 四川长虹电器股份有限公司 | Method for detecting and locating cancer region aiming at small sample or sample unbalance |
US11853908B2 (en) | 2020-05-13 | 2023-12-26 | International Business Machines Corporation | Data-analysis-based, noisy labeled and unlabeled datapoint detection and rectification for machine-learning |
CN111784595B (en) * | 2020-06-10 | 2023-08-29 | 北京科技大学 | Dynamic tag smooth weighting loss method and device based on historical record |
CN113269258A (en) * | 2021-05-27 | 2021-08-17 | 郑州大学 | Semi-supervised learning label noise defense algorithm based on AdaBoost |
CN113887742A (en) * | 2021-10-26 | 2022-01-04 | 重庆邮电大学 | Data classification method and system based on support vector machine |
CN114218872B (en) * | 2021-12-28 | 2023-03-24 | 浙江大学 | DBN-LSTM semi-supervised joint model-based residual service life prediction method |
CN117421657A (en) * | 2023-10-27 | 2024-01-19 | 江苏开放大学(江苏城市职业学院) | Sampling and learning method and system for noisy labels based on oversampling strategy |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105046236A (en) * | 2015-08-11 | 2015-11-11 | 南京航空航天大学 | Iterative tag noise recognition algorithm based on multiple voting |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9053391B2 (en) * | 2011-04-12 | 2015-06-09 | Sharp Laboratories Of America, Inc. | Supervised and semi-supervised online boosting algorithm in machine learning framework |
CN103886330B (en) * | 2014-03-27 | 2017-03-01 | 西安电子科技大学 | Sorting technique based on semi-supervised SVM integrated study |
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN104598813B (en) * | 2014-12-09 | 2017-05-17 | 西安电子科技大学 | Computer intrusion detection method based on integrated study and semi-supervised SVM |
CA2977262A1 (en) * | 2015-02-23 | 2016-09-01 | Cellanyx Diagnostics, Llc | Cell imaging and analysis to differentiate clinically relevant sub-populations of cells |
CN105930411A (en) * | 2016-04-18 | 2016-09-07 | 苏州大学 | Classifier training method, classifier and sentiment classification system |
CN106096622B (en) * | 2016-04-26 | 2019-11-08 | 北京航空航天大学 | Semi-supervised Classification of hyperspectral remote sensing image mask method |
CN106294593B (en) * | 2016-07-28 | 2019-04-09 | 浙江大学 | In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study |
CN106294590B (en) * | 2016-07-29 | 2019-05-31 | 重庆邮电大学 | A kind of social networks junk user filter method based on semi-supervised learning |
-
2017
- 2017-05-02 CN CN201710315861.2A patent/CN107292330B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105046236A (en) * | 2015-08-11 | 2015-11-11 | 南京航空航天大学 | Iterative tag noise recognition algorithm based on multiple voting |
Non-Patent Citations (2)
Title |
---|
具有噪声过滤功能的协同训练半监督主动学习算法;詹永照 等;《模式识别与人工智能》;20091031;第22卷(第5期);摘要,第1-5节 * |
基于集成半监督学习的标签噪声研究;金龙 等;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131215;第2013年卷(第S2期);第I140-91页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107292330A (en) | 2017-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107292330B (en) | Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning | |
WO2017084408A1 (en) | Method and system for checking cargo | |
CN112756759B (en) | Spot welding robot workstation fault judgment method | |
CN111009321A (en) | Application method of machine learning classification model in juvenile autism auxiliary diagnosis | |
CN111105041B (en) | Machine learning method and device for intelligent data collision | |
CN114281809B (en) | Multi-source heterogeneous data cleaning method and device | |
CN113095229B (en) | Self-adaptive pedestrian re-identification system and method for unsupervised domain | |
Wen et al. | Comparision of four machine learning techniques for the prediction of prostate cancer survivability | |
Shoohi et al. | DCGAN for Handling Imbalanced Malaria Dataset based on Over-Sampling Technique and using CNN. | |
Jin et al. | Confusion Graph: Detecting Confusion Communities in Large Scale Image Classification. | |
CN113516638A (en) | Neural network internal feature importance visualization analysis and feature migration method | |
CN113674862A (en) | Acute renal function injury onset prediction method based on machine learning | |
CN114817856B (en) | Beam-pumping unit fault diagnosis method based on structural information retention domain adaptation network | |
CN116741393A (en) | Medical record-based thyroid disease dataset classification model construction method, classification device and computer-readable medium | |
CN114757433A (en) | Method for quickly identifying relative risk of drinking water source antibiotic resistance | |
CN108154189A (en) | Grey relational cluster method based on LDTW distances | |
CN117034110A (en) | Stem cell exosome detection method based on deep learning | |
CN110502669A (en) | The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph | |
CN113392086B (en) | Medical database construction method, device and equipment based on Internet of things | |
Nurmalasari et al. | Classification for Papaya Fruit Maturity Level with Convolutional Neural Network | |
CN115345248A (en) | Deep learning-oriented data depolarization method and device | |
Prajapati et al. | Handling Missing Values: Application to University Data Set | |
CN109308936B (en) | Grain crop production area identification method, grain crop production area identification device and terminal identification equipment | |
Ndung'u | Data Preparation for Machine Learning Modelling | |
CN104463205B (en) | Data classification method based on chaos depth wavelet network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210806 |
|
CF01 | Termination of patent right due to non-payment of annual fee |