CN113869463A - Long tail noise learning method based on cross enhancement matching - Google Patents
Long tail noise learning method based on cross enhancement matching Download PDFInfo
- Publication number
- CN113869463A CN113869463A CN202111457536.2A CN202111457536A CN113869463A CN 113869463 A CN113869463 A CN 113869463A CN 202111457536 A CN202111457536 A CN 202111457536A CN 113869463 A CN113869463 A CN 113869463A
- Authority
- CN
- China
- Prior art keywords
- data
- enhancement
- noise
- cross
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000012545 processing Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000009499 grossing Methods 0.000 claims description 7
- 238000013145 classification model Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000011425 standardization method Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000005855 radiation Effects 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000008030 elimination Effects 0.000 claims 1
- 238000003379 elimination reaction Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 4
- 230000008859 change Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012358 sourcing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a long-tail noise learning method based on cross enhancement matching, which is used for solving the problem of image classification with long-tail characteristics and noise labels at the same time. According to the data noise characteristics, the method screens noise samples by matching prediction results respectively obtained by weak enhancement data and strong enhancement data, and introduces a noise-removing regularization measure to eliminate the influence of the identified noise samples. For data long-tail features, the method implements a new online prior distribution-based prediction penalty to avoid bias on the head class. The method is simple and convenient to implement, flexible in means and superior in the aspect of obtaining the class fitting degree in real time, so that the method achieves remarkable classification effect improvement on long-tail data, noise data and training data with the characteristics of the long-tail data and the noise data.
Description
Technical Field
The invention relates to the field of image classification, in particular to a method for classifying images under the condition that a noise label and long tail distribution exist simultaneously.
Background
In recent years, Convolutional Neural Networks (CNNs) have been widely used in the field of computer vision. In the case of a fixed amount of training data, the overfitting phenomenon is increasingly prominent due to the increase of the number of parameters, and the requirement for accurately labeling data is also increased in order to improve the overall performance. However, obtaining a large number of accurately labeled samples is often quite expensive. In this regard, non-expert crowd-sourcing or systematic tagging is a practical solution, however this easily leads to mislabeling of tags. Many reference datasets, such as ImageNet, CIFAR-10/-100, MNIST, QuickDraw, etc., contain 3% to 10% noise label samples. Existing research on noisy labels has generally focused on splitting correctly labeled and incorrectly labeled samples, but neglecting the distribution of the data. In the real world, data often presents the characteristic of long tail distribution, several main categories in the data set occupy the dominance, and the data of other categories is insufficient in quantity. Therefore, it is very important to study how to train the model on the data set with both long tail distribution and label noise as shown in fig. 1 in practical application.
Noise label learning has received a lot of attention in recent years and also has achieved surprising effects. Because the convolutional neural network can learn a simple general mode of real data before fitting a noise sample in training, most of the existing methods adopt a cross entropy loss function to fit a model prediction result and a data label. However, in a data set with a long tail distribution, since the training data is dominated by the head class, cross entropy loss is difficult to distinguish between correct and incorrect samples of the tail class. For the long-tail image classification task, a series of data re-balancing (re-balancing) based strategies such as re-weighting and resampling balance the training data based on the number of samples of the class. However, in the presence of label noise, the number of samples for each class is unknown, and the number of samples does not reflect the degree of real-time fit of the class. Based on the above analysis, the existing deep neural network CNN still has no effective solution for the data set having both long tail features and tag noise.
Disclosure of Invention
In order to solve the defects of the prior art and avoid the problems that a co-training strategy (co-training) of noise data is difficult to distinguish correct and wrong samples of tail categories on long-tail data, and a data rebalancing strategy of long-tail classification has a poor classification effect on noisy data, the invention adopts the following technical scheme:
a long tail noise learning method based on cross enhancement matching comprises the following steps:
step S1: according to the data noise characteristics, respectively adopting a weak data enhancement strategy and a strong data enhancement strategy for each sample image, carrying out cross-enhancement matching (cross-enhancement matching) on the prediction results of the weak enhancement data and the strong enhancement data, and improving cross entropy loss into cross-enhancement matching loss for screening noise samples;
step S2: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and different parameters are respectively used for model training on feature graphs of the weak enhancement data and the strong enhancement data;
step S3: for the large loss sample screened by cross enhancement matching, the large loss sample is used as a noise sample with high confidence level, and a regularization measure for eliminating noise is used to eliminate the negative influence of the noise sample with high confidence level on model training;
step S4: for the head class classification advantages caused by data long-tail features, the classification prior probability is evaluated from online prediction, so that the class fitting degree is truly reflected, and the prediction penalty based on online prior distribution (online prior distribution) is used for smoothing the prediction result of the head class;
step S5: according to the data noise characteristics, a staged training strategy is used, and only the cross entropy loss and the online prior distribution loss of weak enhancement data are calculated in a preheating stage; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
Further, in the step S1, given isA sample image andtraining data set for individual image classes,In order to be the image of the sample,for noisy labels (i.e. forNot necessarily correct),representing the prediction results of the classification model, whereinAs a result of the network parameters,in order to be a function of the mapping,the dimension of expression isKIs determined according to the following cross-enhancement matching loss functionWhether it is a correctly labeled sample:
wherein、Respectively weak enhancement data and strong enhancement data,for sample imagesThe prediction of the class result with the highest confidence level,,is shown asiAn image of a sampleIs marked with a labelkThe degree of confidence of (a) is,the weight parameter is represented by a weight value,is composed ofT is a transposed symbol.
Further, in the step S1, the crossover is enhancedDistribution loss less than OTSU (Otsu), an algorithm for determining image binary segmentation threshold value)The data of (2) is recognized as correct data, and the correct data setIn the training phase, only the sets are usedThe data in (1) calculates the cross-enhancement matching loss, i.e. the total loss is expressed as:
further, in the step S2, in order to avoid a negative effect of the sample data difference caused by the weak data enhancement and strong data enhancement strategies on the feature extraction, a dual-branch batch processing standardization method is adopted, specifically, a weak enhancement data difference is subjected to a batch processing standardization processAnd strong enhancement dataCalculating different mean and variance according to exponential moving average accumulation:
WhereinIs a constant number of times, and is,the number of sample images in a batch (batch) is shown, and the normalized output of the batch is、Wherein、Are all characteristic graphs of the middle layer of the neural network,the neural network batches the layer inputs representing weak enhancement inputs,which represents a weakly enhanced mean value, is,the variance of the weak enhancement is indicated,representing a strong enhancement input a neural network batches the layer inputs,which represents the average of the strong enhancement,it is meant that the variance is strongly enhanced,、、、are all learnable radiation parameters.
Further, in the step S2, in the training phase, a set of batch processing parameters is trained on the weak enhancement data and the strong enhancement data respectively; in the testing stage, only the weak data enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong data enhancement strategy are discarded.
Further, in step S3, according to the loss of the cross-enhancing matching, the filtered noise sample set is used as a high confidence error data set:
taking the screened large loss sample as a noise sample with high confidence, regularizing a network model through a noise-removing regularization measure, and aligning a setEach sample image in (1)Which belong to a specific classUsing the following regularization term constraint setPreventing the prediction result from fitting a wrong noise label:
Further, in step S4, the classification prior probability is evaluated from the online predictionThe prior probability dynamic evaluation of each category is:
whereinIs a constant number of times, and is,initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.,Represents that the number of samples isIn the training data of (2), each class isNumber of training samples of。
Further, in the step S4, the penalty of prediction based on the online prior distributionAnd is used for smoothing the labels according to prior distribution, so that the labels with higher prior probability are more strongly smoothed, thereby enhancing the optimization of the tail category, and the specific formula is as follows:
wherein,a representation of the image of the sample is shown,the probability of a priori is represented and,representing the prediction results of the classification model, whereinAs a result of the network parameters,is a mapping function.
Further, in the step S4, the prediction based on the online prior distribution is performedPunishmentAdding a cross entropy loss functionThen obtaining:
wherein,in the form of a noisy tag or label,is shown asiAn image of a sampleIs marked with a labelkThe degree of confidence of (a) is,is shown asPrior probability of individual classes.
Further, in the step S5, the training is divided into a preheating stage and a formal stage, and the two stages respectively calculate the loss and update the parameters, including the following steps:
step S5.1, in the preheating stage, cross entropy loss and online prior distribution are calculated by using weak enhancement data:
wherein,represents a cross-entropy loss function of the entropy of the sample,a representation of the image of the sample is shown,indicating that the data is weakly enhanced data,a tag that represents the weakly enhanced data,represents the predictive penalty of an online prior distribution computed over weak enhancement data,in order to be a constant weighting factor,is a training data set;
step S5.2, in the formal training stage, the cross enhancement matching loss is combinedNoise rejection regularization lossAnd an online prior distribution prediction penalty term for weak enhancement data and strong enhancement dataScreening out correct data setAnd high confidence error data setThe total loss is functionalized as:
and updates the network parameters using a random gradient descent method (SGD).
The invention has the advantages and beneficial effects that:
according to the method, according to data noise characteristics, noise samples are screened by matching prediction results respectively obtained by weak enhancement data and strong enhancement data, a noise-rejection regularization measure (leave-noise-out regularization) is introduced to eliminate the influence of the identified noise samples, and a new prediction penalty based on online prior distribution (online prior distribution) is implemented by the method aiming at data long tail characteristics to avoid bias of head types.
Drawings
FIG. 1 is a schematic of a data set with both long tail distribution and tag noise.
Fig. 2 is a flow chart of the method of the present invention.
FIG. 3a is a graph of the variation of the test accuracy of other methods of the present invention (,)。
FIG. 3c is a graph of the variation of the test accuracy of other methods of the present invention: (,)。
FIG. 3d is a graph of the variation of the test accuracy of other methods of the present invention: (,)。
FIG. 3e is a graph of the variation of the test accuracy of other methods of the present invention: (,)。
FIG. 4a is a graph of the variation of the test accuracy of other methods of the present invention (,)。
FIG. 4b is a graph of the variation of the test accuracy of other methods of the present invention (,)。
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1 and 2, a long tail noise learning method based on cross enhanced matching includes the following steps:
the method comprises the following steps: according to the data noise characteristics, a weak data enhancement strategy and a strong data enhancement strategy are respectively adopted for each sample, cross-enhancement matching (cross-entropy matching) is carried out on the prediction results of weak enhancement data and strong enhancement, and cross entropy loss is improved into cross-enhancement matching loss for screening the noise samples.
The invention relates to two data enhancement strategies, namely weak data enhancement and strong data enhancement. The implementation of weak data enhancement (weak augmentation) is simple random flip (flip) and crop (crop), while strong data enhancement (strong augmentation) uses the implementation of AutoAutoAutoAutoaugmentation and adopts a data enhancement strategy automatically selected by a search algorithm on ImageNet.
Is given withA sample image andtraining data set for individual image class labels,In order to be the image of the sample,for noisy labels (i.e. forNot necessarily correct), the prediction result of the classification model is defined asWhereinAs a result of the network parameters,in order to be a function of the mapping,the dimension of expression isKIs determined according to the following cross-correlation matching loss functionWhether it is a correctly labeled sample:
wherein、Respectively a weakly enhanced and a strongly enhanced sample,is composed ofThe prediction of the class result with the highest confidence level,,is shown asiLabeling of individual sample imageskThe degree of confidence of (a) is,the weight parameter is represented by a weight value,is composed ofT is a transposed symbol.
Matching loss of cross enhancement smaller than OTSU (Otsu method, a method for determining image binary segmentation thresholdAlgorithm) threshold valueThe data of (2) is recognized as correct data, and the correct data setIn the training phase, only the sets are usedThe data in (1) calculates the cross-enhancement matching loss, i.e., the total loss can be expressed as:
step two: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a dual-branch batch normalization method is adopted, and different parameters are respectively used for model training of feature maps of weak enhancement and strong enhancement data.
In order to avoid the negative influence of sample data difference on feature extraction caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and specifically, the weak enhancement data is subjected to batch processing standardization in the process of batch processing standardizationAnd strong enhancement dataDifferent mean and variance are cumulatively calculated based on exponential moving averages (EMA in FIG. 2), respectively:
WhereinIs a constant number of times, and is,representing the number of samples in a batch (batch), the normalized output of the batch being、Wherein、Are all characteristic graphs of the middle layer of the neural network,the neural network batches the layer inputs representing weak enhancement inputs,which represents a weakly enhanced mean value, is,the variance of the weak enhancement is indicated,representing a strong enhancement input a neural network batches the layer inputs,which represents the average of the strong enhancement,it is meant that the variance is strongly enhanced,、、、are all learnable radiation parameters.
In the training stage, respectively training a group of batch processing parameters for the weak enhancement data and the strong enhancement data; in the testing stage, only the weak enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong enhancement strategy are discarded.
Step three: regarding the large loss samples screened by the cross-enhancement matching as noise samples with high confidence, a regularization measure (LNOR in fig. 2) for eliminating noise is used to eliminate the negative influence of the samples on the model training.
For large loss samples screened by cross enhancement matching, defining a high-confidence error data set:
Large damage of the screenedThe missing samples are regarded as noise samples with high confidence level and are used for regularizing network model optimization through a noise-removing regularization measure, specifically, a setEach sample in (1)Assume it belongs to a particular sample classUsing the following regularization term constraint setPreventing the prediction result from fitting a wrong noise label:
Step four: aiming at the head class classification advantages caused by the long tail features of the data, a new online prior distribution (online prior distribution) -based prediction penalty is implemented to smooth the prediction result of the head class.
Aiming at the head class classification advantages caused by the data long-tail characteristics, the classification prior probability is evaluated from online prediction to truly reflect class fitting degreeThe prior probability dynamic evaluation of each category is:
whereinIs a constant number of times, and is,initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.,Represents that the number of samples isIn the training data of (2), each class isNumber of training samples of。
Predictive punishment based on online prior distributionThe method is used for smoothing the label according to prior distribution, so that the label with higher prior probability obtains stronger smoothing, thereby enhancing the optimization of the tail category, and the specific definition is as follows:
Adding the prediction punishment based on the online prior distribution into a cross entropy loss function to obtain the result
WhereinFor a constant weighting factor, the above-mentioned loss function can be converted into the following form:
Step five: a phased training strategy is used based on the data noise signature. In the preheating stage, only the cross entropy loss and the online prior distribution loss of the weak enhancement data are calculated; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
The training is divided into a preheating stage and a formal stage, and the loss and the parameters are calculated and updated in the following modes respectively:
step 5.1, in the preheating stage, only weak enhancement data is used for calculating cross entropy loss and online prior distribution, namely:
step 5.2, combining cross enhancement matching loss in the formal training stageNoise rejection regularization lossAnd an online prior distribution prediction penalty term for weak enhancement data and strong enhancement dataScreening out correct data setAnd high confidence error data setThe total loss function is defined as:
and updates the network parameters using a random gradient descent method (SGD).
And in the prediction stage, the sample image to be predicted is input into a model trained by adopting a long tail noise learning method based on cross enhancement matching, and a classification result is output.
The invention defines the sample number ratio between the most sample number category and the least sample number category as an unbalance factor (imbalance factor)I.e. by. The type of long tail data distribution used in the experiments of the present invention is exponential decay distribution.
For the setting of the noise data, there are two cases, i.e., class-independent noise (class-independent noise) and class-dependent noise (class-dependent noise). The category-independent noise assumes that the mislabeled samples are randomly and uniformly distributed, and the category-dependent noise focuses on the phenomenon of artificial labeling error caused by visual similarity. The invention defines the error probability of the sample label as noise rate (noise rate). For class independent noise, each class has a labelThe probability of (d) is labeled as other arbitrary categories by random errors; for class-dependent noise, there are labels for every two classesThe probability of (d) is labeled as the opposite party category.
The method uses a Pythrch frame to carry out experiments, uses a CIFAR-10 data set, uses ResNet-32 as a network model, and uses an SGD optimizer with an initial learning rate of 0.05 and a cosine annealing scheduler. 100 iterations of training are set in both training phases, batch size 128, parametersAll experiments of the invention are trained from zero.
As shown in FIGS. 3a-e, by CE, Coteaching +, JoCoR, Comatring and the method of the present invention, on CIFAR-10 data set, using ResNet-32 network to perform the test accuracy change of long tail distributed noise sample learning, it can be seen that the accuracy of the method of the present invention is superior to that of other methods, wherein the symmetric noise rateImbalance factor。
As shown in fig. 4a-e, by CE. LDAM, Mixup, MisLAS and the method of the invention use ResNet-32 network to study the test accuracy rate of long tail distributed noise sample on CIFAR-10 data set, which shows that the accuracy rate of the method of the invention is superior to that of other methods, wherein the imbalance factorAsymmetric noise ratio。
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A long tail noise learning method based on cross enhancement matching is characterized by comprising the following steps:
step S1: according to the data noise characteristics, respectively adopting a weak data enhancement strategy and a strong data enhancement strategy for each sample image, and carrying out cross enhancement matching on the prediction results of the weak enhancement data and the strong enhancement data to screen a noise sample;
step S2: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and different parameters are respectively used for model training on feature graphs of the weak enhancement data and the strong enhancement data;
step S3: for noise samples screened by cross enhancement matching, a noise elimination regularization measure is used to eliminate the negative influence of the noise samples on model training;
step S4: for the head class classification advantages caused by the long tail characteristics of the data, estimating the classification prior probability from online prediction, and based on the prediction penalty of online prior distribution, smoothing the prediction result of the head class;
step S5: according to the data noise characteristics, a staged training strategy is used, and only the cross entropy loss and the online prior distribution loss of weak enhancement data are calculated in a preheating stage; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
2. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S1, given isA sample image andtraining data set for individual image classes,In order to be the image of the sample,in the form of a noisy tag or label,representing the prediction results of the classification model, whereinAs a result of the network parameters,in order to be a function of the mapping,the dimension of expression isKIs determined according to the following cross-enhancement matching loss functionWhether it is a correctly labeled sample:
wherein、Respectively weak enhancement data and strong enhancement data,for sample imagesThe prediction of the class result with the highest confidence level,,is shown asiAn image of a sampleIs marked with a labelkThe degree of confidence of (a) is,the weight parameter is represented by a weight value,is composed ofT is a transposed symbol.
3. The method according to claim 2, wherein in step S1, the cross-enhancement matching loss is smaller than the OTSU thresholdThe data of (2) is recognized as correct data, and the correct data setIn the training phase, only the sets are usedThe data in (1) calculates the cross-enhancement matching loss, i.e. the total loss is expressed as:
4. the method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S2, a two-branch batch normalization method is used to normalize the weak correlation dataAnd strong enhancement dataCalculating different mean and variance according to exponential moving average accumulation:
WhereinIs a constant number of times, and is,representing the number of sample images in a batch, the normalized output of the batch being、Wherein、Are all characteristic graphs of the middle layer of the neural network,the neural network batches the layer inputs representing weak enhancement inputs,which represents a weakly enhanced mean value, is,the variance of the weak enhancement is indicated,representing a strong enhancement input a neural network batches the layer inputs,which represents the average of the strong enhancement,it is meant that the variance is strongly enhanced,、、、are all learnable radiation parameters.
5. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in the step S2, in the training phase, a set of batch processing parameters is trained on the weak correlation data and the strong correlation data respectively; in the testing stage, only the weak data enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong data enhancement strategy are discarded.
6. The method according to claim 3, wherein in step S3, the filtered noise sample set is used as the high confidence error data set according to the loss of the cross-matching enhancement:
taking the screened large loss sample as a noise sample with high confidence, regularizing a network model through a noise-removing regularization measure, and aligning a setEach sample image in (1)Which belong to a specific classUsing the following regularization term constraint setNetwork prediction result of (1):
7. The method of claim 1, wherein in step S4, the class prior probability is evaluated from online prediction and is set as the first stepThe prior probability dynamic evaluation of each category is:
8. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S4, the penalty of prediction based on online prior distribution is penalizedThe method is used for smoothing the label according to prior distribution, so that the label with higher prior probability obtains stronger smoothing, and the specific formula is as follows:
9. The method for learning long tail noise based on cross-correlation matching as claimed in claim 8, wherein in step S4, the prediction penalty based on online prior distribution is determinedAdding a cross entropy loss functionThen obtaining:
10. The method for learning long tail noise based on cross-correlation matching according to claim 1, wherein the training in step S5 is divided into a preheating stage and a formal stage, and the two stages respectively calculate the loss and update the parameters, including the following steps:
step S5.1, in the preheating stage, cross entropy loss and online prior distribution are calculated by using weak enhancement data:
wherein,represents a cross-entropy loss function of the entropy of the sample,a representation of the image of the sample is shown,indicating that the data is weakly enhanced data,a tag that represents the weakly enhanced data,represents the predictive penalty of an online prior distribution computed over weakly enhanced data,in order to be a constant weighting factor,is a training data set;
step S5.2, in the formal training stage, the cross enhancement matching loss is combinedNoise rejection regularization lossAnd an online prior distribution prediction penalty term for weak enhancement data and strong enhancement dataScreening out correct data setAnd high confidence error data setThe total loss is functionalized as:
and updates the network parameters using a random gradient descent method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111457536.2A CN113869463B (en) | 2021-12-02 | 2021-12-02 | Long tail noise learning method based on cross enhancement matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111457536.2A CN113869463B (en) | 2021-12-02 | 2021-12-02 | Long tail noise learning method based on cross enhancement matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113869463A true CN113869463A (en) | 2021-12-31 |
CN113869463B CN113869463B (en) | 2022-04-15 |
Family
ID=78985557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111457536.2A Active CN113869463B (en) | 2021-12-02 | 2021-12-02 | Long tail noise learning method based on cross enhancement matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113869463B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863193A (en) * | 2022-07-07 | 2022-08-05 | 之江实验室 | Long-tail learning image classification and training method and device based on mixed batch normalization |
CN115423031A (en) * | 2022-09-20 | 2022-12-02 | 腾讯科技(深圳)有限公司 | Model training method and related device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113516207A (en) * | 2021-09-10 | 2021-10-19 | 之江实验室 | Long-tail distribution image classification method with noise label |
-
2021
- 2021-12-02 CN CN202111457536.2A patent/CN113869463B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113516207A (en) * | 2021-09-10 | 2021-10-19 | 之江实验室 | Long-tail distribution image classification method with noise label |
Non-Patent Citations (3)
Title |
---|
GORKEM ALGAN 等: "《Label Noise Types and Their Effects on Deep Learning》", 《ARXIV:2003.10471V1[CS.CV]》 * |
MUHAMMAD ABDULLAH JAMAL 等: "《Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective》", 《ARXIV:2003.10780V1[CS.CV]》 * |
陈庆强 等: "《基于数据分布的标签噪声过滤方法》", 《第六届中国计算机学会大数据学术会议》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863193A (en) * | 2022-07-07 | 2022-08-05 | 之江实验室 | Long-tail learning image classification and training method and device based on mixed batch normalization |
CN115423031A (en) * | 2022-09-20 | 2022-12-02 | 腾讯科技(深圳)有限公司 | Model training method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN113869463B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109101938B (en) | Multi-label age estimation method based on convolutional neural network | |
CN111079847B (en) | Remote sensing image automatic labeling method based on deep learning | |
CN109086799A (en) | A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet | |
CN112396097B (en) | Unsupervised domain self-adaptive visual target detection method based on weighted optimal transmission | |
CN111239137B (en) | Grain quality detection method based on transfer learning and adaptive deep convolution neural network | |
CN113673482B (en) | Cell antinuclear antibody fluorescence recognition method and system based on dynamic label distribution | |
CN113902944A (en) | Model training and scene recognition method, device, equipment and medium | |
CN108596204B (en) | Improved SCDAE-based semi-supervised modulation mode classification model method | |
CN113869463B (en) | Long tail noise learning method based on cross enhancement matching | |
CN111144462B (en) | Unknown individual identification method and device for radar signals | |
CN116912568A (en) | Noise-containing label image recognition method based on self-adaptive class equalization | |
CN113313179B (en) | Noise image classification method based on l2p norm robust least square method | |
CN117173494B (en) | Noise-containing label image recognition method and system based on class balance sample selection | |
CN117197474B (en) | Noise tag learning method based on class equalization and cross combination strategy | |
CN113657473A (en) | Web service classification method based on transfer learning | |
CN116152612B (en) | Long-tail image recognition method and related device | |
CN114863193B (en) | Long-tail learning image classification and training method and device based on mixed batch normalization | |
CN109376619A (en) | A kind of cell detection method | |
CN116486150A (en) | Uncertainty perception-based regression error reduction method for image classification model | |
CN116152194A (en) | Object defect detection method, system, equipment and medium | |
CN116091835A (en) | Regularization combined autonomous training-based field self-adaptive image classification method | |
CN114743042A (en) | Longjing tea quality identification method based on depth features and TrAdaBoost | |
CN115272688A (en) | Small sample learning image classification method based on meta-features | |
CN111797903B (en) | Multi-mode remote sensing image registration method based on data-driven particle swarm optimization | |
CN113762382A (en) | Model training and scene recognition method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |