CN113869463A - Long tail noise learning method based on cross enhancement matching - Google Patents

Long tail noise learning method based on cross enhancement matching Download PDF

Info

Publication number
CN113869463A
CN113869463A CN202111457536.2A CN202111457536A CN113869463A CN 113869463 A CN113869463 A CN 113869463A CN 202111457536 A CN202111457536 A CN 202111457536A CN 113869463 A CN113869463 A CN 113869463A
Authority
CN
China
Prior art keywords
data
enhancement
noise
cross
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111457536.2A
Other languages
Chinese (zh)
Other versions
CN113869463B (en
Inventor
程乐超
茅一宁
苏慧
冯尊磊
宋明黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111457536.2A priority Critical patent/CN113869463B/en
Publication of CN113869463A publication Critical patent/CN113869463A/en
Application granted granted Critical
Publication of CN113869463B publication Critical patent/CN113869463B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a long-tail noise learning method based on cross enhancement matching, which is used for solving the problem of image classification with long-tail characteristics and noise labels at the same time. According to the data noise characteristics, the method screens noise samples by matching prediction results respectively obtained by weak enhancement data and strong enhancement data, and introduces a noise-removing regularization measure to eliminate the influence of the identified noise samples. For data long-tail features, the method implements a new online prior distribution-based prediction penalty to avoid bias on the head class. The method is simple and convenient to implement, flexible in means and superior in the aspect of obtaining the class fitting degree in real time, so that the method achieves remarkable classification effect improvement on long-tail data, noise data and training data with the characteristics of the long-tail data and the noise data.

Description

Long tail noise learning method based on cross enhancement matching
Technical Field
The invention relates to the field of image classification, in particular to a method for classifying images under the condition that a noise label and long tail distribution exist simultaneously.
Background
In recent years, Convolutional Neural Networks (CNNs) have been widely used in the field of computer vision. In the case of a fixed amount of training data, the overfitting phenomenon is increasingly prominent due to the increase of the number of parameters, and the requirement for accurately labeling data is also increased in order to improve the overall performance. However, obtaining a large number of accurately labeled samples is often quite expensive. In this regard, non-expert crowd-sourcing or systematic tagging is a practical solution, however this easily leads to mislabeling of tags. Many reference datasets, such as ImageNet, CIFAR-10/-100, MNIST, QuickDraw, etc., contain 3% to 10% noise label samples. Existing research on noisy labels has generally focused on splitting correctly labeled and incorrectly labeled samples, but neglecting the distribution of the data. In the real world, data often presents the characteristic of long tail distribution, several main categories in the data set occupy the dominance, and the data of other categories is insufficient in quantity. Therefore, it is very important to study how to train the model on the data set with both long tail distribution and label noise as shown in fig. 1 in practical application.
Noise label learning has received a lot of attention in recent years and also has achieved surprising effects. Because the convolutional neural network can learn a simple general mode of real data before fitting a noise sample in training, most of the existing methods adopt a cross entropy loss function to fit a model prediction result and a data label. However, in a data set with a long tail distribution, since the training data is dominated by the head class, cross entropy loss is difficult to distinguish between correct and incorrect samples of the tail class. For the long-tail image classification task, a series of data re-balancing (re-balancing) based strategies such as re-weighting and resampling balance the training data based on the number of samples of the class. However, in the presence of label noise, the number of samples for each class is unknown, and the number of samples does not reflect the degree of real-time fit of the class. Based on the above analysis, the existing deep neural network CNN still has no effective solution for the data set having both long tail features and tag noise.
Disclosure of Invention
In order to solve the defects of the prior art and avoid the problems that a co-training strategy (co-training) of noise data is difficult to distinguish correct and wrong samples of tail categories on long-tail data, and a data rebalancing strategy of long-tail classification has a poor classification effect on noisy data, the invention adopts the following technical scheme:
a long tail noise learning method based on cross enhancement matching comprises the following steps:
step S1: according to the data noise characteristics, respectively adopting a weak data enhancement strategy and a strong data enhancement strategy for each sample image, carrying out cross-enhancement matching (cross-enhancement matching) on the prediction results of the weak enhancement data and the strong enhancement data, and improving cross entropy loss into cross-enhancement matching loss for screening noise samples;
step S2: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and different parameters are respectively used for model training on feature graphs of the weak enhancement data and the strong enhancement data;
step S3: for the large loss sample screened by cross enhancement matching, the large loss sample is used as a noise sample with high confidence level, and a regularization measure for eliminating noise is used to eliminate the negative influence of the noise sample with high confidence level on model training;
step S4: for the head class classification advantages caused by data long-tail features, the classification prior probability is evaluated from online prediction, so that the class fitting degree is truly reflected, and the prediction penalty based on online prior distribution (online prior distribution) is used for smoothing the prediction result of the head class;
step S5: according to the data noise characteristics, a staged training strategy is used, and only the cross entropy loss and the online prior distribution loss of weak enhancement data are calculated in a preheating stage; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
Further, in the step S1, given is
Figure 251904DEST_PATH_IMAGE001
A sample image and
Figure 637886DEST_PATH_IMAGE002
training data set for individual image classes
Figure 11098DEST_PATH_IMAGE003
Figure 875149DEST_PATH_IMAGE004
In order to be the image of the sample,
Figure 548576DEST_PATH_IMAGE005
for noisy labels (i.e. for
Figure 839880DEST_PATH_IMAGE006
Not necessarily correct),
Figure 201853DEST_PATH_IMAGE007
representing the prediction results of the classification model, wherein
Figure 869595DEST_PATH_IMAGE008
As a result of the network parameters,
Figure 131949DEST_PATH_IMAGE009
in order to be a function of the mapping,
Figure 922051DEST_PATH_IMAGE010
the dimension of expression isKIs determined according to the following cross-enhancement matching loss function
Figure 145222DEST_PATH_IMAGE004
Whether it is a correctly labeled sample:
Figure 475709DEST_PATH_IMAGE011
Figure 467936DEST_PATH_IMAGE012
wherein
Figure 989790DEST_PATH_IMAGE013
Figure 965837DEST_PATH_IMAGE014
Respectively weak enhancement data and strong enhancement data,
Figure 834435DEST_PATH_IMAGE015
for sample images
Figure 946748DEST_PATH_IMAGE004
The prediction of the class result with the highest confidence level,
Figure 875390DEST_PATH_IMAGE016
Figure 401049DEST_PATH_IMAGE017
is shown asiAn image of a sample
Figure 948705DEST_PATH_IMAGE004
Is marked with a labelkThe degree of confidence of (a) is,
Figure 541623DEST_PATH_IMAGE018
the weight parameter is represented by a weight value,
Figure 47690DEST_PATH_IMAGE019
is composed of
Figure 591804DEST_PATH_IMAGE006
T is a transposed symbol.
Further, in the step S1, the crossover is enhancedDistribution loss less than OTSU (Otsu), an algorithm for determining image binary segmentation threshold value)
Figure 943151DEST_PATH_IMAGE020
The data of (2) is recognized as correct data, and the correct data set
Figure 889110DEST_PATH_IMAGE021
In the training phase, only the sets are used
Figure 300500DEST_PATH_IMAGE022
The data in (1) calculates the cross-enhancement matching loss, i.e. the total loss is expressed as:
Figure 331910DEST_PATH_IMAGE023
further, in the step S2, in order to avoid a negative effect of the sample data difference caused by the weak data enhancement and strong data enhancement strategies on the feature extraction, a dual-branch batch processing standardization method is adopted, specifically, a weak enhancement data difference is subjected to a batch processing standardization process
Figure 47800DEST_PATH_IMAGE013
And strong enhancement data
Figure 989211DEST_PATH_IMAGE014
Calculating different mean and variance according to exponential moving average accumulation
Figure 696136DEST_PATH_IMAGE024
Figure 355787DEST_PATH_IMAGE025
Figure 173571DEST_PATH_IMAGE026
Wherein
Figure 703909DEST_PATH_IMAGE027
Is a constant number of times, and is,
Figure 581735DEST_PATH_IMAGE028
the number of sample images in a batch (batch) is shown, and the normalized output of the batch is
Figure 292465DEST_PATH_IMAGE029
Figure 789305DEST_PATH_IMAGE030
Wherein
Figure 564363DEST_PATH_IMAGE031
Figure 222877DEST_PATH_IMAGE032
Are all characteristic graphs of the middle layer of the neural network,
Figure 716176DEST_PATH_IMAGE031
the neural network batches the layer inputs representing weak enhancement inputs,
Figure 751128DEST_PATH_IMAGE033
which represents a weakly enhanced mean value, is,
Figure 646271DEST_PATH_IMAGE034
the variance of the weak enhancement is indicated,
Figure 741266DEST_PATH_IMAGE032
representing a strong enhancement input a neural network batches the layer inputs,
Figure 243833DEST_PATH_IMAGE035
which represents the average of the strong enhancement,
Figure 410372DEST_PATH_IMAGE036
it is meant that the variance is strongly enhanced,
Figure 769810DEST_PATH_IMAGE037
Figure 160340DEST_PATH_IMAGE038
Figure 238017DEST_PATH_IMAGE039
Figure 4985DEST_PATH_IMAGE040
are all learnable radiation parameters.
Further, in the step S2, in the training phase, a set of batch processing parameters is trained on the weak enhancement data and the strong enhancement data respectively; in the testing stage, only the weak data enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong data enhancement strategy are discarded.
Further, in step S3, according to the loss of the cross-enhancing matching, the filtered noise sample set is used as a high confidence error data set:
Figure 484508DEST_PATH_IMAGE041
wherein the set
Figure 812983DEST_PATH_IMAGE042
Is restricted to
Figure 377957DEST_PATH_IMAGE043
Figure 886298DEST_PATH_IMAGE044
Is a constant;
taking the screened large loss sample as a noise sample with high confidence, regularizing a network model through a noise-removing regularization measure, and aligning a set
Figure 344962DEST_PATH_IMAGE045
Each sample image in (1)
Figure 952660DEST_PATH_IMAGE046
Which belong to a specific class
Figure 395143DEST_PATH_IMAGE047
Using the following regularization term constraint set
Figure 113700DEST_PATH_IMAGE045
Preventing the prediction result from fitting a wrong noise label:
Figure 190984DEST_PATH_IMAGE048
wherein,
Figure 297481DEST_PATH_IMAGE049
is shown asjAn image of a sample
Figure 837046DEST_PATH_IMAGE046
Is marked with a labely j The confidence of (c).
Further, in step S4, the classification prior probability is evaluated from the online prediction
Figure 483928DEST_PATH_IMAGE050
The prior probability dynamic evaluation of each category is:
Figure 792550DEST_PATH_IMAGE051
wherein
Figure 132264DEST_PATH_IMAGE052
Is a constant number of times, and is,
Figure 893547DEST_PATH_IMAGE053
initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.
Figure 845585DEST_PATH_IMAGE054
Figure 8713DEST_PATH_IMAGE055
Represents that the number of samples is
Figure 457012DEST_PATH_IMAGE001
In the training data of (2), each class is
Figure 187814DEST_PATH_IMAGE050
Number of training samples of
Figure 442078DEST_PATH_IMAGE056
Further, in the step S4, the penalty of prediction based on the online prior distribution
Figure 459712DEST_PATH_IMAGE057
And is used for smoothing the labels according to prior distribution, so that the labels with higher prior probability are more strongly smoothed, thereby enhancing the optimization of the tail category, and the specific formula is as follows:
Figure 141230DEST_PATH_IMAGE058
wherein,
Figure 205000DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 872742DEST_PATH_IMAGE053
the probability of a priori is represented and,
Figure 636561DEST_PATH_IMAGE059
representing the prediction results of the classification model, wherein
Figure 364346DEST_PATH_IMAGE008
As a result of the network parameters,
Figure 915413DEST_PATH_IMAGE009
is a mapping function.
Further, in the step S4, the prediction based on the online prior distribution is performedPunishment
Figure 245900DEST_PATH_IMAGE057
Adding a cross entropy loss function
Figure 238127DEST_PATH_IMAGE060
Then obtaining:
Figure 261446DEST_PATH_IMAGE061
wherein
Figure 971913DEST_PATH_IMAGE062
Is a constant weighting coefficient; loss function
Figure 604627DEST_PATH_IMAGE063
Is converted into the following form:
Figure 716939DEST_PATH_IMAGE064
Figure 848843DEST_PATH_IMAGE065
wherein,
Figure 171240DEST_PATH_IMAGE005
in the form of a noisy tag or label,
Figure 718896DEST_PATH_IMAGE017
is shown asiAn image of a sample
Figure 75928DEST_PATH_IMAGE004
Is marked with a labelkThe degree of confidence of (a) is,
Figure 50837DEST_PATH_IMAGE066
is shown as
Figure 361995DEST_PATH_IMAGE050
Prior probability of individual classes.
Further, in the step S5, the training is divided into a preheating stage and a formal stage, and the two stages respectively calculate the loss and update the parameters, including the following steps:
step S5.1, in the preheating stage, cross entropy loss and online prior distribution are calculated by using weak enhancement data:
Figure 713342DEST_PATH_IMAGE067
wherein,
Figure 862564DEST_PATH_IMAGE060
represents a cross-entropy loss function of the entropy of the sample,
Figure 398587DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 305363DEST_PATH_IMAGE068
indicating that the data is weakly enhanced data,
Figure 585035DEST_PATH_IMAGE006
a tag that represents the weakly enhanced data,
Figure 526446DEST_PATH_IMAGE069
represents the predictive penalty of an online prior distribution computed over weak enhancement data,
Figure 466327DEST_PATH_IMAGE062
in order to be a constant weighting factor,
Figure 188295DEST_PATH_IMAGE070
is a training data set;
step S5.2, in the formal training stage, the cross enhancement matching loss is combined
Figure 147024DEST_PATH_IMAGE071
Noise rejection regularization loss
Figure 67576DEST_PATH_IMAGE072
And an online prior distribution prediction penalty term for weak enhancement data and strong enhancement data
Figure 820768DEST_PATH_IMAGE073
Screening out correct data set
Figure 30032DEST_PATH_IMAGE022
And high confidence error data set
Figure 887392DEST_PATH_IMAGE045
The total loss is functionalized as:
Figure 537816DEST_PATH_IMAGE074
and updates the network parameters using a random gradient descent method (SGD).
The invention has the advantages and beneficial effects that:
according to the method, according to data noise characteristics, noise samples are screened by matching prediction results respectively obtained by weak enhancement data and strong enhancement data, a noise-rejection regularization measure (leave-noise-out regularization) is introduced to eliminate the influence of the identified noise samples, and a new prediction penalty based on online prior distribution (online prior distribution) is implemented by the method aiming at data long tail characteristics to avoid bias of head types.
Drawings
FIG. 1 is a schematic of a data set with both long tail distribution and tag noise.
Fig. 2 is a flow chart of the method of the present invention.
FIG. 3a is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 586544DEST_PATH_IMAGE075
Figure 220788DEST_PATH_IMAGE076
)。
FIG. 3b is the present inventionTest accuracy rate change chart of other methods (A)
Figure 380373DEST_PATH_IMAGE075
Figure 885304DEST_PATH_IMAGE077
)。
FIG. 3c is a graph of the variation of the test accuracy of other methods of the present invention: (
Figure 104933DEST_PATH_IMAGE075
Figure 787325DEST_PATH_IMAGE078
)。
FIG. 3d is a graph of the variation of the test accuracy of other methods of the present invention: (
Figure 360389DEST_PATH_IMAGE075
Figure 375618DEST_PATH_IMAGE079
)。
FIG. 3e is a graph of the variation of the test accuracy of other methods of the present invention: (
Figure 641514DEST_PATH_IMAGE075
Figure 843825DEST_PATH_IMAGE080
)。
FIG. 4a is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 814055DEST_PATH_IMAGE081
Figure 293578DEST_PATH_IMAGE082
)。
FIG. 4b is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 356475DEST_PATH_IMAGE081
Figure 921448DEST_PATH_IMAGE083
)。
FIG. 4c is a graph of the change in test accuracy of other methods of the present invention: (
Figure 960948DEST_PATH_IMAGE081
Figure 826136DEST_PATH_IMAGE084
)。
FIG. 4d is a graph of the change in test accuracy of other methods of the present invention (
Figure 558469DEST_PATH_IMAGE081
Figure 610738DEST_PATH_IMAGE085
)。
FIG. 4e is a graph of the change in test accuracy of other methods of the present invention: (
Figure 657192DEST_PATH_IMAGE081
Figure 734476DEST_PATH_IMAGE086
)。
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1 and 2, a long tail noise learning method based on cross enhanced matching includes the following steps:
the method comprises the following steps: according to the data noise characteristics, a weak data enhancement strategy and a strong data enhancement strategy are respectively adopted for each sample, cross-enhancement matching (cross-entropy matching) is carried out on the prediction results of weak enhancement data and strong enhancement, and cross entropy loss is improved into cross-enhancement matching loss for screening the noise samples.
The invention relates to two data enhancement strategies, namely weak data enhancement and strong data enhancement. The implementation of weak data enhancement (weak augmentation) is simple random flip (flip) and crop (crop), while strong data enhancement (strong augmentation) uses the implementation of AutoAutoAutoAutoaugmentation and adopts a data enhancement strategy automatically selected by a search algorithm on ImageNet.
Is given with
Figure 513076DEST_PATH_IMAGE001
A sample image and
Figure 442855DEST_PATH_IMAGE002
training data set for individual image class labels
Figure 965103DEST_PATH_IMAGE003
Figure 398358DEST_PATH_IMAGE004
In order to be the image of the sample,
Figure 347860DEST_PATH_IMAGE005
for noisy labels (i.e. for
Figure 233776DEST_PATH_IMAGE006
Not necessarily correct), the prediction result of the classification model is defined as
Figure 389076DEST_PATH_IMAGE007
Wherein
Figure 552204DEST_PATH_IMAGE008
As a result of the network parameters,
Figure 797241DEST_PATH_IMAGE087
in order to be a function of the mapping,
Figure 576978DEST_PATH_IMAGE010
the dimension of expression isKIs determined according to the following cross-correlation matching loss function
Figure 565663DEST_PATH_IMAGE004
Whether it is a correctly labeled sample:
Figure 583297DEST_PATH_IMAGE011
Figure 264814DEST_PATH_IMAGE088
wherein
Figure 689DEST_PATH_IMAGE013
Figure 229283DEST_PATH_IMAGE014
Respectively a weakly enhanced and a strongly enhanced sample,
Figure 491637DEST_PATH_IMAGE015
is composed of
Figure 219422DEST_PATH_IMAGE004
The prediction of the class result with the highest confidence level,
Figure 301647DEST_PATH_IMAGE016
Figure 773080DEST_PATH_IMAGE017
is shown asiLabeling of individual sample imageskThe degree of confidence of (a) is,
Figure 889940DEST_PATH_IMAGE018
the weight parameter is represented by a weight value,
Figure 788626DEST_PATH_IMAGE019
is composed of
Figure 125192DEST_PATH_IMAGE006
T is a transposed symbol.
Matching loss of cross enhancement smaller than OTSU (Otsu method, a method for determining image binary segmentation thresholdAlgorithm) threshold value
Figure 134736DEST_PATH_IMAGE020
The data of (2) is recognized as correct data, and the correct data set
Figure 106103DEST_PATH_IMAGE021
In the training phase, only the sets are used
Figure 503586DEST_PATH_IMAGE022
The data in (1) calculates the cross-enhancement matching loss, i.e., the total loss can be expressed as:
Figure 435770DEST_PATH_IMAGE089
step two: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a dual-branch batch normalization method is adopted, and different parameters are respectively used for model training of feature maps of weak enhancement and strong enhancement data.
In order to avoid the negative influence of sample data difference on feature extraction caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and specifically, the weak enhancement data is subjected to batch processing standardization in the process of batch processing standardization
Figure 373639DEST_PATH_IMAGE013
And strong enhancement data
Figure 340458DEST_PATH_IMAGE014
Different mean and variance are cumulatively calculated based on exponential moving averages (EMA in FIG. 2), respectively
Figure 961974DEST_PATH_IMAGE024
Figure 912612DEST_PATH_IMAGE090
Figure 388593DEST_PATH_IMAGE091
Wherein
Figure 537815DEST_PATH_IMAGE027
Is a constant number of times, and is,
Figure 683625DEST_PATH_IMAGE028
representing the number of samples in a batch (batch), the normalized output of the batch being
Figure 980614DEST_PATH_IMAGE029
Figure 135652DEST_PATH_IMAGE030
Wherein
Figure 437583DEST_PATH_IMAGE031
Figure 19874DEST_PATH_IMAGE032
Are all characteristic graphs of the middle layer of the neural network,
Figure 741842DEST_PATH_IMAGE031
the neural network batches the layer inputs representing weak enhancement inputs,
Figure 559626DEST_PATH_IMAGE033
which represents a weakly enhanced mean value, is,
Figure 355543DEST_PATH_IMAGE034
the variance of the weak enhancement is indicated,
Figure 233369DEST_PATH_IMAGE032
representing a strong enhancement input a neural network batches the layer inputs,
Figure 114738DEST_PATH_IMAGE035
which represents the average of the strong enhancement,
Figure 438010DEST_PATH_IMAGE036
it is meant that the variance is strongly enhanced,
Figure 213067DEST_PATH_IMAGE037
Figure 402740DEST_PATH_IMAGE038
Figure 896039DEST_PATH_IMAGE039
Figure 258887DEST_PATH_IMAGE040
are all learnable radiation parameters.
In the training stage, respectively training a group of batch processing parameters for the weak enhancement data and the strong enhancement data; in the testing stage, only the weak enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong enhancement strategy are discarded.
Step three: regarding the large loss samples screened by the cross-enhancement matching as noise samples with high confidence, a regularization measure (LNOR in fig. 2) for eliminating noise is used to eliminate the negative influence of the samples on the model training.
For large loss samples screened by cross enhancement matching, defining a high-confidence error data set:
Figure 763817DEST_PATH_IMAGE092
wherein the set
Figure 750490DEST_PATH_IMAGE042
Is restricted to
Figure 340872DEST_PATH_IMAGE043
Figure 304148DEST_PATH_IMAGE044
Is a constant.
Large damage of the screenedThe missing samples are regarded as noise samples with high confidence level and are used for regularizing network model optimization through a noise-removing regularization measure, specifically, a set
Figure 929165DEST_PATH_IMAGE045
Each sample in (1)
Figure 319695DEST_PATH_IMAGE046
Assume it belongs to a particular sample class
Figure 397372DEST_PATH_IMAGE047
Using the following regularization term constraint set
Figure 898761DEST_PATH_IMAGE045
Preventing the prediction result from fitting a wrong noise label:
Figure 378284DEST_PATH_IMAGE093
wherein,
Figure 438250DEST_PATH_IMAGE049
is shown asjAn image of a sample
Figure 331120DEST_PATH_IMAGE046
Is marked with a labely j The confidence of (c).
Step four: aiming at the head class classification advantages caused by the long tail features of the data, a new online prior distribution (online prior distribution) -based prediction penalty is implemented to smooth the prediction result of the head class.
Aiming at the head class classification advantages caused by the data long-tail characteristics, the classification prior probability is evaluated from online prediction to truly reflect class fitting degree
Figure 308303DEST_PATH_IMAGE050
The prior probability dynamic evaluation of each category is:
Figure 235808DEST_PATH_IMAGE094
wherein
Figure 843507DEST_PATH_IMAGE052
Is a constant number of times, and is,
Figure 754831DEST_PATH_IMAGE053
initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.
Figure 738967DEST_PATH_IMAGE054
Figure 819181DEST_PATH_IMAGE055
Represents that the number of samples is
Figure 597781DEST_PATH_IMAGE001
In the training data of (2), each class is
Figure 996402DEST_PATH_IMAGE050
Number of training samples of
Figure 784229DEST_PATH_IMAGE056
Predictive punishment based on online prior distribution
Figure 217484DEST_PATH_IMAGE057
The method is used for smoothing the label according to prior distribution, so that the label with higher prior probability obtains stronger smoothing, thereby enhancing the optimization of the tail category, and the specific definition is as follows:
Figure 166986DEST_PATH_IMAGE095
wherein,
Figure 318482DEST_PATH_IMAGE057
and (4) showing.
Adding the prediction punishment based on the online prior distribution into a cross entropy loss function to obtain the result
Figure 113262DEST_PATH_IMAGE061
Wherein
Figure 430718DEST_PATH_IMAGE062
For a constant weighting factor, the above-mentioned loss function can be converted into the following form:
Figure 551120DEST_PATH_IMAGE096
Figure 924333DEST_PATH_IMAGE097
wherein,
Figure 788384DEST_PATH_IMAGE066
is shown as
Figure 196231DEST_PATH_IMAGE050
Prior probability of individual classes.
Step five: a phased training strategy is used based on the data noise signature. In the preheating stage, only the cross entropy loss and the online prior distribution loss of the weak enhancement data are calculated; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
The training is divided into a preheating stage and a formal stage, and the loss and the parameters are calculated and updated in the following modes respectively:
step 5.1, in the preheating stage, only weak enhancement data is used for calculating cross entropy loss and online prior distribution, namely:
Figure 487535DEST_PATH_IMAGE098
step 5.2, combining cross enhancement matching loss in the formal training stage
Figure 348044DEST_PATH_IMAGE071
Noise rejection regularization loss
Figure 15786DEST_PATH_IMAGE072
And an online prior distribution prediction penalty term for weak enhancement data and strong enhancement data
Figure 779605DEST_PATH_IMAGE073
Screening out correct data set
Figure 241810DEST_PATH_IMAGE022
And high confidence error data set
Figure 792877DEST_PATH_IMAGE045
The total loss function is defined as:
Figure 388943DEST_PATH_IMAGE099
and updates the network parameters using a random gradient descent method (SGD).
And in the prediction stage, the sample image to be predicted is input into a model trained by adopting a long tail noise learning method based on cross enhancement matching, and a classification result is output.
The invention defines the sample number ratio between the most sample number category and the least sample number category as an unbalance factor (imbalance factor)
Figure 381170DEST_PATH_IMAGE100
I.e. by
Figure 404490DEST_PATH_IMAGE101
. The type of long tail data distribution used in the experiments of the present invention is exponential decay distribution.
For the setting of the noise data, there are two cases, i.e., class-independent noise (class-independent noise) and class-dependent noise (class-dependent noise). The category-independent noise assumes that the mislabeled samples are randomly and uniformly distributed, and the category-dependent noise focuses on the phenomenon of artificial labeling error caused by visual similarity. The invention defines the error probability of the sample label as noise rate (noise rate)
Figure 675809DEST_PATH_IMAGE102
. For class independent noise, each class has a label
Figure 809987DEST_PATH_IMAGE102
The probability of (d) is labeled as other arbitrary categories by random errors; for class-dependent noise, there are labels for every two classes
Figure 656720DEST_PATH_IMAGE102
The probability of (d) is labeled as the opposite party category.
The method uses a Pythrch frame to carry out experiments, uses a CIFAR-10 data set, uses ResNet-32 as a network model, and uses an SGD optimizer with an initial learning rate of 0.05 and a cosine annealing scheduler. 100 iterations of training are set in both training phases, batch size 128, parameters
Figure 850941DEST_PATH_IMAGE103
All experiments of the invention are trained from zero.
As shown in FIGS. 3a-e, by CE, Coteaching +, JoCoR, Comatring and the method of the present invention, on CIFAR-10 data set, using ResNet-32 network to perform the test accuracy change of long tail distributed noise sample learning, it can be seen that the accuracy of the method of the present invention is superior to that of other methods, wherein the symmetric noise rate
Figure 783125DEST_PATH_IMAGE075
Imbalance factor
Figure 720994DEST_PATH_IMAGE104
As shown in fig. 4a-e, by CE. LDAM, Mixup, MisLAS and the method of the invention use ResNet-32 network to study the test accuracy rate of long tail distributed noise sample on CIFAR-10 data set, which shows that the accuracy rate of the method of the invention is superior to that of other methods, wherein the imbalance factor
Figure 422234DEST_PATH_IMAGE081
Asymmetric noise ratio
Figure 288821DEST_PATH_IMAGE105
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A long tail noise learning method based on cross enhancement matching is characterized by comprising the following steps:
step S1: according to the data noise characteristics, respectively adopting a weak data enhancement strategy and a strong data enhancement strategy for each sample image, and carrying out cross enhancement matching on the prediction results of the weak enhancement data and the strong enhancement data to screen a noise sample;
step S2: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and different parameters are respectively used for model training on feature graphs of the weak enhancement data and the strong enhancement data;
step S3: for noise samples screened by cross enhancement matching, a noise elimination regularization measure is used to eliminate the negative influence of the noise samples on model training;
step S4: for the head class classification advantages caused by the long tail characteristics of the data, estimating the classification prior probability from online prediction, and based on the prediction penalty of online prior distribution, smoothing the prediction result of the head class;
step S5: according to the data noise characteristics, a staged training strategy is used, and only the cross entropy loss and the online prior distribution loss of weak enhancement data are calculated in a preheating stage; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
2. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S1, given is
Figure 348148DEST_PATH_IMAGE001
A sample image and
Figure 965074DEST_PATH_IMAGE002
training data set for individual image classes
Figure 911033DEST_PATH_IMAGE003
Figure 322423DEST_PATH_IMAGE004
In order to be the image of the sample,
Figure 353833DEST_PATH_IMAGE005
in the form of a noisy tag or label,
Figure 836767DEST_PATH_IMAGE006
representing the prediction results of the classification model, wherein
Figure 512599DEST_PATH_IMAGE007
As a result of the network parameters,
Figure 720989DEST_PATH_IMAGE008
in order to be a function of the mapping,
Figure 115061DEST_PATH_IMAGE009
the dimension of expression isKIs determined according to the following cross-enhancement matching loss function
Figure 932844DEST_PATH_IMAGE004
Whether it is a correctly labeled sample:
Figure 994341DEST_PATH_IMAGE010
Figure 606588DEST_PATH_IMAGE011
wherein
Figure 487956DEST_PATH_IMAGE012
Figure 312693DEST_PATH_IMAGE013
Respectively weak enhancement data and strong enhancement data,
Figure 586286DEST_PATH_IMAGE014
for sample images
Figure 510380DEST_PATH_IMAGE004
The prediction of the class result with the highest confidence level,
Figure 738099DEST_PATH_IMAGE015
Figure 38630DEST_PATH_IMAGE016
is shown asiAn image of a sample
Figure 933774DEST_PATH_IMAGE004
Is marked with a labelkThe degree of confidence of (a) is,
Figure 28769DEST_PATH_IMAGE017
the weight parameter is represented by a weight value,
Figure 947046DEST_PATH_IMAGE018
is composed of
Figure 411788DEST_PATH_IMAGE019
T is a transposed symbol.
3. The method according to claim 2, wherein in step S1, the cross-enhancement matching loss is smaller than the OTSU threshold
Figure 36804DEST_PATH_IMAGE020
The data of (2) is recognized as correct data, and the correct data set
Figure 427334DEST_PATH_IMAGE021
In the training phase, only the sets are used
Figure 770591DEST_PATH_IMAGE022
The data in (1) calculates the cross-enhancement matching loss, i.e. the total loss is expressed as:
Figure 271979DEST_PATH_IMAGE023
4. the method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S2, a two-branch batch normalization method is used to normalize the weak correlation data
Figure 751502DEST_PATH_IMAGE012
And strong enhancement data
Figure 781775DEST_PATH_IMAGE013
Calculating different mean and variance according to exponential moving average accumulation
Figure 173180DEST_PATH_IMAGE024
Figure 743839DEST_PATH_IMAGE025
Figure 405764DEST_PATH_IMAGE026
Wherein
Figure 13463DEST_PATH_IMAGE027
Is a constant number of times, and is,
Figure 455946DEST_PATH_IMAGE028
representing the number of sample images in a batch, the normalized output of the batch being
Figure 269443DEST_PATH_IMAGE029
Figure 457979DEST_PATH_IMAGE030
Wherein
Figure 361213DEST_PATH_IMAGE031
Figure 166358DEST_PATH_IMAGE032
Are all characteristic graphs of the middle layer of the neural network,
Figure 16502DEST_PATH_IMAGE031
the neural network batches the layer inputs representing weak enhancement inputs,
Figure 449758DEST_PATH_IMAGE033
which represents a weakly enhanced mean value, is,
Figure 664838DEST_PATH_IMAGE034
the variance of the weak enhancement is indicated,
Figure 314869DEST_PATH_IMAGE032
representing a strong enhancement input a neural network batches the layer inputs,
Figure 703125DEST_PATH_IMAGE035
which represents the average of the strong enhancement,
Figure 866253DEST_PATH_IMAGE036
it is meant that the variance is strongly enhanced,
Figure 642448DEST_PATH_IMAGE037
Figure 218923DEST_PATH_IMAGE038
Figure 82974DEST_PATH_IMAGE039
Figure 992286DEST_PATH_IMAGE040
are all learnable radiation parameters.
5. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in the step S2, in the training phase, a set of batch processing parameters is trained on the weak correlation data and the strong correlation data respectively; in the testing stage, only the weak data enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong data enhancement strategy are discarded.
6. The method according to claim 3, wherein in step S3, the filtered noise sample set is used as the high confidence error data set according to the loss of the cross-matching enhancement:
Figure 549169DEST_PATH_IMAGE041
wherein the set
Figure 409678DEST_PATH_IMAGE042
Is restricted to
Figure 77420DEST_PATH_IMAGE043
Figure 543036DEST_PATH_IMAGE044
Is a constant;
taking the screened large loss sample as a noise sample with high confidence, regularizing a network model through a noise-removing regularization measure, and aligning a set
Figure 129875DEST_PATH_IMAGE045
Each sample image in (1)
Figure 618625DEST_PATH_IMAGE046
Which belong to a specific class
Figure 471085DEST_PATH_IMAGE047
Using the following regularization term constraint set
Figure 463312DEST_PATH_IMAGE045
Network prediction result of (1):
Figure 689894DEST_PATH_IMAGE048
wherein,
Figure 524995DEST_PATH_IMAGE049
is shown asjAn image of a sample
Figure 268960DEST_PATH_IMAGE046
Is marked with a labely j The confidence of (c).
7. The method of claim 1, wherein in step S4, the class prior probability is evaluated from online prediction and is set as the first step
Figure 771486DEST_PATH_IMAGE050
The prior probability dynamic evaluation of each category is:
Figure 575493DEST_PATH_IMAGE051
wherein
Figure 399355DEST_PATH_IMAGE052
Is a constant number of times, and is,
Figure 274907DEST_PATH_IMAGE053
initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.
Figure 241726DEST_PATH_IMAGE054
Figure 872428DEST_PATH_IMAGE055
Represents that the number of samples is
Figure 291908DEST_PATH_IMAGE001
In the training data of (2), each class is
Figure 971151DEST_PATH_IMAGE050
Number of training samples of
Figure 681225DEST_PATH_IMAGE056
8. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S4, the penalty of prediction based on online prior distribution is penalized
Figure 92614DEST_PATH_IMAGE057
The method is used for smoothing the label according to prior distribution, so that the label with higher prior probability obtains stronger smoothing, and the specific formula is as follows:
Figure 327287DEST_PATH_IMAGE058
wherein,
Figure 341379DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 548369DEST_PATH_IMAGE053
the probability of a priori is represented and,
Figure 458557DEST_PATH_IMAGE059
representing the prediction results of the classification model, wherein
Figure 478727DEST_PATH_IMAGE007
As a result of the network parameters,
Figure 437456DEST_PATH_IMAGE008
is a mapping function.
9. The method for learning long tail noise based on cross-correlation matching as claimed in claim 8, wherein in step S4, the prediction penalty based on online prior distribution is determined
Figure 295691DEST_PATH_IMAGE057
Adding a cross entropy loss function
Figure 173517DEST_PATH_IMAGE060
Then obtaining:
Figure 913940DEST_PATH_IMAGE061
wherein
Figure 410780DEST_PATH_IMAGE062
Is a constant weighting coefficient; loss function
Figure 684373DEST_PATH_IMAGE063
Is converted into the following form:
Figure 342888DEST_PATH_IMAGE064
Figure 836186DEST_PATH_IMAGE065
wherein,
Figure 871138DEST_PATH_IMAGE005
in the form of a noisy tag or label,
Figure 766282DEST_PATH_IMAGE016
is shown asiAn image of a sample
Figure 923594DEST_PATH_IMAGE004
Is marked with a labelkThe degree of confidence of (a) is,
Figure 779554DEST_PATH_IMAGE066
is shown as
Figure 244296DEST_PATH_IMAGE050
Prior probability of individual classes.
10. The method for learning long tail noise based on cross-correlation matching according to claim 1, wherein the training in step S5 is divided into a preheating stage and a formal stage, and the two stages respectively calculate the loss and update the parameters, including the following steps:
step S5.1, in the preheating stage, cross entropy loss and online prior distribution are calculated by using weak enhancement data:
Figure 603733DEST_PATH_IMAGE067
wherein,
Figure 259842DEST_PATH_IMAGE060
represents a cross-entropy loss function of the entropy of the sample,
Figure 665416DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 307750DEST_PATH_IMAGE068
indicating that the data is weakly enhanced data,
Figure 177486DEST_PATH_IMAGE019
a tag that represents the weakly enhanced data,
Figure 440714DEST_PATH_IMAGE069
represents the predictive penalty of an online prior distribution computed over weakly enhanced data,
Figure 271267DEST_PATH_IMAGE062
in order to be a constant weighting factor,
Figure 576347DEST_PATH_IMAGE070
is a training data set;
step S5.2, in the formal training stage, the cross enhancement matching loss is combined
Figure 238272DEST_PATH_IMAGE071
Noise rejection regularization loss
Figure 845971DEST_PATH_IMAGE072
And an online prior distribution prediction penalty term for weak enhancement data and strong enhancement data
Figure 491716DEST_PATH_IMAGE073
Screening out correct data set
Figure 836372DEST_PATH_IMAGE022
And high confidence error data set
Figure 556066DEST_PATH_IMAGE045
The total loss is functionalized as:
Figure 662562DEST_PATH_IMAGE074
and updates the network parameters using a random gradient descent method.
CN202111457536.2A 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching Active CN113869463B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111457536.2A CN113869463B (en) 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111457536.2A CN113869463B (en) 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching

Publications (2)

Publication Number Publication Date
CN113869463A true CN113869463A (en) 2021-12-31
CN113869463B CN113869463B (en) 2022-04-15

Family

ID=78985557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111457536.2A Active CN113869463B (en) 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching

Country Status (1)

Country Link
CN (1) CN113869463B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863193A (en) * 2022-07-07 2022-08-05 之江实验室 Long-tail learning image classification and training method and device based on mixed batch normalization
CN115423031A (en) * 2022-09-20 2022-12-02 腾讯科技(深圳)有限公司 Model training method and related device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516207A (en) * 2021-09-10 2021-10-19 之江实验室 Long-tail distribution image classification method with noise label

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516207A (en) * 2021-09-10 2021-10-19 之江实验室 Long-tail distribution image classification method with noise label

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GORKEM ALGAN 等: "《Label Noise Types and Their Effects on Deep Learning》", 《ARXIV:2003.10471V1[CS.CV]》 *
MUHAMMAD ABDULLAH JAMAL 等: "《Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective》", 《ARXIV:2003.10780V1[CS.CV]》 *
陈庆强 等: "《基于数据分布的标签噪声过滤方法》", 《第六届中国计算机学会大数据学术会议》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863193A (en) * 2022-07-07 2022-08-05 之江实验室 Long-tail learning image classification and training method and device based on mixed batch normalization
CN115423031A (en) * 2022-09-20 2022-12-02 腾讯科技(深圳)有限公司 Model training method and related device

Also Published As

Publication number Publication date
CN113869463B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN109101938B (en) Multi-label age estimation method based on convolutional neural network
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN109086799A (en) A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet
CN112396097B (en) Unsupervised domain self-adaptive visual target detection method based on weighted optimal transmission
CN111239137B (en) Grain quality detection method based on transfer learning and adaptive deep convolution neural network
CN113673482B (en) Cell antinuclear antibody fluorescence recognition method and system based on dynamic label distribution
CN113902944A (en) Model training and scene recognition method, device, equipment and medium
CN108596204B (en) Improved SCDAE-based semi-supervised modulation mode classification model method
CN113869463B (en) Long tail noise learning method based on cross enhancement matching
CN111144462B (en) Unknown individual identification method and device for radar signals
CN116912568A (en) Noise-containing label image recognition method based on self-adaptive class equalization
CN113313179B (en) Noise image classification method based on l2p norm robust least square method
CN117173494B (en) Noise-containing label image recognition method and system based on class balance sample selection
CN117197474B (en) Noise tag learning method based on class equalization and cross combination strategy
CN113657473A (en) Web service classification method based on transfer learning
CN116152612B (en) Long-tail image recognition method and related device
CN114863193B (en) Long-tail learning image classification and training method and device based on mixed batch normalization
CN109376619A (en) A kind of cell detection method
CN116486150A (en) Uncertainty perception-based regression error reduction method for image classification model
CN116152194A (en) Object defect detection method, system, equipment and medium
CN116091835A (en) Regularization combined autonomous training-based field self-adaptive image classification method
CN114743042A (en) Longjing tea quality identification method based on depth features and TrAdaBoost
CN115272688A (en) Small sample learning image classification method based on meta-features
CN111797903B (en) Multi-mode remote sensing image registration method based on data-driven particle swarm optimization
CN113762382A (en) Model training and scene recognition method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant