CN113869463B - Long tail noise learning method based on cross enhancement matching - Google Patents

Long tail noise learning method based on cross enhancement matching Download PDF

Info

Publication number
CN113869463B
CN113869463B CN202111457536.2A CN202111457536A CN113869463B CN 113869463 B CN113869463 B CN 113869463B CN 202111457536 A CN202111457536 A CN 202111457536A CN 113869463 B CN113869463 B CN 113869463B
Authority
CN
China
Prior art keywords
data
enhancement
noise
sample
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111457536.2A
Other languages
Chinese (zh)
Other versions
CN113869463A (en
Inventor
程乐超
茅一宁
苏慧
冯尊磊
宋明黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111457536.2A priority Critical patent/CN113869463B/en
Publication of CN113869463A publication Critical patent/CN113869463A/en
Application granted granted Critical
Publication of CN113869463B publication Critical patent/CN113869463B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a long-tail noise learning method based on cross enhancement matching, which is used for solving the problem of image classification with long-tail characteristics and noise labels at the same time. According to the data noise characteristics, the method screens noise samples by matching prediction results respectively obtained by weak enhancement data and strong enhancement data, and introduces a noise-removing regularization measure to eliminate the influence of the identified noise samples. For data long-tail features, the method implements a new online prior distribution-based prediction penalty to avoid bias on the head class. The method is simple and convenient to implement, flexible in means and superior in the aspect of obtaining the class fitting degree in real time, so that the method achieves remarkable classification effect improvement on long-tail data, noise data and training data with the characteristics of the long-tail data and the noise data.

Description

Long tail noise learning method based on cross enhancement matching
Technical Field
The invention relates to the field of image classification, in particular to a method for classifying images under the condition that a noise label and long tail distribution exist simultaneously.
Background
In recent years, Convolutional Neural Networks (CNNs) have been widely used in the field of computer vision. In the case of a fixed amount of training data, the overfitting phenomenon is increasingly prominent due to the increase of the number of parameters, and the requirement for accurately labeling data is also increased in order to improve the overall performance. However, obtaining a large number of accurately labeled samples is often quite expensive. In this regard, non-expert crowd-sourcing or systematic tagging is a practical solution, however this easily leads to mislabeling of tags. Many reference datasets, such as ImageNet, CIFAR-10/-100, MNIST, QuickDraw, etc., contain 3% to 10% noise label samples. Existing research on noisy labels has generally focused on splitting correctly labeled and incorrectly labeled samples, but neglecting the distribution of the data. In the real world, data often presents the characteristic of long tail distribution, several main categories in the data set occupy the dominance, and the data of other categories is insufficient in quantity. Therefore, it is very important to study how to train the model on the data set with both long tail distribution and label noise as shown in fig. 1 in practical application.
Noise label learning has received a lot of attention in recent years and also has achieved surprising effects. Because the convolutional neural network can learn a simple general mode of real data before fitting a noise sample in training, most of the existing methods adopt a cross entropy loss function to fit a model prediction result and a data label. However, in a data set with a long tail distribution, since the training data is dominated by the head class, cross entropy loss is difficult to distinguish between correct and incorrect samples of the tail class. For the long-tail image classification task, a series of data re-balancing (re-balancing) based strategies such as re-weighting and resampling balance the training data based on the number of samples of the class. However, in the presence of label noise, the number of samples for each class is unknown, and the number of samples does not reflect the degree of real-time fit of the class. Based on the above analysis, the existing deep neural network CNN still has no effective solution for the data set having both long tail features and tag noise.
Disclosure of Invention
In order to solve the defects of the prior art and avoid the problems that a co-training strategy (co-training) of noise data is difficult to distinguish correct and wrong samples of tail categories on long-tail data, and a data rebalancing strategy of long-tail classification has a poor classification effect on noisy data, the invention adopts the following technical scheme:
a long tail noise learning method based on cross enhancement matching comprises the following steps:
step S1: according to the data noise characteristics, respectively adopting a weak data enhancement strategy and a strong data enhancement strategy for each sample image, carrying out cross-enhancement matching (cross-enhancement matching) on the prediction results of the weak enhancement data and the strong enhancement data, and improving cross entropy loss into cross-enhancement matching loss for screening noise samples;
step S2: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and different parameters are respectively used for model training on feature graphs of the weak enhancement data and the strong enhancement data;
step S3: for the large loss sample screened by cross enhancement matching, the large loss sample is used as a noise sample with high confidence level, and a regularization measure for eliminating noise is used to eliminate the negative influence of the noise sample with high confidence level on model training;
step S4: for the head class classification advantages caused by data long-tail features, the classification prior probability is evaluated from online prediction, so that the class fitting degree is truly reflected, and the prediction penalty based on online prior distribution (online prior distribution) is used for smoothing the prediction result of the head class;
step S5: according to the data noise characteristics, a staged training strategy is used, and only the cross entropy loss and the online prior distribution loss of weak enhancement data are calculated in a preheating stage; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
Further, the method can be used for preparing a novel materialIn step S1, a
Figure 251904DEST_PATH_IMAGE001
A sample image and
Figure 637886DEST_PATH_IMAGE002
training data set for individual image classes
Figure 11098DEST_PATH_IMAGE003
Figure 875149DEST_PATH_IMAGE004
In order to be the image of the sample,
Figure 548576DEST_PATH_IMAGE005
for noisy labels (i.e. for
Figure 839880DEST_PATH_IMAGE006
Not necessarily correct),
Figure 201853DEST_PATH_IMAGE007
representing the prediction results of the classification model, wherein
Figure 869595DEST_PATH_IMAGE008
As a result of the network parameters,
Figure 131949DEST_PATH_IMAGE009
in order to be a function of the mapping,
Figure 922051DEST_PATH_IMAGE010
the dimension of expression isKIs determined according to the following cross-enhancement matching loss function
Figure 145222DEST_PATH_IMAGE004
Whether it is a correctly labeled sample:
Figure 475709DEST_PATH_IMAGE011
Figure 467936DEST_PATH_IMAGE012
wherein
Figure 989790DEST_PATH_IMAGE013
Figure 965837DEST_PATH_IMAGE014
Respectively weak enhancement data and strong enhancement data,
Figure 834435DEST_PATH_IMAGE015
for sample images
Figure 946748DEST_PATH_IMAGE004
The prediction of the class result with the highest confidence level,
Figure 875390DEST_PATH_IMAGE016
Figure 401049DEST_PATH_IMAGE017
is shown asiAn image of a sample
Figure 948705DEST_PATH_IMAGE004
Is marked with a labelkThe degree of confidence of (a) is,
Figure 541623DEST_PATH_IMAGE018
the weight parameter is represented by a weight value,
Figure 47690DEST_PATH_IMAGE019
is composed of
Figure 591804DEST_PATH_IMAGE006
T is a transposed symbol.
Further, in step S1, the cross enhancement matching loss is smaller than the OTSU (otosu, an algorithm for determining the image binarization segmentation threshold value) threshold value
Figure 943151DEST_PATH_IMAGE020
The data of (2) is recognized as correct data, and the correct data set
Figure 889110DEST_PATH_IMAGE021
In the training phase, only the sets are used
Figure 300500DEST_PATH_IMAGE022
The data in (1) calculates the cross-enhancement matching loss, i.e. the total loss is expressed as:
Figure 331910DEST_PATH_IMAGE023
further, in the step S2, in order to avoid a negative effect of the sample data difference caused by the weak data enhancement and strong data enhancement strategies on the feature extraction, a dual-branch batch processing standardization method is adopted, specifically, a weak enhancement data difference is subjected to a batch processing standardization process
Figure 47800DEST_PATH_IMAGE013
And strong enhancement data
Figure 989211DEST_PATH_IMAGE014
Calculating different mean and variance according to exponential moving average accumulation
Figure 696136DEST_PATH_IMAGE024
Figure 355787DEST_PATH_IMAGE025
Figure 173571DEST_PATH_IMAGE026
Wherein
Figure 703909DEST_PATH_IMAGE027
Is a constant number of times, and is,
Figure 581735DEST_PATH_IMAGE028
the number of sample images in a batch (batch) is shown, and the normalized output of the batch is
Figure 292465DEST_PATH_IMAGE029
Figure 789305DEST_PATH_IMAGE030
Wherein
Figure 564363DEST_PATH_IMAGE031
Figure 222877DEST_PATH_IMAGE032
Are all characteristic graphs of the middle layer of the neural network,
Figure 716176DEST_PATH_IMAGE031
the neural network batches the layer inputs representing weak enhancement inputs,
Figure 751128DEST_PATH_IMAGE033
which represents a weakly enhanced mean value, is,
Figure 646271DEST_PATH_IMAGE034
the variance of the weak enhancement is indicated,
Figure 741266DEST_PATH_IMAGE032
representing a strong enhancement input a neural network batches the layer inputs,
Figure 243833DEST_PATH_IMAGE035
which represents the average of the strong enhancement,
Figure 410372DEST_PATH_IMAGE036
it is meant that the variance is strongly enhanced,
Figure 769810DEST_PATH_IMAGE037
Figure 160340DEST_PATH_IMAGE038
Figure 238017DEST_PATH_IMAGE039
Figure 4985DEST_PATH_IMAGE040
are all learnable radiation parameters.
Further, in the step S2, in the training phase, a set of batch processing parameters is trained on the weak enhancement data and the strong enhancement data respectively; in the testing stage, only the weak data enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong data enhancement strategy are discarded.
Further, in step S3, according to the loss of the cross-enhancing matching, the filtered noise sample set is used as a high confidence error data set:
Figure 484508DEST_PATH_IMAGE041
wherein the set
Figure 812983DEST_PATH_IMAGE042
Is restricted to
Figure 377957DEST_PATH_IMAGE043
Figure 886298DEST_PATH_IMAGE044
Is a constant;
taking the screened large loss sample as a noise sample with high confidence, regularizing a network model through a noise-removing regularization measure, and aligning a set
Figure 344962DEST_PATH_IMAGE045
Each sample image in (1)
Figure 952660DEST_PATH_IMAGE046
Which belong to a specific class
Figure 395143DEST_PATH_IMAGE047
Using the following regularization term constraint set
Figure 113700DEST_PATH_IMAGE045
Preventing the prediction result from fitting a wrong noise label:
Figure 190984DEST_PATH_IMAGE048
wherein the content of the first and second substances,
Figure 297481DEST_PATH_IMAGE049
is shown asjAn image of a sample
Figure 837046DEST_PATH_IMAGE046
Is marked with a labely j The confidence of (c).
Further, in step S4, the classification prior probability is evaluated from the online prediction
Figure 483928DEST_PATH_IMAGE050
The prior probability dynamic evaluation of each category is:
Figure 792550DEST_PATH_IMAGE051
wherein
Figure 132264DEST_PATH_IMAGE052
Is a constant number of times, and is,
Figure 893547DEST_PATH_IMAGE053
initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.
Figure 845585DEST_PATH_IMAGE054
Figure 8713DEST_PATH_IMAGE055
Represents that the number of samples is
Figure 457012DEST_PATH_IMAGE001
In the training data of (2), each class is
Figure 187814DEST_PATH_IMAGE050
Number of training samples of
Figure 442078DEST_PATH_IMAGE056
Further, in the step S4, the penalty of prediction based on the online prior distribution
Figure 459712DEST_PATH_IMAGE057
And is used for smoothing the labels according to prior distribution, so that the labels with higher prior probability are more strongly smoothed, thereby enhancing the optimization of the tail category, and the specific formula is as follows:
Figure 141230DEST_PATH_IMAGE058
wherein the content of the first and second substances,
Figure 205000DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 872742DEST_PATH_IMAGE053
the probability of a priori is represented and,
Figure 636561DEST_PATH_IMAGE059
representing the prediction results of the classification model, wherein
Figure 364346DEST_PATH_IMAGE008
As a result of the network parameters,
Figure 915413DEST_PATH_IMAGE009
is a mapping function.
Further, in the step S4, the prediction penalty based on the online prior distribution is determined
Figure 245900DEST_PATH_IMAGE057
Adding a cross entropy loss function
Figure 238127DEST_PATH_IMAGE060
Then obtaining:
Figure 261446DEST_PATH_IMAGE061
wherein
Figure 971913DEST_PATH_IMAGE062
Is a constant weighting coefficient; loss function
Figure 604627DEST_PATH_IMAGE063
Is converted into the following form:
Figure 716939DEST_PATH_IMAGE064
Figure 848843DEST_PATH_IMAGE065
wherein the content of the first and second substances,
Figure 171240DEST_PATH_IMAGE005
in the form of a noisy tag or label,
Figure 718896DEST_PATH_IMAGE017
is shown asiAn image of a sample
Figure 75928DEST_PATH_IMAGE004
Is marked with a labelkThe degree of confidence of (a) is,
Figure 50837DEST_PATH_IMAGE066
is shown as
Figure 361995DEST_PATH_IMAGE050
Prior probability of individual classes.
Further, in the step S5, the training is divided into a preheating stage and a formal stage, and the two stages respectively calculate the loss and update the parameters, including the following steps:
step S5.1, in the preheating stage, cross entropy loss and online prior distribution are calculated by using weak enhancement data:
Figure 713342DEST_PATH_IMAGE067
wherein the content of the first and second substances,
Figure 862564DEST_PATH_IMAGE060
represents a cross-entropy loss function of the entropy of the sample,
Figure 398587DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 305363DEST_PATH_IMAGE068
indicating that the data is weakly enhanced data,
Figure 585035DEST_PATH_IMAGE006
a tag that represents the weakly enhanced data,
Figure 526446DEST_PATH_IMAGE069
represents the predictive penalty of an online prior distribution computed over weak enhancement data,
Figure 466327DEST_PATH_IMAGE062
in order to be a constant weighting factor,
Figure 188295DEST_PATH_IMAGE070
is a training data set;
step S5.2, in the formal training stage, the cross enhancement matching loss is combined
Figure 147024DEST_PATH_IMAGE071
Noise rejection regularization loss
Figure 67576DEST_PATH_IMAGE072
And an online prior distribution prediction penalty term for weak enhancement data and strong enhancement data
Figure 820768DEST_PATH_IMAGE073
Screening out correct data set
Figure 30032DEST_PATH_IMAGE022
And high confidence error data set
Figure 887392DEST_PATH_IMAGE045
The total loss is functionalized as:
Figure 537816DEST_PATH_IMAGE074
and updates the network parameters using a random gradient descent method (SGD).
The invention has the advantages and beneficial effects that:
according to the method, according to data noise characteristics, noise samples are screened by matching prediction results respectively obtained by weak enhancement data and strong enhancement data, a noise-rejection regularization measure (leave-noise-out regularization) is introduced to eliminate the influence of the identified noise samples, and a new prediction penalty based on online prior distribution (online prior distribution) is implemented by the method aiming at data long tail characteristics to avoid bias of head types.
Drawings
FIG. 1 is a schematic of a data set with both long tail distribution and tag noise.
Fig. 2 is a flow chart of the method of the present invention.
FIG. 3a is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 586544DEST_PATH_IMAGE075
Figure 220788DEST_PATH_IMAGE076
)。
FIG. 3b is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 380373DEST_PATH_IMAGE075
Figure 885304DEST_PATH_IMAGE077
)。
FIG. 3c is a graph of the variation of the test accuracy of other methods of the present invention: (
Figure 104933DEST_PATH_IMAGE075
Figure 787325DEST_PATH_IMAGE078
)。
FIG. 3d is a graph of the variation of the test accuracy of other methods of the present invention: (
Figure 360389DEST_PATH_IMAGE075
Figure 375618DEST_PATH_IMAGE079
)。
FIG. 3e is a graph of the variation of the test accuracy of other methods of the present invention: (
Figure 641514DEST_PATH_IMAGE075
Figure 843825DEST_PATH_IMAGE080
)。
FIG. 4a is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 814055DEST_PATH_IMAGE081
Figure 293578DEST_PATH_IMAGE082
)。
FIG. 4b is a graph of the variation of the test accuracy of other methods of the present invention (
Figure 356475DEST_PATH_IMAGE081
Figure 921448DEST_PATH_IMAGE083
)。
FIG. 4c is a graph of the change in test accuracy of other methods of the present invention: (
Figure 960948DEST_PATH_IMAGE081
Figure 826136DEST_PATH_IMAGE084
)。
FIG. 4d is a graph of the change in test accuracy of other methods of the present invention (
Figure 558469DEST_PATH_IMAGE081
Figure 610738DEST_PATH_IMAGE085
)。
FIG. 4e is a graph of the change in test accuracy of other methods of the present invention: (
Figure 657192DEST_PATH_IMAGE081
Figure 734476DEST_PATH_IMAGE086
)。
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1 and 2, a long tail noise learning method based on cross enhanced matching includes the following steps:
the method comprises the following steps: according to the data noise characteristics, a weak data enhancement strategy and a strong data enhancement strategy are respectively adopted for each sample, cross-enhancement matching (cross-entropy matching) is carried out on the prediction results of weak enhancement data and strong enhancement, and cross entropy loss is improved into cross-enhancement matching loss for screening the noise samples.
The invention relates to two data enhancement strategies, namely weak data enhancement and strong data enhancement. The implementation of weak data enhancement (weak augmentation) is simple random flip (flip) and crop (crop), while strong data enhancement (strong augmentation) uses the implementation of AutoAutoAutoAutoaugmentation and adopts a data enhancement strategy automatically selected by a search algorithm on ImageNet.
Is given with
Figure 513076DEST_PATH_IMAGE001
A sample image and
Figure 442855DEST_PATH_IMAGE002
training data set for individual image class labels
Figure 965103DEST_PATH_IMAGE003
Figure 398358DEST_PATH_IMAGE004
In order to be the image of the sample,
Figure 347860DEST_PATH_IMAGE005
for noisy labels (i.e. for
Figure 233776DEST_PATH_IMAGE006
Not necessarily correct), the prediction result of the classification model is defined as
Figure 389076DEST_PATH_IMAGE007
Wherein
Figure 552204DEST_PATH_IMAGE008
As a result of the network parameters,
Figure 797241DEST_PATH_IMAGE087
in order to be a function of the mapping,
Figure 576978DEST_PATH_IMAGE010
the dimension of expression isKIs determined according to the following cross-correlation matching loss function
Figure 565663DEST_PATH_IMAGE004
Whether it is a correctly labeled sample:
Figure 583297DEST_PATH_IMAGE011
Figure 264814DEST_PATH_IMAGE088
wherein
Figure 689DEST_PATH_IMAGE013
Figure 229283DEST_PATH_IMAGE014
Respectively a weakly enhanced and a strongly enhanced sample,
Figure 491637DEST_PATH_IMAGE015
is composed of
Figure 219422DEST_PATH_IMAGE004
The prediction of the class result with the highest confidence level,
Figure 301647DEST_PATH_IMAGE016
Figure 773080DEST_PATH_IMAGE017
is shown asiLabeling of individual sample imageskThe degree of confidence of (a) is,
Figure 889940DEST_PATH_IMAGE018
the weight parameter is represented by a weight value,
Figure 788626DEST_PATH_IMAGE019
is composed of
Figure 125192DEST_PATH_IMAGE006
T is a transposed symbol.
The cross enhancement matching loss is smaller than OTSU (Otsu method), an algorithm for determining image binary segmentation threshold value
Figure 134736DEST_PATH_IMAGE020
The data of (2) is recognized as correct data, and the correct data set
Figure 106103DEST_PATH_IMAGE021
In the training phase, only the sets are used
Figure 503586DEST_PATH_IMAGE022
The data in (1) calculates the cross-enhancement matching loss, i.e., the total loss can be expressed as:
Figure 435770DEST_PATH_IMAGE089
step two: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a dual-branch batch normalization method is adopted, and different parameters are respectively used for model training of feature maps of weak enhancement and strong enhancement data.
In order to avoid the negative influence of sample data difference on feature extraction caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and specifically, the weak enhancement data is subjected to batch processing standardization in the process of batch processing standardization
Figure 373639DEST_PATH_IMAGE013
And strong enhancement data
Figure 340458DEST_PATH_IMAGE014
Different mean and variance are cumulatively calculated based on exponential moving averages (EMA in FIG. 2), respectively
Figure 961974DEST_PATH_IMAGE024
Figure 912612DEST_PATH_IMAGE090
Figure 388593DEST_PATH_IMAGE091
Wherein
Figure 537815DEST_PATH_IMAGE027
Is a constant number of times, and is,
Figure 683625DEST_PATH_IMAGE028
representing the number of samples in a batch (batch), the normalized output of the batch being
Figure 980614DEST_PATH_IMAGE029
Figure 135652DEST_PATH_IMAGE030
Wherein
Figure 437583DEST_PATH_IMAGE031
Figure 19874DEST_PATH_IMAGE032
Are all characteristic graphs of the middle layer of the neural network,
Figure 741842DEST_PATH_IMAGE031
the neural network batches the layer inputs representing weak enhancement inputs,
Figure 559626DEST_PATH_IMAGE033
which represents a weakly enhanced mean value, is,
Figure 355543DEST_PATH_IMAGE034
the variance of the weak enhancement is indicated,
Figure 233369DEST_PATH_IMAGE032
representing a strong enhancement input a neural network batches the layer inputs,
Figure 114738DEST_PATH_IMAGE035
which represents the average of the strong enhancement,
Figure 438010DEST_PATH_IMAGE036
it is meant that the variance is strongly enhanced,
Figure 213067DEST_PATH_IMAGE037
Figure 402740DEST_PATH_IMAGE038
Figure 896039DEST_PATH_IMAGE039
Figure 258887DEST_PATH_IMAGE040
are all learnable radiation parameters.
In the training stage, respectively training a group of batch processing parameters for the weak enhancement data and the strong enhancement data; in the testing stage, only the weak enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong enhancement strategy are discarded.
Step three: regarding the large loss samples screened by the cross-enhancement matching as noise samples with high confidence, a regularization measure (LNOR in fig. 2) for eliminating noise is used to eliminate the negative influence of the samples on the model training.
For large loss samples screened by cross enhancement matching, defining a high-confidence error data set:
Figure 763817DEST_PATH_IMAGE092
wherein the set
Figure 750490DEST_PATH_IMAGE042
Is restricted to
Figure 340872DEST_PATH_IMAGE043
Figure 304148DEST_PATH_IMAGE044
Is a constant.
The selected high loss samples are considered as high confidence noise samples for regularization by a noise rejectionMeasure regularization network model optimization, specifically, on-set
Figure 929165DEST_PATH_IMAGE045
Each sample in (1)
Figure 319695DEST_PATH_IMAGE046
Assume it belongs to a particular sample class
Figure 397372DEST_PATH_IMAGE047
Using the following regularization term constraint set
Figure 898761DEST_PATH_IMAGE045
Preventing the prediction result from fitting a wrong noise label:
Figure 378284DEST_PATH_IMAGE093
wherein the content of the first and second substances,
Figure 438250DEST_PATH_IMAGE049
is shown asjAn image of a sample
Figure 331120DEST_PATH_IMAGE046
Is marked with a labely j The confidence of (c).
Step four: aiming at the head class classification advantages caused by the long tail features of the data, a new online prior distribution (online prior distribution) -based prediction penalty is implemented to smooth the prediction result of the head class.
Aiming at the head class classification advantages caused by the data long-tail characteristics, the classification prior probability is evaluated from online prediction to truly reflect class fitting degree
Figure 308303DEST_PATH_IMAGE050
The prior probability dynamic evaluation of each category is:
Figure 235808DEST_PATH_IMAGE094
wherein
Figure 843507DEST_PATH_IMAGE052
Is a constant number of times, and is,
Figure 754831DEST_PATH_IMAGE053
initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.
Figure 738967DEST_PATH_IMAGE054
Figure 819181DEST_PATH_IMAGE055
Represents that the number of samples is
Figure 597781DEST_PATH_IMAGE001
In the training data of (2), each class is
Figure 996402DEST_PATH_IMAGE050
Number of training samples of
Figure 784229DEST_PATH_IMAGE056
Predictive punishment based on online prior distribution
Figure 217484DEST_PATH_IMAGE057
The method is used for smoothing the label according to prior distribution, so that the label with higher prior probability obtains stronger smoothing, thereby enhancing the optimization of the tail category, and the specific definition is as follows:
Figure 166986DEST_PATH_IMAGE095
wherein the content of the first and second substances,
Figure 318482DEST_PATH_IMAGE057
and (4) showing.
Adding the prediction punishment based on the online prior distribution into a cross entropy loss function to obtain the result
Figure 113262DEST_PATH_IMAGE061
Wherein
Figure 430718DEST_PATH_IMAGE062
For a constant weighting factor, the above-mentioned loss function can be converted into the following form:
Figure 551120DEST_PATH_IMAGE096
Figure 924333DEST_PATH_IMAGE097
wherein the content of the first and second substances,
Figure 788384DEST_PATH_IMAGE066
is shown as
Figure 196231DEST_PATH_IMAGE050
Prior probability of individual classes.
Step five: a phased training strategy is used based on the data noise signature. In the preheating stage, only the cross entropy loss and the online prior distribution loss of the weak enhancement data are calculated; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
The training is divided into a preheating stage and a formal stage, and the loss and the parameters are calculated and updated in the following modes respectively:
step 5.1, in the preheating stage, only weak enhancement data is used for calculating cross entropy loss and online prior distribution, namely:
Figure 487535DEST_PATH_IMAGE098
step 5.2, combining cross enhancement matching loss in the formal training stage
Figure 348044DEST_PATH_IMAGE071
Noise rejection regularization loss
Figure 15786DEST_PATH_IMAGE072
And an online prior distribution prediction penalty term for weak enhancement data and strong enhancement data
Figure 779605DEST_PATH_IMAGE073
Screening out correct data set
Figure 241810DEST_PATH_IMAGE022
And high confidence error data set
Figure 792877DEST_PATH_IMAGE045
The total loss function is defined as:
Figure 388943DEST_PATH_IMAGE099
and updates the network parameters using a random gradient descent method (SGD).
And in the prediction stage, the sample image to be predicted is input into a model trained by adopting a long tail noise learning method based on cross enhancement matching, and a classification result is output.
The invention defines the sample number ratio between the most sample number category and the least sample number category as an unbalance factor (imbalance factor)
Figure 381170DEST_PATH_IMAGE100
I.e. by
Figure 404490DEST_PATH_IMAGE101
. The type of long tail data distribution used in the experiments of the present invention is exponential decay distribution.
For setting of noise data, there are two cases, i.e., class-independent noise (class-independent noise) and classClass-dependent noise (noise). The category-independent noise assumes that the mislabeled samples are randomly and uniformly distributed, and the category-dependent noise focuses on the phenomenon of artificial labeling error caused by visual similarity. The invention defines the error probability of the sample label as noise rate (noise rate)
Figure 675809DEST_PATH_IMAGE102
. For class independent noise, each class has a label
Figure 809987DEST_PATH_IMAGE102
The probability of (d) is labeled as other arbitrary categories by random errors; for class-dependent noise, there are labels for every two classes
Figure 656720DEST_PATH_IMAGE102
The probability of (d) is labeled as the opposite party category.
The method uses a Pythrch frame to carry out experiments, uses a CIFAR-10 data set, uses ResNet-32 as a network model, and uses an SGD optimizer with an initial learning rate of 0.05 and a cosine annealing scheduler. 100 iterations of training are set in both training phases, batch size 128, parameters
Figure 850941DEST_PATH_IMAGE103
All experiments of the invention are trained from zero.
As shown in FIGS. 3a-e, by CE, Coteaching +, JoCoR, Comatring and the method of the present invention, on CIFAR-10 data set, using ResNet-32 network to perform the test accuracy change of long tail distributed noise sample learning, it can be seen that the accuracy of the method of the present invention is superior to that of other methods, wherein the symmetric noise rate
Figure 783125DEST_PATH_IMAGE075
Imbalance factor
Figure 720994DEST_PATH_IMAGE104
As shown in fig. 4a-e, by CE, LDAM, Mixup, MisLAS and the method of the invention, the ResNet-32 network is used on the CIFAR-10 data set to carry out the test accuracy change of the long tail distributed noise sample learning, and the accuracy of the method of the invention is superior to that of other methods, wherein the imbalance factor
Figure 422234DEST_PATH_IMAGE081
Asymmetric noise ratio
Figure 288821DEST_PATH_IMAGE105
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A long tail noise learning method based on cross enhancement matching is characterized by comprising the following steps:
step S1: according to the data noise characteristics, a weak data enhancement strategy and a strong data enhancement strategy are respectively adopted for each sample image;
step S2: aiming at sample data difference caused by weak data enhancement and strong data enhancement strategies, a double-branch batch processing standardization method is adopted, and different parameters are respectively used for model training on feature graphs of the weak enhancement data and the strong enhancement data;
step S3: carrying out cross enhancement matching on the prediction results of the weak enhancement data and the strong enhancement data to screen a noise sample, and eliminating the negative influence of the noise sample on model training by using a noise elimination regularization measure on the noise sample screened by the cross enhancement matching;
step S4: for the head class classification advantages caused by the long tail characteristics of the data, estimating the classification prior probability from online prediction, and based on the prediction penalty of online prior distribution, smoothing the prediction result of the head class;
step S5: according to the data noise characteristics, a staged training strategy is used, and only the cross entropy loss and the online prior distribution loss of weak enhancement data are calculated in a preheating stage; in the formal training stage, the cross enhancement matching loss and the online prior distribution loss of the weak enhancement data and the strong enhancement data are calculated, and a regularization measure for eliminating noise is added.
2. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S3, given is
Figure 756150DEST_PATH_IMAGE001
A sample image and
Figure 707926DEST_PATH_IMAGE002
training data set for individual image classes
Figure 711654DEST_PATH_IMAGE003
Figure 621841DEST_PATH_IMAGE004
In order to be the image of the sample,
Figure 343809DEST_PATH_IMAGE005
in the form of a noisy tag or label,
Figure 364855DEST_PATH_IMAGE006
representing the prediction results of the classification model, wherein
Figure 223089DEST_PATH_IMAGE007
As a result of the network parameters,
Figure 38599DEST_PATH_IMAGE008
in order to be a function of the mapping,
Figure 982284DEST_PATH_IMAGE009
the dimension of expression isKIs determined according to the following cross-enhancement matching loss function
Figure 541441DEST_PATH_IMAGE004
Whether it is a correctly labeled sample:
Figure 519762DEST_PATH_IMAGE010
Figure 243523DEST_PATH_IMAGE011
wherein
Figure 940083DEST_PATH_IMAGE012
Figure 37352DEST_PATH_IMAGE013
Respectively weak enhancement data and strong enhancement data,
Figure 135758DEST_PATH_IMAGE014
for sample images
Figure 293070DEST_PATH_IMAGE004
The prediction of the class result with the highest confidence level,
Figure 476927DEST_PATH_IMAGE015
Figure 377887DEST_PATH_IMAGE016
is shown asiAn image of a sample
Figure 799641DEST_PATH_IMAGE004
To (1) akThe confidence level of the seed class or classes,
Figure 924591DEST_PATH_IMAGE017
the weight parameter is represented by a weight value,
Figure 64586DEST_PATH_IMAGE018
is composed of
Figure 769237DEST_PATH_IMAGE019
T is a transposed symbol.
3. The method according to claim 2, wherein in step S1, the cross-enhancement matching loss is smaller than the OTSU threshold
Figure 311076DEST_PATH_IMAGE020
The data of (2) is recognized as correct data, and the correct data set
Figure 813120DEST_PATH_IMAGE021
In the training phase, only the sets are used
Figure 440411DEST_PATH_IMAGE022
The data in (1) calculates the cross-enhancement matching loss, i.e. the total loss is expressed as:
Figure 683173DEST_PATH_IMAGE023
4. the method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S2, a two-branch batch normalization method is used to normalize the weak correlation data
Figure 345099DEST_PATH_IMAGE012
And strong enhancement data
Figure 15115DEST_PATH_IMAGE013
Calculating different mean and variance according to exponential moving average accumulation
Figure 129701DEST_PATH_IMAGE024
Figure 176155DEST_PATH_IMAGE025
Figure 692587DEST_PATH_IMAGE026
Wherein
Figure 533504DEST_PATH_IMAGE027
Is a constant number of times, and is,
Figure 400966DEST_PATH_IMAGE028
representing the number of sample images in a batch, the normalized output of the batch being
Figure 985531DEST_PATH_IMAGE030
Figure 356469DEST_PATH_IMAGE031
Wherein
Figure 633867DEST_PATH_IMAGE032
Figure 457466DEST_PATH_IMAGE033
Are all characteristic graphs of the middle layer of the neural network,
Figure 102512DEST_PATH_IMAGE032
the neural network batches the layer inputs representing weak enhancement inputs,
Figure 327957DEST_PATH_IMAGE034
which represents a weakly enhanced mean value, is,
Figure 510677DEST_PATH_IMAGE035
the variance of the weak enhancement is indicated,
Figure 352731DEST_PATH_IMAGE033
representing a strong enhancement input a neural network batches the layer inputs,
Figure 279099DEST_PATH_IMAGE036
which represents the average of the strong enhancement,
Figure 359050DEST_PATH_IMAGE037
it is meant that the variance is strongly enhanced,
Figure 243830DEST_PATH_IMAGE038
Figure 42021DEST_PATH_IMAGE039
Figure 772080DEST_PATH_IMAGE040
Figure 706538DEST_PATH_IMAGE041
are all learnable radiation parameters.
5. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in the step S2, in the training phase, a set of batch processing parameters is trained on the weak correlation data and the strong correlation data respectively; in the testing stage, only the weak data enhancement strategy and the batch processing parameters of the weak enhancement data are used, and the batch processing parameters of the strong data enhancement strategy are discarded.
6. The method according to claim 3, wherein in step S3, the filtered noise sample set is used as the high confidence error data set according to the loss of the cross-matching enhancement:
Figure 496639DEST_PATH_IMAGE042
wherein the set
Figure 782127DEST_PATH_IMAGE043
Is restricted to
Figure 50297DEST_PATH_IMAGE044
Figure 104841DEST_PATH_IMAGE045
Is a constant;
taking the screened large loss sample as a noise sample with high confidence, regularizing a network model through a noise-removing regularization measure, and aligning a set
Figure 65844DEST_PATH_IMAGE046
Each sample image in (1)
Figure 903874DEST_PATH_IMAGE047
Which belong to a specific class
Figure 975736DEST_PATH_IMAGE048
Using the following regularization term constraint set
Figure 150365DEST_PATH_IMAGE046
Network prediction result of (1):
Figure 16690DEST_PATH_IMAGE049
wherein the content of the first and second substances,
Figure 542349DEST_PATH_IMAGE050
is shown asjAn image of a sample
Figure 152322DEST_PATH_IMAGE047
Is marked with a labely j The confidence of (c).
7. The method of claim 3, wherein in step S4, the class prior probability is evaluated from online prediction
Figure 181458DEST_PATH_IMAGE051
The prior probability dynamic evaluation of each category is:
Figure 218684DEST_PATH_IMAGE052
wherein
Figure 231639DEST_PATH_IMAGE053
Is a constant number of times, and is,
Figure 910882DEST_PATH_IMAGE054
initialisation to the ratio of the number of samples of a particular class to the total number of samples, i.e.
Figure 794525DEST_PATH_IMAGE055
Figure 268232DEST_PATH_IMAGE056
Represents that the number of samples is
Figure 505833DEST_PATH_IMAGE001
In the training data of (2), each class is the first
Figure 457609DEST_PATH_IMAGE051
Number of training samples of species, satisfy
Figure 461337DEST_PATH_IMAGE057
8. The method for learning long tail noise based on cross-correlation matching as claimed in claim 1, wherein in step S4, the penalty of prediction based on online prior distribution is penalized
Figure 840366DEST_PATH_IMAGE058
The method is used for smoothing the label according to prior distribution, so that the label with higher prior probability obtains stronger smoothing, and the specific formula is as follows:
Figure 296755DEST_PATH_IMAGE059
wherein the content of the first and second substances,
Figure 317801DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 176035DEST_PATH_IMAGE054
the probability of a priori is represented and,
Figure 991544DEST_PATH_IMAGE060
representing the prediction results of the classification model, wherein
Figure 935230DEST_PATH_IMAGE007
As a result of the network parameters,
Figure 494387DEST_PATH_IMAGE008
is a mapping function.
9. The method for learning long tail noise based on cross-correlation matching as claimed in claim 8, wherein in step S4, the prediction penalty based on online prior distribution is determined
Figure 207128DEST_PATH_IMAGE058
Adding a cross entropy loss function
Figure 193539DEST_PATH_IMAGE061
Then obtaining:
Figure 624520DEST_PATH_IMAGE062
wherein
Figure 987368DEST_PATH_IMAGE063
Is a constant weighting coefficient; loss function
Figure 557545DEST_PATH_IMAGE064
Is converted into the following form:
Figure 714857DEST_PATH_IMAGE065
Figure 633135DEST_PATH_IMAGE066
wherein the content of the first and second substances,
Figure 268515DEST_PATH_IMAGE005
in the form of a noisy tag or label,
Figure 221428DEST_PATH_IMAGE016
is shown asiAn image of a sample
Figure 549641DEST_PATH_IMAGE004
To (1) akThe confidence level of the seed class or classes,
Figure 689635DEST_PATH_IMAGE067
is shown as
Figure 394286DEST_PATH_IMAGE051
Prior probability of individual classes.
10. The method for learning long tail noise based on cross-correlation matching according to claim 1, wherein the training in step S5 is divided into a preheating stage and a formal stage, and the two stages respectively calculate the loss and update the parameters, including the following steps:
step S5.1, in the preheating stage, cross entropy loss and online prior distribution are calculated by using weak enhancement data:
Figure 936126DEST_PATH_IMAGE068
wherein the content of the first and second substances,
Figure 435240DEST_PATH_IMAGE061
represents a cross-entropy loss function of the entropy of the sample,
Figure 328110DEST_PATH_IMAGE004
a representation of the image of the sample is shown,
Figure 305293DEST_PATH_IMAGE069
indicating that the data is weakly enhanced data,
Figure 967219DEST_PATH_IMAGE019
a label representing the sample data is displayed,
Figure 637235DEST_PATH_IMAGE070
represents the predictive penalty of an online prior distribution computed over weakly enhanced data,
Figure 751821DEST_PATH_IMAGE063
in order to be a constant weighting factor,
Figure 547344DEST_PATH_IMAGE071
is a training data set;
step S5.2, in the formal training stage, the cross enhancement matching loss is combined
Figure 63776DEST_PATH_IMAGE072
Noise rejection regularization loss
Figure 170272DEST_PATH_IMAGE073
And an online prior distribution prediction penalty term for weak enhancement data and strong enhancement data
Figure 772155DEST_PATH_IMAGE074
Screening out correct data set
Figure 356720DEST_PATH_IMAGE022
And high confidence error data set
Figure 727658DEST_PATH_IMAGE046
The total loss function is:
Figure 739477DEST_PATH_IMAGE075
and updates the network parameters using a random gradient descent method.
CN202111457536.2A 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching Active CN113869463B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111457536.2A CN113869463B (en) 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111457536.2A CN113869463B (en) 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching

Publications (2)

Publication Number Publication Date
CN113869463A CN113869463A (en) 2021-12-31
CN113869463B true CN113869463B (en) 2022-04-15

Family

ID=78985557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111457536.2A Active CN113869463B (en) 2021-12-02 2021-12-02 Long tail noise learning method based on cross enhancement matching

Country Status (1)

Country Link
CN (1) CN113869463B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863193B (en) * 2022-07-07 2022-12-02 之江实验室 Long-tail learning image classification and training method and device based on mixed batch normalization

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516207A (en) * 2021-09-10 2021-10-19 之江实验室 Long-tail distribution image classification method with noise label

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516207A (en) * 2021-09-10 2021-10-19 之江实验室 Long-tail distribution image classification method with noise label

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Label Noise Types and Their Effects on Deep Learning》;Gorkem Algan 等;《arXiv:2003.10471v1[cs.CV]》;20200323;全文 *
《Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective》;Muhammad Abdullah Jamal 等;《arXiv:2003.10780v1[cs.CV]》;20200324;全文 *
《基于数据分布的标签噪声过滤方法》;陈庆强 等;《第六届中国计算机学会大数据学术会议》;20201130;全文 *

Also Published As

Publication number Publication date
CN113869463A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN107316061B (en) Deep migration learning unbalanced classification integration method
CN109101938B (en) Multi-label age estimation method based on convolutional neural network
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN109815979B (en) Weak label semantic segmentation calibration data generation method and system
CN114283287B (en) Robust field adaptive image learning method based on self-training noise label correction
CN111239137B (en) Grain quality detection method based on transfer learning and adaptive deep convolution neural network
CN113869463B (en) Long tail noise learning method based on cross enhancement matching
CN111144462B (en) Unknown individual identification method and device for radar signals
CN113657449A (en) Traditional Chinese medicine tongue picture greasy classification method containing noise labeling data
CN116385373A (en) Pathological image classification method and system combining stable learning and hybrid enhancement
CN116894985A (en) Semi-supervised image classification method and semi-supervised image classification system
CN116912568A (en) Noise-containing label image recognition method based on self-adaptive class equalization
CN113608223B (en) Single-station Doppler weather radar strong precipitation estimation method based on double-branch double-stage depth model
CN113902944A (en) Model training and scene recognition method, device, equipment and medium
CN111598580A (en) XGboost algorithm-based block chain product detection method, system and device
CN109376619A (en) A kind of cell detection method
CN116486150A (en) Uncertainty perception-based regression error reduction method for image classification model
CN116129185A (en) Fuzzy classification method for tongue-like greasy feature of traditional Chinese medicine based on collaborative updating of data and model
CN114445649A (en) Method for detecting RGB-D single image shadow by multi-scale super-pixel fusion
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception
CN114863193B (en) Long-tail learning image classification and training method and device based on mixed batch normalization
CN117173494B (en) Noise-containing label image recognition method and system based on class balance sample selection
CN114745231A (en) AI communication signal identification method and device based on block chain
CN116245866B (en) Mobile face tracking method and system
CN113313179B (en) Noise image classification method based on l2p norm robust least square method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant