CN115661549A - Fine-grained classification denoising training method based on prediction confidence - Google Patents
Fine-grained classification denoising training method based on prediction confidence Download PDFInfo
- Publication number
- CN115661549A CN115661549A CN202211452486.3A CN202211452486A CN115661549A CN 115661549 A CN115661549 A CN 115661549A CN 202211452486 A CN202211452486 A CN 202211452486A CN 115661549 A CN115661549 A CN 115661549A
- Authority
- CN
- China
- Prior art keywords
- prediction
- training
- sample
- samples
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 145
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012937 correction Methods 0.000 claims abstract description 24
- 238000013528 artificial neural network Methods 0.000 claims description 35
- 239000013598 vector Substances 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 239000000126 substance Substances 0.000 claims description 7
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 241001522296 Erithacus rubecula Species 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000010438 heat treatment Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 229910000831 Steel Inorganic materials 0.000 claims 1
- 239000010959 steel Substances 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 93
- 230000000875 corresponding effect Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000002372 labelling Methods 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a fine-grained classification denoising training method based on prediction confidence, which comprises the following steps of S1, utilizing all training samples to participate in preheating training, and recording the prediction results of each sample for a few times as a historical prediction set; s2, generating a normalized prediction confidence coefficient of each sample through a histogram generated by a historical prediction set; and S3, balancing weights of the sample labels and the sample prediction by adopting the normalized prediction confidence coefficient, and dynamically correcting the loss value. In the invention, the dynamic loss replaces the common cross entropy loss to distinguish the noise outside the distribution from other samples, so that the noise outside the distribution can be better removed; when the model is trained on the noisy data set, denoising training is carried out in a frame through loss correction and a global sample selection strategy, and the classification precision of the fine-grained visual recognition model is obviously improved.
Description
Technical Field
The invention relates to the technical field of fine-grained image classification, in particular to a fine-grained classification denoising training method based on prediction confidence.
Background
Noise in noisy data sets is generally divided into two categories: the first type of noise is intra-distribution noise, i.e., the true label of the sample itself belongs to the label set of the data set, but is mistakenly labeled as other labels of the data set; the second type of noise is out-of-distribution noise, where the true label of the sample is not in the label set of the dataset. The image content of the distributed external noise sample is only weakly associated with the label labeling condition but not in accordance with the labeling principle, and some or even no relationship exists. If a data set contains both types of noise, the data set is called an open-set noisy data set. The noisy data sets obtained under natural conditions are almost all open-set data sets, and closed-set data sets are rather rare.
The research community has proposed various ideas to deal with the problem of noise contained in the training data set. One class of methods is known as "loss correction" (loss correction) or label correction (label correction). The conventional practice of loss correction is to add some correction to the loss values during neural network model training to avoid over-fitting noise samples within the distribution. Some methods also correct the in-distribution noise in a form of learning a noise transfer matrix, but cannot simultaneously and correctly process the out-distribution noise, the effect on large-scale data is not ideal, the real label of the out-distribution noise is not in the label definition domain of the data set, and the label result of forcibly correcting the in-distribution noise sample by using the noise transfer matrix is difficult to obtain.
Disclosure of Invention
The invention provides a neural network denoising training method based on prediction confidence coefficient, and solves the problem that a fine-grained image classification model is difficult to train on a noisy data set.
In order to achieve the purpose, the invention provides the following technical scheme: a fine-grained classification denoising training method based on prediction confidence comprises the following steps:
s1, firstly, utilizing all training samples to participate in preheating training, and recording the prediction results of each sample for a few times as a historical prediction set;
s2, generating the normalized prediction confidence coefficient of each sample through a histogram generated by a historical prediction set, wherein the normalized prediction confidence coefficient is as follows:
s21, through the formula(6.1) calculating a histogram of the prediction labels with respect to the total prediction times;
s22, deducing the confidence of the correct label according to the historical prediction result;
s23, performing normalization operation on the basis of the cross entropy;
and S3, balancing weights of the sample labels and the sample prediction by adopting the normalized prediction confidence coefficient, and dynamically correcting the loss value.
Further, in S1, the training must first go through several rounds of preheating training at the beginning, and after the preheating training process is completed, each sample in the training set D is subjected to the preheating training ,Performing inference and obtaining a prediction result, N being the number of samples of the data set; the inference process respectively obtains Softmax probability distribution vectors from backbone network output formed by two convolutional neural networks, and then calculates prediction results(ii) a Note bookFor sample imagesMore recently, the development of new and more recently developed devicesHistorical prediction sequences of round trainingFor each sample imageThe prediction confidence of (2); dynamic correction loss based on prediction confidenceBalancing the result of label one-hot coding and prediction weight and neural network output to calculate cross entropy to obtain corrected loss value; in the training process, the samples with higher prediction confidence coefficient are selected to form a training sample set actually participating in training(ii) a ComprisesThe sample set of individual training examples update the fine-grained image classification neural network model with their corrected loss values.
Further, deep neural networks tend to fit clean and simple samples and then begin to adapt to difficult and noisy samples; use all training samples D beforeIn the round robin, a preheating strategy is used for training a target neural network; the cross-entropy loss formula used for training is:
wherein the content of the first and second substances,representing a sampleLabel of (2), cross entropy loss in equation (6.2)Updating the neural network in the preheating stage; in the formulaThe output vector representing the last softmax layer is calculated by equation (6.3):
in the formulaA mapping function representing a neural network is provided,is the output of the fully-connected layer preceding the last softmax layer, k is the number of classes of the dataset,for network parameters, individual sample imagesCorresponding inference resultThe formula for calculation is:
recording and updating each sample image in the training set during the whole training processIn the near vicinityPrediction in round training, note,Sample image for networkPredictive labels in the T-th (i.e., current) round of training.
Further, after the warm-up training phase is completed, a prediction is performed on the samples in the training sample D using the trained neural network and then a historical prediction sequence is established and updated(ii) a The PFL loss of all training samples in training sample D was calculated using the average equation (6.15); after training is completed, the number of discarded samples, namely the proportion of the noise samples which are judged to be out-of-distribution to the total number of samples, is controlled by using a ratio delta (%) in general, and PFL loss before ranking can be selectedForm a new training sample setPerforming an update of the neural network model, pre-heating a newly selected training sample set for each round of training after trainingThe generation process of (2) is shown in formula (6.5):
whereinRepresenting a sampleThe clean samples and the noise samples in the distribution are selected into a training set of the training of the round by a formula (6.5) in the global sample selection stage; to avoid false exclusion of useful samples, the samples with high loss based on the prediction stability metric are only excluded from the training set in the current round of training, but the normalized prediction confidence of all samples is recalculated in the next round of global sample selectionAnd updating the historical prediction sequenceThe process formula is as follows:
Furthermore, in S21, since the noise-containing training data has no correlation between the content of the distributed noise samples and the content of the clean samples and the content of the distributed noise samples, the prediction result of the distributed noise samples will be changed continuously during the early training process, so thatRepresenting a samplePredicting a sequence of outcomes in historyIs predicted as the frequency of tag j andk is the number of categories of the data set;can be calculated by equation (6.1):
wherein the content of the first and second substances,is the result of the prediction of the sample image,is the size of its historical prediction result set, i.e. the total number of predictions,i.e. a histogram of the prediction labels with respect to the total number of predictions.
Further, in S22, the frequency of occurrence of the sample label in the prediction history is statistically positively correlated with the probability that the label is a true label; the likelihood that such a prediction belongs to a true tag is defined as the "confidence" that the correct tag was inferred from historical predictions;
the concept of entropy is matched with the concept of confidence degree, and can be used for expressing each sample imageThe uncertainty of the prediction result of (1) is in the form of:
representing a samplePredicting a sequence of outcomes in historyThe frequency of the label y predicted as the highest frequency in the prediction history, the histogram characteristic of the prediction history reflects the uncertainty of the inference history to the attribution of the label, and the following formula can be used to describe the uncertainty of the prediction history when the uncertainty is the maximumThe minimum case is:
wherein k represents the total class number of the tags in the data set,represents the length of the prediction history sequence; in a commonly used network image fine-grained classification dataset, the label category number in the dataset is far greater than the length setting of the actual historical prediction record, i.e.(ii) a Therefore, the first and second electrodes are formed on the substrate,the calculation method of the historical prediction maximum uncertainty can be obtained, and the formula is as follows:
furthermore, in S23, the use of cross entropy is inconvenient, and the cross entropy itself has a lower bound, but the difference between the upper bounds of the cross entropy and the lower bounds of the cross entropy is very large under different conditions, so that the normalization operation needs to be performed on the basis of the cross entropy; given the maximum historical prediction uncertainty calculation method, one can defineFor normalizing the historical prediction uncertainty to make the value range constantTo facilitate measurement;see formula (6.10):
in summary, we can calculate and update the prediction confidence for each input sample according to the prediction history, and sample imageNormalized prediction confidence ofThe definition is shown in formula (6.11):
further, in S3, the conventional idea of processing the noise sample is to identify the noise sample by modeling the distribution of the noise, and generally adopt a "loss correction" method;
the basic loss function of the neural network adopts cross entropy, and the addition of additional targets is divided into two basic modes: one is a soft scheme and the other is a hard scheme; the detailed structure of the soft scheme is as follows:
q is a prediction vector output by the neural network, t is a noise label vector, L is the total label category number, and beta is a value range constant in the rangeThe hyper-parameter between, corresponding to the "soft solution", is a hard solution, changing the regression objective to a known oneIs known asIn the case of the maximum value of qMaximum posterior probability of, noteThe formal formula is:
after a suitable loss correction method is selected, the network parameters are updated using a stochastic gradient descent optimization tool.
Furthermore, the neural network is updated by using a loss function, so that the noise sample is quickly overfitted in the training process, thereby causing the performance to be reduced; it is therefore desirable to compensate for the loss value using both the label and the prediction to mitigate the tendency of the neural network to fit to noise, and when compensation is brought in, equation (6.2 can be rewritten as:
is a sample imageThe corresponding label is marked with a corresponding label,is the result of the prediction thereof,is the last softmax layer output defined by equation (6.3), and in general, the parametersAre often set to a fixed value, e.g.= 0.8 to statically balance the label and the prediction result.
Further, the normalized prediction confidence defined by equation (6.11) is usedDynamically determining the weight relationship between each sample label and the prediction result to achieve the purpose of dynamically correcting the loss value;
equation (6.14) can be rewritten to a dynamic loss based on prediction uncertainty by introducing equation (6.11), specifically:
wherein, the first and the second end of the pipe are connected with each other,using normalized prediction confidenceThe compensation degree of the current loss value is dynamically adjusted; equation (6.15) is referred to as dynamic penalty based on Prediction confidence, abbreviated as PFL penalty (Prediction Fidelity Loss).
Compared with the prior art, the invention has the beneficial effects that:
in the application, a dynamic loss based on prediction confidence coefficient is used for replacing a common cross entropy loss to distinguish the noise outside the distribution from other samples (clean samples and noise inside the distribution), so that the noise outside the distribution can be better removed; analyzing a historical prediction result and dynamically correcting a loss value by using a method for realizing the correction of noise in distribution and identifying noise outside the distribution according to the prediction confidence coefficient so as to achieve the aim of relieving the interference of the noise in the distribution on training; when a model is trained on a noisy data set, global sample selection is carried out by utilizing the prediction confidence coefficient, and the strategy is integrated into a simple and effective noisy data set fine-grained image classification training frame, so that denoising training is carried out in one frame through loss correction and a global sample selection strategy, and the classification precision of a fine-grained visual recognition model is obviously improved.
Drawings
FIG. 1 is a flowchart of a classification algorithm for images with a dry granularity based on a prediction confidence level according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a fine-grained classification denoising training method based on prediction confidence, which is based on a prediction deviceThe noise-containing fine-grained classification denoising algorithm method for the reliability is mainly divided into two parts: the method comprises the steps of firstly, dynamic loss correction based on prediction confidence, and secondly, global sample selection based on prediction confidence, wherein the capacity of learning a fine-grained image classification model from internet noisy image data sets is effectively improved by combining the dynamic loss correction and the global sample selection in a frame; in the training phase, the framework of the method can be described simply as: definition ofFor the number i sample in the training data set D, whereinIs an imageIs its corresponding label. It can be known thatAnd k is the number of classes in the data set, andand the number of samples of the N data sets, due to the presence of noise in the original training set,is not always the caseThe corresponding real label, if any,is a sample imageWhen the sample is a clean sample, the real label of (1)When the sample is an in-distribution noise weThe noise samples should be corrected to the correct label ready for useReplacement ofAnd should be discarded from the training set when the samples are out-of-distribution noise.
The general framework of the algorithm of the present invention is shown in fig. 1, and it must first go through several rounds of preheating training at the beginning of training, and after the preheating training process is completed, each sample of the training set D is subjected to the preheating training , Executing inference and obtaining a prediction result, wherein the inference process respectively obtains Softmax probability distribution vectors from backbone network output consisting of two convolutional neural networks, and then the prediction result is calculatedMemory for recordingFor sample imagesMore recently, the development of new and more recently developed devicesHistorical prediction sequences of round trainingFor each sample imageThen, dynamic correction based on the prediction confidence is lostBalancing the results of label one-hot coding and prediction weight and the neural network output calculation cross entropy to obtain the corrected loss value, and selecting the samples with higher prediction confidence coefficient to form a training sample set actually participating in training in the training process(ii) a Ratio is generally usedThe number of discarded samples, i.e. the proportion of the samples determined to be out-of-distribution noise in the total number of samples, is controlled. Finally, containThe sample set of individual training examples update the fine-grained image classification neural network model with their corrected loss values.
In this embodiment, since the noise-containing training data contains the out-of-distribution noise samples independent of the contents of the clean samples and the in-distribution noise samples, the prediction results of these out-of-distribution noise samples will change during the early training process. Consider the case where only noise samples and clean samples within the distribution exist in the dataset, letRepresenting a samplePredicting a sequence of outcomes in historyIs predicted as the frequency of tag j andk is the number of categories of the data set;can be represented by formula (6)1) calculating:
wherein the content of the first and second substances,is the result of the prediction of the sample image,is the size of its historical prediction result set, i.e. the total number of predictions,i.e. a histogram of the prediction labels with respect to the total number of predictions.
The frequency with which a sample label appears in the prediction history is statistically positively correlated with the likelihood that the label is a genuine label. The likelihood that a prediction belongs to a true tag is defined as the "confidence" that the correct tag was inferred from historical predictions.
The concept of entropy is consistent with the above-mentioned confidence expression concept, and can be used to express each sample imageThe uncertainty of the predicted result of (2) is shown in formula (6.7):
the histogram characteristics of the prediction history reflect the uncertainty of the inference history with respect to the label attribution, and the flatter the more the histogram distribution uncertainty is, the weaker the histogram distribution uncertainty is, the more concentrated the histogram distribution uncertainty is. Equation (6.8) describes when the uncertainty of the prediction history is greatestThe minimum case is:
wherein k represents the total class number of the tags in the data set,representing the length of the prediction history sequence. In a commonly used network image fine-grained classification data set, the number of label categories (mostly more than one hundred categories) in the data set is far greater than the length (about 10) of an actual historical prediction record, that is, the label categories are set. Therefore, it can be considered thatFrom the above analysis, a method for calculating the maximum uncertainty of the historical prediction can be obtained, as shown in equation (6.9):
the cross entropy is used only and inconvenient to use, the cross entropy has a lower bound, but the difference of the upper bound of the cross entropy is large under different conditions, so that the normalization operation needs to be executed on the basis of the cross entropy. Given the maximum historical prediction uncertainty calculation method, one can defineFor normalizing the historical prediction uncertainty to make the value range constantTo measure in a convenient way:the form of the formula is as follows:
in summary, we can calculate and update the prediction confidence for each input sample according to the prediction history, sample imageNormalized prediction confidence ofThe definition is shown in formula (6.11):
in this embodiment, the conventional idea of processing noise samples is to identify the noise samples by modeling the distribution of the noise. Accurately modeling noise is difficult in many cases, and it is not possible to model noise in a large number of samples with a significant effect. The situation of explicitly raising the distribution of noise or giving reconstruction errors is not always in accordance with the actual situation, and the method is not common in the large-scale data training neural network scene, so that the model-based method is gradually replaced by other methods. A common way to deal with the noisy data training problem is to add terms to the loss function so that the loss function can be less affected by the noise samples under certain conditions, and the above idea is generally called a "loss correction" method.
The idea of implementing loss correction is to dynamically adjust the target implementation of training according to the current neural network state, and a Bootstrapping strategy can be introduced, and the main method is as follows: the prediction target, which is a Convex Combination (constellation Combination) between the wrong tag vector and the result of the current prediction output of the neural network, is dynamically updated in the existing state of the model. As the training process continues to advance, neural networks should be more inclined to trust the current predicted output. Because the correct sample with the dominant scale exists in the training sample set, the network prediction result after training and the error label keep a certain difference, and therefore the method can finally reduce the influence of the incorrectly labeled sample on the training.
According to a common scheme, cross entropy is adopted as a basic loss function of the target neural network, and an additional optimization target needs to be added to the basic loss function to reflect the current state of the model. While adding additional targets is generally divided into two basic approaches: one is to directly use the prediction vector output by the neural network, called soft scheme; and the method of generating a prediction result (one-hot tag) using the prediction vector is called a hard scheme. The detailed structure of the "soft scheme" is shown in equation (6.7):
it can be shown that the final optimization objective using equation (6.12) is equivalent to a Softmax regression with a minimum entropy regularization term whose function is to make the model more prone to predictive labeling. Corresponding to the "soft solution" is a hard solution that changes the regression objective to a known oneIn the case ofMaximum posterior probability of, noteIn the form of equation (6.13):
after selecting a proper loss correction method, a normal neural network optimization process is executed next, data are fed into the neural network in batches, and network parameters are updated by using optimization tools such as random gradient descent and the like. The mode of updating the network parameters is similar to the process of an EM algorithm, a confidence label (correction label) corresponding to a sample is estimated by utilizing the convex combination of an original label and a model prediction label in the expectation stage, and the network parameters are updated in the maximization stage to enable the model to better predict the label generated in the last step.
If the neural network is updated directly with the loss function, the network will quickly begin to over-fit the noise samples during the training process, resulting in degraded performance. It is therefore desirable to consider adopting the bootstrapping strategy while compensating for loss values using the labels and predictors to mitigate the tendency of the neural network to fit to noise. When compensation is brought in, equation (6.2) can be rewritten as equation (6.14):
is a sample imageThe corresponding label is marked with a corresponding label,is the result of the prediction thereof,is the last softmax layer output defined by equation (6.3).
Normalized prediction confidence defined using equation (6.11)And dynamically determining the weight relationship between each sample label and the prediction result to achieve the aim of dynamically correcting the loss value. The specific idea is as follows: in the training process, if the label prediction confidence of a certain sample image is very high, the certain sample image is a clean sample or distributionThe probability of the internal noise sample is larger, and the loss value is corrected by the prediction result to a great extent; conversely, if there is a frequent variation in the prediction history of a sample, resulting in a large prediction uncertainty, the sample is a difficult sample or has a large possibility of being out-of-distribution noise, and the sample loss value should depend on the original label to a large extent. In summary, equation (6.14) can be rewritten to the dynamic loss based on prediction uncertainty by introducing equation (6.11), i.e., equation (6.15):
wherein, the first and the second end of the pipe are connected with each other,using normalized prediction confidenceThe degree of compensation for the current Loss value is dynamically adjusted, and equation (6.15 refers to the dynamic Loss based on the Prediction confidence, abbreviated as PFL Loss).
In this embodiment, the cross-entropy loss boundaries between the in-distribution noise samples, the out-distribution noise samples and the clean samples are not always well defined, and thus cannot be well distinguished from each other in any scenario. The confidence of the prediction result of the noise samples outside the distribution in the training process is always lower than that of the noise samples in the clean distribution and the noise in the distribution, and the phenomenon provides a valuable clue for distinguishing the noise samples outside the distribution from other samples. The rule of prediction change of each sample in the training process reveals a feasible sample selection strategy, namely, the sample selection is driven by measuring the historical prediction confidence. The adoption of the strategy is more effective than the simple use of cross entropy loss, and clean samples and noise samples in distribution can be fully utilized while external noise is identified.
In order to smoothly introduce the strategy, preheating training needs to be adopted in the initial stage of the algorithm, the first few rounds of training are carried out, and a deep neural network tends to be fittedClean and simple samples and then start to adapt to difficult and noisy samples, inspired by the above conclusions, with all training samples D precedingIn the round robin, a preheating strategy is firstly used for training a target neural network, the process of presetting the training does not include any loss correction and sample selection process, and the cross entropy loss used for the training is as shown in a formula (6.2):
wherein the content of the first and second substances,representing a sampleLabel of (2), cross entropy loss in equation (6.2)For updating the neural network only during the warm-up phase, in whichThe output vector representing the last softmax layer is calculated by equation (6.3):
in the formulaA mapping function representing a neural network is provided,is the output of the fully connected layer preceding the last softmax layer, then the respective sample imageCorresponding inference resultIt can be calculated by equation (6.4):
during the whole training process (including the preheating training process), each sample image in the training set is recorded and updatedIn the near fieldPrediction in round training, note. After the warm-up training phase is over, a prediction is performed on all samples of training set D using the trained neural network and then a historical prediction sequence is established and updated. The PFL loss for all training samples of training set D was calculated using the average equation (6.15). After completion of a round of training, PFL loss ranking can be selectedForm a new training sample setAnd performing updating of the neural network model, selecting no useful sample in the mini-batch by the method, and selecting a sample participating in training on the whole training set by the algorithm so as to reduce the influence caused by the unbalanced noise distribution phenomenon across the mini-batch. To sum up, after the preheating trainingNewly selected training sample set for each round of trainingThe generation process of (2) is shown in formula (6.5):
whereinRepresenting a sampleThe clean samples and the distributed intra-noise samples are selected by equation (6.5) into the training set of the current round of training in the global sample selection phase. To avoid false exclusion of useful samples, the samples with high loss based on the prediction stability metric are only excluded from the training set in the current round of training, but the normalized prediction confidence of all samples is recalculated in the next round of global sample selectionAnd updating the historical prediction sequenceThe process is shown in formula (6.6):
the loss correction based on prediction confidence and global sample selection algorithm is as follows:
inputting training sample set D training turnsPreheating training turnsSample noise ratioLength of history recorded。
Predicting the result of each sample after the training of the current roundAdding to historical prediction result sequencesTo the end of (1);
else
using prediction results of current individual samplesReplacing historical predictor sequencesOf the earliest prediction record, guaranteed sequenceThe length is not more than。
end
According to cross entropy lossThe gradient is calculated and the update network is propagated backwards.
else
End;
and (3) outputting: for training the loss of back propagation.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.
Claims (10)
1. A fine-grained classification denoising training method based on prediction confidence is characterized by comprising the following steps:
s1, firstly, utilizing all training samples to participate in preheating training, and recording a recent prediction result of each sample as a historical prediction set;
s2, generating a normalized prediction confidence coefficient of each sample through a histogram generated by a historical prediction set, wherein the normalized prediction confidence coefficient is as follows:
s21, calculating a histogram of the prediction labels relative to the total prediction times through a formula;
s22, deducing the confidence coefficient of the correct label according to the historical prediction result;
s23, performing normalization operation on the basis of the cross entropy;
and S3, balancing weights of the sample labels and the sample prediction by adopting the normalized prediction confidence coefficient, and dynamically correcting the loss value.
2. The fine-grained classification denoising training method based on prediction confidence coefficient as claimed in claim 1, wherein in S1, the training is first pre-heated at the beginning, and each sample in the training samples D is pre-heated after the pre-heating training process is completed , Performing inference and obtaining a prediction result, obtaining the number of samples of the N data sets, respectively obtaining Softmax probability distribution vectors from backbone network outputs formed by the two convolutional neural networks in a physical process, and then calculating the prediction result(ii) a Note the bookFor sample imagesHistorical prediction sequences of trainingFor each sample imageThe prediction confidence of (2); dynamic correction loss based on prediction confidenceAnd balancing the label one-hot coding and the predicted result after the weight and the neural network output to calculate the cross entropy to obtain a corrected loss value.
3. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 2, wherein the training sample D is put in frontIn the round robin, a preheating strategy is used for training a target neural network, and a cross entropy loss formula used for training is as follows:
wherein the content of the first and second substances,representing a sampleLabel of (2), cross entropy loss in equation (6.2)Updating the neural network in the preheating stage;
in the formulaThe output vector representing the last softmax layer is calculated by equation (6.3):
in the formulaA mapping function representing a neural network is provided,is the output of the fully-connected layer preceding the last softmax layer, k is the number of categories of the dataset,for network parameters, sample imagesCorresponding inference resultCalculating outThe formula is as follows:
4. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 3, wherein after the preheating training phase is finished, a trained neural network is used to perform a prediction on the samples in the training sample D and then to establish and update the historical prediction sequence(ii) a The PFL loss of all training samples in training sample D was calculated using the average equation (6.15); after training is finished, the number of discarded samples is controlled by using a ratio delta (%), namely the proportion of the samples judged to be noise outside the distribution to the total number of samples, and PFL loss before ranking is selectedThe training samples form a new training sample setPerforming an update of the neural network model, pre-heating a newly selected training sample set for each round of training after trainingThe generation process of (2) is shown in formula (6.5):
whereinRepresenting a sampleThe clean samples and the noise samples in the distribution are selected into a training set of the training of the round by a formula (6.5) in the global sample selection stage; the normalized prediction confidence of all samples is recalculated in the next round of global sample selectionAnd updating the historical prediction sequenceThe process formula is as follows:
5. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 3, wherein in S21, letRepresenting a samplePredicting a sequence of outcomes in historyIs predicted as the frequency of tag j andk is the number of categories of the data set;can be calculated by equation (6.1):
6. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 5, wherein in S22, the frequency of occurrence of sample labels in prediction history is statistically in positive correlation with the probability that the label is a true label; the likelihood that such a prediction belongs to a true tag is defined as the "confidence" that the correct tag was inferred from historical predictions;
the concept of entropy is matched with the concept of confidence degree, and can be used for expressing each sample imageThe uncertainty of the prediction result of (1) is in the form of:
representing a samplePredicting a sequence of outcomes in historyThe frequency of the label y predicted as the highest frequency in the prediction history, the histogram characteristic of the prediction history reflects the uncertainty of the inference history to the attribution of the label, and the following formula can be used to describe the uncertainty of the prediction history when the uncertainty is the maximumThe minimum case is:
wherein k represents the total class number of the tags in the data set,represents the length of the prediction history sequence; in a commonly used network image fine-grained classification dataset, the label category number in the dataset is far greater than the length setting of the actual historical prediction record, i.e.(ii) a Therefore, the temperature of the molten steel is controlled,the calculation method of the historical prediction maximum uncertainty can be obtained, and the formula is as follows:
7. the fine-grained classification denoising training method based on prediction confidence as claimed in claim 6, wherein in S23, normalization operation is performed on the basis of cross entropy; the known maximum historical prediction uncertainty calculation method is definedNormalizing the historical prediction uncertainty to a constant valueTo measure in a convenient way:the form of the formula is as follows:
calculating and updating prediction confidence for each input sample based on prediction history, sample imageNormalized prediction confidence ofThe definition is shown in formula (6.11):
8. the fine-grained classification denoising training method based on prediction confidence coefficient as claimed in claim 7, wherein in S3, a "loss correction" method is adopted for processing noise samples;
the basic loss function of the neural network adopts cross entropy, and the addition of additional targets is divided into two basic modes: one is a soft scheme and the other is a hard scheme; the detailed structure of the soft scheme is as the formula:
q is a prediction vector output by the neural network, t is a noise label vector, L is the total class number of labels, and beta is a value range constant inThe hyper-parameter between, corresponding to the "soft solution", is a hard solution, changing the regression objective to a known oneIs known asIn the case of the maximum value of qMaximum posterior probability of, note
after a loss correction method is selected, the network parameters are updated by using a random gradient descent optimization tool.
9. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 6, wherein the neural network is updated by using a loss function, and when compensation is brought, the formula (6.2) can be rewritten as:
10. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 9, wherein the normalized prediction confidence defined by formula (6.11) is usedDynamically determining the weight relationship between each sample label and the prediction result to achieve the purpose of dynamically correcting the loss value;
equation (6.14) is rewritten to a dynamic loss based on prediction uncertainty by introducing equation (6.11), specifically:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211452486.3A CN115661549A (en) | 2022-11-21 | 2022-11-21 | Fine-grained classification denoising training method based on prediction confidence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211452486.3A CN115661549A (en) | 2022-11-21 | 2022-11-21 | Fine-grained classification denoising training method based on prediction confidence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115661549A true CN115661549A (en) | 2023-01-31 |
Family
ID=85017297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211452486.3A Pending CN115661549A (en) | 2022-11-21 | 2022-11-21 | Fine-grained classification denoising training method based on prediction confidence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115661549A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116567719A (en) * | 2023-07-05 | 2023-08-08 | 北京集度科技有限公司 | Data transmission method, vehicle-mounted system, device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111861909A (en) * | 2020-06-29 | 2020-10-30 | 南京理工大学 | Network fine-grained image denoising and classifying method |
CN112232407A (en) * | 2020-10-15 | 2021-01-15 | 杭州迪英加科技有限公司 | Neural network model training method and device for pathological image sample |
CN114190950A (en) * | 2021-11-18 | 2022-03-18 | 电子科技大学 | Intelligent electrocardiogram analysis method and electrocardiograph for containing noise label |
-
2022
- 2022-11-21 CN CN202211452486.3A patent/CN115661549A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111861909A (en) * | 2020-06-29 | 2020-10-30 | 南京理工大学 | Network fine-grained image denoising and classifying method |
CN112232407A (en) * | 2020-10-15 | 2021-01-15 | 杭州迪英加科技有限公司 | Neural network model training method and device for pathological image sample |
CN114190950A (en) * | 2021-11-18 | 2022-03-18 | 电子科技大学 | Intelligent electrocardiogram analysis method and electrocardiograph for containing noise label |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116567719A (en) * | 2023-07-05 | 2023-08-08 | 北京集度科技有限公司 | Data transmission method, vehicle-mounted system, device and storage medium |
CN116567719B (en) * | 2023-07-05 | 2023-11-10 | 北京集度科技有限公司 | Data transmission method, vehicle-mounted system, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321926B (en) | Migration method and system based on depth residual error correction network | |
Yoon et al. | Data valuation using reinforcement learning | |
CN113688949B (en) | Network image data set denoising method based on dual-network joint label correction | |
CN111832627A (en) | Image classification model training method, classification method and system for suppressing label noise | |
CN111784595B (en) | Dynamic tag smooth weighting loss method and device based on historical record | |
WO2019091402A1 (en) | Method and device for age estimation | |
CN113221903B (en) | Cross-domain self-adaptive semantic segmentation method and system | |
CN115661549A (en) | Fine-grained classification denoising training method based on prediction confidence | |
CN110110372B (en) | Automatic segmentation prediction method for user time sequence behavior | |
CN112990385A (en) | Active crowdsourcing image learning method based on semi-supervised variational self-encoder | |
CN113537630A (en) | Training method and device of business prediction model | |
CN111105241A (en) | Identification method for anti-fraud of credit card transaction | |
CN110059251B (en) | Collaborative filtering recommendation method based on multi-relation implicit feedback confidence | |
CN110310199B (en) | Method and system for constructing loan risk prediction model and loan risk prediction method | |
Ji et al. | How to handle noisy labels for robust learning from uncertainty | |
JP2021149842A (en) | Machine learning system and machine learning method | |
Li et al. | Inter-domain mixup for semi-supervised domain adaptation | |
Peng et al. | FaxMatch: Multi‐Curriculum Pseudo‐Labeling for semi‐supervised medical image classification | |
CN112541010A (en) | User gender prediction method based on logistic regression | |
CN116486150A (en) | Uncertainty perception-based regression error reduction method for image classification model | |
CN115829693A (en) | Contextual slot machine delay feedback recommendation method and system based on causal counterfactual | |
Donini et al. | An efficient method to impose fairness in linear models | |
CN113627538B (en) | Method for training asymmetric generation of image generated by countermeasure network and electronic device | |
CN115861625A (en) | Self-label modifying method for processing noise label | |
CN113191984B (en) | Deep learning-based motion blurred image joint restoration and classification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230131 |
|
RJ01 | Rejection of invention patent application after publication |