CN115661549A

CN115661549A - Fine-grained classification denoising training method based on prediction confidence

Info

Publication number: CN115661549A
Application number: CN202211452486.3A
Authority: CN
Inventors: 沈复民; 姚亚洲; 张传一; 姚钰龙; 孙泽人
Original assignee: Nanjing Code Geek Technology Co ltd
Current assignee: Nanjing Code Geek Technology Co ltd
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-01-31

Abstract

The invention discloses a fine-grained classification denoising training method based on prediction confidence, which comprises the following steps of S1, utilizing all training samples to participate in preheating training, and recording the prediction results of each sample for a few times as a historical prediction set; s2, generating a normalized prediction confidence coefficient of each sample through a histogram generated by a historical prediction set; and S3, balancing weights of the sample labels and the sample prediction by adopting the normalized prediction confidence coefficient, and dynamically correcting the loss value. In the invention, the dynamic loss replaces the common cross entropy loss to distinguish the noise outside the distribution from other samples, so that the noise outside the distribution can be better removed; when the model is trained on the noisy data set, denoising training is carried out in a frame through loss correction and a global sample selection strategy, and the classification precision of the fine-grained visual recognition model is obviously improved.

Description

Fine-grained classification denoising training method based on prediction confidence

Technical Field

The invention relates to the technical field of fine-grained image classification, in particular to a fine-grained classification denoising training method based on prediction confidence.

Background

Noise in noisy data sets is generally divided into two categories: the first type of noise is intra-distribution noise, i.e., the true label of the sample itself belongs to the label set of the data set, but is mistakenly labeled as other labels of the data set; the second type of noise is out-of-distribution noise, where the true label of the sample is not in the label set of the dataset. The image content of the distributed external noise sample is only weakly associated with the label labeling condition but not in accordance with the labeling principle, and some or even no relationship exists. If a data set contains both types of noise, the data set is called an open-set noisy data set. The noisy data sets obtained under natural conditions are almost all open-set data sets, and closed-set data sets are rather rare.

The research community has proposed various ideas to deal with the problem of noise contained in the training data set. One class of methods is known as "loss correction" (loss correction) or label correction (label correction). The conventional practice of loss correction is to add some correction to the loss values during neural network model training to avoid over-fitting noise samples within the distribution. Some methods also correct the in-distribution noise in a form of learning a noise transfer matrix, but cannot simultaneously and correctly process the out-distribution noise, the effect on large-scale data is not ideal, the real label of the out-distribution noise is not in the label definition domain of the data set, and the label result of forcibly correcting the in-distribution noise sample by using the noise transfer matrix is difficult to obtain.

Disclosure of Invention

The invention provides a neural network denoising training method based on prediction confidence coefficient, and solves the problem that a fine-grained image classification model is difficult to train on a noisy data set.

In order to achieve the purpose, the invention provides the following technical scheme: a fine-grained classification denoising training method based on prediction confidence comprises the following steps:

s1, firstly, utilizing all training samples to participate in preheating training, and recording the prediction results of each sample for a few times as a historical prediction set;

s2, generating the normalized prediction confidence coefficient of each sample through a histogram generated by a historical prediction set, wherein the normalized prediction confidence coefficient is as follows:

s21, through the formula

(6.1) calculating a histogram of the prediction labels with respect to the total prediction times;

s22, deducing the confidence of the correct label according to the historical prediction result;

s23, performing normalization operation on the basis of the cross entropy;

and S3, balancing weights of the sample labels and the sample prediction by adopting the normalized prediction confidence coefficient, and dynamically correcting the loss value.

Further, in S1, the training must first go through several rounds of preheating training at the beginning, and after the preheating training process is completed, each sample in the training set D is subjected to the preheating training

,

Performing inference and obtaining a prediction result, N being the number of samples of the data set; the inference process respectively obtains Softmax probability distribution vectors from backbone network output formed by two convolutional neural networks, and then calculates prediction results

(ii) a Note book

For sample images

More recently, the development of new and more recently developed devices

Historical prediction sequences of round training

For each sample image

The prediction confidence of (2); dynamic correction loss based on prediction confidence

Balancing the result of label one-hot coding and prediction weight and neural network output to calculate cross entropy to obtain corrected loss value; in the training process, the samples with higher prediction confidence coefficient are selected to form a training sample set actually participating in training

(ii) a Comprises

The sample set of individual training examples update the fine-grained image classification neural network model with their corrected loss values.

Further, deep neural networks tend to fit clean and simple samples and then begin to adapt to difficult and noisy samples; use all training samples D before

In the round robin, a preheating strategy is used for training a target neural network; the cross-entropy loss formula used for training is:

(6.2)

wherein the content of the first and second substances,

representing a sample

Label of (2), cross entropy loss in equation (6.2)

Updating the neural network in the preheating stage; in the formula

The output vector representing the last softmax layer is calculated by equation (6.3):

(6.3)

in the formula

A mapping function representing a neural network is provided,

is the output of the fully-connected layer preceding the last softmax layer, k is the number of classes of the dataset,

for network parameters, individual sample images

Corresponding inference result

The formula for calculation is:

(6.4)

recording and updating each sample image in the training set during the whole training process

In the near vicinity

Prediction in round training, note

，

Sample image for network

Predictive labels in the T-th (i.e., current) round of training.

Further, after the warm-up training phase is completed, a prediction is performed on the samples in the training sample D using the trained neural network and then a historical prediction sequence is established and updated

(ii) a The PFL loss of all training samples in training sample D was calculated using the average equation (6.15); after training is completed, the number of discarded samples, namely the proportion of the noise samples which are judged to be out-of-distribution to the total number of samples, is controlled by using a ratio delta (%) in general, and PFL loss before ranking can be selected

Form a new training sample set

Performing an update of the neural network model, pre-heating a newly selected training sample set for each round of training after training

The generation process of (2) is shown in formula (6.5):

(6.5)

wherein

Representing a sample

The clean samples and the noise samples in the distribution are selected into a training set of the training of the round by a formula (6.5) in the global sample selection stage; to avoid false exclusion of useful samples, the samples with high loss based on the prediction stability metric are only excluded from the training set in the current round of training, but the normalized prediction confidence of all samples is recalculated in the next round of global sample selection

And updating the historical prediction sequence

The process formula is as follows:

(6.6)；

is a sample image

A corresponding label.

Furthermore, in S21, since the noise-containing training data has no correlation between the content of the distributed noise samples and the content of the clean samples and the content of the distributed noise samples, the prediction result of the distributed noise samples will be changed continuously during the early training process, so that

Representing a sample

Predicting a sequence of outcomes in history

Is predicted as the frequency of tag j and

k is the number of categories of the data set;

can be calculated by equation (6.1):

(6.1)

wherein the content of the first and second substances,

is the result of the prediction of the sample image,

is the size of its historical prediction result set, i.e. the total number of predictions,

i.e. a histogram of the prediction labels with respect to the total number of predictions.

Further, in S22, the frequency of occurrence of the sample label in the prediction history is statistically positively correlated with the probability that the label is a true label; the likelihood that such a prediction belongs to a true tag is defined as the "confidence" that the correct tag was inferred from historical predictions;

the concept of entropy is matched with the concept of confidence degree, and can be used for expressing each sample image

The uncertainty of the prediction result of (1) is in the form of:

(6.7)

representing a sample

Predicting a sequence of outcomes in history

The frequency of the label y predicted as the highest frequency in the prediction history, the histogram characteristic of the prediction history reflects the uncertainty of the inference history to the attribution of the label, and the following formula can be used to describe the uncertainty of the prediction history when the uncertainty is the maximum

The minimum case is:

(6.8)

wherein k represents the total class number of the tags in the data set,

represents the length of the prediction history sequence; in a commonly used network image fine-grained classification dataset, the label category number in the dataset is far greater than the length setting of the actual historical prediction record, i.e.

(ii) a Therefore, the first and second electrodes are formed on the substrate,

the calculation method of the historical prediction maximum uncertainty can be obtained, and the formula is as follows:

(6.9)。

furthermore, in S23, the use of cross entropy is inconvenient, and the cross entropy itself has a lower bound, but the difference between the upper bounds of the cross entropy and the lower bounds of the cross entropy is very large under different conditions, so that the normalization operation needs to be performed on the basis of the cross entropy; given the maximum historical prediction uncertainty calculation method, one can define

For normalizing the historical prediction uncertainty to make the value range constant

To facilitate measurement;

see formula (6.10):

(6.10)

in summary, we can calculate and update the prediction confidence for each input sample according to the prediction history, and sample image

Normalized prediction confidence of

The definition is shown in formula (6.11):

(6.11)。

further, in S3, the conventional idea of processing the noise sample is to identify the noise sample by modeling the distribution of the noise, and generally adopt a "loss correction" method;

the basic loss function of the neural network adopts cross entropy, and the addition of additional targets is divided into two basic modes: one is a soft scheme and the other is a hard scheme; the detailed structure of the soft scheme is as follows:

(6.12)

q is a prediction vector output by the neural network, t is a noise label vector, L is the total label category number, and beta is a value range constant in the range

The hyper-parameter between, corresponding to the "soft solution", is a hard solution, changing the regression objective to a known one

Is known as

In the case of the maximum value of q

Maximum posterior probability of, note

The formal formula is:

(6.13)

after a suitable loss correction method is selected, the network parameters are updated using a stochastic gradient descent optimization tool.

Furthermore, the neural network is updated by using a loss function, so that the noise sample is quickly overfitted in the training process, thereby causing the performance to be reduced; it is therefore desirable to compensate for the loss value using both the label and the prediction to mitigate the tendency of the neural network to fit to noise, and when compensation is brought in, equation (6.2 can be rewritten as:

(6.14)

is a sample image

The corresponding label is marked with a corresponding label,

is the result of the prediction thereof,

is the last softmax layer output defined by equation (6.3), and in general, the parameters

Are often set to a fixed value, e.g.

= 0.8 to statically balance the label and the prediction result.

Further, the normalized prediction confidence defined by equation (6.11) is used

Dynamically determining the weight relationship between each sample label and the prediction result to achieve the purpose of dynamically correcting the loss value;

equation (6.14) can be rewritten to a dynamic loss based on prediction uncertainty by introducing equation (6.11), specifically:

(6.15)

wherein, the first and the second end of the pipe are connected with each other,

using normalized prediction confidence

The compensation degree of the current loss value is dynamically adjusted; equation (6.15) is referred to as dynamic penalty based on Prediction confidence, abbreviated as PFL penalty (Prediction Fidelity Loss).

Compared with the prior art, the invention has the beneficial effects that:

in the application, a dynamic loss based on prediction confidence coefficient is used for replacing a common cross entropy loss to distinguish the noise outside the distribution from other samples (clean samples and noise inside the distribution), so that the noise outside the distribution can be better removed; analyzing a historical prediction result and dynamically correcting a loss value by using a method for realizing the correction of noise in distribution and identifying noise outside the distribution according to the prediction confidence coefficient so as to achieve the aim of relieving the interference of the noise in the distribution on training; when a model is trained on a noisy data set, global sample selection is carried out by utilizing the prediction confidence coefficient, and the strategy is integrated into a simple and effective noisy data set fine-grained image classification training frame, so that denoising training is carried out in one frame through loss correction and a global sample selection strategy, and the classification precision of a fine-grained visual recognition model is obviously improved.

Drawings

FIG. 1 is a flowchart of a classification algorithm for images with a dry granularity based on a prediction confidence level according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention is a fine-grained classification denoising training method based on prediction confidence, which is based on a prediction deviceThe noise-containing fine-grained classification denoising algorithm method for the reliability is mainly divided into two parts: the method comprises the steps of firstly, dynamic loss correction based on prediction confidence, and secondly, global sample selection based on prediction confidence, wherein the capacity of learning a fine-grained image classification model from internet noisy image data sets is effectively improved by combining the dynamic loss correction and the global sample selection in a frame; in the training phase, the framework of the method can be described simply as: definition of

For the number i sample in the training data set D, wherein

Is an image

Is its corresponding label. It can be known that

And k is the number of classes in the data set, and

and the number of samples of the N data sets, due to the presence of noise in the original training set,

is not always the case

The corresponding real label, if any,

is a sample image

When the sample is a clean sample, the real label of (1)

When the sample is an in-distribution noise weThe noise samples should be corrected to the correct label ready for use

Replacement of

And should be discarded from the training set when the samples are out-of-distribution noise.

The general framework of the algorithm of the present invention is shown in fig. 1, and it must first go through several rounds of preheating training at the beginning of training, and after the preheating training process is completed, each sample of the training set D is subjected to the preheating training

,

Executing inference and obtaining a prediction result, wherein the inference process respectively obtains Softmax probability distribution vectors from backbone network output consisting of two convolutional neural networks, and then the prediction result is calculated

Memory for recording

For sample images

More recently, the development of new and more recently developed devices

Historical prediction sequences of round training

For each sample image

Then, dynamic correction based on the prediction confidence is lost

Balancing the results of label one-hot coding and prediction weight and the neural network output calculation cross entropy to obtain the corrected loss value, and selecting the samples with higher prediction confidence coefficient to form a training sample set actually participating in training in the training process

(ii) a Ratio is generally used

The number of discarded samples, i.e. the proportion of the samples determined to be out-of-distribution noise in the total number of samples, is controlled. Finally, contain

In this embodiment, since the noise-containing training data contains the out-of-distribution noise samples independent of the contents of the clean samples and the in-distribution noise samples, the prediction results of these out-of-distribution noise samples will change during the early training process. Consider the case where only noise samples and clean samples within the distribution exist in the dataset, let

Representing a sample

Predicting a sequence of outcomes in history

Is predicted as the frequency of tag j and

k is the number of categories of the data set;

can be represented by formula (6)1) calculating:

(6.1)

wherein the content of the first and second substances,

is the result of the prediction of the sample image,

The frequency with which a sample label appears in the prediction history is statistically positively correlated with the likelihood that the label is a genuine label. The likelihood that a prediction belongs to a true tag is defined as the "confidence" that the correct tag was inferred from historical predictions.

The concept of entropy is consistent with the above-mentioned confidence expression concept, and can be used to express each sample image

The uncertainty of the predicted result of (2) is shown in formula (6.7):

(6.7)

the histogram characteristics of the prediction history reflect the uncertainty of the inference history with respect to the label attribution, and the flatter the more the histogram distribution uncertainty is, the weaker the histogram distribution uncertainty is, the more concentrated the histogram distribution uncertainty is. Equation (6.8) describes when the uncertainty of the prediction history is greatest

The minimum case is:

(6.8)

wherein k represents the total class number of the tags in the data set,

representing the length of the prediction history sequence. In a commonly used network image fine-grained classification data set, the number of label categories (mostly more than one hundred categories) in the data set is far greater than the length (about 10) of an actual historical prediction record, that is, the label categories are set

. Therefore, it can be considered that

From the above analysis, a method for calculating the maximum uncertainty of the historical prediction can be obtained, as shown in equation (6.9):

(6.9)

the cross entropy is used only and inconvenient to use, the cross entropy has a lower bound, but the difference of the upper bound of the cross entropy is large under different conditions, so that the normalization operation needs to be executed on the basis of the cross entropy. Given the maximum historical prediction uncertainty calculation method, one can define

To measure in a convenient way:

the form of the formula is as follows:

(6.10)

in summary, we can calculate and update the prediction confidence for each input sample according to the prediction history, sample image

Normalized prediction confidence of

The definition is shown in formula (6.11):

(6.11)。

in this embodiment, the conventional idea of processing noise samples is to identify the noise samples by modeling the distribution of the noise. Accurately modeling noise is difficult in many cases, and it is not possible to model noise in a large number of samples with a significant effect. The situation of explicitly raising the distribution of noise or giving reconstruction errors is not always in accordance with the actual situation, and the method is not common in the large-scale data training neural network scene, so that the model-based method is gradually replaced by other methods. A common way to deal with the noisy data training problem is to add terms to the loss function so that the loss function can be less affected by the noise samples under certain conditions, and the above idea is generally called a "loss correction" method.

The idea of implementing loss correction is to dynamically adjust the target implementation of training according to the current neural network state, and a Bootstrapping strategy can be introduced, and the main method is as follows: the prediction target, which is a Convex Combination (constellation Combination) between the wrong tag vector and the result of the current prediction output of the neural network, is dynamically updated in the existing state of the model. As the training process continues to advance, neural networks should be more inclined to trust the current predicted output. Because the correct sample with the dominant scale exists in the training sample set, the network prediction result after training and the error label keep a certain difference, and therefore the method can finally reduce the influence of the incorrectly labeled sample on the training.

According to a common scheme, cross entropy is adopted as a basic loss function of the target neural network, and an additional optimization target needs to be added to the basic loss function to reflect the current state of the model. While adding additional targets is generally divided into two basic approaches: one is to directly use the prediction vector output by the neural network, called soft scheme; and the method of generating a prediction result (one-hot tag) using the prediction vector is called a hard scheme. The detailed structure of the "soft scheme" is shown in equation (6.7):

(6.12)

it can be shown that the final optimization objective using equation (6.12) is equivalent to a Softmax regression with a minimum entropy regularization term whose function is to make the model more prone to predictive labeling. Corresponding to the "soft solution" is a hard solution that changes the regression objective to a known one

In the case of

Maximum posterior probability of, note

In the form of equation (6.13):

(6.13)

after selecting a proper loss correction method, a normal neural network optimization process is executed next, data are fed into the neural network in batches, and network parameters are updated by using optimization tools such as random gradient descent and the like. The mode of updating the network parameters is similar to the process of an EM algorithm, a confidence label (correction label) corresponding to a sample is estimated by utilizing the convex combination of an original label and a model prediction label in the expectation stage, and the network parameters are updated in the maximization stage to enable the model to better predict the label generated in the last step.

If the neural network is updated directly with the loss function, the network will quickly begin to over-fit the noise samples during the training process, resulting in degraded performance. It is therefore desirable to consider adopting the bootstrapping strategy while compensating for loss values using the labels and predictors to mitigate the tendency of the neural network to fit to noise. When compensation is brought in, equation (6.2) can be rewritten as equation (6.14):

(6.14)

is a sample image

The corresponding label is marked with a corresponding label,

is the result of the prediction thereof,

is the last softmax layer output defined by equation (6.3).

Normalized prediction confidence defined using equation (6.11)

And dynamically determining the weight relationship between each sample label and the prediction result to achieve the aim of dynamically correcting the loss value. The specific idea is as follows: in the training process, if the label prediction confidence of a certain sample image is very high, the certain sample image is a clean sample or distributionThe probability of the internal noise sample is larger, and the loss value is corrected by the prediction result to a great extent; conversely, if there is a frequent variation in the prediction history of a sample, resulting in a large prediction uncertainty, the sample is a difficult sample or has a large possibility of being out-of-distribution noise, and the sample loss value should depend on the original label to a large extent. In summary, equation (6.14) can be rewritten to the dynamic loss based on prediction uncertainty by introducing equation (6.11), i.e., equation (6.15):

(6.15)

using normalized prediction confidence

The degree of compensation for the current Loss value is dynamically adjusted, and equation (6.15 refers to the dynamic Loss based on the Prediction confidence, abbreviated as PFL Loss).

In this embodiment, the cross-entropy loss boundaries between the in-distribution noise samples, the out-distribution noise samples and the clean samples are not always well defined, and thus cannot be well distinguished from each other in any scenario. The confidence of the prediction result of the noise samples outside the distribution in the training process is always lower than that of the noise samples in the clean distribution and the noise in the distribution, and the phenomenon provides a valuable clue for distinguishing the noise samples outside the distribution from other samples. The rule of prediction change of each sample in the training process reveals a feasible sample selection strategy, namely, the sample selection is driven by measuring the historical prediction confidence. The adoption of the strategy is more effective than the simple use of cross entropy loss, and clean samples and noise samples in distribution can be fully utilized while external noise is identified.

In order to smoothly introduce the strategy, preheating training needs to be adopted in the initial stage of the algorithm, the first few rounds of training are carried out, and a deep neural network tends to be fittedClean and simple samples and then start to adapt to difficult and noisy samples, inspired by the above conclusions, with all training samples D preceding

In the round robin, a preheating strategy is firstly used for training a target neural network, the process of presetting the training does not include any loss correction and sample selection process, and the cross entropy loss used for the training is as shown in a formula (6.2):

(6.2)

wherein the content of the first and second substances,

representing a sample

Label of (2), cross entropy loss in equation (6.2)

For updating the neural network only during the warm-up phase, in which

(6.3)

in the formula

A mapping function representing a neural network is provided,

is the output of the fully connected layer preceding the last softmax layer, then the respective sample image

Corresponding inference result

It can be calculated by equation (6.4):

(6.4)

during the whole training process (including the preheating training process), each sample image in the training set is recorded and updated

In the near field

Prediction in round training, note

. After the warm-up training phase is over, a prediction is performed on all samples of training set D using the trained neural network and then a historical prediction sequence is established and updated

. The PFL loss for all training samples of training set D was calculated using the average equation (6.15). After completion of a round of training, PFL loss ranking can be selected

Form a new training sample set

And performing updating of the neural network model, selecting no useful sample in the mini-batch by the method, and selecting a sample participating in training on the whole training set by the algorithm so as to reduce the influence caused by the unbalanced noise distribution phenomenon across the mini-batch. To sum up, after the preheating trainingNewly selected training sample set for each round of training

The generation process of (2) is shown in formula (6.5):

(6.5)

wherein

Representing a sample

The clean samples and the distributed intra-noise samples are selected by equation (6.5) into the training set of the current round of training in the global sample selection phase. To avoid false exclusion of useful samples, the samples with high loss based on the prediction stability metric are only excluded from the training set in the current round of training, but the normalized prediction confidence of all samples is recalculated in the next round of global sample selection

And updating the historical prediction sequence

The process is shown in formula (6.6):

(6.6)。

the loss correction based on prediction confidence and global sample selection algorithm is as follows:

inputting training sample set D training turns

Preheating training turns

Sample noise ratio

Length of history recorded

。

for

do

for training each training sample in the data set D

do

Calculate the current sample according to equation (6.4)

Predicted result of (2)

；

if

then

Predicting the result of each sample after the training of the current round

Adding to historical prediction result sequences

To the end of (1);

else

using prediction results of current individual samples

Replacing historical predictor sequences

Of the earliest prediction record, guaranteed sequenceThe length is not more than

。

end

if

then

Calculating the Cross entropy loss according to equation (6.2)

；

According to cross entropy loss

The gradient is calculated and the update network is propagated backwards.

else

Calculate the normalized predicted stability of the samples in D according to equation (6.11)

；

Calculating dynamic loss based on predicted stability according to equation (6.15)

；

Selection according to equation (6.5)

The samples form an actual training set of the training round

；

Calculating the corrected loss value according to equation (6.6)

；

According to the corrected loss value

Reverse directionSpreading;

End；

and (3) outputting: for training the loss of back propagation.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims

1. A fine-grained classification denoising training method based on prediction confidence is characterized by comprising the following steps:

s1, firstly, utilizing all training samples to participate in preheating training, and recording a recent prediction result of each sample as a historical prediction set;

s2, generating a normalized prediction confidence coefficient of each sample through a histogram generated by a historical prediction set, wherein the normalized prediction confidence coefficient is as follows:

s21, calculating a histogram of the prediction labels relative to the total prediction times through a formula;

s22, deducing the confidence coefficient of the correct label according to the historical prediction result;

s23, performing normalization operation on the basis of the cross entropy;

2. The fine-grained classification denoising training method based on prediction confidence coefficient as claimed in claim 1, wherein in S1, the training is first pre-heated at the beginning, and each sample in the training samples D is pre-heated after the pre-heating training process is completed

,

Performing inference and obtaining a prediction result, obtaining the number of samples of the N data sets, respectively obtaining Softmax probability distribution vectors from backbone network outputs formed by the two convolutional neural networks in a physical process, and then calculating the prediction result

(ii) a Note the book

For sample images

Historical prediction sequences of training

For each sample image

And balancing the label one-hot coding and the predicted result after the weight and the neural network output to calculate the cross entropy to obtain a corrected loss value.

3. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 2, wherein the training sample D is put in front

In the round robin, a preheating strategy is used for training a target neural network, and a cross entropy loss formula used for training is as follows:

(6.2)

wherein the content of the first and second substances,

representing a sample

Label of (2), cross entropy loss in equation (6.2)

Updating the neural network in the preheating stage;

in the formula

(6.3)

in the formula

A mapping function representing a neural network is provided,

is the output of the fully-connected layer preceding the last softmax layer, k is the number of categories of the dataset,

for network parameters, sample images

Corresponding inference result

Calculating outThe formula is as follows:

(6.4)

In the near vicinity

Prediction in round training, note

，

Sample image for network

Predictive labels in the T-th (i.e., current) round of training.

4. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 3, wherein after the preheating training phase is finished, a trained neural network is used to perform a prediction on the samples in the training sample D and then to establish and update the historical prediction sequence

(ii) a The PFL loss of all training samples in training sample D was calculated using the average equation (6.15); after training is finished, the number of discarded samples is controlled by using a ratio delta (%), namely the proportion of the samples judged to be noise outside the distribution to the total number of samples, and PFL loss before ranking is selected

The training samples form a new training sample set

The generation process of (2) is shown in formula (6.5):

(6.5)

wherein

Representing a sample

The clean samples and the noise samples in the distribution are selected into a training set of the training of the round by a formula (6.5) in the global sample selection stage; the normalized prediction confidence of all samples is recalculated in the next round of global sample selection

And updating the historical prediction sequence

The process formula is as follows:

(6.6)；

is a sample image

A corresponding label.

5. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 3, wherein in S21, let

Representing a sample

Predicting a sequence of outcomes in history

Is predicted as the frequency of tag j and

k is the number of categories of the data set;

can be calculated by equation (6.1):

(6.1)

wherein the content of the first and second substances,

is the result of the prediction of the sample image,

i.e. the predicted signature phaseHistogram for the total number of predictions.

6. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 5, wherein in S22, the frequency of occurrence of sample labels in prediction history is statistically in positive correlation with the probability that the label is a true label; the likelihood that such a prediction belongs to a true tag is defined as the "confidence" that the correct tag was inferred from historical predictions;

The uncertainty of the prediction result of (1) is in the form of:

(6.7)

representing a sample

Predicting a sequence of outcomes in history

The minimum case is:

(6.8)

wherein k represents the total class number of the tags in the data set,

(ii) a Therefore, the temperature of the molten steel is controlled,

(6.9)。

7. the fine-grained classification denoising training method based on prediction confidence as claimed in claim 6, wherein in S23, normalization operation is performed on the basis of cross entropy; the known maximum historical prediction uncertainty calculation method is defined

Normalizing the historical prediction uncertainty to a constant value

To measure in a convenient way:

the form of the formula is as follows:

(6.10)

calculating and updating prediction confidence for each input sample based on prediction history, sample image

Normalized prediction confidence of

The definition is shown in formula (6.11):

(6.11)。

8. the fine-grained classification denoising training method based on prediction confidence coefficient as claimed in claim 7, wherein in S3, a "loss correction" method is adopted for processing noise samples;

the basic loss function of the neural network adopts cross entropy, and the addition of additional targets is divided into two basic modes: one is a soft scheme and the other is a hard scheme; the detailed structure of the soft scheme is as the formula:

(6.12)

q is a prediction vector output by the neural network, t is a noise label vector, L is the total class number of labels, and beta is a value range constant in

Is known as

In the case of the maximum value of q

Maximum posterior probability of, note

The formal formula is:

(6.13)

after a loss correction method is selected, the network parameters are updated by using a random gradient descent optimization tool.

9. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 6, wherein the neural network is updated by using a loss function, and when compensation is brought, the formula (6.2) can be rewritten as:

(6.14)

is a sample image

The corresponding label is marked with a corresponding label,

is the result of the prediction thereof,

is the last softmax layer output, parameter, defined by equation (6.3)

Is set to a fixed value, e.g.

= 0.8 to statically balance the label and the prediction result.

10. The fine-grained classification denoising training method based on prediction confidence as claimed in claim 9, wherein the normalized prediction confidence defined by formula (6.11) is used

equation (6.14) is rewritten to a dynamic loss based on prediction uncertainty by introducing equation (6.11), specifically:

(6.15)

wherein the content of the first and second substances,

using normalized prediction confidence

The compensation degree of the current loss value is dynamically adjusted; equation (6.15) is referred to as the dynamic penalty based on prediction confidence, abbreviated as PFL penalty.