CN116468938A

CN116468938A - Robust image classification method on label noisy data

Info

Publication number: CN116468938A
Application number: CN202310347228.7A
Authority: CN
Inventors: 吴建鑫; 周逸帆
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-07-21

Abstract

The invention discloses a robust image classification method on label noisy data, which is based on self-supervision learning to pretrain two different convolutional neural networks as network branches of an image classification model; dividing the training data set into a clean sample set, a closed world noise sample set and an open world noise sample set according to network branches; using network branches to respectively correct labels of the closed world noise sample set as pseudo labels; training a model by combining a clean sample set, a closed world noise sample set and an open world noise sample set which are corrected by using a label by using a network branch with a back propagation algorithm; optimizing training model parameters through a small-batch gradient descent algorithm and outputting the training model parameters; and outputting an image to be processed by preprocessing the obtained image, and inputting the image to be processed into a trained model to correspondingly output the classification result of the image. The robustness of the model to open world noise is effectively improved, and the model trained on the label noisy data can obtain more excellent classification accuracy.

Description

Robust image classification method on label noisy data

Technical Field

The invention relates to a robust image classification method on label noisy data, which belongs to the field of computer vision and the field of deep learning technology in computer technology, is particularly suitable for deep convolutional neural networks and noisy label learning technology, can better show the robustness to label noise, and can obtain higher classification accuracy.

Background

Image classification is an important topic in the field of computer vision. Supervised learning is one of the most commonly used learning paradigms for solving the problem of image classification, and the process is to learn the mapping of data to labels from a labeled dataset. However, due to the high cost of data labeling, real-world tasks often have difficulty obtaining truth labels for all samples. For example, for a picture obtained from the internet, under limited economic and manual conditions, generating a label directly from descriptive text around the picture is a common and convenient way to generate a data set, and at this time, the accuracy of the image label is not guaranteed, i.e. the label is noisy. The tag noise is mainly of two types, and the difference is that whether the real tag of the sample with the tag error is in the class space of the data set or not, if the real tag of the sample is still in a certain class in the data set, the tag noise belongs to the tag noise in the closed world, and otherwise, the tag noise belongs to the tag noise in the open world.

Noisy label learning is the task of image classification that is directed to labels in a dataset and is not always true. With the development of deep learning, particularly the development of a deep convolutional neural network, the field of noisy label learning is rapidly developed in recent years. The current mainstream algorithm mainly has two directions, namely, designing a loss function robust to label noise, and then still applying a supervised learning mode on the whole training set, for example, using a symmetrical cross entropy loss function, or using a sample label as a parameter which can be updated based on gradient back propagation to correct the noisy label. Experiments show that the loss functions show better robustness to the tag noise compared with naive cross entropy loss, but the accuracy of the model is still low, and the requirements of practical application cannot be met. The other type is to design a standard to screen out the correct sample of the label from the noisy data of the label, the label noise rate of the sample subset screened out is generally lower than that of the noisy data of the original label, then supervised learning is carried out on the sample subset which is relatively correct in the screened out label, and the rest samples are used as unlabeled samples to carry out unsupervised learning by using regularization loss based on network prediction consistency, so that the mode of semi-supervised learning is wholly adopted. Although the model obtained by training the method has higher accuracy, the current mainstream noisy label learning algorithm based on sample selection still has great limitation in practical application. The main aspects are as follows: the model is not reasonably pre-trained by self-supervised learning techniques; the loss function of the supervised learning section does not take into account the class-wise imbalance of the training samples; the loss function of the unsupervised learning section uses operations such as label sharpening without considering label noise of the open world; and no consideration is given to the need to correct for the corresponding penalty term when resampling the unlabeled exemplars.

These deficiencies can greatly affect the classification accuracy of the final model, and the noisy data of the tag in the real world generally contains a large amount of open world noise and the number of samples in different classes is very unbalanced, so that it is highly desirable to propose a more efficient image classification method that can better process the noisy data set of the tag in the real world.

Disclosure of Invention

The content of the present application is intended to introduce concepts in a simplified form that are further described below in the detailed description. The section of this application is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Aiming at the problems and the shortcomings in the prior art, the invention aims to provide a robust image classification method on label noisy data. Two different convolutional neural network branches are pre-trained using self-supervised learning, and the influence of noise on the model pre-training process is eliminated by avoiding the use of labels. On the basis of the pre-training model, the accumulated error in the training process can be effectively relieved by enabling two network branches to mutually select training samples for each other. The method can avoid the problem that the model is excessively fitted with the unmarked sample set due to large scale difference of the marked and unmarked sample sets by correcting the unsupervised loss term through the resampling coefficient without sharpening the prediction of the network, and effectively improve the robustness of the model to open world noise so as to solve the problem in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions:

the invention discloses a robust image classification method on label noisy data, which mainly comprises the following steps:

step 1, pre-training two different convolutional neural networks on the basis of self-supervision learning on the existing label noisy data set to serve as network branches of an image classification model;

step 2, dividing a training data set into a clean sample set, a closed world noise sample set and an open world noise sample set according to the network branches;

step 3, using the network branches to respectively correct labels of the closed world noise sample set as pseudo labels of samples in the training data set;

step 4, training a model by combining the clean sample set, the closed world noise sample set and the open world noise sample set corrected by the label with a counter propagation algorithm, optimizing training model parameters by a small-batch gradient descent algorithm, and outputting;

and 5, obtaining an image, preprocessing the image, outputting the image to be processed, and inputting the image to be processed into a trained model to correspondingly output a classification result of the image.

Further, in step 1, the pre-training of two different convolutional neural networks as network branches of the image classification model based on self-supervised learning specifically includes the following steps:

step 1.1, randomly initializing two convolutional neural networks by using two different sub-values;

step 1.2, respectively using a self-supervision learning algorithm to pretrain the convolutional neural network on the training data set;

and step 1.3, respectively obtaining network branches of the two image classification models by the pre-training output.

Further, in step 2, the training data set is divided into a clean sample set, a closed-world noise sample set and an open-world noise sample set according to the network branches, which specifically includes the following steps:

step 2.1, fitting cross entropy loss values of all samples in the image training set by using a two-component Gaussian mixture model;

step 2.2, dividing the samples with small cross entropy loss values into the clean sample set, and dividing the rest samples into a noise sample set;

step 2.3, fitting the prediction confidence of the network branch to the noise sample in the noise sample set by using a two-component Gaussian mixture model;

step 2.4, dividing the noise samples with small prediction confidence into the open world noise sample set;

and 2.5, dividing the rest noise samples into the closed-world noise sample set.

Further, in step 4, the two network branches respectively divide and correct the clean sample set, the label corrected closed world noise sample set and the open world noise sample set by using each other for training.

Further, in step 4, the network branch uses the clean sample set, the closed world noise sample set and the open world noise sample set after label correction in combination with a back propagation algorithm to train a model, and the specific steps are as follows:

step 4.1, combining the clean sample and the closed world noise sample set corrected by the label as a marked sample set X, and performing supervised learning on the marked sample set X by using a cross entropy loss function with balanced categories to obtain marked loss items;

step 4.2, taking the open world noise sample set as an unmarked sample set U, and performing unmarked learning on the open world noise sample set by using a regularization loss function based on network prediction consistency to obtain an unmarked loss item;

and 4.3, summing the marked loss term and the unmarked loss term, and training a model by using a back propagation algorithm.

Further, in step 4.1, performing supervised learning using the class-balanced cross-entropy loss function that corrects the probability value of the network branch for the i-th class prediction to be

Wherein n is _i The number of samples of the marked sample set X, η, which is the i-th class _i Representing the fraction of the network branches output for the ith class, n _j The number of samples of the marked sample set X for the j-th class.

Further, in step 4.2, performing unsupervised learning by using the regularized loss function based on the network prediction consistency, using the average value of the prediction probabilities of two different views in the same sample by the network branch as a learning target of the noise sample in the unmarked sample set U, and not performing sharpening processing on the distribution of the prediction probabilities.

Further, in step 4.1 and step 4.2, when the number of samples |x| of the marked sample set X is far greater than the number of samples |u| of the unmarked sample set U, a resampling operation is required for the unmarked sample set U until the number of samples |x| is the same as the number of samples |u|.

Further, a resampling coefficient μ is multiplied before the corresponding no-marker loss term for preventing overfitting of the no-marker sample set U.

Further, in step 5, the image to be processed is input into the trained model to correspond to the classification result of the output image, and the specific steps are as follows:

step 5.1, inputting the preprocessed image to be processed into a trained model for training;

step 5.2, respectively outputting a group of component vectors through two network branches, wherein each dimension corresponds to a category;

step 5.3, summing the two groups of score vectors;

and 5.4, outputting the category corresponding to the dimension with the highest score as a model prediction result.

Compared with the prior art, the invention has the beneficial effects that:

the image classification method robust on the label noisy data uses a self-supervision learning technology to pretrain a model, and uses two groups of two-component Gaussian mixture models to fit cross entropy loss and model confidence of all samples respectively. The training set is further divided into a clean sample set, a closed world noise sample set, and an open world noise sample set according to the relative sizes of the samples on these indicators. And then, using the prediction of the network to correct the label of the closed world noise sample set, combining the clean sample set and the closed world noise sample set as marked sample sets, using the open world noise sample set as unmarked sample sets, and using the cross entropy loss of class equalization to supervise and learn the marked sample sets, thereby effectively solving the problem of unbalanced class of the number of samples. Unsupervised learning using regularization loss based on network prediction consistency for unlabeled sample sets, so that the model makes consistent predictions for different views of the same sample. The problems of unbalance of sample number, label noise in the open world, inconsistent scales of marked and unmarked sample sets and the like in model training are solved, and the model trained on the label noisy data based on the method can obtain more excellent classification accuracy.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application and to provide a further understanding of the application with regard to the other features, objects and advantages of the application. The drawings of the illustrative embodiments of the present application and their descriptions are for the purpose of illustrating the present application and are not to be construed as unduly limiting the present application.

In the drawings:

fig. 1: the method is a main step flow block diagram in the embodiment of the invention;

fig. 2: step flow block diagrams for dividing training data sets in the embodiment of the invention;

fig. 3: step flow chart of training model for training data set after dividing for network branch in the embodiment of the invention;

fig. 4: the method is a flow chart of the step of obtaining the image classification result of the image output in the embodiment of the invention.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

The present invention discloses a continuous learning image classification method facing an open environment, and the present disclosure will be described in detail with reference to the accompanying drawings in combination with embodiments.

Referring to fig. 1, the present invention mainly includes the steps of:

step 2, dividing the training data set into a clean sample set, a closed world noise sample set and an open world noise sample set according to network branches;

step 3, respectively correcting labels of the closed world noise sample set by using network branches to serve as pseudo labels of samples in the training data set;

step 4, training a model by combining the clean sample set, the closed world noise sample set and the open world noise sample set corrected by the network branch with a counter propagation algorithm, and outputting the model after optimizing training parameters by a small-batch gradient descent algorithm;

and 5, outputting an image to be processed by preprocessing the obtained image, and inputting the image to be processed into a trained model to correspondingly output a classification result of the image.

In step 1, two different convolutional neural networks are pre-trained as network branches of an image classification model based on self-supervised learning, which specifically comprises the following steps:

step 1.2, respectively using a self-supervision learning algorithm to pretrain the convolutional neural network on a training data set;

and step 1.3, respectively obtaining network branches of the two image classification models by pre-training output.

Specifically, the self-supervision learning algorithm mainly utilizes an auxiliary task to mine own supervision information from large-scale non-supervision data, and the supervision information constructed by the self-supervision learning algorithm is used for training a network, so that valuable characterization on a downstream task can be learned. That is, the supervision information of the self-supervision learning does not need manual labeling, but the algorithm automatically constructs the supervision information in the large-scale non-supervision data, so that the self-supervision learning or training is performed. Thus, only image data is needed in the process without using a label, and the model can be trained without being influenced by label noise.

Referring to fig. 2, in step 2, the training data set is divided into a clean sample set, a closed-world noise sample set and an open-world noise sample set according to network branches, and specifically includes the following steps:

step 2.2, dividing samples with small cross entropy loss values into clean sample sets, and dividing the rest samples into noise sample sets;

step 2.4, dividing the noise samples with small prediction confidence into an open world noise sample set;

and 2.5, dividing the rest noise samples into a closed-world noise sample set.

Specifically, a gaussian mixture model can be regarded as a model formed by combining a plurality of single gaussian models, precisely quantizing things with a gaussian normal distribution curve, and decomposing the things into a plurality of models formed based on the gaussian normal distribution curve. In a gaussian mixture model, we generally refer to each gaussian distribution as a component. The two-component gaussian mixture model is a gaussian distribution with two components.

Here we first fit the cross entropy loss values of all samples in the image training set using a set of two-component gaussian mixture models, dividing the samples with smaller loss values into a clean sample set. The dividing standard is to calculate the posterior probability that the samples in the image training set belong to Gaussian components with small mean values, and when the posterior probability is larger than 0.5, the corresponding samples are added into a clean sample set, and the rest samples are divided into noise sample sets. And then, fitting a set of two-component Gaussian mixture model to the prediction confidence of the network branch on the noise sample in the noise sample set, dividing the sample with smaller confidence into the noise sample set in the open world, wherein the division standard is the same as that of the sample in the open world, calculating the posterior probability that the sample in the noise sample set belongs to the Gaussian component with small mean value, correspondingly adding the noise sample into the noise sample set in the closed world when the value is greater than 0.5, and dividing the rest sample into the noise sample set in the closed world. The prediction confidence of the network branch to the noise sample in the noise sample set is obtained by calculating the negative logarithm of the probability value of the most confidence category in the network prediction.

In step 3, the labels of the closed-world noise sample sets are corrected separately using the network branches as pseudo labels of the samples in the training data set. Specifically, the maximum confidence class of the network-to-closed world noise sample prediction is used as a pseudo tag for the sample instead of its original tag, thereby further training the network branches as the following training sample.

In step 4, the two network branches respectively divide and correct the clean sample set, the closed world noise sample set and the open world noise sample set after label correction by using the other party, specifically, the two network branches respectively divide and correct the training data set by using the other party for training. For example, the two network branches are network branch a and network branch B, respectively, i.e., network branch a divides the training data set of network branch B into a clean sample set, a label corrected closed world noise sample set, and an open world noise sample set. Similarly, the network branch B is the network branch A.

Referring to fig. 3, in step 4, the training model of the network branch using the clean sample set, the closed world noise sample set after the label correction and the open world noise sample set is combined with the back propagation algorithm, specifically includes the following steps:

step 4.1, combining the clean sample and the closed world noise sample set corrected by the label as a marked sample set X, and performing supervised learning on cross entropy loss of class equalization to obtain marked loss items;

step 4.2, taking the open world noise sample set as an unmarked sample set U, and performing unmarked learning on the open world noise sample set by using regularization loss based on network prediction consistency to obtain an unmarked loss item;

Specifically, the back propagation algorithm is based on a gradient descent method, and the learning process of the back propagation algorithm consists of a forward propagation process and a back propagation process, so that the problem of large calculation amount of partial derivatives is solved, and any partial derivative can be calculated quickly. And then, the parameters of the training model are optimized by combining a small-batch gradient descent algorithm and then output, and a gradient item is obtained after summing a certain number of samples through the small-batch gradient descent, so that the deep learning is further optimized, the calculation time is saved, and the stability is improved.

Further, supervised learning of the labeled sample set X using class-balanced cross entropy loss in step 4.1, the cross entropy loss function correcting the probability value of the network branch to the i-th class prediction as

Wherein n is _i The number of samples of the marked sample set X of the ith class, η _i Representing the fraction of the network branches output for the ith class, n _j The number of samples of the marked sample set X for the j-th category.

And (4) performing unsupervised learning by adopting the regularization loss based on the network prediction consistency, wherein an average value of the network branch prediction probabilities of two different views in the same sample is used as a learning target of noise samples in the unmarked sample set U, and the aim is to hope that the network prediction of different views of the same sample is consistent. And the sharpening process is not performed on the distribution of the prediction probability, because in a training set with open world noise, the sample does not belong to any category in the data set, and the sharpening operation can force the model to make clear judgment on the sample, so that the accuracy of the model can be reduced.

According to steps 4.1 and 4.2, when the number of samples |x| of the marked sample set X is much larger than the number of samples |u| of the unmarked sample set U, the resampling operation is required for the unmarked sample set U until the number of samples |x| is the same as the number of samples |u|. The number of resampling operations is expressed asFor not overly highlighting the role of the unlabeled exemplar set U in training. Furthermore, to prevent model overfitting of the unlabeled exemplar set U, it is necessary to multiply the resampling coefficient +_ before the corresponding unlabeled loss term>

Referring to fig. 4, in step 5, the image to be processed is input into the trained model to correspond to the classification result of the output image, and the specific steps are as follows:

step 5.2, respectively outputting a group of component vectors through two network branches;

step 5.3, summing the two component vectors;

and 5.4, outputting the category corresponding to the dimension with the highest sum of the score vectors as a model prediction result.

Specifically, the image to be processed is output after preprocessing for obtaining the image. The preprocessing refers to zooming, cutting, turning and other means for obtaining an image, changing the obtained direction into a specified specification size or format, and inputting the preprocessed image to be processed into a trained model to execute training. The two network branches output a set of component vectors, respectively, with each dimension in the network branches corresponding to a category. And summing the score vectors output by the two network branches, and outputting the category output corresponding to the dimension with the highest sum of the score vectors.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A method of robust image classification on label noisy data, comprising the steps of:

2. The method for classifying images with robustness on noisy label data according to claim 1, wherein in step 1, the two different convolutional neural networks are pre-trained as network branches of the image classification model based on self-supervised learning, specifically comprising the following steps:

3. The method for classifying images as claimed in claim 2, wherein in step 2, the training data set is divided into a clean sample set, a closed-world noise sample set and an open-world noise sample set according to network branches, and the method specifically comprises the following steps:

4. A method of robust image classification on noisy label data according to claim 3, wherein: in step 4, the two network branches respectively divide and correct the clean sample set, the label corrected closed world noise sample set and the open world noise sample set by using each other for training.

5. The method for robust image classification on noisy label data according to claim 4, wherein: in step 4, training a model by combining the clean sample set, the closed world noise sample set and the open world noise sample set after label correction by the network branch with a back propagation algorithm, wherein the specific steps are as follows:

6. The method for robust image classification on noisy label data according to claim 5, wherein: performing supervised learning in step 4.1 using the class-balanced cross entropy loss function that corrects the probability value of the network branch to the i-th class prediction to be

7. The method for robust image classification on noisy label data according to claim 6, wherein: in step 4.2, performing unsupervised learning by adopting the regularization loss function based on network prediction consistency, using an average value of prediction probabilities of two different views in the same sample by the network branch as a learning target of a noise sample in the unmarked sample set U, and not performing sharpening processing on the distribution of the prediction probabilities.

8. The method for robust image classification on noisy label data according to claim 7, wherein: in step 4.1 and step 4.2, when the number of samples |x| of the marked sample set X is much larger than the number of samples |u| of the unmarked sample set U, a resampling operation is required for the unmarked sample set U until the number of samples |x| is the same as the number of samples |u|.

9. The method for robust image classification on noisy label data according to claim 8, wherein: multiplying the corresponding no-marker loss term by a resampling coefficient μ for preventing overfitting of the no-marker sample set U.

10. The method for robust image classification on noisy label data according to claim 9, wherein: in step 5, inputting the image to be processed into the trained model to correspondingly output the classification result of the image, wherein the specific steps are as follows:

step 5.3, summing the two groups of score vectors;