CN114881125A

CN114881125A - Label noisy image classification method based on graph consistency and semi-supervised model

Info

Publication number: CN114881125A
Application number: CN202210433807.9A
Authority: CN
Inventors: 佟子业; 惠维; 赵鲲; 峁子富; 郑艳
Original assignee: Xi'an Heshuo Logistics Technology Co ltd
Current assignee: Xi'an Heshuo Logistics Technology Co ltd
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2022-08-09

Abstract

The invention relates to the technical field of computer vision and artificial intelligence, in particular to a label noisy image classification method based on graph consistency and a semi-supervised model. S1: training the total noise label data in the training set, and initializing a model; s2: obtaining the distribution of the sample; s3: classifying and screening the samples; s4: performing different data enhancement on the image; s5: sending the enhanced image into a semi-supervised model for training; s6: carrying out graph coding to obtain a consistency graph; s7: jointly optimizing the model, and updating the sample label; s8: and taking the semi-supervised model as an inference model. The invention mainly solves the problems that the existing semi-supervised method based on classification consistency training only faces very serious confidence deviation and remembers a large number of noise labels, errors are accumulated and a model is damaged. The graph consistency model can effectively resist noise label memory and obviously improve the model performance.

Description

Label noisy image classification method based on graph consistency and semi-supervised model

Technical Field

The invention relates to the technical field of computer vision and artificial intelligence, in particular to a label noisy image classification method based on graph consistency and a semi-supervised model.

Background

Deep learning today has great success without leaving a large amount of correctly labeled data, however, labeling of these large-scale datasets often requires a great expenditure of labor, money and time, and even some data (e.g., medical images) are difficult for experts to correctly classify the data. Moreover, the current computer vision research has a plurality of scenes, and the high data set collection cost brings huge challenges to model training in specific scenes.

In order to solve the problem of excessive cost in collecting accurate marking data, two low-cost schemes, namely Crowdsourcing (crowdsource) and Web Query (Web Query), become the choices of enterprises and large-scale organizations in a relatively inclined manner. Crowdsourcing schemes employ a large number of inexpensive workers, more efficiently and at a lower cost than expert annotation teams, but the efficient consequence is that annotation quality is worse than expert teams; the latter relies on the massive data and the powerful search ability of the network search engine, can obtain a large amount of image data marked by users from open platforms such as google, Baidu and the like in a short time. Although the two methods remarkably improve the efficiency of image annotation, the image annotation inevitably has a large number of errors. Such labels that do not match the true labels (group-truth labels), we call noise labels (noise labels), and the labels of these images do not match their true semantic information. In recent years, noise label learning has been widely studied as an important subject.

The noise label Learning algorithm (Learning from noise labels) means that under the condition that a certain amount of error labels exist in a data set, data containing noise labels are effectively used for training, negative effects of the data containing noise labels are eliminated, and related tasks are well completed. The accuracy of the model can be improved through the learning of the noise label data. On the other hand, the deep neural network can fit all the training labels due to its huge capacity, but when the deep neural network is trained under noise supervision, the model inevitably fits all the noise labels, which brings great damage to the generalization of the model. Models tend to remember noise labels during training, resulting in severe impairment of the performance of deep learning models. Data tag errors not only fail to provide useful information but also interfere with the overall task, so the more accurate the noise tag learning, the more accurate the task execution will be, and the more positive the task execution will be affected. In the practical application process, tag noise is a common problem in data concentration, data always has error tags and is very large in quantity, and risks caused by the error tags are difficult to avoid by using a uniform mode, so that how to effectively use the noise for training and eliminate negative effects of the noise is a significant direction.

The most prominent method in the field of noise label learning at present is to combine a specific sample selection strategy with a specific semi-supervised learning model, so that the model is effectively prevented from being over-fitted with noise data, and the generation performance is further improved. The SELF model (Learning to filter noise labels with SELF-ensemble) is combined with a semi-supervised Learning method to gradually filter out examples of false marks from noisy data. By keeping a running average model called mean-teacher (mean teachers are roller models) as a backbone, it obtains self-integrated predictions for all training examples, and then progressively removes examples for which the integrated predictions are inconsistent with their annotation tags. The method further exploits the unsupervised loss of instances not included in the selected clean set. The RoCL model (Robust curriculum learning) employs a two-stage learning strategy: supervised training is performed on the selected clean examples, and then semi-supervised learning is performed on the re-labeled noise examples through self-supervision. For selection and re-labeling, it computes an exponential moving average of the losses in the training iterations. The SELFIE model (reflecting unclean samples for robust deep learning) is a hybrid method of sample selection and loss correction that corrects for loss of refurbished samples (i.e., loss correction) and then is used with loss of small loss samples (i.e., sample selection). Therefore, more training samples need to be considered to update the deep neural network. However, these semi-supervised approaches trained solely on class consistency face very serious confidence bias problems and remember a large number of noise labels, errors accumulate and impair the performance of the model.

Disclosure of Invention

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

the label noisy image classification method based on graph consistency and a semi-supervised model comprises the following steps:

s1: training the total noise label data in the training set, and initializing a model;

s2: obtaining the distribution of the samples according to the initialization model in the step S1, and then dividing the samples into correct-label samples and incorrect-label samples according to the distribution of the samples by adopting a Gaussian mixture model;

s3: classifying and screening the sample obtained in the step S2: the correct sample of the label is reserved as a labeled image, and the wrong sample of the label is erased as a non-labeled image; transmitting both the labeled image and the unlabeled image into a semi-supervised model;

s4: performing different data enhancements on the tagged image and the untagged image;

s5: sending the enhanced labeled image and the enhanced unlabeled image into a semi-supervised model for training;

s6: carrying out graph coding on a result obtained by the semi-supervised model training in the step S5 to obtain a consistency graph;

s7: regularization common optimization model is adopted for class distribution consistency and graph consistency, and sample labels are updated;

s8: and taking the semi-supervised model obtained by the constraint model in the step S7 as an inference model, and inputting the image data of the test set into the semi-supervised model to obtain a noise label data classification result.

Preferably, the ResNet50 model training is performed using the full amount of data containing noise labels; reading a label correct sample and a label error sample in a first epoch of a model training phase; enhancing the data, wherein the enhancing is data enhancing by turning, cutting and shifting; the enhanced data is sent to a ResNet50 model, and two ResNet50 models are trained simultaneously; the model outputs probability distribution, cross entropy loss function is calculated according to the probability distribution, and back propagation is carried out, so that the model is trained, and model parameters are stored.

Preferably, the probability distribution output by the model is input into a gaussian mixture model, and the distribution calculation is performed by the gaussian mixture model, and the gaussian mixture model can be represented by the following formula:

two clusters are distributed by two-dimensional gaussians, the component number K is 2, and the following conditions are met:

wherein, N (x | mu) _k ，∑ _k ) Comprises the following steps: the kth component (component) in the mixture model; pi _k Comprises the following steps: each component N (x | μ) _k ，∑ _k ) The weight of (c).

Preferably, the step S3 includes:

s301, for p (x) obtained in the step three, if the p (x) is more than or equal to 0.5, the label is correct, and the original label is saved; if p (x) is less than 0.5, its tag is erroneous and the tag is not saved;

s302, sending the labeled image and the unlabeled image to a semi-supervised model together for training;

preferably, the step S4 includes:

s401, performing weak enhancement operation on the labeled image, and performing data enhancement on the unlabeled image by respectively adopting a weak enhancement technology and an AutoAutoAutoAutoaugmentation technology, wherein the weak enhancement operation comprises turning, cutting and shifting data enhancement.

S402, simultaneously carrying out mixup operation on the labeled image and the unlabeled image; mixup can be represented by the following formula:

wherein x is _i And x _j Representing an input image, and fusing two pictures; y is _i And y _j Labels representing input images, wherein the labels are in a single-hot encoding format and are fused; λ is a fusion ratio ranging from 0 to 1.

Preferably, the step S5 includes:

s501, for the enhanced labeled image, training a labeled module serving as a semi-supervised model by using a ResNet50 model, and using a double-network structure;

s502, for the enhanced unlabeled images, an unlabeled module serving as a semi-supervised model is trained by using a ResNet50 model, and a double-network structure is used, wherein a group of weakly enhanced unlabeled images generate a probability distribution result after being trained by using a ResNet50 model, and the unlabeled images enhanced by the AutoAutoAutoAutoAutoAutoAutoAutomation technology generate a probability distribution representation, an embedding representation and a feature map level representation after being trained by using a ResNet50 model.

Preferably, the step S6 includes:

s601, carrying out autocorrelation calculation on the multi-scale results obtained by the label-free image training after the AutoAutoAutoAutoAutoAutomation technology is enhanced in the step 5, and firstly, expressing C by using probability distribution _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

Use of embedding to denote E _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

Using a profile level representation F _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

Wherein τ is a temperature hyperparameter;

s602, pair

And (3) carrying out normalization treatment:

wherein the content of the first and second substances,

representing the similarity degree of the ith sample and the jth sample in the dimensions of class distribution, embedding and feature graph, and B represents Batchsize;

s603, performing autocorrelation calculation on the probability distribution representations obtained by training the other batch of weakly enhanced unlabeled images in the step 5, and firstly, using the probability distribution representation C _U2 And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

It is also normalized:

at this time, the process of the present invention,

as a pseudo label graph.

Preferably, the step S7 includes:

s701, for the labeled samples, calculating the category consistency of the labeled samples by applying cross entropy loss

The cross entropy loss calculation method is as follows:

wherein y is a real label;

encoding a one-hot of the prediction label; n is the number of samples; m is the number of categories.

S702, calculating the category consistency by using the L2 loss, wherein the L2 loss calculation mode is as follows:

wherein p is ₀ (y ⁱ |Aug ₂ (u ⁱ ) A parameter θ representing the probability of class distribution of the model after softmax;

s703, carrying out graph consistency calculation on the unlabeled samples, wherein the probability distribution graph and the pseudo label graph consistency calculation mode is as follows:

the consistency of the embedding graph and the pseudo label graph is calculated as follows:

the consistency of the feature map and the pseudo label map is calculated as follows:

and finally adding to obtain a graph consistency regularization term:

s704, the final loss is the sum of the above losses:

compared with the prior art, the invention has the following beneficial effects:

the invention provides a label noisy image classification method based on graph consistency and a semi-supervised model. The ResNet50 model training is carried out by using the full data containing the noise label in the first epoch, so that the characteristics of the full data can be learned to obtain better initial parameters of the model, and the training of the subsequent epochs is facilitated. Secondly, the invention adopts a Gaussian Mixture Model (GMM) to divide whether the sample label is correct or not, the label is reserved for the sample with the correct label, otherwise, the label is erased, and then the semi-supervised Model is used for learning the label sample and the sample without the label. The Gaussian mixture model can be accurately divided according to the distribution of the samples, the labels of the samples with wrong labels are erased, and the training is carried out by adopting a semi-supervised learning mode, so that the function of data can be fully exerted, the memory of the model to the wrong labels is reduced, and the model performance is improved. Finally, the present invention designs a graph consistency model. Semi-supervised approaches based on classification consistency training alone face very serious confidence bias problems and remember a large number of noise labels, errors accumulate and impair the performance of the model. The graph consistency model can effectively resist noise label memory and remarkably improve the model performance.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a block diagram of a model for unlabeled exemplar training of a semi-supervised training module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1-2, the label noisy image classification method based on graph consistency and semi-supervised model includes the following steps:

s1, training a ResNet50 model by using the full data containing the noise label;

s101, reading all training set data in the first epoch of the model training stage, wherein the training set data comprises label correct samples and label error samples.

And S102, performing data enhancement, wherein the enhancement refers to weak enhancement operations such as turning, clipping and shifting data enhancement.

S103, sending the enhanced data to a ResNet50 model, and training two ResNet50 models simultaneously, wherein for the problem of noise label learning, a double-network structure can effectively resist wrong label memory and prevent overfitting, and the model training structure is more stable.

And S104, outputting probability distribution by the model, calculating a cross entropy loss function according to the probability distribution, and performing back propagation so as to train the model. The cross entropy loss function is calculated as follows:

wherein y is a real label;

And S105, saving the model parameters.

S2, dividing the samples by adopting a Gaussian mixture model to obtain samples with correct labels and samples with wrong labels;

s201, inputting the probability distribution of the model output into a Gaussian mixture model.

S202, performing distribution calculation by using a gaussian mixture model, where the gaussian mixture model can be represented by the following formula:

wherein, N (x | mu) _k ，∑ _k ) Referred to as the kth component (component) in the mixture model. To partition out the samples with correct labels and the samples with wrong labels, we distribute two clusters with two-dimensional gaussians, so that the component number K is 2, pi _k Is a mixing coefficient (mix coefficient) and satisfies:

can be regarded as pi _k Is that each component N (x | mu) _k ，∑ _k ) The weight of (c).

S3, dividing samples of the Gaussian mixture model, keeping labels for the samples with correct labels as labeled data, erasing the labels for the labels with wrong labels as unlabeled data, and sending the labeled data and the unlabeled data into the semi-supervised model;

s301, regarding p (x) obtained in the step three, if the p (x) is more than or equal to 0.5, the label is considered to be correct, and storing the original label; if p (x) is less than 0.5, we consider its tag to be erroneous and do not save the tag;

s4, adopting an AutoAutoAutoAutoAutoAutoAutoAutoAutoaugmentation technology to enhance the image and simultaneously carrying out mixup operation on the data;

s401, performing different data enhancement on the labeled data and the unlabeled data. And performing weak enhancement operation on the image with the label, wherein the enhancement refers to performing weak enhancement operation such as flipping, cropping and shifting data enhancement. And performing data enhancement on the image without the label by respectively adopting the weak enhancement technology and the AutoAutoAutoAutoAutoAutoAutoaugmentation technology, and obtaining the image with the label after the weak enhancement, the image without the label after the weak enhancement and the image without the label after the Autoaugmentation technology enhancement.

S402, in order to relieve overfitting of the error label image, a mixup operation is applied, and the mixup is a data enhancement mode, so that discrete sample space can be subjected to serialization, and smoothness in a neighborhood is improved. The Mixup may be represented by the following formula:

wherein x is _i And x _j Representing an input image, and fusing two pictures; y is _i And y _j Labels representing input images, wherein the labels are in a single-hot encoding format and are fused; lambda is a fusion ratio value range from 0 to 1, and is calculated according to beta distribution (alpha and beta are equal when calculated, namely both alpha are taken).

S5, training the enhanced data by using a semi-supervised model;

s501, for the enhanced labeled image, a labeled module serving as a semi-supervised model is trained by using a ResNet50 model, and a double-network structure is used. The double-network structure can effectively enhance the robustness of the model to noise labels.

S502, for the enhanced unlabeled image, the unlabeled module serving as a semi-supervised model is trained by using a ResNet50 model, and a double-network structure is used. The weakly enhanced group of unlabeled images generate probability distribution results after being trained by using a ResNet50 model, and the unlabeled images enhanced by the AutoAutoAutoaugmentation technology generate probability distribution representation, embedding representation and feature map level representation after being trained by using a ResNet50 model. The result with more dimensions is generated, more levels of information can be provided for classification of the user, and the method is beneficial to better avoid remembering wrong labels and carrying out wrong fitting on the wrong labels under the condition of a noise image so as to influence the effect of the model.

S6, carrying out graph coding on the result obtained by semi-supervision to obtain a consistency graph, please refer to FIG. 2;

s601, carrying out autocorrelation calculation on multi-scale results obtained by training a batch of label-free images enhanced by the AutoAutoAutoAutoAutoAutoAutomation technology in step five, and firstly, expressing C by using probability distribution _U And its transposed matrix

Calculating the similarity to obtain an autocorrelation graph

Use of embedding to denote F _U And its transposed matrix

Carrying out similarity calculation to obtainAutocorrelation graph

Using a profile level representation F _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

Where τ is a temperature hyperparameter.

S602, we are right

And (3) carrying out normalization treatment:

wherein, the first and the second end of the pipe are connected with each other,

representing the similarity degree of the ith sample and the jth sample in the dimensions of class distribution, embedding and feature diagram, B representingBatchsize。

S603, carrying out autocorrelation calculation on the probability distribution representations obtained by training of the other batch of weakly enhanced unlabeled images in the step five, and firstly representing C by using the probability distribution _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

We also normalized it:

at this time, the process of the present invention,

as a pseudo label graph.

S7, regularizing a common optimization model by adopting category consistency and graph consistency;

See S104 for a cross entropy loss calculation.

in the above formula p _θ (y ⁱ |Aug ₂ (u ⁱ ) Represents the class of the model after softmaxA distribution likelihood parameter θ;

s703, carrying out graph consistency calculation on the unlabeled samples, wherein the calculation mode of the consistency of the probability distribution graph and the pseudo label graph is as follows:

and finally adding to obtain a graph consistency regularization term:

s704, the final loss is the sum of the above losses:

and S8, inputting the image data of the test set into the semi-supervised model as an inference model to obtain the result of noise label image classification.

Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

Claims

1. The label noisy image classification method based on graph consistency and a semi-supervised model is characterized by comprising the following steps of:

the method comprises the following steps:

2. The label noisy image classification method based on graph consistency and semi-supervised model according to claim 1, characterized by: training the ResNet50 model using the full amount of data containing noise labels; reading a label correct sample and a label error sample in a first epoch of a model training phase; enhancing the data, wherein the enhancing is data enhancing by turning, cutting and shifting; the enhanced data is sent to a ResNet50 model, and two ResNet50 models are trained simultaneously; and (3) outputting probability distribution by the model, calculating a cross entropy loss function according to the probability distribution, and performing back propagation so as to train the model and store model parameters.

3. The label noisy image classification method based on graph consistency and semi-supervised model according to claim 2, characterized by: inputting the probability distribution output by the model into a Gaussian mixture model, and performing distribution calculation through the Gaussian mixture model, wherein the Gaussian mixture model can be represented by the following formula:

wherein N (x | mu) _k ，∑ _k ) Comprises the following steps: the kth component (component) in the mixture model; pi _k Comprises the following steps: each component N (x | mu) _k ，∑ _k ) The weight of (c).

4. The method for classifying label noisy images based on graph consistency and semi-supervised model according to claim 3, characterized in that: the step S3 includes:

s302, the labeled image and the unlabeled image are sent to a semi-supervised model together for training.

5. The label noisy image classification method based on graph consistency and semi-supervised model according to claim 1, characterized by: the step S4 includes:

s401, performing weak enhancement operation on the labeled image, and performing data enhancement on the unlabeled image by respectively adopting a weak enhancement technology and an AutoAutoAutoAutoaugmentation technology, wherein the weak enhancement operation comprises turning, cutting and shifting data enhancement;

s402, simultaneously carrying out mixup operation on the labeled image and the unlabeled image; the Mixup may be represented by the following formula:

6. The label noisy image classification method based on graph consistency and semi-supervised model according to claim 1, characterized by: the step S5 includes:

7. The method for classifying label noisy images based on graph consistency and semi-supervised model according to claim 1, characterized in that: the step S6 includes:

Carrying out similarity calculation to obtain an autocorrelation graph

Use of embedding to denote E _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

Using a profile level representation F _U And its transposed matrix

Carrying out similarity calculation to obtain an autocorrelation graph

Wherein t is a temperature hyperparameter;

s602, pair

And (3) carrying out normalization treatment:

wherein the content of the first and second substances,

Carrying out similarity calculation to obtain an autocorrelation graph

It is also normalized:

at this time, the process of the present invention,

as a pseudo label graph.

8. The method for classifying label noisy images based on graph consistency and semi-supervised model according to claim 1, characterized in that: the step S7 includes:

The cross entropy loss calculation method is as follows:

wherein y is a real label;

wherein p is _θ (y ⁱ ∣Aug ₂ (u ⁱ ) A parameter θ representing the probability of class distribution of the model after softmax;

the consistency between the embedding graph and the pseudo label graph is calculated as follows:

and finally adding to obtain a graph consistency regularization term:

s704, the final loss is the sum of the above losses: