CN113159294A

CN113159294A - Sample selection algorithm based on companion learning

Info

Publication number: CN113159294A
Application number: CN202110458211.XA
Authority: CN
Inventors: 唐振民; 孙泽人; 姚亚洲
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-07-23

Abstract

The invention discloses a sample selection algorithm based on peer learning, which comprises the following steps: (1) training two deep convolutional neural networks simultaneously, inputting images into the two networks to perform forward calculation, predicting image types and calculating cross entropy loss respectively; (2) the two networks update the network parameters thereof by using samples with inconsistent prediction results; (3) the two networks select samples with low loss values from the samples with consistent predictions and update the network parameters of the other networks. According to the invention, the two partner sub-networks improve the final identification performance through self-learning (updating the network by using samples with inconsistent network predictions) and mutual communication (exchanging the network with consistent samples to update the network), and effectively solve the problem of label noise existing in the network image data set. In addition, the invention can effectively select samples beneficial to model training from the data set containing the label noise, and can be widely applied to various scene tasks with unreliable labels.

Description

Sample selection algorithm based on companion learning

Technical Field

The invention relates to a robust image identification method under the condition of unreliable network image dataset labels, in particular to a sample selection algorithm based on companion learning.

Background

Training image recognition models with network images is attracting the eye of more and more researchers. However, before training the model with a considerable amount of network data, the label noise problem that the network image data set cannot avoid is the first difficulty to be solved.

Due to the "memory effect" of the deep convolutional neural network, the noise label (i.e., the wrong label) of the image is "memorized" in the network training process, so that the model is fitted to the wrong label, and finally the performance of the model is reduced. Currently, methods for studying tag noise can be mainly classified into the following two categories.

The first type is a label (loss) correction method. This class of methods can be further divided into two sub-classes according to whether the object of correction is a label or a loss. The first method corrects the labels of the training data and solves the label noise problem by improving the original label quality of the data. A common approach is to correct the wrong tag by a clean tag prediction step. In this process, some extra clean data is sometimes needed to assist in model training. The second method is to make up for misleading of error labels in the model training process by directly correcting the loss or correcting the probability distribution used for calculating the loss.

The second type is a sample selection method. Intuitively, the simplest step to solve for the tag noise is to find out the noisy data, remove it, and then train the neural network with the remaining data. However, the difficulty with this type of approach is how to correctly pick out the noisy data without reliable tag supervision information. A more representative algorithm includes: MentorNet, Decoupling and Co-teaching. The MentorNet algorithm trains a MentorNet to supervise the training of the StudentNet, selects samples for the StudentNet through the MentorNet and endows corresponding high weight values to the samples with the possibly correct labels. The decorupling algorithm trains two neural networks simultaneously, and then optimized updating is carried out on network parameters by using data samples with inconsistent predictions. The Co-teaching algorithm trains two neural networks simultaneously, and in the training process, the two networks mutually use low-loss samples selected by the other side to learn and update network parameters. However, they all have their own problems. The MentorNet algorithm can be trained only in a self-packed mode under the condition that no verification set exists, and a predefined rule is used for selecting a sample, so that the problem of error accumulation exists; the decorupling algorithm cannot explicitly process noise, and the error accumulation problem also exists; the Co-teaching algorithm is trained by the network, the two networks finally tend to be consistent, and the whole model is finally degenerated into self-pacedMenttoNet.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides a sample selection algorithm based on peer learning, which can train two networks simultaneously, wherein the two networks mutually select clean samples for the other in samples with consistent prediction, filter various errors through mutual 'communication' to avoid the problem of accumulated errors, and simultaneously introduce a divergence sample to ensure that the two networks always keep divergence in the training process to obtain better performance.

The technical scheme is as follows: the sample selection algorithm based on peer learning comprises the following steps:

(1) simultaneously training two deep convolutional neural networks, namely partner networks, and respectively performing class prediction on input samples and calculating cross entropy loss by using the partner networks;

(2) constructing a sample set with inconsistent prediction through judging whether the predictions of the partner networks are different;

(3) constructing a sample set with consistent prediction by judging whether the predictions of the partner networks are the same or not, then selecting samples with low loss values from the samples, and updating the network parameters of the other party by using the samples;

(4) network of peers h₁Updating peer network h with samples of predicted disagreement₁Own network parameter, partner network h₂Updating the companion network h by using samples with low loss values in samples predicted to be consistent₂And the other side network parameters.

Preferably, in the step (1), the two deep convolutional neural networks trained simultaneously are BCNN networks h pre-trained on ImageNet₁And h₂Guarantee h by random initialization of the full connection layer₁And h₂The dissimilarity of; one-time batch productionThe training data input to the network is recorded as

Wherein m is the batch size; image x_iAnd its unreliable label y_iInput h₁And h₂Respectively obtaining corresponding class predictions

And

and cross entropy loss L_h1(x_i,y_i) And L_h2(x_i,y_i)。

Preferably, in step (2), the prediction via the peer network is different to construct a sample set with inconsistent prediction, which is specifically referred to as

G_dSamples with different labels are predicted for the two subnetworks.

As for a dual-sub network architecture, the prediction capability difference existing between sub networks is helpful for improving the overall model performance, the training data set is divided according to the consistency of the sub network prediction, and then the training samples G with inconsistent prediction results are obtained_dAs part of the sample set that ultimately participates in the network update.

Preferably, in step (3), the sample set with consistent prediction is constructed by determining whether the predictions of the peer networks are the same, which is specifically referred to as

G_sPredicting samples with consistent labels for the two sub-networks; subsequently, G is reacted_sThe samples in the network are drained in ascending order according to the size of cross entropy loss value, and the two networks respectively select a sample structure with lower loss value (1-d (T)) multiplied by 100 percent

And

expressed as:

wherein, | G_sIs set G_sThe number of training samples involved, d (T) being used to dynamically adjust

Number of samples in set

Is designed to be:

where ξ is the preset maximum discard rate, T_kIs the number of training rounds required to maximize the discard rate.

Due to G_dThe label noise in G is not explicitly processed, and the invention trains the model by maximally using the collected network data in order to reduce the negative influence caused by the label noise and maximally use the collected network data_sAfter the samples in the network are selected, the selected sample subset is used as the other part of the sample set which finally participates in the network updating; through the exchange process, two sub-networks with different characteristics and resolving power filter different errors from each other, and gradient errors generated by label noise in the training process of the sub-networks are prevented from being gradually accumulated in self-feedback.

Preferably, in step (4), the network of peers h₁Using G_dAnd

sample in (3) updates network parameters, companion network h₂Using G_dAnd

the sample in (1) updates the network parameter. Expressed as:

wherein the content of the first and second substances,

and

are respectively a network h₁And h₂Is the learning rate, lambda is the learning rate,

and

is the backtransmission gradient.

Has the advantages that: compared with the prior art, the invention has the following remarkable effects: (1) by exchanging and predicting low-loss partial samples in consistent samples, the companion network can filter different errors together in the 'exchange' training process, so that the problem that the training errors are accumulated continuously in the training process of the label noise is avoided; (2) by using samples with inconsistent prediction for model training, on one hand, the 'difficult' samples in the samples play a great role in the learning of the model characterization capability; on the other hand, when the samples with consistent predictions are used for 'exchange' training independently, the problem that the two network models are finally consistent occurs along with the training of the network, and the introduction of the samples with inconsistent predictions for the training can ensure the dissimilarity between the partner networks, so that the overall model performance is improved finally.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is an overall architecture diagram of the present invention;

FIG. 3 is a sample selection schematic of the present invention.

Detailed Description

The present invention will be described in detail with reference to examples.

As shown in fig. 1, the sample selection algorithm based on peer learning includes the following steps:

as shown in fig. 2, two BCNN networks are trained simultaneously, the convolutional layers of the two networks are pre-trained on ImageNet, and the dissimilarity of the two networks is ensured by randomly initializing the full-link layers of the two networks. And crawling the corresponding network images from the Bing image search engine by using class names of 200 types of birds in the CUB200-2011 data set, and finally obtaining 18388 network images after re-filtering. These network images were used as a training set, and 5794 test images of CUB200-2011 were used as a test set. The images were input into two BCNN networks in batches.

Specifically, the pretreatment method comprises the following steps: the image is resized to 448 the short side while maintaining the aspect ratio, then randomly flipped horizontally, and finally randomly cropped to 448 x 448 the size. As shown in FIG. 2, the preprocessed images are input into two BCNNs, and the training data input into the two BCNNs in batches at one time is recorded as

Where m is the batch size. Image x_iAnd its unreliable label y_iInput h₁And h₂In the method, the class prediction corresponding to the image is obtained

And

and cross entropy loss L_h1(x_i,y_i) And L_h2(x_i,y_i)。

as shown in fig. 2, a sample set with inconsistent predictions is constructed by whether the predictions of the peer networks are different, and is represented as:

wherein G is_dSamples with different labels are predicted for the two subnetworks.

constructing a sample set with consistent prediction through whether the predictions of the partner network are the same or not, and specifically comprising the following steps:

wherein G is_sSamples with consistent labels are predicted for both subnetworks. Subsequently, G is reacted_sThe samples in the network are drained in ascending order according to the size of cross entropy loss value, and the two networks respectively select a sample structure with lower loss value (1-d (T)) multiplied by 100 percent

And

expressed as:

Number of samples in set

Is designed to be:

where ξ is the preset maximum drop rate, set at 0.25. T is_kIs the number of training rounds required to maximize the discard rate, set to 10.

Network of peers h₁Using G_dAnd

sample in (3) updates network parameters, companion network h₂Using G_dAnd

the sample in (1) updates the network parameters, expressed as:

wherein the content of the first and second substances,

and

and

is the backtransmission gradient.

The hyper-parameters are set as: adam is selected as an optimizer, and a two-stage training strategy is adopted: the first stage, the parameter of the convolution layer of the network is frozen, only the full connection layer is updated, and the learning rate and the batch size in the first stage are respectively set to be 0.001 and 64; in the second stage, parameters of all layers participate in optimization updating, and the learning rate and the batch size in the second stage are respectively set to be 0.0001 and 32. The first phase is trained for 100 rounds and the second phase is trained for 200 rounds.

After the training is finished, testing by using the test set, inputting the test image into the trained deep neural network for image recognition, and finally obtaining image classification prediction. The excellent effect of the present invention on sample selection is shown in fig. 3.

Comparing the effect of the tag noise robust learning algorithm with the effect of the following 5 advanced tag noise robust learning algorithms, and adopting Average Classification Accuracy (ACA) as an evaluation index of identification, the higher the ACA value is, the more excellent the identification effect is. The 5 advanced label noise robust learning algorithms are as follows:

[1]Malach E,Shalev-shwartz S.Decoupling”when to update”from”how to update”[C]//Proceedings ofthe Advances in Neural Information Processing Systems(NeurIPS).2017:960–970.

[2]Jiang L,Zhou Z,Leung T,et al.Mentornet:Learning data-driven curriculum for very deep neural networks on corrupted labels[C]//Proceedings of the International Conference on Machine Learning(ICML).2017:2309––2318.

[3]Han B,Yao Q,Yu X,et al.Co-teaching:Robust training of deep neural networks with extremely noisy labels[C]//Proceedings of the Advances in Neural Information Processing Systems(NeurIPS).2018:8527–8537.

[4]Yu X,Han B,Yao J,et al.How does Disagreement Help Generalization against Label Corruption？[C]//Proceedings of the International Conference on Machine Learning(ICML).2019:7164–7173.

[5]Wei H,Feng L,Chen X,et al.Combating noisy labels by agreement:Ajoint training method with co-regularization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2020:13726–13735.

TABLE 1 image recognition Performance comparison

Method	Backbone network	ACA(％)
			Decoupling[1]	BCNN	70.56
MentorNet[2]	BCNN	71.16
			Co-teaching[3]	BCNN	73.85
Co-teaching+[4]	BCNN	69.91
			JoCoR[5]	BCNN	75.29
The invention	BCNN	76.48

As can be seen from table 1, the present invention improves the recognition performance of the entire model by training two networks simultaneously, and two partner networks update the networks through "self-learning" (updating the networks by using samples whose network predictions are inconsistent) and "intercommunion" (exchanging the networks to predict samples whose network predictions are consistent).

Claims

1. A sample selection algorithm based on peer learning, characterized in that: the method comprises the following steps:

(1) training two deep convolution neural networks simultaneously, namely a companion network h₁And h₂The peer network respectively carries out category prediction on the input samples and calculates cross entropy loss;

(4) network of peersh₁Updating peer network h with samples of predicted disagreement₁Own network parameter, partner network h₂Updating the companion network h by using samples with low loss values in samples predicted to be consistent₂And the other side network parameters.

2. The peer-learning based sample selection algorithm of claim 1, wherein: in the step (1), the simultaneous training of the two deep convolutional neural networks is performed by using a BCNN (binary-coded neural network) h pre-trained on ImageNet₁And h₂Guarantee h by random initialization of the full connection layer₁And h₂The dissimilarity of; the training data input into the network in one batch is recorded as

And

and cross entropy loss L_h1(x_i，y_i) And L_h2(x_i，y_i)。

3. The peer-learning based sample selection algorithm of claim 1, wherein: in the step (2), a sample set with inconsistent predictions is constructed by judging whether the predictions of the peer networks are different, and is specifically referred to as

G_dSamples with different labels are predicted for the two subnetworks.

4. The peer-learning based sample selection algorithm of claim 1, characterized in thatCharacterized in that: in the step (3), a sample set with consistent prediction is constructed by judging whether the predictions of the peer networks are the same or not, and is specifically referred to as