CN113159294A - Sample selection algorithm based on companion learning - Google Patents
Sample selection algorithm based on companion learning Download PDFInfo
- Publication number
- CN113159294A CN113159294A CN202110458211.XA CN202110458211A CN113159294A CN 113159294 A CN113159294 A CN 113159294A CN 202110458211 A CN202110458211 A CN 202110458211A CN 113159294 A CN113159294 A CN 113159294A
- Authority
- CN
- China
- Prior art keywords
- network
- samples
- networks
- peer
- predictions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a sample selection algorithm based on peer learning, which comprises the following steps: (1) training two deep convolutional neural networks simultaneously, inputting images into the two networks to perform forward calculation, predicting image types and calculating cross entropy loss respectively; (2) the two networks update the network parameters thereof by using samples with inconsistent prediction results; (3) the two networks select samples with low loss values from the samples with consistent predictions and update the network parameters of the other networks. According to the invention, the two partner sub-networks improve the final identification performance through self-learning (updating the network by using samples with inconsistent network predictions) and mutual communication (exchanging the network with consistent samples to update the network), and effectively solve the problem of label noise existing in the network image data set. In addition, the invention can effectively select samples beneficial to model training from the data set containing the label noise, and can be widely applied to various scene tasks with unreliable labels.
Description
Technical Field
The invention relates to a robust image identification method under the condition of unreliable network image dataset labels, in particular to a sample selection algorithm based on companion learning.
Background
Training image recognition models with network images is attracting the eye of more and more researchers. However, before training the model with a considerable amount of network data, the label noise problem that the network image data set cannot avoid is the first difficulty to be solved.
Due to the "memory effect" of the deep convolutional neural network, the noise label (i.e., the wrong label) of the image is "memorized" in the network training process, so that the model is fitted to the wrong label, and finally the performance of the model is reduced. Currently, methods for studying tag noise can be mainly classified into the following two categories.
The first type is a label (loss) correction method. This class of methods can be further divided into two sub-classes according to whether the object of correction is a label or a loss. The first method corrects the labels of the training data and solves the label noise problem by improving the original label quality of the data. A common approach is to correct the wrong tag by a clean tag prediction step. In this process, some extra clean data is sometimes needed to assist in model training. The second method is to make up for misleading of error labels in the model training process by directly correcting the loss or correcting the probability distribution used for calculating the loss.
The second type is a sample selection method. Intuitively, the simplest step to solve for the tag noise is to find out the noisy data, remove it, and then train the neural network with the remaining data. However, the difficulty with this type of approach is how to correctly pick out the noisy data without reliable tag supervision information. A more representative algorithm includes: MentorNet, Decoupling and Co-teaching. The MentorNet algorithm trains a MentorNet to supervise the training of the StudentNet, selects samples for the StudentNet through the MentorNet and endows corresponding high weight values to the samples with the possibly correct labels. The decorupling algorithm trains two neural networks simultaneously, and then optimized updating is carried out on network parameters by using data samples with inconsistent predictions. The Co-teaching algorithm trains two neural networks simultaneously, and in the training process, the two networks mutually use low-loss samples selected by the other side to learn and update network parameters. However, they all have their own problems. The MentorNet algorithm can be trained only in a self-packed mode under the condition that no verification set exists, and a predefined rule is used for selecting a sample, so that the problem of error accumulation exists; the decorupling algorithm cannot explicitly process noise, and the error accumulation problem also exists; the Co-teaching algorithm is trained by the network, the two networks finally tend to be consistent, and the whole model is finally degenerated into self-pacedMenttoNet.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a sample selection algorithm based on peer learning, which can train two networks simultaneously, wherein the two networks mutually select clean samples for the other in samples with consistent prediction, filter various errors through mutual 'communication' to avoid the problem of accumulated errors, and simultaneously introduce a divergence sample to ensure that the two networks always keep divergence in the training process to obtain better performance.
The technical scheme is as follows: the sample selection algorithm based on peer learning comprises the following steps:
(1) simultaneously training two deep convolutional neural networks, namely partner networks, and respectively performing class prediction on input samples and calculating cross entropy loss by using the partner networks;
(2) constructing a sample set with inconsistent prediction through judging whether the predictions of the partner networks are different;
(3) constructing a sample set with consistent prediction by judging whether the predictions of the partner networks are the same or not, then selecting samples with low loss values from the samples, and updating the network parameters of the other party by using the samples;
(4) network of peers h1Updating peer network h with samples of predicted disagreement1Own network parameter, partner network h2Updating the companion network h by using samples with low loss values in samples predicted to be consistent2And the other side network parameters.
Preferably, in the step (1), the two deep convolutional neural networks trained simultaneously are BCNN networks h pre-trained on ImageNet1And h2Guarantee h by random initialization of the full connection layer1And h2The dissimilarity of; one-time batch productionThe training data input to the network is recorded asWherein m is the batch size; image xiAnd its unreliable label yiInput h1And h2Respectively obtaining corresponding class predictionsAndand cross entropy loss Lh1(xi,yi) And Lh2(xi,yi)。
Preferably, in step (2), the prediction via the peer network is different to construct a sample set with inconsistent prediction, which is specifically referred to asGdSamples with different labels are predicted for the two subnetworks.
As for a dual-sub network architecture, the prediction capability difference existing between sub networks is helpful for improving the overall model performance, the training data set is divided according to the consistency of the sub network prediction, and then the training samples G with inconsistent prediction results are obtaineddAs part of the sample set that ultimately participates in the network update.
Preferably, in step (3), the sample set with consistent prediction is constructed by determining whether the predictions of the peer networks are the same, which is specifically referred to asGsPredicting samples with consistent labels for the two sub-networks; subsequently, G is reactedsThe samples in the network are drained in ascending order according to the size of cross entropy loss value, and the two networks respectively select a sample structure with lower loss value (1-d (T)) multiplied by 100 percentAndexpressed as:
wherein, | GsIs set GsThe number of training samples involved, d (T) being used to dynamically adjustNumber of samples in setIs designed to be:
where ξ is the preset maximum discard rate, TkIs the number of training rounds required to maximize the discard rate.
Due to GdThe label noise in G is not explicitly processed, and the invention trains the model by maximally using the collected network data in order to reduce the negative influence caused by the label noise and maximally use the collected network datasAfter the samples in the network are selected, the selected sample subset is used as the other part of the sample set which finally participates in the network updating; through the exchange process, two sub-networks with different characteristics and resolving power filter different errors from each other, and gradient errors generated by label noise in the training process of the sub-networks are prevented from being gradually accumulated in self-feedback.
Preferably, in step (4), the network of peers h1Using GdAndsample in (3) updates network parameters, companion network h2Using GdAndthe sample in (1) updates the network parameter. Expressed as:
wherein the content of the first and second substances,andare respectively a network h1And h2Is the learning rate, lambda is the learning rate,andis the backtransmission gradient.
Has the advantages that: compared with the prior art, the invention has the following remarkable effects: (1) by exchanging and predicting low-loss partial samples in consistent samples, the companion network can filter different errors together in the 'exchange' training process, so that the problem that the training errors are accumulated continuously in the training process of the label noise is avoided; (2) by using samples with inconsistent prediction for model training, on one hand, the 'difficult' samples in the samples play a great role in the learning of the model characterization capability; on the other hand, when the samples with consistent predictions are used for 'exchange' training independently, the problem that the two network models are finally consistent occurs along with the training of the network, and the introduction of the samples with inconsistent predictions for the training can ensure the dissimilarity between the partner networks, so that the overall model performance is improved finally.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an overall architecture diagram of the present invention;
FIG. 3 is a sample selection schematic of the present invention.
Detailed Description
The present invention will be described in detail with reference to examples.
As shown in fig. 1, the sample selection algorithm based on peer learning includes the following steps:
(1) simultaneously training two deep convolutional neural networks, namely partner networks, and respectively performing class prediction on input samples and calculating cross entropy loss by using the partner networks;
as shown in fig. 2, two BCNN networks are trained simultaneously, the convolutional layers of the two networks are pre-trained on ImageNet, and the dissimilarity of the two networks is ensured by randomly initializing the full-link layers of the two networks. And crawling the corresponding network images from the Bing image search engine by using class names of 200 types of birds in the CUB200-2011 data set, and finally obtaining 18388 network images after re-filtering. These network images were used as a training set, and 5794 test images of CUB200-2011 were used as a test set. The images were input into two BCNN networks in batches.
Specifically, the pretreatment method comprises the following steps: the image is resized to 448 the short side while maintaining the aspect ratio, then randomly flipped horizontally, and finally randomly cropped to 448 x 448 the size. As shown in FIG. 2, the preprocessed images are input into two BCNNs, and the training data input into the two BCNNs in batches at one time is recorded asWhere m is the batch size. Image xiAnd its unreliable label yiInput h1And h2In the method, the class prediction corresponding to the image is obtainedAndand cross entropy loss Lh1(xi,yi) And Lh2(xi,yi)。
(2) Constructing a sample set with inconsistent prediction through judging whether the predictions of the partner networks are different;
as shown in fig. 2, a sample set with inconsistent predictions is constructed by whether the predictions of the peer networks are different, and is represented as:
wherein G isdSamples with different labels are predicted for the two subnetworks.
(3) Constructing a sample set with consistent prediction by judging whether the predictions of the partner networks are the same or not, then selecting samples with low loss values from the samples, and updating the network parameters of the other party by using the samples;
constructing a sample set with consistent prediction through whether the predictions of the partner network are the same or not, and specifically comprising the following steps:
wherein G issSamples with consistent labels are predicted for both subnetworks. Subsequently, G is reactedsThe samples in the network are drained in ascending order according to the size of cross entropy loss value, and the two networks respectively select a sample structure with lower loss value (1-d (T)) multiplied by 100 percentAndexpressed as:
wherein, | GsIs set GsThe number of training samples involved, d (T) being used to dynamically adjustNumber of samples in setIs designed to be:
where ξ is the preset maximum drop rate, set at 0.25. T iskIs the number of training rounds required to maximize the discard rate, set to 10.
(4) Network of peers h1Updating peer network h with samples of predicted disagreement1Own network parameter, partner network h2Updating the companion network h by using samples with low loss values in samples predicted to be consistent2And the other side network parameters.
Network of peers h1Using GdAndsample in (3) updates network parameters, companion network h2Using GdAndthe sample in (1) updates the network parameters, expressed as:
wherein the content of the first and second substances,andare respectively a network h1And h2Is the learning rate, lambda is the learning rate,andis the backtransmission gradient.
The hyper-parameters are set as: adam is selected as an optimizer, and a two-stage training strategy is adopted: the first stage, the parameter of the convolution layer of the network is frozen, only the full connection layer is updated, and the learning rate and the batch size in the first stage are respectively set to be 0.001 and 64; in the second stage, parameters of all layers participate in optimization updating, and the learning rate and the batch size in the second stage are respectively set to be 0.0001 and 32. The first phase is trained for 100 rounds and the second phase is trained for 200 rounds.
After the training is finished, testing by using the test set, inputting the test image into the trained deep neural network for image recognition, and finally obtaining image classification prediction. The excellent effect of the present invention on sample selection is shown in fig. 3.
Comparing the effect of the tag noise robust learning algorithm with the effect of the following 5 advanced tag noise robust learning algorithms, and adopting Average Classification Accuracy (ACA) as an evaluation index of identification, the higher the ACA value is, the more excellent the identification effect is. The 5 advanced label noise robust learning algorithms are as follows:
[1]Malach E,Shalev-shwartz S.Decoupling”when to update”from”how to update”[C]//Proceedings ofthe Advances in Neural Information Processing Systems(NeurIPS).2017:960–970.
[2]Jiang L,Zhou Z,Leung T,et al.Mentornet:Learning data-driven curriculum for very deep neural networks on corrupted labels[C]//Proceedings of the International Conference on Machine Learning(ICML).2017:2309––2318.
[3]Han B,Yao Q,Yu X,et al.Co-teaching:Robust training of deep neural networks with extremely noisy labels[C]//Proceedings of the Advances in Neural Information Processing Systems(NeurIPS).2018:8527–8537.
[4]Yu X,Han B,Yao J,et al.How does Disagreement Help Generalization against Label Corruption?[C]//Proceedings of the International Conference on Machine Learning(ICML).2019:7164–7173.
[5]Wei H,Feng L,Chen X,et al.Combating noisy labels by agreement:Ajoint training method with co-regularization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2020:13726–13735.
TABLE 1 image recognition Performance comparison
Method | Backbone network | ACA(%) |
Decoupling[1] | BCNN | 70.56 |
MentorNet[2] | BCNN | 71.16 |
Co-teaching[3] | BCNN | 73.85 |
Co-teaching+[4] | BCNN | 69.91 |
JoCoR[5] | BCNN | 75.29 |
The invention | BCNN | 76.48 |
As can be seen from table 1, the present invention improves the recognition performance of the entire model by training two networks simultaneously, and two partner networks update the networks through "self-learning" (updating the networks by using samples whose network predictions are inconsistent) and "intercommunion" (exchanging the networks to predict samples whose network predictions are consistent).
Claims (5)
1. A sample selection algorithm based on peer learning, characterized in that: the method comprises the following steps:
(1) training two deep convolution neural networks simultaneously, namely a companion network h1And h2The peer network respectively carries out category prediction on the input samples and calculates cross entropy loss;
(2) constructing a sample set with inconsistent prediction through judging whether the predictions of the partner networks are different;
(3) constructing a sample set with consistent prediction by judging whether the predictions of the partner networks are the same or not, then selecting samples with low loss values from the samples, and updating the network parameters of the other party by using the samples;
(4) network of peersh1Updating peer network h with samples of predicted disagreement1Own network parameter, partner network h2Updating the companion network h by using samples with low loss values in samples predicted to be consistent2And the other side network parameters.
2. The peer-learning based sample selection algorithm of claim 1, wherein: in the step (1), the simultaneous training of the two deep convolutional neural networks is performed by using a BCNN (binary-coded neural network) h pre-trained on ImageNet1And h2Guarantee h by random initialization of the full connection layer1And h2The dissimilarity of; the training data input into the network in one batch is recorded asWherein m is the batch size; image xiAnd its unreliable label yiInput h1And h2Respectively obtaining corresponding class predictionsAndand cross entropy loss Lh1(xi,yi) And Lh2(xi,yi)。
3. The peer-learning based sample selection algorithm of claim 1, wherein: in the step (2), a sample set with inconsistent predictions is constructed by judging whether the predictions of the peer networks are different, and is specifically referred to asGdSamples with different labels are predicted for the two subnetworks.
4. The peer-learning based sample selection algorithm of claim 1, characterized in thatCharacterized in that: in the step (3), a sample set with consistent prediction is constructed by judging whether the predictions of the peer networks are the same or not, and is specifically referred to asGsPredicting samples with consistent labels for the two sub-networks; subsequently, G is reactedsThe samples in the network are drained in ascending order according to the size of cross entropy loss value, and the two networks respectively select a sample structure with lower loss value (1-d (T)) multiplied by 100 percentAndexpressed as:
wherein, | GsIs set GsThe number of training samples involved, d (T) being used to dynamically adjustNumber of samples in setIs designed to be:
where ξ is the preset maximum discard rate, TkIs the number of training rounds required to maximize the discard rate.
5. The peer-learning based sample selection algorithm of claim 1, wherein: in step (4), the peer network h1Using GdAndsample in (3) updates network parameters, companion network h2Using GdAndthe sample in (1) updates the network parameter. Expressed as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110458211.XA CN113159294A (en) | 2021-04-27 | 2021-04-27 | Sample selection algorithm based on companion learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110458211.XA CN113159294A (en) | 2021-04-27 | 2021-04-27 | Sample selection algorithm based on companion learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113159294A true CN113159294A (en) | 2021-07-23 |
Family
ID=76871185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110458211.XA Withdrawn CN113159294A (en) | 2021-04-27 | 2021-04-27 | Sample selection algorithm based on companion learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113159294A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114170461A (en) * | 2021-12-02 | 2022-03-11 | 匀熵教育科技(无锡)有限公司 | Teacher-student framework image classification method containing noise labels based on feature space reorganization |
CN115457337A (en) * | 2022-10-29 | 2022-12-09 | 南京理工大学 | Image classification method containing fine-grained noise based on label distribution learning |
-
2021
- 2021-04-27 CN CN202110458211.XA patent/CN113159294A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114170461A (en) * | 2021-12-02 | 2022-03-11 | 匀熵教育科技(无锡)有限公司 | Teacher-student framework image classification method containing noise labels based on feature space reorganization |
CN114170461B (en) * | 2021-12-02 | 2024-02-27 | 匀熵智能科技(无锡)有限公司 | Noise-containing label image classification method based on feature space reorganization for teacher and student architecture |
CN115457337A (en) * | 2022-10-29 | 2022-12-09 | 南京理工大学 | Image classification method containing fine-grained noise based on label distribution learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113159294A (en) | Sample selection algorithm based on companion learning | |
CN112232476A (en) | Method and device for updating test sample set | |
CN113987236B (en) | Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network | |
CN112634171B (en) | Image defogging method and storage medium based on Bayesian convolutional neural network | |
CN117201122B (en) | Unsupervised attribute network anomaly detection method and system based on view level graph comparison learning | |
CN113361645A (en) | Target detection model construction method and system based on meta-learning and knowledge memory | |
CN112686376A (en) | Node representation method based on timing diagram neural network and incremental learning method | |
CN113869404A (en) | Self-adaptive graph volume accumulation method for thesis network data | |
CN112364747A (en) | Target detection method under limited sample | |
CN113128518B (en) | Sift mismatch detection method based on twin convolution network and feature mixing | |
Bianchi et al. | Improving image classification robustness through selective cnn-filters fine-tuning | |
CN116486150A (en) | Uncertainty perception-based regression error reduction method for image classification model | |
CN113469977B (en) | Flaw detection device, method and storage medium based on distillation learning mechanism | |
CN116152194A (en) | Object defect detection method, system, equipment and medium | |
CN113537389B (en) | Robust image classification method and device based on model embedding | |
CN115131605A (en) | Structure perception graph comparison learning method based on self-adaptive sub-graph | |
CN115238134A (en) | Method and apparatus for generating a graph vector representation of a graph data structure | |
CN112199980B (en) | Overhead line robot obstacle recognition method | |
CN113408393A (en) | Cassava disease identification method | |
CN115481215A (en) | Partner prediction method and prediction system based on temporal partner knowledge graph | |
Lim et al. | Analyzing deep neural networks with noisy labels | |
CN116188834B (en) | Full-slice image classification method and device based on self-adaptive training model | |
CN116416212B (en) | Training method of road surface damage detection neural network and road surface damage detection neural network | |
CN113554078B (en) | Method for improving classification accuracy of graphs under continuous learning based on comparison type concentration | |
CN117808127B (en) | Image processing method, federal learning method and device under heterogeneous data condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210723 |
|
WW01 | Invention patent application withdrawn after publication |