CN113688949B

CN113688949B - Network image data set denoising method based on dual-network joint label correction

Info

Publication number: CN113688949B
Application number: CN202111237302.7A
Authority: CN
Inventors: 姚亚洲; 孙泽人; 陈涛; 张传一; 沈复民
Original assignee: Nanjing Code Geek Technology Co ltd
Current assignee: Nanjing Code Geek Technology Co ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-02-15
Anticipated expiration: 2041-10-25
Also published as: CN113688949A

Abstract

The invention discloses a network image data set denoising method based on double-network joint label correction. And smoothing the softmax probabilities of the two deep neural networks, taking the result of weighted averaging as a real label of an internal noise sample to correct the internal noise, and then calculating cross entropy loss by combining clean samples to respectively update the two deep neural networks. After the combined training, the two deep neural networks become more and more accurate, and finally the prediction of the image is close to the same. Compared with a single-network correction method, the method has the advantages that the accuracy of label correction is obviously improved, and the practicability is better.

Description

Network image data set denoising method based on dual-network joint label correction

Technical Field

The invention belongs to the technical field of image data processing, and particularly relates to a network image data set denoising method based on dual-network joint label correction.

Background

The image classification task has a wide application prospect as a basic task in computer vision, and can be roughly divided into a coarse-grained image classification task and a fine-grained image classification task. Due to the development of modern computer technology, deep neural networks have become a widely adopted method in image classification tasks at present. With the rapid development of the internet, various information platforms generate a large amount of multimedia information each day, which contains a large amount of image information. Compared with a manually labeled data set, the network data is abundant in quantity and easy to acquire. Some search engines support image retrieval using keywords, and therefore it is easy to acquire a large amount of image data from tag characters. However, since the accuracy of information in the internet cannot be guaranteed, if labels of images retrieved from the network are directly marked as retrieval characters and form a training set in a network data set, a large amount of label noises are introduced, and the existence of the label noises can seriously affect the classification effect of the classifier.

The network image data set is used for reducing the dependence of a fine-grained image classification task on a fine manual labeling data set, and the network image data set contains internal noise and irrelevant noise. Due to the existence of label noise in the training set in the network data set, if a deep neural network is used for prediction, a wrong learning direction can be generated due to random initialization before training or random gradient reduction in the learning process. Meanwhile, because the network data set contains internal noise besides irrelevant noise, if only the noise data is considered to be discarded, the internal noise which can be used for deep neural network training after being corrected can be discarded.

At present, researches on processing label noise in a training data set are mainly divided into two types, one type is sample selection, and deep neural network training is mainly carried out by selecting clean samples in the training data set. The other is label or loss correction, which is to correct the error marked data by label correction or loss correction. The denoising methods are all based on a data set with labels disturbed manually, but for a network data set, the noise rate of the data set is unknown, the distribution of noise is not regular, and irrelevant noise exists, so that the methods cannot be completely applied to the network data set.

Disclosure of Invention

The invention aims to provide a network image data set denoising method based on dual-network joint label correction, and aims to solve the problems.

The invention is mainly realized by the following technical scheme:

a network image data set denoising method based on dual-network joint label correction comprises the following steps:

step S100: acquiring a network data set, and dividing to obtain a training setD，

Step S200: using training setsDPre-training the two identical deep neural networks A, B, respectively;

step S300: will train the setDAfter randomly dividing the training set into a plurality of mini-batch, obtaining the training set in the mth mini-batchD _mUsing the pre-trained deep neural network A, B to separately pair training setsD _mSelecting samples, dividing the samples into clean samples, internal noise samples and irrelevant noise samples according to the selection result, and respectively obtaining samples with smaller loss and rejection rate delta percentD _AAndD _B；

step S400: smoothing the softmax probabilities of the two deep neural networks, and taking the result of weighted average as a real label of the internal noise sample to correct the internal noise; then, the cross entropy loss is calculated by combining the clean samples and is respectively used for updating the two deep neural networks,

step S500: repeating the steps 200-400 until the set times are reached to obtain the trained deep neural network;

step S600: and inputting the image to be detected into the trained deep neural network for prediction classification, thereby realizing the denoising of the image data set.

The mini-batch in the invention is a sample selected before each parameter adjustment; each time a data set is learned, the data set is called 1 epoch; the mini-batch and epoch are conventional expressions in the field, and are not described in detail.

The invention uses two same deep neural networks to train the network data set after respectively carrying out random initialization, respectively carries out sample selection, and divides clean samples, internal noise and irrelevant noise data according to the selection result. And smoothing the softmax probabilities of the two deep neural networks, taking the result of weighted averaging as a real label of an internal noise sample to correct the internal noise, and then calculating cross entropy loss by combining clean samples to respectively update the two deep neural networks. After the combined training, the two deep neural networks become more and more accurate, and finally the prediction of the image is close to the same.

In order to better implement the present invention, further, the specific steps of step S200 are as follows:

training setDThe middle image isx _iImages ofx _iIs labeled asy _iImages ofx _iThe real label isy _i ^*，

，

WhereinkThe number of categories of the network data set;Nthe number of samples of a training set in a network data set;

respectively adopting two identical deep neural networks in training setDGo on toT _kSecondary pre-training, back propagation and parameter updating using cross entropy loss, and softmax outputs of two deep neural networks are as follows:

（6.1）

wherein:

fully connected layer correspondence category before softmax layer of deep neural networksAn output of (d);

softmax layer correspondence category for deep neural networksθAn output of (d);

the cross entropy loss is obtained by the cross entropy accumulation of the image labeled label and the depth neural network to the softmax output of the image, and the formula is as follows:

（6.2）

during the pre-training period, the two deep neural networks use the cross entropy loss updating networks respectively, the two deep neural networks do not influence each other, the two deep neural networks have different initialization and random gradient descending directions in the pre-training process, and finally, the learning effects of the two deep neural networks are different after the pre-training is finished, so that the subsequent combined learning is carried out.

In order to better implement the present invention, further, the specific steps of step S300 are as follows:

step S301: after the pre-training is finished, smoothing is respectively carried out on softmax probabilities of the two deep neural networks, and the formula is as follows:

（6.3）

wherein:

is the smoothing coefficient;

when in use

When the number is equal to 1, not performing label smoothing, otherwise, performing label smoothing;

step S302: in each mini-batch, cross-entropy loss is calculated by using the smoothed softmax function, and the pair is used as the cross-entropy lossD _mSample selection is carried out, and delta% samples with smaller loss are respectively selectedD _AAndD _B：

the loss after smoothing is calculated according to the following formula,

（6.4）

then, the sample selection is performed according to the following formula:

（6.5）

wherein:N _mis the firstmNumber of samples in each mini-batch.

In order to better implement the present invention, further, the specific steps of step S400 are as follows:

step S401: if the current batch isD _mIs simultaneously present in the samples selected by the two deep neural networks A, BD _AAndD _Bif the two non-interfering deep neural networks A, B identify the training sample as a clean sample, the training sample is a clean sample, and the label is directly used to calculate the back propagation loss, and the formula is as follows:

（6.6）

step S402: if the training samples exist in the samples selected by the two deep neural networks A, BD _AOrD _BThen, the correct label of the image is estimated from the prediction probabilities of the two deep neural networks A, B on the image, so that label correction is performed in the computation of the back propagation loss; the smoothed softmax outputs of the two deep neural networks A, B are weighted-averaged:

（6.7）

wherein:

outputting the smoothed softmax of the deep neural network A;

outputting the smoothed softmax of the deep neural network B;

after smoothing through the softmax probability, combining the smoothed softmax probabilities of the two deep neural networks, averaging, using the maximum probability corresponding to the averaged softmax as a final label and replacing the label of the sample as a modified label:

（6.8）

step S403: for samples that were divided into internal noise after the two deep neural networks A, B were combined, the original label was replaced with the revised label when calculating the cross-entropy loss, the formula is as follows:

（6.9）

combining equation (6.6) and equation (6.9) yields the calculated loss of internal noise using the clean samples and the modified labels, which are used to update the parameters of the two deep neural networks A, B, respectively, as follows:

（6.10）。

to better implement the present invention, further, in step S401, if the training sample is not in the training set selected by the two deep neural networks at the same timeD _AAndD _Bif both deep neural networks A, B identify the sample as noise, then the sample is temporarily not used for back propagation in the epoch training; and the sample selection is carried out again in the new epoch training process, and new irrelevant noise samples which are not used for back propagation are determined again according to the selection results of the two deep neural networks A, B.

To better implement the present invention, further, the two deep neural networks A, B are BCNN networks respectively composed of VGGs 16.

The invention provides a network image data set denoising method based on double-network joint label correction. Due to the different random initialization of the two deep neural networks and the learning process which is not influenced mutually during the pre-training, the deep neural network after the pre-training has inconsistent prediction results for the same training sample. Respectively selecting samples by the two deep neural networks, distinguishing clean samples, internal noise and irrelevant noise according to the selection result, correcting internal noise labels by using labels subjected to weighted average after softmax probability smoothing of the two deep neural networks, and performing deep neural network parameter updating by reversely propagating the corrected internal noise and the clean samples

The invention has the beneficial effects that:

(1) according to the method, the softmax probabilities of the two deep neural networks are smoothed and then the weighted average result is used as a real label of an internal noise sample to correct the internal noise, and then the cross entropy loss is calculated by combining clean samples and is respectively used for updating the two deep neural networks. After the combined training, the two deep neural networks become more and more accurate, and finally, the prediction of the image is also close to the same, so that the denoising precision of the image data set is improved;

(2) the invention provides a double-network joint label correction method, which avoids the problem of wrong learning direction caused by random initialization before training or random gradient reduction in the learning process to a certain extent, and has better practicability;

(3) according to the method, two identical deep neural networks are used for training the network data set after random initialization is carried out on the two identical deep neural networks, and compared with a single-network correction method, the accuracy rate of correcting the label is obviously improved;

(4) the label of the internal noise is corrected into the correct label through label correction, so that the aim of recycling the internal noise is fulfilled, and the method has better practicability.

Drawings

FIG. 1 is a schematic diagram of the principles of the present invention;

fig. 2 is a test accuracy rate variation curve of the invention on three network data sets in example 1.

Detailed Description

Example 1:

The invention provides a double-network joint label correction method, which avoids the problem of wrong learning direction caused by random initialization before training or random gradient reduction in the learning process to a certain extent; according to the method, two identical deep neural networks are used for training the network data set after random initialization is carried out on the two identical deep neural networks, and compared with a single-network correction method, the accuracy rate of correcting the label is obviously improved.

Example 2:

in this embodiment, optimization is performed on the basis of embodiment 1, and the specific steps of step S200 are as follows:

，

（6.1）

wherein:

is deepFully-connected layer corresponding category before softmax layer of neural networksAn output of (d);

（6.2）

Further, the specific steps of step S300 are as follows:

（6.3）

wherein:

is the smoothing coefficient;

when in use

step S302: in each mini-batch, cross-entropy loss is calculated by using the smoothed softmax function, and the pair is used as the cross-entropy lossD _mSelecting samples, respectively selectingSelection of samples with lower delta% lossD _AAndD _B：

the loss after smoothing is calculated according to the following formula,

（6.4）

then, the sample selection is performed according to the following formula:

（6.5）

wherein:N _mis the firstmNumber of samples in each mini-batch.

Other parts of this embodiment are the same as embodiment 1, and thus are not described again.

Example 3:

in this embodiment, optimization is performed on the basis of embodiment 1 or 2, and as shown in fig. 1, the specific steps of step S400 are as follows:

（6.6）

step S402: if the training samples exist in the samples selected by the two deep neural networks A, BD _AOrD _BThen, the correct label of the image is estimated from the prediction probabilities of the two deep neural networks A, B on the image, so that label correction is performed in the computation of the back propagation loss; smoothed s for two deep neural networks A, BThe soft max output performs a weighted average operation:

（6.7）

wherein:

outputting the smoothed softmax of the deep neural network A;

outputting the smoothed softmax of the deep neural network B;

（6.8）

（6.9）

（6.10）。

further, in step S401, if the training sample is not in the training set selected by the two deep neural networks at the same timeD _AAndD _Bif both deep neural networks A, B identify the sample as noise, then the sample is temporarily not used for back propagation in the epoch training; and the sample selection is carried out again in the new epoch training process, and new irrelevant noise samples which are not used for back propagation are determined again according to the selection results of the two deep neural networks A, B.

The invention provides a double-network joint label correction method, which avoids the problem of wrong learning direction caused by random initialization before training or random gradient reduction in the learning process to a certain extent; according to the method, two identical deep neural networks are used for training the network data set after random initialization is carried out on the two identical deep neural networks, and compared with a single-network correction method, the accuracy rate of correcting the label is obviously improved; the label of the internal noise is corrected into the correct label through label correction, so that the aim of recycling the internal noise is fulfilled, and the method has better practicability.

The rest of this embodiment is the same as embodiment 1 or 2, and therefore, the description thereof is omitted.

Example 4:

a network image data set denoising method based on dual-network joint label correction is provided, as shown in FIG. 1, a training set in a network data set is assumedDThe middle image isx _iImages ofx _iIs labeled asy _i。

WhereinkIs the number of categories of the network data set,

，Nrepresenting the number of samples of the training set in the network dataset. Due to the noise in the training set, soy _iNot necessarily bex _iThe correct label. Suppose thaty _i ^*Is a samplex _iIf the sample label is clean, then

. Randomly dividing a data set into training sets in the mth mini-batch after a plurality of mini-batchesD _m，N _mIs the number of samples in the m-th mini-batch. The basic structure of the method is shown in fig. 1.

Respectively enabling two deep neural networks to be carried out on a network training set before carrying out double-network joint label correctionT _kAnd (5) secondary pre-training. The two deep neural networks A and B used in this chapter are both BCNN networks composed of VGG16, and the two deep neural networks have the same structure. Respectively randomly initializing two deep neural networks A, B, directly using cross entropy loss for back propagation and parameter updating without performing label correction in the pre-training process, and outputting softmax of the two deep neural networks as follows:

（6.1）

in the above formula, the first and second carbon atoms are,

the output is the output of the fully-connected layer corresponding category s before the softmax layer of the deep neural network a (or B).

The cross quotient loss is obtained by the cross entropy accumulation of the image labeled label and the depth neural network to the softmax output of the image, and the formula is as follows:

（6.2）

during the pre-training period, the two deep neural networks use the cross entropy loss updating networks respectively, the two deep neural networks do not influence each other, the two deep neural networks have different initialization and random gradient descending directions in the pre-training process, finally, the learning effects of the two deep neural networks are different after the pre-training is finished, and the subsequent combined learning can be carried out.

Clean samples and noise samples are separated. After the pre-training is finished, smoothing is respectively carried out on softmax probabilities of the two deep neural networks, and the formula is as follows:

（6.3）

wherein the content of the first and second substances,

is a smoothing coefficient.

When in use

If =1, label smoothing is not performed. Assuming that the total number of classes k =2, the probability that the deep neural network outputs the corresponding two classes of labels to the softmax of the image without smoothing is 0.9 and 0.1 respectively, if softmax smoothing is performed and set

If =2, the softmax probabilities of the two types of smoothed tags are 0.75 and 0.25, respectively. After the softmax probability is smoothed, the probability difference between different classes becomes small, the deep neural network does not pay attention to only one label with the majority probability any more, and the probability occupied by other labels is improved.

In each mini-batch, cross-entropy loss is calculated by using the smoothed softmax function, and the pair is used as the cross-entropy lossD _mSample selection is carried out, and delta% samples with smaller loss are respectively selectedD _AAndD _Bthe smoothed losses are calculated according to equation (6.4), and then the sample selection is performed according to equation (6.5).

（6.4）

（6.5）

Due to different initializations and different learning directions, the two same deep neural networks can have different prediction results for the same image, and experiments prove that better classification effect can be obtained by combining the prediction results of the two deep neural networks to perform sample selection. In order to improve the accuracy of the selected samples as clean samples and to select out internal noise, the invention combines two deep neural networks to classify the training data types and decide whether to perform label correction. If the current batchD _mA certain training sample in (2) exists in the training set selected by two deep neural networks at the same timeD _AAndD _Bthe two non-interfering deep neural networks simultaneously identify a clean sample, which indicates that the sample is probably clean. Such samples are directly used with the tags to calculate the loss of back propagation, the formula is as follows:

（6.6）

if the sample is not in the range selected by the deep neural networks A and B at the same time, the two deep neural networks are considered as noise, and in order to avoid adverse effects caused by the noise, the sample is not used for back propagation temporarily in the current epoch training. And meanwhile, in order to avoid the error discarding caused by inaccurate sample selection of the deep neural network, the sample is not used for calculating the back propagation loss only in the epoch, the sample selection is carried out again in the training process of a new epoch, and a new irrelevant noise sample which is not used for back propagation is determined again according to the selection results of the two deep neural networks.

If training samples existD _AIn, but does not existD _BMiddle (or absent)D _ABut exist atD _BMiddle), the sample is not considered clean by both deep neural networks at the same time,the sample is likely to be internal noise in the presence of tag noise. The correct label of the image can be deduced from the prediction probabilities of the two deep neural networks on the image, so that the label correction is carried out in the calculation of the loss of the backward propagation. The invention carries out weighted average operation on the smoothed softmax outputs of two deep neural networks:

（6.7）

after smoothing by softmax probabilities, combining the smoothed softmax probabilities of the two deep neural networks and averaging, the final prediction will tend to be more accurate than the prediction of a single network. And using the maximum probability corresponding to the averaged softmax as a final label and replacing the label of the sample as a corrected label.

（6.8）

If the image is a clean training sample, the label corresponding to the maximum probability of the output of the softmax layer of the image by the two deep neural networks tends to be the clean label, and the probability of the label which is finally predicted to be the clean label is larger after the two softmax outputs are summed and averaged; if the image is internal noise, because the depth neural network after pre-training has certain resolving power, the probability of the predicted softmax probability is also inclined to be higher than that of the corresponding real label, and the label corresponding to the averaged softmax high probability is also most likely to be the real label of the label, so that the noise label is replaced by the real label through label correction; if the image is irrelevant noise, the irrelevant noise can be better distinguished by calculating cross entropy loss after softmax probability smoothing, so that the cross entropy loss is not used for calculating back propagation loss in the training. Therefore, for the samples divided into internal noise after the two deep neural networks are combined, the corrected label is used for replacing the original label when the cross entropy loss is calculated, and the formula is as follows:

（6.9）

combining equation (6.6) and equation (6.9) yields the loss calculated using the clean sample and the internal noise after label correction, which are used to update the parameters of the two deep neural networks, respectively, as follows:

（6.10）

example 5:

a network image data set denoising method based on double-network joint Label Correction, namely a double-network joint Label Correction method (TNLC), is carried out on a network data set (Web-bytes, Web-Aircrafts, Web-Cars) as shown in FIG. 2, and Average Classification Accuracy (ACA) is taken as standard measurement. The two deep neural network architectures adopted in this embodiment are both BCNN, and the BCNN is composed of two VGG16 deep neural networks trained on ImageNet.

The invention adopts a 'two-step' training method: firstly, only updating the last full-connection layer in the training process, and fixing the parameters of other layers; the second step updates the parameters of all layers during the training process. In the experiment, 100 epochs are trained in the two steps respectively, and the number of epochs is 200. The experiment used two NVIDIA GPUs, setting the Adam optimizer with momentum of 0.9, and the batch size was set to 64 and 18 in "two steps", respectively. The initial learning rates are set to 0.001 and 0.0001 respectively, and the learning rate is decreased progressively by testing the accuracy rate: if the test accuracy is not improved within 5 consecutive epochs, the learning rate is halved. The invention uses three variables, respectively the number of pretrainsT _kDiscard rate delta% and softmax smoothing factor

. To avoid learning the same parameters from two deep neural networks due to the same initialization, experiments were performed using "Kaiming Normal In" on the fully-connected layerinitialization "to ensure that two deep neural networks possess different learning abilities.

In order to verify the effectiveness of the method provided in this chapter, four excellent algorithms are selected as comparison experiments, which are respectively as follows:

(1) BCNN: and only using a weak supervision fine-grained image classification algorithm of image label information, performing outer product on the extracted features of the image by using the two deep neural networks, inputting the outer product to an SVM classifier through a pooling layer for classification, and simultaneously realizing positioning and classification of the fine-grained image.

(2) Decorupling: and training two deep neural network models, and updating network parameters when the prediction results of the two deep neural networks on the same image are not equal.

(3) Co-teaching: two deep neural network models are trained, and each deep neural network selects a sample with low loss to be handed to the other network for training.

(4) SELFIE: and correcting the label of the uncertain sample into a prediction label which accounts for the highest in the prediction history, and performing gradient updating by combining a clean sample.

BCNN is a comparison experiment without denoising as a basic network architecture used by the experiment, and the other three denoising algorithms (denoising, Co-training and SELFIE) are used as comparison experiments for denoising. To ensure contrast

The reliability of the experiment, the deep neural network of the comparison algorithm was all replaced by BCNN consisting of two VGG16 networks, and the training was performed using the network data set. In the comparative experiments of Co-teaching and SELFIE, the discard rate delta and the number of pretrainsT _kThe settings are consistent with the present invention, and other parameters of SELEIE are consistent with its article.

Deep neural network training is performed on a training set of a network data set by adopting a TNLC method and a comparison algorithm, and testing is performed on a clean testing set, so that the obtained testing accuracy is shown in Table 1. The experimental parameters were set as: web-documents, and Web-Cars datasets:T _k =15，

=1.5, δ = 0.2; Web-Aircrafts dataset:T _k =15，

=1.5, δ = 0.25. As shown in Table 1, the accuracy of the present invention was higher on the Web-Birds and Web-Cars datasets than other methods, and the test accuracy on the Web-Aircrafts was only slightly lower than decorupling. As shown in FIG. 2, the test accuracy rate variation curves of the invention in the training process of three network data sets are higher in the test accuracy rate of the invention in the training process of Web-Birds and Web-Cars network data sets.

TABLE 1 accuracy (ACA%) of different algorithms on different network data set test sets

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A network image data set denoising method based on dual-network joint label correction is characterized by comprising the following steps:

step S300: will train the setDAfter randomly dividing the training set into a plurality of mini-batch, obtaining the training set in the mth mini-batchD _mUsing the pre-trained deep neural network A, B to separately pair training setsD _mSelecting samples, dividing the samples into clean samples, internal noise samples and irrelevant noise samples according to the selection result, and obtaining losses respectivelySmaller rejection delta% samplesD _AAndD _B；

step S500: repeating the steps S200-S400 until the set times are reached to obtain the trained deep neural network;

2. The method for denoising network image data set based on dual-network joint label correction according to claim 1, wherein the specific steps of step S200 are as follows:

，

（6.1）

wherein:

（6.2）

3. The method for denoising network image data set based on dual-network joint label correction according to claim 2, wherein the specific steps of step S300 are as follows:

（6.3）

wherein:

is the smoothing coefficient;

when in use

the loss after smoothing is calculated according to the following formula,

（6.4）

then, the sample selection is performed according to the following formula:

（6.5）

wherein:N _mis the firstmNumber of samples in each mini-batch.

4. The method for denoising network image data set based on dual-network joint label correction according to claim 2 or 3, wherein the specific steps of step S400 are as follows:

（6.6）

step S402: if trainingThe training samples are samples selected by two deep neural networks A, BD _AOrD _BThen, the correct label of the image is estimated from the prediction probabilities of the two deep neural networks A, B on the image, so that label correction is performed in the computation of the back propagation loss; the smoothed softmax outputs of the two deep neural networks A, B are weighted-averaged:

（6.7）

wherein:

outputting the smoothed softmax of the deep neural network A;

outputting the smoothed softmax of the deep neural network B;

（6.8）

（6.9）

（6.10）。

5. the method for denoising network image data set based on dual-network joint label correction as claimed in claim 4, wherein in step S401, if the training sample is not in the samples selected by two deep neural networks at the same timeD _AAndD _Bif both deep neural networks A, B identify the sample as noise, then the sample is temporarily not used for back propagation in the epoch training; and the sample selection is carried out again in the new epoch training process, and new irrelevant noise samples which are not used for back propagation are determined again according to the selection results of the two deep neural networks A, B.

6. The method for denoising network image data set based on dual-network joint label correction according to claim 1, wherein the two deep neural networks A, B are BCNN networks respectively composed of VGG 16.