CN108416382B

CN108416382B - Web image training convolutional neural network method based on iterative sampling and one-to-many label correction

Info

Publication number: CN108416382B
Application number: CN201810171017.1A
Authority: CN
Inventors: 杨巨峰; 程明明; 孙晓晓; 王恺
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2018-03-01
Filing date: 2018-03-01
Publication date: 2022-04-19
Anticipated expiration: 2038-03-01
Also published as: CN108416382A

Abstract

The invention discloses a method for training a convolutional neural network based on iterative sampling and one-to-many label correction for Web images. The purpose of this method is to solve the problem of insufficient data when using depth science for computer vision tasks by gradually adding Web images in the training set. The new method can be used as an auxiliary processing step of various computer vision tasks, and is characterized in that with the training of the model, the updated model can predict the label confidence of the Web image more accurately, and then the performance of the model is continuously improved by adding high-quality data into the training data through comparison, so that the noise data in the Web data can be effectively reduced, and the performance of the convolutional neural network is improved. Meanwhile, the complexity and diversity of Web image contents are considered, and the method adopts a one-to-many label correction strategy in the iterative process to reduce the influence of hard labeling on model training. Based on the steps, the whole model is iteratively trained until the performance of the network tends to be stable.

Description

Web image training convolutional neural network method based on iterative sampling and one-to-many label correction

Technical Field

The invention belongs to the technical field of computer vision, and relates to a method for training a convolutional neural network by a Web image, in particular to a method for training a convolutional neural network by a Web image based on iterative sampling and one-to-many label correction.

Background

With the development of deep learning, many typical problems such as object detection, object recognition, tracking, salient region detection, etc. benefit from the development, but training of the convolutional neural network requires a large amount of data, and manual labeling of data is time-consuming and labor-consuming, and especially for some problems requiring professional knowledge, manual labeling is more difficult to implement. In recent years, network image learning is attracting more and more attention, which is one of the current methods for solving the data shortage of the training convolutional neural network, and aims to train the convolutional neural network by using images which are easily acquired on the network. Training convolutional neural networks with network data faces two challenges: i) the data of the data on the network contains noise data; ii) there is a difference in the distribution of the network data and the standard data. If the label of the selected network image can not accurately reflect the content of the image, the training of the model is influenced. Meanwhile, the network data image has complex content, so that the data distribution of the webpage image and the target data set are quite different.

Currently, some work has been proposed to utilize network data. Krause, Jonathan, et al 2016, proposed in The paper "The unresponsive effective implementation of knowledge data for fine-grained-graphiderogation" (pp.301-320ECCV 2016.vol 9907.Springer) that The addition of network data to The training of convolutional neural networks can boost The effect of The model. In view of the influence of network noise data, Vo, Phong D et al in 2017 published work on Computer Vision and Image interpretation propose to remove noise data based on the prediction of the reference model, and then use the processed network data for auxiliary convolutional neural network training. The method can improve the classification effect of the model. Furthermore, TongXiao, equal to 2015, proposed a probabilistic graph model-based approach in "Learning from systematic data for image classification" (pp.2691-2699CVPR 2015). By modeling the relationship between the network image, the noise label and the real label, the model has certain robustness to noise data. But these efforts neglect a problem: the judgment of the model on the noise data is based on the standard data, and the distribution of the standard data and the network image have great difference, so that a great deal of network data can be deleted by mistake. If only data consistent with standard data is kept, the model can hardly learn new content again, and the addition of network data loses due value. Therefore, sampling of network data is crucial.

In recent years, iterative learning strategies have been applied to many machine learning tasks, such as data mining, pattern recognition, computer vision, and the like. Knowledge acquisition is performed in a simple to complex manner, and y.bengio et al proposed "Curriculumlearning" (pp.41-48ICML2009) (curriculum learning) in 2009 to describe the learning manner and introduce it into machine learning. In 2010, m.p. kumar et al proposed embedding the difficult and easy identification of course learning into the learning objective in "Self-tracked learning for latentvariable models" (pp.1189-1197NIPS 2010), and called this learning as Self-tracked learning, i.e. learning iteratively from easy to difficult from samples. It is worth mentioning that this method is a popular learning method, which conforms to the cognitive habits of people. For example, in 2017, Ma Fan and the like propose a selection optimization process based on self-pace learning to improve the traditional co-training algorithm; and a new model is designed by Dong in Few-shot object detection (arXiv:1706.08249) based on the idea of iterative learning, so that the object detection effect is improved. Inspired by these efforts, we consider that each iteration of progressive iterative learning can be viewed as: optimization of model weights with the addition of new data. Our method was designed based on the above analysis.

Some latest achievements in the field stimulate the inspiration of us, and provide a solid technical foundation for the realization of a Web image training convolutional neural network learning method based on iterative sampling and one-to-many label correction.

Disclosure of Invention

The technical problem to be solved by the invention is to train a convolutional neural network by using a Web image, accurately judge the label of a network picture, reduce the influence of noise data and more efficiently train a model by using the Web data.

In order to achieve the purpose of the invention, the following technical scheme is adopted to realize the purpose:

a. a user inputs a Web image data set, and feature extraction is carried out by using a reference model to obtain the label confidence of an image;

b. according to the confidence coefficient of the current prediction, performing one-to-many correction and sampling on the label of the Web image data;

c. training is continued by using the Web images sampled above to obtain an updated model, and then the steps are repeated until the performance of the model tends to be stable or the data is not changed basically.

In order to perform one-to-many correction in the step b, a reference model is obtained by training on a standard data set in advance, and then the features of the softmax layer are extracted as the confidence degrees of the predicted labels of the Web images.

The invention has the beneficial effects that: the method can be simply transferred to any convolutional neural network model, and is suitable for tasks with insufficient data and learning by means of Web images. Under the condition that a model is not selected, only the structure of the model and the size of data batch during training (determined according to the model and the video memory) need to be modified, and the performance of the model obtained through final training is greatly improved compared with the model only trained by standard data, and is also obviously improved compared with other methods utilizing Web images. In general, the method provides a brand-new scheme for training the convolutional neural network by using the Web image, and the method is believed to be well applied to many other computer vision classification tasks to help the tasks to train the convolutional neural network more fully and obtain a better model.

Drawings

FIG. 1 is a flowchart of a method for training a convolutional neural network based on iterative sampling and one-to-many label correction of a Web image.

FIG. 2 is a schematic diagram of a method for training a convolutional neural network based on iterative sampling and one-to-many label correction of a Web image.

Detailed Description

The invention is described in further detail below with reference to the following figures and detailed description:

referring to fig. 1, a flowchart of a method for training a convolutional neural network by using a Web image with iterative sampling and one-to-many label correction is shown, wherein the steps shown in the diagram are as follows:

a. and training the neural network by using the given labeled data set as reference data to obtain a reference model. The typical network training uses a cafe framework running on an incavida display card. By means of the high concurrency capability of the display card, the training process can be completed quickly; in the step, the convolutional neural network model trained by the standard data set is used as a reference model, so that the distribution of the sampled data is ensured not to be changed greatly, the training process of the model is influenced by the data distribution difference as little as possible, and the model is better drawn to a target task.

b. And (4) performing feature extraction of the softmax layer on the Web image to obtain a tag confidence coefficient. And then sequencing the predicted labels of the Web images, comparing the predicted labels with the Web labels of the Web images, and performing label correction and sampling on the Web data by using a one-to-many label correction strategy. Compared with the traditional one-to-one label correction, the strategy can fully utilize the richness and diversity of Web image contents. The specific content comprises the steps of obtaining the confidence coefficient of a predicted label of the Web image through a reference convolutional neural network, and then sequencing the label according to the confidence coefficient from high to low. When the label sampling data is corrected, firstly, whether a first prediction label is consistent with a Web label of a Web image is compared, and if the prediction is correct, the sample is adopted; if the two-bit label is incorrect, whether the previous two-bit label is correctly predicted is considered, on the premise that the difference between the confidence degrees of the first bit label and the second bit label is small, whether the second bit prediction label is consistent with the Web label of the Web image is compared, if the conditions are met, the two labels are respectively used as labels to sample the image twice, and the like, the previous four labels are considered, if the confidence degrees are similar and correct prediction is carried out simultaneously, the sample is labeled by the labels and sampled simultaneously;

c. and adding all the sampling samples into a training data set, continuing training to obtain a new model, performing a new round of data sampling on the whole Web image data set by using the new model as a reference model, and performing iterative training until the model performance tends to be stable or the data is basically not changed.

In the invention, the Web image training convolutional neural network is subjected to iterative training of the convolutional neural network and sampling of the Web image so as to obtain a Web image data set with less noise and a convolutional neural network with improved performance, the two steps can be mutually promoted, and simultaneously, the optimization is carried out so as to achieve the purpose of efficiently utilizing network data.

Fig. 2 shows a schematic diagram of the method, in which the core problem of the algorithm at each stage, the training process, and the system input and output are visually described. Fig. 2 and fig. 1 have the same meaning, but have different abstraction levels, and mainly assist in understanding the various parts in fig. 1.

Claims

1. A Web image training convolutional neural network method based on iterative sampling and one-to-many label correction is characterized in that for the Web image training convolutional neural network, the Web image data set with less noise and the convolutional neural network with improved performance are obtained through the iterative training convolutional neural network and the sampling Web image, the two steps can be mutually promoted, and meanwhile, the purpose of efficiently utilizing the Web data to train the neural network is achieved through optimization, and the method specifically comprises the following steps:

a. training a convolution neural network model by using a standard data set as an initial reference model; by taking the convolutional neural network model trained by the standard data set as a reference model, the distribution of the sampled data is ensured not to change greatly, the training process of the model is influenced by the data distribution difference as little as possible, and the model is better drawn to a target task in the iteration process;

b. inputting a Web image to a reference model, and extracting the characteristics of a softmax layer to serve as the confidence coefficient of a prediction label of the Web image;

c. b, obtaining the sequence of the predicted labels according to the label prediction confidence coefficient of the Web image obtained in the step b, then comparing the predicted labels with the Web labels of the image, and performing label correction and sampling on the Web data by using a one-to-many label correction strategy;

d. c, adding the sampled Web data obtained in the step c into a standard training data set to continue training to obtain a new model, and repeating the steps b, c and d by taking the new model as a reference model until the performance of the training model and the change of the sampled data set tend to be stable;

b, sequencing the predicted labels of the Web image, comparing the predicted labels with the Web labels of the Web image, and then performing label correction and sampling on the Web data by using a one-to-many label correction strategy, wherein the specific contents comprise that the confidence coefficient of the predicted labels of the Web image is obtained through a reference convolutional neural network, then the labels are sequenced from high to low according to the confidence coefficient, when the sampled data of the labels are corrected, whether the predicted label at the first position is consistent with the Web label of the Web image is firstly compared, and if so, the sample is adopted; if the two labels are not consistent, whether the previous two-bit label is correctly predicted is considered, on the premise that the difference between the confidence degrees of the first bit label and the second bit label is small, whether the second bit prediction label is consistent with the Web label of the Web image is compared, if the conditions are met, the two labels are respectively used as labels to sample the image twice, and the rest is considered until the previous four labels are considered, if all the conditions are not met, the sample cannot be sampled, and if the conditions that the confidence degrees are close and correct prediction is carried out simultaneously are met, the sample is simultaneously labeled by the labels and sampled.