CN107463953A

CN107463953A - Image classification method and system based on quality insertion in the case of label is noisy

Info

Publication number: CN107463953A
Application number: CN201710599924.1A
Authority: CN
Inventors: 张娅; 姚江超; 王嘉杰; 王延峰
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Media Intelligence Co ltd
Priority date: 2017-07-21
Filing date: 2017-07-21
Publication date: 2017-12-12
Anticipated expiration: 2037-07-21
Also published as: CN107463953B

Abstract

The present invention provides an image classification method and system based on quality embedding in the case of noisy tags, including: a step of collecting network image tags; a tag quality factor embedding step: introducing a tag quality factor into a supervised image classification model for use in Control the prediction value generation of noisy labels and absorb the error return information from wrong labels; use the maximized logarithmic likelihood function to design the optimization objective function after adding the label quality factor; network model construction steps: use deep neural network to optimize Modeling of the objective function; network parameter training step: input training pictures and noisy labels into the network model, use a variant of stochastic gradient descent method to end-to-end linkage training model, and update model parameters at the same time; image classification step. The invention uniformly models the three variables of the real label of the picture, the label provided by the user and the quality of the picture label to form the supervised learning of the noisy label, and can obtain relatively accurate image classification results.

Description

Method and system for image classification based on quality embedding in the case of noisy labels

技术领域technical field

本发明涉及计算机视觉和数据挖掘领域，具体地，特别涉及标签含有噪声情况下的图片标签学习方法及系统。The present invention relates to the fields of computer vision and data mining, and in particular, relates to a method and system for learning a picture label when the label contains noise.

背景技术Background technique

图像识别是人工智能领域的一项基础而重要的任务，其应用跨越了自然科学，医药学，工业等多个领域。随着深度学习的迅猛发展，利用卷积神经网络训练得到的图像分类器获得了空前成功。然而深度学习框架下的图像分类学习依赖于大规模高品质的训练数据，包括清晰的图像和精确的标签。这样的训练数据往往来自于人工收集和标注，这将消耗大量的人力物力，使得处理新领域的图像识别问题变得相对昂贵和低效。Image recognition is a basic and important task in the field of artificial intelligence, and its application spans many fields such as natural science, medicine, and industry. With the rapid development of deep learning, image classifiers trained by convolutional neural networks have achieved unprecedented success. However, image classification learning under the deep learning framework relies on large-scale high-quality training data, including clear images and accurate labels. Such training data often comes from manual collection and labeling, which will consume a lot of manpower and material resources, making it relatively expensive and inefficient to deal with image recognition problems in new fields.

由于网络技术，社交媒体的迅速发展以及人们对网络自媒体的热爱，互联网络中存在数不胜数的图片数据。图片社交平台如Flickr和网易LOFTER拥有近千百万的用户提供的图片数据以及标签信息。如果能将这些图片和标签数据用于深度神经网络模型的训练，将大大提升数据集的种类和数量，帮助深度神经网络更快速的迁移到不同领域的图像识别问题中。Due to the rapid development of network technology, social media and people's love for online self-media, there are countless picture data in the Internet. Image social platforms such as Flickr and NetEase LOFTER have image data and label information provided by nearly millions of users. If these pictures and label data can be used for the training of deep neural network models, the type and quantity of data sets will be greatly improved, and it will help deep neural networks to migrate to image recognition problems in different fields more quickly.

使用互联网用户上传的图片和标签作为训练数据可以很好的解决人工标记数据的局限，但是也会带来相应的问题和挑战。大型人工标记数据集提供的图片数据品质好且标签完备，因此基于此类数据集训练得到的神经网络分类器模型准确率高。相比之下，网络图片以及用户标签存在质量不佳标签不准确的特性。如果利用存在大量噪声的图片标签数据，会大大降低模型的预测可靠度。因此，研究如何充分利用网络图片以及用户提供标签这一取之不尽的数据资源，进行有效的图片标签学习得到了更多的关注。Using pictures and labels uploaded by Internet users as training data can solve the limitations of manual labeling data, but it will also bring corresponding problems and challenges. The image data provided by large manually labeled datasets is of good quality and complete with labels, so the neural network classifier model trained based on such datasets has a high accuracy rate. In contrast, web images and user tags have the characteristics of poor quality and inaccurate labels. If the image label data with a lot of noise is used, the prediction reliability of the model will be greatly reduced. Therefore, research on how to make full use of the inexhaustible data resources of network images and user-provided labels for effective image label learning has received more attention.

传统的利用带噪标签进行图片标签学习的方法有设计鲁棒的损失函数、统计查询、模拟噪声特性等等。其中一些方法需要一部分干净的标签数据来辅助训练图像分类器；另外一些尝试建立模型模拟数据噪声的分布，图片中真实标签与用户提供标签之间的差异带来的噪声，却没有考虑的图片质量好坏，以及用户提供标签的准确程度，其分类识别的的效果达不到预期。Traditional approaches to image label learning using noisy labels include designing robust loss functions, statistical queries, simulating noise characteristics, and so on. Some of these methods require a part of clean label data to assist training image classifiers; others try to build models to simulate the distribution of data noise, the noise caused by the difference between the real label in the picture and the label provided by the user, but does not consider the quality of the picture Good or bad, as well as the accuracy of the labels provided by users, the effect of classification and recognition is not up to expectations.

发明内容Contents of the invention

本发明的目的是克服现有技术的不足，提供一种在标签含噪情况下基于质量嵌入的图像分类方法及系统，以解决现有技术中使用带噪声的标签图片训练图像分类器时不考虑图片标签本身质量的问题。The purpose of the present invention is to overcome the deficiencies of the prior art, and provide an image classification method and system based on quality embedding in the case of noisy labels, so as to solve the problem of using noisy label pictures to train image classifiers in the prior art. The quality of the image tag itself is a problem.

根据本发明的一个方面，提供一种在标签含噪情况下基于质量嵌入的图像分类方法，包括：According to one aspect of the present invention, there is provided an image classification method based on quality embedding in the case of label noise, comprising:

网络图片标签收集步骤：从网络图片分享平台上获取大量图片和用户提供的标签信息，按照所需种类进行过滤和整理，以便用于图像分类器的训练；Network picture label collection steps: obtain a large number of pictures and label information provided by users from the network picture sharing platform, filter and organize according to the required types, so as to be used for the training of image classifiers;

标签质量因子嵌入步骤：在有监督的图像分类模型中引入标签质量因子，用于控制带噪标签的预测值生成和吸收来自错误标签的误差回传信息；利用最大化对数似然函数，设计加入标签质量因子之后的优化目标函数；Label quality factor embedding step: Introduce the label quality factor in the supervised image classification model, which is used to control the prediction value generation of noisy labels and absorb the error feedback information from wrong labels; using the maximized logarithmic likelihood function, design The optimization objective function after adding the label quality factor;

网络模型构建步骤：利用深度神经网络对优化目标函数进行建模，得到整体网络模型，其包括四个子模型，分别为编码模型、采样模型、解码模型和分类模型；Network model construction steps: use the deep neural network to model the optimization objective function to obtain the overall network model, which includes four sub-models, namely the encoding model, sampling model, decoding model and classification model;

网络参数训练步骤：将网络图片标签收集步骤得到的训练图片和带噪声的标签输入网络模型构建步骤得到的整体网络模型，使用变种的随机梯度下降法端到端的联动训练上述四个子模型，同时更新四个子模型的模型参数；Network parameter training step: input the training pictures and noisy labels obtained in the network image label collection step into the overall network model obtained in the network model construction step, and use the variant stochastic gradient descent method to end-to-end linkage training of the above four sub-models, and update them at the same time Model parameters of the four sub-models;

图像分类步骤：对于要求分类的新图片，输入至训练好的分类模型，得到对图片真实标签的预测。Image classification step: For a new picture that requires classification, input it to the trained classification model to obtain the prediction of the real label of the picture.

优选地，所述网络图片标签收集步骤，运用了网络爬虫技术，在图片社交网站上收集所需要的大量图片以及用户标注的标签，并按照所需种类对标签和图片进行过滤和整理，比如保留含有总数m类中一个或一个以上标签的图片。Preferably, the network picture label collection step uses web crawler technology to collect a large number of pictures and labels marked by users on picture social networking sites, and filters and organizes labels and pictures according to the required types, such as keeping Images containing one or more labels from the total m categories.

优选地，所述标签质量因子嵌入步骤，在现有的有监督的图像分类模型中，加入图片标签质量因子的嵌入，使新的优化目标函数为：Preferably, in the step of embedding the tag quality factor, in the existing supervised image classification model, the embedding of the image tag quality factor is added, so that the new optimization objective function is:

其中x_m和y_m分别是第m张图片的像素集合和相应用户提供的噪声标签，z_m和s_m分别是代表图片真实标签和标签质量的隐藏变量，M代表用于训练的图片总数。新的优化目标函数由于加入了标签质量因子，对训练数据集中错误的标签造成的不良影响有吸收作用。同时，由于该目标函数的梯度函数难以计算，因此首先转而优化其证据下界(ELBO)，同时利用重参技巧简化训练所需的计算资源，得到最终的优化目标函数公式组合。where x _m and y _m are the pixel set of the mth image and the noise label provided by the corresponding user, respectively, z _m and s _m are the hidden variables representing the real label and label quality of the image, respectively, and M represents the total number of images used for training. The new optimization objective function can absorb the adverse effects caused by wrong labels in the training data set due to the addition of the label quality factor. At the same time, because the gradient function of the objective function is difficult to calculate, it first turns to optimize its evidence lower bound (ELBO), and at the same time uses the re-parameter technique to simplify the computing resources required for training, and obtains the final optimized objective function formula combination.

优选地，所述网络模型构建步骤，利用深度神经网络对最终的优化目标函数公式组合分别进行建模，得到整体网络模型，其包括四个子模型：编码模型、采样模型、解码模型和分类模型；Preferably, in the network model building step, a deep neural network is used to model the final optimized objective function formula combination to obtain an overall network model, which includes four sub-models: an encoding model, a sampling model, a decoding model and a classification model;

其中，所述编码模型，采用卷积神经网络，用于从图片内容X生成噪声标签的先验预测并联合噪声标签y对标签质量分布q(S|X,Y)和真实标签分布q(Z|X,Y)进行预测。Wherein, the encoding model adopts a convolutional neural network for generating noise labels from image content X The prior prediction of Combined with the noise label y, the label quality distribution q(S|X,Y) and the real label distribution q(Z|X,Y) are predicted.

其中，所述采样模型，用于将编码模型生成的标签质量分布q(S|X,Y)和真实标签分布q(Z|X,Y)映射为确切值S和Z。Wherein, the sampling model is used to map the label quality distribution q(S|X,Y) generated by the encoding model and the real label distribution q(Z|X,Y) to exact values S and Z.

其中，所述解码模型，所采用的方法为神经网络，其输入为采样模型的输出标签质量S和真实标签Z，用于生成对噪声标签的后验预测q(Y|Z,S)。Wherein, the decoding model adopts a neural network whose input is the output label quality S and the real label Z of the sampling model, and is used to generate the posterior prediction q(Y|Z,S) of the noise label.

其中，所述分类模型，所采用的方法为卷积神经网络，其利用图片生成对真实标签Z的预测。Wherein, the method used in the classification model is a convolutional neural network, which uses pictures to generate a prediction of the real label Z.

优选地，所述网络参数训练步骤，利用解码模型恢复的噪声标签后验预测q(Y|Z,S)进行有监督的模型训练，计算编码模型、采样模型、解码模型的回传梯度，更新这三个子模型的参数，同时，利用在编码模型中得到的真实标签分布q(Z|X,Y)对分类模型进行有监督的模型训练，计算神经网络回传梯度，更新分类模型的参数。Preferably, the network parameter training step uses the noise label posterior prediction q(Y|Z,S) restored by the decoding model to perform supervised model training, calculates the return gradient of the encoding model, sampling model, and decoding model, and updates The parameters of these three sub-models, at the same time, use the real label distribution q(Z|X,Y) obtained in the encoding model to perform supervised model training on the classification model, calculate the neural network return gradient, and update the parameters of the classification model.

优选地，所述图像分类步骤，将所需要进行图像分类的图片输入训练好的分类模型中，得到对图像真实标签的预测，同时产生图像的分类结果。Preferably, in the image classification step, the pictures required for image classification are input into the trained classification model to obtain the prediction of the real label of the image, and at the same time generate the classification result of the image.

根据本发明的第二方面，提供一种在标签含噪情况下基于质量嵌入的图像分类系统，包括：According to a second aspect of the present invention, an image classification system based on quality embedding in the case of noisy labels is provided, including:

网络图片标签收集模块：从网络图片分享平台上获取大量图片和用户提供的标签信息并按照所需种类进行过滤和整理；Network picture label collection module: obtain a large number of pictures and label information provided by users from the network picture sharing platform, and filter and organize according to the required types;

标签质量因子嵌入模块：在传统有监督的图像分类模型中引入标签质量因子来控制带噪标签的预测值生成和吸收来自错误标签的误差回传信息，计算图像分类模型对应的对数似然函数作为训练的优化目标函数；Label quality factor embedding module: Introduce the label quality factor in the traditional supervised image classification model to control the prediction value generation of noisy labels and absorb the error return information from the wrong label, and calculate the logarithmic likelihood function corresponding to the image classification model as the optimized objective function for training;

网络模型构建模块：用于利用深度神经网络对所述优化目标函数进行建模，分别得到编码模型、采样模型和解码模型和分类模型四个子模型；Network model construction module: used to utilize the deep neural network to model the optimization objective function, and obtain four sub-models of encoding model, sampling model, decoding model and classification model respectively;

网络参数训练模块：将训练图片和带噪声的标签输入整体网络模型，使用变种的随机梯度下降法端到端的联动训练四个子模型，同时更新这四个子模型的参数；Network parameter training module: input training pictures and noisy labels into the overall network model, use a variant of stochastic gradient descent method to end-to-end linkage training four sub-models, and update the parameters of these four sub-models at the same time;

新图像分类任务处理模块：对于要求分类的新图片，输入至训练好的分类模型，得到对图片真实标签的预测。New image classification task processing module: For new pictures that require classification, input them to the trained classification model to obtain the prediction of the real label of the picture.

优选地，所述标签质量因子嵌入模块，在现有的有监督的图像分类模型中，加入图片标签质量因子的嵌入，使新的优化目标函数为：Preferably, the label quality factor embedding module, in the existing supervised image classification model, adds the embedding of the picture label quality factor, so that the new optimization objective function is:

其中x_m和y_m分别是第m张图片的像素集合和相应用户提供的噪声标签，z_m和s_m分别是代表图片真实标签和标签质量的隐藏变量，M代表用于训练的图片总数；Among them, x _m and y _m are the pixel set of the mth picture and the noise label provided by the corresponding user, z _m and s _m are the hidden variables representing the real label and label quality of the picture respectively, and M represents the total number of pictures used for training;

新的优化目标函数由于加入了标签质量因子，对训练数据集中错误的标签造成的不良影响有吸收作用，同时，该新的优化目标函数的梯度函数难以计算，因此首先转而优化其证据下界，同时利用重参技巧简化训练所需的计算资源，得到最终的优化目标函数公式组合。Due to the addition of the label quality factor, the new optimization objective function can absorb the adverse effects caused by wrong labels in the training data set. At the same time, the gradient function of the new optimization objective function is difficult to calculate, so first turn to optimize its evidence lower bound, At the same time, the re-parameter technique is used to simplify the computing resources required for training, and the final optimized objective function formula combination is obtained.

优选地，所述网络模型构建模块，利用深度神经网络对最终的优化目标函数公式组合分别进行建模，得到四个模型：编码模型、采样模型、解码模型和分类模型；其中：Preferably, the network model building module uses a deep neural network to model the final optimized objective function formula combination respectively to obtain four models: encoding model, sampling model, decoding model and classification model; wherein:

所述编码模型，采用卷积神经网络，用于从图片内容X生成噪声标签的先验预测并联合噪声标签y对标签质量分布q(S|X,Y)和真实标签分布q(Z|X,Y)进行预测；The encoding model, using a convolutional neural network, is used to generate noise labels from image content X The prior prediction of Combined with the noise label y to predict the label quality distribution q(S|X,Y) and the real label distribution q(Z|X,Y);

所述采样模型，用于将编码模型生成的标签质量分布q(S|X,Y)和真实标签分布q(Z|X,Y)映射为确切值S和Z；The sampling model is used to map the label quality distribution q(S|X,Y) generated by the coding model and the real label distribution q(Z|X,Y) to exact values S and Z;

所述解码模型，所采用的方法为神经网络，其输入为采样模型的输出标签质量S和真实标签Z，用于生成对噪声标签的后验预测q(Y|Z,S)；The decoding model adopts a neural network whose input is the output label quality S and the real label Z of the sampling model, and is used to generate a posteriori prediction q(Y|Z,S) of the noise label;

所述分类模型，采用卷积神经网络，其利用图片生成对真实标签Z的预测。The classification model uses a convolutional neural network, which uses pictures to generate predictions for the true label Z.

优选地，所述网络参数训练模块，利用解码模型恢复的噪声标签后验预测q(Y|Z,S)进行有监督的模型训练，计算编码模型、采样模型、解码模型的回传梯度，更新这三个子模型的参数，同时，利用在编码模型中得到的真实标签分布q(Z|X,Y)对分类模型进行有监督的模型训练，计算神经网络回传梯度，更新分类模型的参数。Preferably, the network parameter training module uses the noise label posterior prediction q(Y|Z,S) restored by the decoding model to perform supervised model training, calculates the return gradient of the encoding model, sampling model, and decoding model, and updates The parameters of these three sub-models, at the same time, use the real label distribution q(Z|X,Y) obtained in the encoding model to perform supervised model training on the classification model, calculate the neural network return gradient, and update the parameters of the classification model.

优选地，所述网络图片标签收集模块，运用了网络爬虫技术，在图片社交网站上收集所需要的大量图片以及用户标注的标签。Preferably, the network image tag collection module uses web crawler technology to collect a large number of required images and tags marked by users on image social networking sites.

本发明是将图片真实标签、用户提供标签、图片标签质量三个变量统一建模，训练分类器时不仅预测每次输入图片数据的真实标签，而且推测出用户所上传的图片标签的质量，进而形成对含噪标签的监督学习，不断迭代直到训练收敛，得到所需的图像分类器，用于新的图像的分类任务。The present invention unifies the modeling of the three variables of the real label of the picture, the label provided by the user, and the quality of the picture label. When training the classifier, it not only predicts the real label of each input picture data, but also infers the quality of the picture label uploaded by the user, and then Form the supervised learning of noisy labels, iterate continuously until the training converges, and obtain the required image classifier for the classification task of new images.

与现有技术相比，本发明具有如下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明通过深入挖掘社交媒体上的图片标签数据，在图像分类模型中嵌入代表标签质量的隐藏变量，改进了现有使用带噪标签的分类器学习模式。通过重新设计误差回传梯度公式并且构造相应的神经网络模型，对真实标签、图片标签质量和用户提供的含噪标签同时进行预测，从而有效的吸收标签噪声引起的神经网络训练中的错误回传信息，有利于图片分类器的正确学习。The present invention improves the existing classifier learning mode using noisy labels by digging deep into the picture label data on social media and embedding hidden variables representing label quality in the image classification model. By redesigning the error return gradient formula and constructing the corresponding neural network model, the real label, image label quality and the noisy label provided by the user are simultaneously predicted, so as to effectively absorb the error return in the neural network training caused by the label noise. information, which is conducive to the correct learning of image classifiers.

使用本发明将有助于将社交媒体上大量存在并低廉可得的图片标签数据用于图像分类器的训练，从而有效节省专业标注所需的人力物力，同时避免标签噪声对分类器训练的影响，得到较为准确的图像分类结果。Using the present invention will help to use a large number of low-cost and available image label data on social media for the training of image classifiers, thereby effectively saving the manpower and material resources required for professional labeling, while avoiding the impact of label noise on classifier training , to obtain more accurate image classification results.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1为本发明一实施例中的方法流程图；Fig. 1 is a flow chart of the method in an embodiment of the present invention;

图2为本发明一实施例引入标签质量因子的有监督图像分类模型；Fig. 2 introduces the supervised image classification model of label quality factor according to an embodiment of the present invention;

图3为本发明一实施例所用的深度神经网络模型各个模块的构造图；Fig. 3 is a structural diagram of each module of the deep neural network model used in an embodiment of the present invention;

图4为本发明一实施例中系统框图。Fig. 4 is a system block diagram in an embodiment of the present invention.

具体实施方式detailed description

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明，但不以任何形式限制本发明。应当指出的是，对本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进。这些都属于本发明的保护范围。The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

本发明综合考虑将图片真实标签、用户提供标签、图片标签质量三个变量，提出在标签含噪情况下基于质量嵌入的图像分类技术。按照总体技术的实现来划分，主要分四部分：The present invention comprehensively considers the three variables of the real label of the picture, the label provided by the user, and the quality of the picture label, and proposes an image classification technology based on quality embedding when the label is noisy. Divided according to the realization of the overall technology, it is mainly divided into four parts:

(一)网络图片标签收集；(1) Collection of network image tags;

(二)标签质量因子嵌入；(2) Label quality factor embedding;

(三)网络模型构建与参数训练；(3) Network model construction and parameter training;

(四)新的图像分类任务处理。(4) New image classification task processing.

上述四部分构成了本发明中图像分类方法和系统，为了能更好的理解本发明，以下结合实施例对本发明的方法和系统实现进行介绍。The above four parts constitute the image classification method and system of the present invention. In order to better understand the present invention, the implementation of the method and system of the present invention will be introduced below in combination with embodiments.

如图1所示，为本实施例提供的分类方法的流程图，其中：As shown in Figure 1, it is a flowchart of the classification method provided by this embodiment, wherein:

(一)网络图片标签收集；(1) Collection of network image tags;

在基于照片分享网站Flickr公开的数据集YFCC100M上筛选带有所需的分类标签的图片，并利用网络爬虫技术进行收集下载和整理，总共获得M张图片。Based on the data set YFCC100M released by the photo sharing website Flickr, the pictures with the required classification labels are screened, and the web crawler technology is used to collect, download and organize, and a total of M pictures are obtained.

(二)标签质量因子嵌入；(2) Label quality factor embedding;

如图2所示，在现有有监督的图像分类模型中引入标签质量因子S，建立标签质量因子与其他变量的关系。根据图模型理论，将对数似然函数lnP(Y|X)重写为：As shown in Figure 2, the label quality factor S is introduced into the existing supervised image classification model, and the relationship between the label quality factor and other variables is established. According to the theory of graphical models, the log likelihood function lnP(Y|X) is rewritten as:

其中X代表图片内容集合，Y代表用户提供的噪声标签集合，x_m和y_m分别是图片m的内容和相应用户提供的噪声标签，z_m和s_m分别是代表图片真实标签和标签质量的隐藏变量，E代表期望。Where X represents the image content set, Y represents the noise label set provided by the user, x _m and y _m are the content of the image m and the noise label provided by the corresponding user, respectively, z _m and s _m represent the real label and label quality of the picture respectively Hidden variable, E stands for expectation.

本步骤中，将标签质量和真实标签以及用户提供标签统一建模，有利于正确理解噪声标签和真实标签之间的关联，降低噪声标签对图片分类器训练的影响，进而提高在标签带噪声情况下训练得到图片分类器的准确度。由该对数似然函数计算需要优化的目标函数:In this step, the unified modeling of label quality, real labels, and user-provided labels will help to correctly understand the relationship between noise labels and real labels, reduce the impact of noise labels on the training of image classifiers, and improve performance in the case of label noise. Next training to get the accuracy of the image classifier. The objective function to be optimized is calculated by the log-likelihood function:

1.根据变分推断的思想，并利用琴生不等式(Jensen's inequality)计算需要优化的目标函数的证据下界(ELBO)，共有三项，分别是1. According to the idea of variational inference, and using Jensen's inequality (Jensen's inequality) to calculate the evidence lower bound (ELBO) of the objective function to be optimized, there are three items, namely

以及从而简化训练分类器所需的计算量； as well as This simplifies the amount of computation required to train the classifier;

其中x_m和y_m分别是图片m的内容和相应用户提供的噪声标签，z_m和s_m分别是代表图片真实标签和标签质量的隐藏变量，D_KL[·||·]代表相对熵，E代表期望。where x _m and y _m are the content of image m and the noise label provided by the corresponding user, respectively, z _m and s _m are the hidden variables representing the real label and label quality of the image respectively, D _KL [·||·] represents the relative entropy, E is for expectation.

2.计算两个相对熵表达式的显式表达式，假设真实标签变量服从q(z_m|x_m,y_m)和P(z_m|x_m)的K维多项式分布，标签质量变量概率P(s_m)，服从均值为μ(x_m,y_m)，协方差为diag(σ(x_m,y_m))的多元高斯分布；2. Calculate the explicit expression of two relative entropy expressions, assuming that the real label variable obeys the K-dimensional polynomial distribution of q(z _m |x _m ,y _m ) and P(z _m |x _m ), the label quality variable probability P(s _m ), obeys the multivariate Gaussian distribution with mean value μ(x _m ,y _m ) and covariance diag(σ(x _m ,y _m ));

3.利用重参技巧构建真实标签z_m和标签质量s_m到补充随机变量γ_m和ζ_m的映射，可以解决传统蒙特卡洛采样方法带来的采样方差过大问题，提高训练分类器的稳定性。3. Using the re-parameter technique to construct the mapping of the real label z _m and label quality s _m to the supplementary random variables γ _m and ζ _m can solve the problem of excessive sampling variance caused by the traditional Monte Carlo sampling method and improve the performance of the training classifier. stability.

利用(耿贝尔-归一化指数函数，Gumbel-SoftMax Function)函数构造真实标签z_m到耿贝尔变量γ_m的映射，z_m＝g(γ_m)；Utilize the (Gumbel-Normalized Exponential Function, Gumbel-SoftMax Function) function to construct the mapping from the real label z _m to the Gumbel variable γ _m , z _m = g(γ _m );

构造标签质量s_m到标准正态变量ζ_m～N(0,1)的映射，s_m＝μ(x_m,y_m)+σ²(x_m,y_m)eζ_m，Construct the mapping of label quality s _m to standard normal variable ζ _m ～N(0,1), s _m = μ(x _m ,y _m )+σ ² (x _m ,y _m )eζ _m ,

其中μ(x_m,y_m)是标签质量s_m的均值，σ²(x_m,y_m)是标签质量s_m的方差。where μ(x _m , y _m ) is the mean of the label quality s _m , and σ ² (x _m , y _m ) is the variance of the label quality s _m .

据此，计算证据下界中期望项的显式表达式，将此作为需要优化的目标函数。Accordingly, the expected term in the calculation of the evidence lower bound An explicit expression of , using this as the objective function to be optimized.

1.如图3所示，根据需要优化的目标函数的证据下界表达式，利用神经网络进行整体建模，得到的网络模型包括四个子模型，共分为4个子模型：编码模型，采样模型，解码模型，分类模型。当一张图片及其含噪标签输入整体网络模型，整体网络模型首先开始正向传播：1. As shown in Figure 3, according to the evidence lower bound expression of the objective function to be optimized, the neural network is used for overall modeling, and the obtained network model includes four sub-models, which are divided into four sub-models: coding model, sampling model, Decoding model, classification model. When a picture and its noisy label are input into the overall network model, the overall network model first starts forward propagation:

a)编码模型根据图片内容x和带噪标签y对真实标签z和标签质量s的分布q(z|x,y)和q(s|x,y)做出预测。a) The encoding model makes predictions on the distribution q(z|x,y) and q(s|x,y) of the true label z and label quality s based on the image content x and the noisy label y.

b)采样模型根据编码模型给出的概率分布q(z|x,y)和q(s|x,y)，采样出真实标签z和标签质量s的具体值。b) Sampling model According to the probability distribution q(z|x,y) and q(s|x,y) given by the encoding model, the specific values of the real label z and the label quality s are sampled.

c)解码模型将采样模型得到的真实标签z和标签质量s的具体值输入神经网络，得到对噪声标签的预测，从而与图片给出的含噪标签计算交叉熵损失。c) The decoding model inputs the specific values of the real label z and label quality s obtained by the sampling model into the neural network to obtain the prediction of the noise label, and then calculates the cross-entropy loss with the noisy label given by the picture.

d)独立训练一个神经网络分类模型，用于预测图片的真实标签P(z)，并计算与编码模型给出的真实标签z分布q(z|x,y)之间的相对熵，从而耦合上述其余三个子模型。d) Independently train a neural network classification model to predict the real label P(z) of the picture, and calculate the relative entropy between the real label z distribution q(z|x,y) given by the encoding model, thereby coupling The remaining three sub-models above.

其中，所述编码模型，采用卷积神经网络，用于由图片内容x生成标签的先验预测P(y)，并联合噪声标签y对标签质量分布q(s|x,y)和真实标签分布q(z|x,y)进行预测。其中，生成标签质量的分布由对比层实现，生成真实标签的分布由相加层实现。Wherein, the encoding model uses a convolutional neural network to generate a priori prediction P(y) of the label from the image content x, and combines the noise label y to the label quality distribution q(s|x,y) and the real label Distribution q(z|x,y) for prediction. Among them, the distribution of generated label quality is implemented by the contrast layer, and the distribution of generated real labels is implemented by the addition layer.

其中，所述采样模型，用于将编码模型生成的标签质量分布q(s|x,y)和真实标签分布q(z|x,y)映射为确切值，包括标签质量s和真实标签z。采样模型中采取了重参技巧，从而降低采样结果的方差，使得模型训练更加稳定。Wherein, the sampling model is used to map the label quality distribution q(s|x,y) generated by the encoding model and the real label distribution q(z|x,y) to exact values, including the label quality s and the real label z . The re-referencing technique is adopted in the sampling model to reduce the variance of the sampling results and make the model training more stable.

其中，所述解码模型，采用神经网络，其输入为采样模型的输出标签质量s和真实标签z，用于由噪声层来恢复噪声标签的预测 Wherein, the decoding model adopts a neural network whose input is the output label quality s and the real label z of the sampling model, and is used to restore the prediction of the noise label by the noise layer

其中，所述神经网络分类模型，采用卷积神经网络，其输入是图片内容x，输出是对图片真实标签的预测P(z)，并利用在编码模型、解码模型运行过程中得到的真实标签分布q(z|x,y)计算得到相对熵，进行有监督的训练，得到所需的图像分类器。Wherein, the neural network classification model adopts a convolutional neural network, its input is the picture content x, and the output is the prediction P(z) of the real label of the picture, and the real label obtained during the operation of the encoding model and the decoding model is used The distribution q(z|x,y) is calculated to obtain the relative entropy, and supervised training is performed to obtain the required image classifier.

2.由解码模型的输出，即带噪标签预测和用户提供带噪标签y，一起输入损失层计算损失，并利用随机梯度下降法更新各个子模型的网络参数。2. From the output of the decoding model, that is, the noisy label prediction Provide the noisy label y with the user, input the loss layer together to calculate the loss, and use the stochastic gradient descent method to update the network parameters of each sub-model.

3.经过多轮反馈迭代，直至神经网络收敛，训练完成。3. After multiple rounds of feedback iterations, until the neural network converges, the training is completed.

运用(三)中训练完成的神经网络图像分类模型，当有需要分类的未标记的新图片时，将其输入至训练好的分类模型，得到对图片真实标签的预测。Using the neural network image classification model trained in (3), when there is a new unlabeled picture that needs to be classified, input it to the trained classification model to obtain the prediction of the real label of the picture.

如图4所示，在另一实施例中，对应于上述方法，一种在标签含噪情况下基于质量嵌入的图像分类系统的实施例，包括：As shown in Figure 4, in another embodiment, corresponding to the above method, an embodiment of an image classification system based on quality embedding in the case of label noise, including:

标签质量因子嵌入模块：用于在传统有监督的图像分类模型中引入标签质量因子来控制带噪标签的预测值生成和吸收来自错误标签的误差回传信息，计算整体网络模型(嵌入质量因子之后的图像分类模型)对应的对数似然函数作为训练的优化目标函数；Label quality factor embedding module: used to introduce the label quality factor in the traditional supervised image classification model to control the generation of the predicted value of the noisy label and absorb the error return information from the wrong label, and calculate the overall network model (after embedding the quality factor The corresponding logarithmic likelihood function of the image classification model) is used as the optimization objective function of training;

网络模型构建模块：用于利用深度神经网络对优化目标函数进行建模，分别得到编码模型、采样模型和解码模型和分类模型四个子模型；Network model building block: it is used to use the deep neural network to model the optimization objective function, and obtain four sub-models of encoding model, sampling model, decoding model and classification model respectively;

网络参数训练模块：将训练图片和带噪声的标签输入整体网络模型，使用变种的随机梯度下降法端到端的联动训练四个子模型，同时更新四个子模型的参数；Network parameter training module: input the training pictures and noisy labels into the overall network model, use the variant stochastic gradient descent method to train the four sub-models end-to-end linkage, and update the parameters of the four sub-models at the same time;

新的图像分类任务处理模块：对于要求分类的新图片，输入至训练好的分类模型，得到对图片真实标签的预测。New image classification task processing module: For a new picture that requires classification, input it to the trained classification model to obtain the prediction of the real label of the picture.

上述在标签含噪情况下基于质量嵌入的图像分类系统系统的具体模块的实现技术特征与在标签含噪情况下基于质量嵌入的图像分类方法的各步骤对应。The implementation technical features of the specific modules of the image classification system based on quality embedding in the case of noisy tags above correspond to the steps of the image classification method based on quality embedding in the case of noisy tags.

需要说明的是，本发明提供的方法中的步骤，可以利用所述系统中对应的模块、装置、单元等予以实现，本领域技术人员可以参照所述系统的技术方案实现所述方法的步骤流程，即，所述系统中的实施例可理解为实现所述方法的优选例，在此不予赘述。It should be noted that the steps in the method provided by the present invention can be realized by using the corresponding modules, devices, units, etc. in the system, and those skilled in the art can refer to the technical solution of the system to realize the step flow of the method , that is, the embodiments in the system can be understood as preferred examples for implementing the method, which will not be described in detail here.

本领域技术人员知道，除了以纯计算机可读程序代码方式实现本发明提供的系统及其各个装置以外，完全可以通过将方法步骤进行逻辑编程来使得本发明提供的系统及其各个装置以逻辑门、开关、专用集成电路、可编程逻辑控制器以及嵌入式微控制器等的形式来实现相同功能。所以，本发明提供的系统及其各项装置可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构；也可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。Those skilled in the art know that, in addition to realizing the system provided by the present invention and its various devices in a purely computer-readable program code mode, the system provided by the present invention and its various devices can be completely programmed with logic gates, logic gates, The same functions can be realized in the form of switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, the system provided by the present invention and its various devices can be considered as a hardware component, and the devices included in it for realizing various functions can also be regarded as the structure in the hardware component; Means for implementing various functions can be regarded as either a software module implementing a method or a structure within a hardware component.

以上对本发明的具体实施例进行了描述。需要理解的是，本发明并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变化或修改，这并不影响本发明的实质内容。在不冲突的情况下，本申请的实施例和实施例中的特征可以任意相互组合。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention. In the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other arbitrarily.

Claims

A kind of 1. image classification method based on quality insertion in the case of label is noisy, it is characterised in that：Including：

Network picture tag collection step：A large amount of pictures and the label letter of user's offer are provided from network picture sharing platform Breath, is filtered and is arranged according to required species, for use in the training of Image Classifier；

Label quality factor Embedded step：The label quality factor is introduced in the image classification model for have supervision, for controlling band Make an uproar label predicted value generation and absorb the error back information from error label；Using maximizing log-likelihood function, if The optimization object function that meter is added after the label quality factor；

Network model construction step：Optimization object function is modeled using deep neural network, obtains four models, respectively For encoding model, sampling model, decoded model and disaggregated model；

Network parameter training step：The training picture that network picture tag collection step is obtained and the label with noise input net The above-mentioned network model that network model construction step obtains, being linked end to end using the stochastic gradient descent method of mutation, it is above-mentioned to train Four models, while model parameter is updated, the network model trained；

Image classification step：New picture for requiring classification, inputs to the disaggregated model trained, obtains to the true mark of picture The prediction of label, while produce the classification results of image.
2. the image classification method based on quality insertion in the case of label is noisy according to claim 1, it is characterised in that： In the label quality factor Embedded step, have the image classification model of supervision existing, add picture tag quality because The insertion of son, makes the new optimization object function be：

<mrow> <mi>ln</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>|</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>ln</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>ln</mi> <mi> </mi> <msub> <mi>E</mi> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> </mrow> </msub> <mo>&lsqb;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>z</mi> <mi>m</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

Wherein x_mAnd y_mBe respectively m pictures pixel set and relative users provide noise label, z_mAnd s_mIt is generation respectively The hidden variable of table picture true tag and label quality, M represent the picture sum for training；

New optimization object function is bad caused by the label for concentrate mistake to training data due to adding the label quality factor Influence has absorption, meanwhile, the gradient function of the new optimization object function is difficult to calculate, therefore transfers to optimize its card first Simplify the required computing resource of training according to lower bound, while using skill is joined again, obtain final optimization object function formula combinations.
3. the image classification method based on quality insertion in the case of label is noisy according to claim 2, it is characterised in that： The network model construction step, final optimization object function formula combinations are built respectively using deep neural network Mould, obtain four models：Encoding model, sampling model, decoded model and disaggregated model；Wherein：

The encoding model, using convolutional neural networks, for generating noise label from image content XPriori prediction And combine noise label y q (S | X, Y) and true tag distribution q (Z | X, Y) are distributed to label quality and be predicted；

The sampling model, label quality distribution q (S | X, Y) and true tag for encoding model to be generated be distributed q (Z | X, Y explicit value S and Z) are mapped as；

The decoded model, using neutral net, it inputs the output label quality S and true tag Z for sampling model, is used for Generate and predict the posteriority of noise label q (Y | Z, S)；

The disaggregated model, using convolutional neural networks, it generates the prediction to true tag Z using picture.
4. the image classification method based on quality insertion in the case of label is noisy according to claim 3, it is characterised in that： The network parameter training step, the noise label posteriority recovered using decoded model predict that q (Y | Z, S) carries out the mould for having supervision Type training, calculation code model, sampling model, the passback gradient of decoded model, model parameter is updated, meanwhile, using encoding The true tag distribution q (Z | X, Y) obtained in model carries out the model training for having supervision to disaggregated model, calculates neutral net and returns Gradient is passed, updates model parameter.
5. the image classification method according to claim any one of 1-4 based on quality insertion in the case of label is noisy, its It is characterised by：In the network picture tag collection step, web crawlers technology has been used, institute is collected on picture social network sites The label of a large amount of pictures and user annotation that need.
A kind of 6. image classification system based on quality insertion in the case of label is noisy, it is characterised in that：Including：

Network picture tag collection module：A large amount of pictures and the label information of user's offer are provided from network picture sharing platform And filtered and arranged according to required species；

The label quality factor is embedded in module：The label quality factor is introduced in the image classification model that tradition has supervision to control band Make an uproar label predicted value generation and absorb the error back information from error label, calculate image classification model corresponding to logarithm Optimization object function of the likelihood function as training；

Network model builds module：For being modeled using deep neural network to the optimization object function, respectively obtain Encoding model, sampling model and decoded model and four models of disaggregated model；

Network parameter training module：Training picture and the label with noise are inputted into network model, use the stochastic gradient of mutation Descent method links end to end trains four models, while updates model parameter；

New images classification task processing module：New picture for requiring classification, input are obtained pair to the disaggregated model trained The prediction of picture true tag.
7. the image classification system based on quality insertion in the case of label is noisy according to claim 6, it is characterised in that： The label quality factor is embedded in module, has the image classification model of supervision existing, adds picture tag quality factor Insertion, make the new optimization object function be：

<mrow> <mi>ln</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>|</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>ln</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>ln</mi> <mi> </mi> <msub> <mi>E</mi> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> </mrow> </msub> <mo>&lsqb;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>z</mi> <mi>m</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

Wherein x_mAnd y_mBe respectively m pictures pixel set and relative users provide noise label, z_mAnd s_mIt is generation respectively The hidden variable of table picture true tag and label quality, M represent the picture sum for training；

New optimization object function is bad caused by the label for concentrate mistake to training data due to adding the label quality factor Influence has absorption, meanwhile, the gradient function of the new optimization object function is difficult to calculate, therefore transfers to optimize its card first Simplify the required computing resource of training according to lower bound, while using skill is joined again, obtain final optimization object function formula combinations.
8. the image classification method based on quality insertion in the case of label is noisy according to claim 7, it is characterised in that： The network model builds module, and final optimization object function formula combinations are built respectively using deep neural network Mould, obtain four models：Encoding model, sampling model, decoded model and disaggregated model；Wherein：

The encoding model, using convolutional neural networks, for generating noise label from image content XPriori prediction And combine noise label y q (S | X, Y) and true tag distribution q (Z | X, Y) are distributed to label quality and be predicted；

The sampling model, label quality distribution q (S | X, Y) and true tag for encoding model to be generated be distributed q (Z | X, Y explicit value S and Z) are mapped as；

The decoded model, used method are neutral net, and it is inputted as the output label quality S of sampling model and true Label Z, q (Y | Z, S) is predicted the posteriority of noise label for generating；

The disaggregated model, used method are convolutional neural networks, and it generates the prediction to true tag Z using picture.
9. the image classification method based on quality insertion in the case of label is noisy according to claim 8, it is characterised in that： The network parameter training module, the noise label posteriority recovered using decoded model predict that q (Y | Z, S) carries out the mould for having supervision Type training, calculation code model, sampling model, the passback gradient of decoded model, model parameter is updated, meanwhile, using encoding The true tag distribution q (Z | X, Y) obtained in model carries out the model training for having supervision to disaggregated model, calculates neutral net and returns Gradient is passed, updates model parameter.
10. the image classification system according to claim any one of 6-7 based on quality insertion in the case of label is noisy, its It is characterised by：The network picture tag collection module, has used web crawlers technology, on picture social network sites needed for collection The label of a large amount of pictures and user annotation wanted.