CN111461159A

CN111461159A - A Decoupled Representation Learning Algorithm Based on Similarity Constraints

Info

Publication number: CN111461159A
Application number: CN201910598166.0A
Authority: CN
Inventors: 李晓强; 陈亮波
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2020-07-28

Abstract

The invention discloses a decoupling representation learning algorithm based on similarity constraint, which aims to solve the problem of the existing InfoGAN in the unsupervised representation learning model. The specific steps of the present invention are as follows: step 1, prepare a data set; step 2, select the structure of the model: in order to generate a picture with better visual effect, the structure of WGAN using gradient penalty is adopted; step 3, impose similarity constraints on the model: in Based on the WGAN structure using gradient penalty, constraints are imposed on the similarity and factor of the generated images. The present invention proposes a similarity-constrained generative adversarial network (SCGAN). Compared with the InfoGAN model, the present invention has a simpler structure and only needs to add a similarity constraint on the basis of the original generative adversarial network. Meanwhile, SCGAN has a higher model Robustness, the performance is still stable in the processed dataset, and SCGAN is also an unsupervised representation learning model, so it can avoid expensive annotation work and has broad application prospects.

Description

A Decoupled Representation Learning Algorithm Based on Similarity Constraints

技术领域technical field

本发明涉及图片表示领域，具体是一种基于相似度约束的解耦表示学习算法。The invention relates to the field of picture representation, in particular to a decoupling representation learning algorithm based on similarity constraint.

背景技术Background technique

在概率统计理论中,生成模型是指能够从训练数据中估计出概率分布，并利用该分布随机生成新的观测数据的这样一类模型。要让一个优秀的机器学习算法试图去理解数据内在的规律，首先需要让该算法学会去创造，也就是说生成数据具有重要的意义。表示学习作为生成学习中一个热门的领域，受到学者的广泛关注。通过表示学习获得的高效的表示，可以辅助机器学习中的很多判别任务，例如分类、分割、检测等。解耦表示学习属于表示学习的一个子分支，目的是学习到能够控制图片的高级语义信息的因子。对于有监督的模型，例如CGAN(条件生成对抗网络)，显式提供了因子的标签，让因子学习去控制物品的类别。对于无监督的模型，例如InfoGAN(信息最大化生成对抗网络)，通过互信息来衡量因子和图片表示的关系，利用变分技术来最大化互信息的下界，进而让因子学习去控制图片潜在的表示，例如光照、色彩等。In probability and statistics theory, a generative model refers to a type of model that can estimate a probability distribution from training data and use this distribution to randomly generate new observation data. For an excellent machine learning algorithm to try to understand the inherent laws of data, it first needs to learn to create, that is to say, generating data is of great significance. Representation learning, as a hot field in generative learning, has received extensive attention from scholars. Efficient representations obtained through representation learning can assist many discriminative tasks in machine learning, such as classification, segmentation, detection, etc. Decoupled representation learning is a sub-branch of representation learning, which aims to learn factors that control the high-level semantic information of images. For supervised models, such as CGAN (Conditional Generative Adversarial Networks), the labels of the factors are provided explicitly, and the factors are learned to control the categories of items. For unsupervised models, such as InfoGAN (Information Maximization Generative Adversarial Network), mutual information is used to measure the relationship between factors and image representation, variational techniques are used to maximize the lower bound of mutual information, and factor learning is used to control the potential of the image. Representation, such as lighting, color, etc.

条件生成模型需要提供标签进行表示学习，在绝大多数情况下，获取标签需要昂贵的代价，并且由于条件生成模型对需要学习的表示提供了标签，因此模型捕捉到的表示是有限的，例如在手写字符集上捕捉到数字类型。Conditional generative models need to provide labels for representation learning. In most cases, obtaining labels is expensive, and because conditional generative models provide labels for the representations that need to be learned, the representations captured by the model are limited, such as in Number type is captured on handwritten character set.

在无监督的表示学习模型中，InfoGAN是比较经典的。该模型的思想是最大化因子和图片的互信息，直觉的解释是既然因子能控制图片的某种表示，那么因子必然和图片有着紧密的联系，而互信息正好可以用于衡量这种联系。但是InfoGAN的模型比较复杂，为了最大化互信息，InfoGAN使用变分技术，额外增加一个神经网络用于最大化这个下界。并且InfoGAN训练不太稳定，在一些处理过的数据集(随机平移过的手写字符数据集)上容易发生崩溃现象，人们也在进行相关方面的研究。Among unsupervised representation learning models, InfoGAN is a classic. The idea of this model is to maximize the mutual information between the factor and the picture. The intuitive explanation is that since the factor can control a certain representation of the picture, the factor must have a close relationship with the picture, and mutual information can be used to measure this connection. However, the model of InfoGAN is more complicated. In order to maximize mutual information, InfoGAN uses variational technology and adds an additional neural network to maximize this lower bound. And the training of InfoGAN is not stable, and it is prone to collapse on some processed datasets (randomly translated handwritten character datasets), and people are also conducting related research.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的在于提供一种基于相似度约束的解耦表示学习算法，以解决上述背景技术中提出的问题。The purpose of the embodiments of the present invention is to provide a decoupled representation learning algorithm based on similarity constraints, so as to solve the problems raised in the above background art.

为实现上述目的，本发明实施例提供如下技术方案：To achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

一种基于相似度约束的解耦表示学习算法，具体步骤如下：A decoupled representation learning algorithm based on similarity constraints, the specific steps are as follows:

步骤一，准备数据集；Step 1, prepare the dataset;

步骤二，选择模型的结构：为了生成视觉效果更佳的图片，采用使用梯度惩罚的WGAN(基于瓦瑟斯坦距离的生成对抗网络)的结构，WGAN使用了Wasserstein距离来替代原始GAN(生成对抗网络)中的Jensen-Shannon(杰森香农)散度，从而能有效地避免模式崩塌的现象，生成模式更佳丰富的图片；其次，WGAN的训练过程比原始的GAN更加稳定，更容易收敛，为了让判别器评估Wasserstein距离，需要约束判别器在1-Lipschitzd(李普希茨)限制下，而1-Lipschitzd约束的本质是要求判别器输出的变化程度要小于输入的变化程度，有一些技术可以让判别器近似达到1-Lipschitzd限制，例如权重裁剪和梯度惩罚，在实际操作中，梯度惩罚的效果要远好于权重裁剪，因此这里优先选择梯度惩罚的技术；Step 2: Choose the structure of the model: In order to generate better visual effects, the structure of WGAN (Wasserstein Distance-based Generative Adversarial Network) using gradient penalty is used. WGAN uses Wasserstein distance to replace the original GAN (Generation Adversarial Network). ), the Jensen-Shannon divergence in To allow the discriminator to evaluate the Wasserstein distance, it is necessary to constrain the discriminator under the 1-Lipschitzd (Lipschitz) limit, and the essence of the 1-Lipschitzd constraint is to require that the output of the discriminator change less than the input. The device approximates to the 1-Lipschitzd limit, such as weight clipping and gradient penalty. In practice, the effect of gradient penalty is much better than that of weight clipping, so gradient penalty technology is preferred here;

步骤三，对模型施加相似度约束：在使用梯度惩罚的WGAN结构的基础上，对生成图片的相似度和因子施加约束，其核心思想是当因子相同时，生成图片的表示差异应该尽量小，反之当因子不同时，生成图片的表示差异应该尽量大，相似度约束是通过让不同的因子之间产生排斥作用，相同的因子之间产生吸引作用，从而让不同的因子能够控制不同的表示，达到表示学习的目的。Step 3, impose similarity constraints on the model: On the basis of the WGAN structure using gradient penalty, constraints are imposed on the similarity and factors of the generated images. The core idea is that when the factors are the same, the representation difference of the generated images should be as small as possible. On the other hand, when the factors are different, the difference in the representation of the generated images should be as large as possible. The similarity constraint is to make different factors repel and attract the same factors, so that different factors can control different representations. achieve the purpose of expressing learning.

作为本发明实施例进一步的方案：步骤一中数据集包括简单数据集和复杂数据集，可以具有丰富的样本，最大限度的为模型提供最全面的样本。As a further solution of the embodiment of the present invention: in step 1, the data set includes simple data sets and complex data sets, which may have abundant samples, and provide the most comprehensive samples for the model to the greatest extent.

作为本发明实施例进一步的方案：简单数据集包括MNIST、Fashion-MNIST和SVHN，目标居中；数据集SVHN包含的是彩色的数字图片，背景颜色更加丰富，而且每张图可以出现多个数字的情况。As a further solution of the embodiment of the present invention: the simple data set includes MNIST, Fashion-MNIST and SVHN, and the target is centered; the data set SVHN contains colorful digital pictures, the background color is richer, and each picture can appear with multiple numbers. Happening.

作为本发明实施例进一步的方案：复杂数据集包括CIFAR-10和CelebA，数据集CIFAR-10包含了真实场景的图片，共10类物体，而且即使是同一类物体，也可能存在巨大的差异。As a further solution of the embodiment of the present invention: the complex data set includes CIFAR-10 and CelebA, the data set CIFAR-10 contains pictures of real scenes, a total of 10 types of objects, and even objects of the same type may have huge differences.

作为本发明实施例进一步的方案：由于生成图片的表示差异不容易获得，所以提出假设生成图片由内容和表示两部分组成，当控制内容相同时，生成图片的表示差异等价于生成图片的相似度差异，从而将约束从生成图片表示的差异转换到生成图片相似度的差异。As a further solution of the embodiment of the present invention: since the representation difference of the generated picture is not easy to obtain, it is proposed to assume that the generated picture is composed of content and representation. When the control content is the same, the representation difference of the generated picture is equivalent to the similarity of the generated picture. degree differences, thereby transforming the constraints from differences in generating image representations to differences in generating image similarities.

作为本发明实施例进一步的方案。MNIST和Fashion-MNIST这两个数据集包含了灰度图、黑色背景和目标，目标居中。As a further solution of the embodiment of the present invention. The two datasets, MNIST and Fashion-MNIST, contain grayscale images, black backgrounds, and objects, with objects centered.

作为本发明实施例进一步的方案：数据集CelebA包含了大量的明星人脸，拥有丰富的人脸表示。As a further solution of the embodiment of the present invention: the dataset CelebA contains a large number of star faces and has rich face representations.

与现有技术相比，本发明实施例的有益效果是：Compared with the prior art, the beneficial effects of the embodiments of the present invention are:

本发明提出了相似度约束的生成对抗网络(SCGAN)，相比InfoGAN模型，本发明的结构简单，只需要在原始的生成对抗网络基础上增加相似度约束即可，同时SCGAN(相似度约束的生成对抗网络)具有更高的模型鲁棒性，在处理过的数据集依然表现稳定，并且SCGAN也是无监督的表示学习模型，因此可以避免昂贵的标注工作，应用前景广阔。The present invention proposes a Similarity Constrained Generative Adversarial Network (SCGAN). Compared with the InfoGAN model, the present invention has a simpler structure and only needs to add a similarity constraint on the basis of the original Generative Adversarial Network. Generative Adversarial Networks) have higher model robustness and are still stable in processed datasets, and SCGAN is also an unsupervised representation learning model, so it can avoid expensive labeling work and has broad application prospects.

附图说明Description of drawings

图1为基于相似度约束的解耦表示学习算法中实施例1的工作流程图。FIG. 1 is a work flow chart of Embodiment 1 of the similarity constraint-based decoupling representation learning algorithm.

具体实施方式Detailed ways

下面结合具体实施方式对本专利的技术方案作进一步详细地说明。The technical solution of the present patent will be described in further detail below in conjunction with specific embodiments.

实施例1Example 1

步骤1，随机从高斯分布中采样噪声z，从多项分布或者均匀分布中采样因子c，因子c可以为离散因子，也可以为连续因子，输入生成器生成图片。Step 1, randomly sample the noise z from the Gaussian distribution, and sample the factor c from the multinomial distribution or uniform distribution. The factor c can be a discrete factor or a continuous factor, and input the generator to generate a picture.

步骤2，随机从数据集中采样一个批量的真实图片x和生成器输出中采样一个批量的生成图片G(z,c)，输入判别器让其区分真假。Step 2, randomly sample a batch of real images x from the data set and a batch of generated images G(z,c) from the generator output, and input the discriminator to let it distinguish between true and false.

步骤3，对因子c和生成图片G(z,c)施加相似度约束。Step 3, impose similarity constraints on the factor c and the generated image G(z,c).

步骤4，利用梯度下降算法分别对判别器和生成器的参数做更新。Step 4, using the gradient descent algorithm to update the parameters of the discriminator and generator respectively.

步骤5，重复步骤1-4直到生成器能够生成足够真实的图片，并且不同的因子c能够控制图片的不同表示。Step 5, repeat steps 1-4 until the generator can generate enough realistic pictures, and different factors c can control different representations of the pictures.

本发明实施例的工作原理是：本方法是基于生成对抗网络和相似度约束的，能够无监督地学习到解耦表示。由于是无监督的学习方式，因此不需要提前对数据集中的图片进行标注。图片的表示本身就是一个比较模糊的概念，很多时候人也很难鉴定出明确的划分边界，例如对图片的亮度的划分。而无监督的方式，则会自动去分析数据集潜在的特性，学习到一些比较明显图片表示的变化。The working principle of the embodiment of the present invention is that the method is based on a generative adversarial network and similarity constraints, and can learn decoupling representations unsupervised. Since it is an unsupervised learning method, there is no need to label the images in the dataset in advance. The representation of a picture itself is a relatively vague concept, and it is often difficult for people to identify a clear division boundary, such as the division of the brightness of the picture. The unsupervised method will automatically analyze the potential characteristics of the data set and learn some obvious changes in the image representation.

本方法的模型结构非常简单，只需要在原始的生成对抗网络的基础上增加相似度约束即可。而另一种无监督的用于解耦表示学习的模型InfoGAN，需要额外引入一个神经网络来优化互信息的下界，增加了模型的复杂度。并且从训练的难度上来讲，本方法的模型更容易收敛，特别是在一些处理过的数据集(随机平移过的手写字符数据集)上，InfoGAN在训练过程中发生了崩溃的现象，而本方法的模型则非常鲁棒。The model structure of this method is very simple, and it only needs to add similarity constraints on the basis of the original generative adversarial network. Another unsupervised model for decoupled representation learning, InfoGAN, needs to introduce an additional neural network to optimize the lower bound of mutual information, which increases the complexity of the model. And in terms of the difficulty of training, the model of this method is easier to converge, especially on some processed datasets (randomly translated handwritten character datasets), InfoGAN crashes during the training process, and this The model of the method is very robust.

本方法的模型有更好的可扩展性。由于相似度约束只作用于生成图片和因子之间，因此任何新型的生成对抗网络模型都可以集成我们的相似度约束。这样，本方法就可以利用这些新型的模型生成更高质量的图片，同时相似度约束能够保证因子捕捉到图片表示的变化。The model of this method has better scalability. Since the similarity constraint only acts between the generated image and the factor, any novel GAN model can integrate our similarity constraint. In this way, the method can utilize these novel models to generate higher quality images, while the similarity constraints can ensure that the factor captures the changes in the image representation.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the scope of the present invention. within the scope of protection. Any reference signs in the claims shall not be construed as limiting the involved claim.

此外，应当理解，虽然本说明书按照实施方式加以描述，但并非每个实施方式仅包含一个独立的技术方案，说明书的这种叙述方式仅仅是为清楚起见，本领域技术人员应当将说明书作为一个整体，各实施例中的技术方案也可以经适当组合，形成本领域技术人员可以理解的其他实施方式。In addition, it should be understood that although this specification is described in terms of embodiments, not each embodiment only includes an independent technical solution, and this description in the specification is only for the sake of clarity, and those skilled in the art should take the specification as a whole , the technical solutions in each embodiment can also be appropriately combined to form other implementations that can be understood by those skilled in the art.

Claims

1. a decoupling representation learning algorithm based on similarity constraint, is characterized in that, concrete steps are as follows:

Step 1, prepare the dataset;

Step 2, select the structure of the model: adopt the structure of WGAN using gradient penalty;

Step 3, impose similarity constraints on the model: On the basis of the WGAN structure using gradient penalty, constraints are imposed on the similarity and factors of the generated images.

2 . The similarity constraint-based decoupling representation learning algorithm according to claim 1 , wherein the data set in the first step includes a simple data set and a complex data set. 3 .

3. The similarity constraint-based decoupling representation learning algorithm according to claim 2, wherein the simple data set includes MNIST, Fashion-MNIST and SVHN.

The decoupled representation learning algorithm based on similarity constraint according to claim 2 or 3, wherein the complex data set includes CIFAR-10 and CelebA.

5 . The decoupling representation learning algorithm based on similarity constraint according to claim 1 , wherein the generated picture is composed of two parts: content and representation. 6 .

6 . The decoupling representation learning algorithm based on similarity constraint according to claim 3 , wherein the two data sets of MNIST and Fashion-MNIST include grayscale images, black backgrounds and targets. 7 .

7 . The similarity constraint-based decoupling representation learning algorithm according to claim 4 , wherein the dataset CelebA contains a large number of star faces. 8 .