CN115690487A

CN115690487A - Small sample image generation method

Info

Publication number: CN115690487A
Application number: CN202211230704.9A
Authority: CN
Inventors: 杨孟平; 王喆; 李冬冬; 杨海
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2022-10-09
Filing date: 2022-10-09
Publication date: 2023-02-03

Abstract

The invention discloses a small-sample image generation method, which is used for image generation in a scene with limited data. The small-sample image generation method provided by the present invention includes: randomly sampling from a dynamic Gaussian mixture distribution to obtain a dynamic Gaussian mixture hidden coding; The intermediate features are enhanced, and the intermediate features are obtained from the dynamic Gaussian mixture hidden coding mapping by the generating network, and the enhanced intermediate features are input into the generating network to obtain a generated image set; the generated image set and The real image set is input into the discrimination network to obtain the image discrimination result of the generated image set and the real image set; according to the image discrimination result and the target optimization function of the generation network and the discrimination network to update the The generating network and the discriminant network are used to obtain the updated generating network and discriminant network.

Description

A small sample image generation method

技术领域technical field

本发明属于计算机视觉领域，主要涉及小样本场景中的图像生成问题；主要应用于图像编辑、生成以及扩充增强等领域。The invention belongs to the field of computer vision, and mainly relates to the problem of image generation in small sample scenes; it is mainly applied to the fields of image editing, generation, expansion and enhancement, and the like.

背景技术Background technique

随着深度学习的不断发展，其在计算机视觉领域中取得了显著的进步，并被应用于各个领域。其中，深度图像生成模型，利用深度网络学习和理解图片的内容和分布，并学会生成与真实图片相类似的真实图片，是计算机视觉领域中的热点问题。深度图像生成模型是图像修复、编辑、超分辨率等任务的基础，可以应用于影视媒体、创意设计等各类领域。然而，深度图像生成模型的训练通常需要大量的数据和计算代价，极大的限制了生成模型在只具有少量图片领域的应用，如医学影像、名人画作等。将生成模型应用于小样本场景中是非常有应用和研究意义的方向，不仅能够通过生成数据实现小样本场景中的数据扩充，还能将生成数据用于辅助小样本分类、分割等问题。With the continuous development of deep learning, it has made remarkable progress in the field of computer vision and has been applied in various fields. Among them, the deep image generation model, which uses deep networks to learn and understand the content and distribution of pictures, and learn to generate real pictures similar to real pictures, is a hot issue in the field of computer vision. The deep image generation model is the basis for image inpainting, editing, super-resolution and other tasks, and can be applied to various fields such as film and television media and creative design. However, the training of deep image generation models usually requires a large amount of data and computational costs, which greatly limits the application of generative models in fields with only a few images, such as medical imaging and celebrity paintings. Applying generative models to small sample scenarios is a direction of great application and research significance. It can not only realize data expansion in small sample scenarios through generated data, but also use generated data to assist small sample classification, segmentation and other issues.

当给定少量的图片训练时，生成模型通常会过拟合并简单的记住训练数据，不能够产生真实和多样的图片。为了提高小样本场景中生成图像的真实性和多样性，研究者们提出了各种方法来减轻模型过拟合问题。一种直接的方法就是利用迁移学习的思想，假设存在与小样本数据相近且拥有大量数据的源域，首先对源域预训练，然后将源域中的知识迁移到小样本目标域，来提升小样本生成的多样性和真实性。然而，这类方法存在两个问题：第一，源域的预训练仍然需要大量的计算和采集代价；第二，当源域与小样本目标域存在一定偏差时，反而会降低小样本目标域的性能。When given a small number of images for training, generative models often overfit and simply memorize the training data, failing to produce realistic and diverse images. In order to improve the authenticity and diversity of generated images in small-sample scenes, researchers have proposed various methods to alleviate the model overfitting problem. A direct method is to use the idea of transfer learning, assuming that there is a source domain that is similar to the small sample data and has a large amount of data, first pre-train the source domain, and then transfer the knowledge in the source domain to the small sample target domain to improve Variety and authenticity of small sample generation. However, there are two problems with this type of method: first, the pre-training of the source domain still requires a lot of calculation and acquisition costs; second, when there is a certain deviation between the source domain and the small-sample target domain, it will reduce the small-sample target domain performance.

另一类小样本生成方法通过数据增强技术，将小样本数据进行翻转、平移等实现数据扩充，增加模型可用的训练数据。这类方法同样存在两个问题：第一，对原始图片进行翻转平移可能会改变原始数据的分布，从而误导生成模型产生不合理的生成图片；第二，对样本进行的增广本质上还是在同一批数据进行处理，并不会改变内在结构，所以模型还是很容易发生过拟合。Another type of small sample generation method uses data enhancement technology to flip and translate small sample data to achieve data expansion and increase the available training data for the model. This type of method also has two problems: first, flipping and translating the original image may change the distribution of the original data, thereby misleading the generative model to produce an unreasonable generated image; second, the augmentation of the sample is essentially still in the Processing the same batch of data will not change the internal structure, so the model is still prone to overfitting.

发明内容Contents of the invention

与现有的小样本图像生成方法不同，本发明从数据先验和数据本质出发，基于更复杂的先验信息能够给模型提供更多可编辑属性的假设，设计高斯动态混合隐空间编码作为生成模型的输入信号，提供更具多样性的混合高斯隐编码给生成模型。同时，为了进一步保证生成图片的多样性和真实性，本发明设计针对生成过程中间特征进行内容和布局的混合注意力增强模块，保证局部内容和全局布局的合理性和完整性。通过融合上述两个模块，本发明构建了一种小样本图像生成方法，在来源于包含漫画风格和真实照片等不同领域的小样本数据上，本发明取得了出色的效果。Different from the existing small-sample image generation methods, the present invention starts from the data prior and the essence of the data, and based on the assumption that more complex prior information can provide the model with more editable attributes, and designs Gaussian dynamic mixed latent space coding as the generation method. The input signal of the model provides a more diverse mixture of Gaussian hidden codes to the generative model. At the same time, in order to further ensure the diversity and authenticity of generated pictures, the present invention designs a mixed attention enhancement module for content and layout of intermediate features in the generation process to ensure the rationality and integrity of local content and global layout. By fusing the above two modules, the present invention constructs a small-sample image generation method, which achieves excellent results on small-sample data from different fields including comics and real photos.

本发明目的在于提供一种小样本图像生成方法，提高小样本场景下生成图片的保真度和丰富度。The purpose of the present invention is to provide a small-sample image generation method to improve the fidelity and richness of generated pictures in a small-sample scene.

本发明是一种小样本图像生成方法，通过引入动态混合高斯隐编码，给生成模型提供更多可变、可编辑的属性，解决现有小样本生成方法多样性不足的问题。同时，为了进一步提升生成图片的真实性，本发明提出混合注意力机制，对生成过程中间特征的全局布局和局部内容进行增强，有效的保留了中间特征的关键信息。通过融合上述方法，本发明有效地提高了小样本图像生成的多样性和真实性，缓解了小样本场景中模型训练不稳定和过拟合等问题。The invention is a small-sample image generation method, which provides more variable and editable attributes for the generation model by introducing dynamic mixed Gaussian hidden coding, and solves the problem of insufficient diversity in the existing small-sample generation methods. At the same time, in order to further improve the authenticity of generated pictures, the present invention proposes a mixed attention mechanism to enhance the global layout and local content of intermediate features in the generation process, effectively retaining the key information of intermediate features. By combining the above methods, the present invention effectively improves the diversity and authenticity of small-sample image generation, and alleviates the problems of model training instability and over-fitting in small-sample scenes.

为了方便地对本发明内容进行描述，首先给出一些常用术语的定义。In order to describe the content of the present invention conveniently, definitions of some commonly used terms are given first.

定义1：生成对抗网络(Generative Adversarial Networks)：生成对抗网络是最常用且被应用得最多的深度生成模型，本发明生成系统以生成对抗网络为基网络构建而来，生成对抗网络通常由生成网络(Generator,G)和判别网络(Discriminator,D)组成，其中生成网络G将从特定分布采样得到的隐编码(Latent Code)映射为生成图片，生成的图片与真实图片一并送入到判别网络D中，判别网络学习将生成图片和真实图片区分开来，生成对抗网络的训练目标函数为：Definition 1: Generative Adversarial Networks (Generative Adversarial Networks): Generative Adversarial Networks are the most commonly used and most widely used deep generation models. (Generator, G) and a discriminant network (Discriminator, D), where the generator network G maps the hidden code (Latent Code) obtained from a specific distribution sampling into a generated picture, and the generated picture is sent to the discriminant network together with the real picture In D, the discriminant network learns to distinguish the generated pictures from the real pictures, and the training objective function of the generative confrontation network is:

上述公式中，I_real表示真实数据分布，G(z)表示生成数据，log()表示求对数损失。D和G分别表示判别网络和生成网络，判别网络最大化生成数据和真实数据的分类损失，生成网络最小化判别网络对生成数据的分类损失。二者相互博弈，最终达到均衡状态。在均衡状态时，生成网络具备生成足够真实的图片的能力，判别网络不能区分图片到底是真实的还是生成的。In the above formula, I _real represents the real data distribution, G(z) represents the generated data, and log() represents the logarithmic loss. D and G represent the discriminative network and the generative network, respectively. The discriminant network maximizes the classification loss of generated data and real data, and the generative network minimizes the classification loss of the discriminant network for generated data. The two compete with each other and finally reach an equilibrium state. In the equilibrium state, the generation network has the ability to generate sufficiently realistic pictures, and the discriminant network cannot distinguish whether the picture is real or generated.

定义2：隐编码(Latent Code)：隐编码是生成网络G的输入，通常是从特定分布(如高斯分布、均匀分布等)中随机采样得到的固定长度的向量，用z表示。生成网络D学习将隐编码z映射为生成图片。Definition 2: Latent Code: The latent code is the input of the generator network G, usually a fixed-length vector randomly sampled from a specific distribution (such as Gaussian distribution, uniform distribution, etc.), denoted by z. The generative network D learns to map the latent code z to the generated image.

定义3：注意力机制(Attention Mechanism)：注意力机制受人类视觉注意力机制启发，学习图像中更应该受关注的内容，并增大相对应部分内容的权重。Definition 3: Attention Mechanism: Inspired by the human visual attention mechanism, the attention mechanism learns the content that should be more concerned in the image and increases the weight of the corresponding part of the content.

定义4：高斯混合分布(Gaussian Mixture Distribution)：高斯混合分布表示由N个高斯分布组合而成的分布，为了给本发明的生成网络提供更多可编辑和可变的属性，本发明从高斯混合分布中随机采样隐编码作为生成网络的输入，而不是像现有方法从单个高斯分布中采样隐编码作为生成网络的输入。Definition 4: Gaussian Mixture Distribution (Gaussian Mixture Distribution): Gaussian Mixture Distribution represents a distribution composed of N Gaussian distributions. In order to provide more editable and variable attributes for the generation network of the present invention, the present invention uses Gaussian Mixture The hidden codes are randomly sampled from the distribution as the input of the generative network, instead of sampling the hidden codes from a single Gaussian distribution as the input of the generative network as in the existing method.

上面公式中，

为第i个高斯分布的权重，(μ_i,∑_i)分别表示第i个高斯分布的均值和方差，z表示混合高斯分布。In the above formula,

is the weight of the i-th Gaussian distribution, (μ _i , ∑ _i ) represent the mean and variance of the i-th Gaussian distribution respectively, and z represents the mixed Gaussian distribution.

定义5：重参数化技巧(Reparameterization Trick)：由于从N个高斯分布采样得到隐编码会导致神经网络变得不可微分，本发明对高斯混合分布进行重参数化处理，首先从其中一个高斯分布采样得到δ，然后进行展平重参数化得到z：Definition 5: Reparameterization Trick (Reparameterization Trick): Since the hidden coding obtained from N Gaussian distribution sampling will cause the neural network to become non-differentiable, the present invention performs reparameterization processing on the Gaussian mixture distribution, and first samples from one of the Gaussian distributions Get δ, and then flatten and reparameterize to get z:

z＝u_i+σ_iδz＝u _i +σ _i δ

上面公式中，u_i和σ_i为网络可学习参数，δ为从均值为0方差为1的高斯分布中随机采样获得，即δ∈N(0,1)。In the above formula, u _i and σ _i are learnable parameters of the network, and δ is randomly sampled from a Gaussian distribution with a mean of 0 and a variance of 1, that is, δ∈N(0,1).

为了给生成网络进一步提供更多可变和可编辑的属性，本发明进一步引入动态调控因子λ，λ可以动态调控隐编码的高斯成分：In order to further provide more variable and editable attributes for the generation network, the present invention further introduces a dynamic regulation factor λ, which can dynamically regulate the Gaussian component of the hidden code:

z＝λu_i+(1-λ)σ_iδz＝λu _i +(1-λ)σ _i δ

上面公式中，λ为动态调控因子，u_i和σ_i为网络可学习参数，δ为从均值为0方差为1的高斯分布中随机采样获得，即δ∈N(0,1)。z表示获得的动态高斯混合隐编码。In the above formula, λ is a dynamic control factor, u _i and σ _i are network learnable parameters, and δ is randomly sampled from a Gaussian distribution with a mean of 0 and a variance of 1, that is, δ∈N(0,1). z represents the obtained dynamic Gaussian mixture hidden coding.

定义6：Sigmoid激活函数：激活函数将模型的输入通过非线性变化得到新的输出，其形式化定义为：Definition 6: Sigmoid activation function: The activation function changes the input of the model to obtain a new output through nonlinear changes, and its formalization is defined as:

上面公式中，e表示自然指数，z为激活函数的输入，g(z)为Sigmoid激活函数。In the above formula, e represents the natural index, z is the input of the activation function, and g(z) is the Sigmoid activation function.

定义7：卷积(Convolution)：卷积是针对输入图像像素点的空间依赖性对图像进行处理的一种技术。Definition 7: Convolution: Convolution is a technique for processing images based on the spatial dependence of input image pixels.

定义8：池化(Pooling)：池化将输入的特征图，按照一定的规则将特征图降维压缩，选取的规则包括最大池化、平均池化。Definition 8: Pooling: Pooling reduces the dimensionality of the input feature map according to certain rules, and the selected rules include maximum pooling and average pooling.

本发明所述的一种小样本图像生成方法，其特征在于，所述小样本图像生成方法包括：A method for generating a small sample image according to the present invention is characterized in that the method for generating a small sample image includes:

从动态高斯混合分布中随机采样获取动态高斯混合隐编码，所述动态高斯混合分布为引入动态调控因子的高斯混合分布；Obtaining a dynamic Gaussian mixture hidden code by random sampling from a dynamic Gaussian mixture distribution, the dynamic Gaussian mixture distribution being a Gaussian mixture distribution introducing a dynamic control factor;

将所述动态高斯混合隐编码输入到生成网络，通过混合注意力机制对所述生成网络的中间特征进行增强，所述中间特征为所述生成网络对所述动态高斯混合隐编码映射所得，增强后的中间特征输入到所述生成网络中，得到生成图像集合；The dynamic Gaussian mixture hidden coding is input to the generation network, and the intermediate features of the generation network are enhanced through a mixed attention mechanism, and the intermediate characteristics are obtained by mapping the dynamic Gaussian mixture hidden coding by the generation network, and the enhanced The final intermediate features are input into the generation network to obtain a set of generated images;

将所述生成图像集合和真实图像集合输入到判别网络中，得到对所述生成图像集合和所述真实图像集合的图像判别结果；Inputting the set of generated images and the set of real images into a discriminant network to obtain an image discrimination result for the set of generated images and the set of real images;

根据所述图像判别结果，以及所述生成网络和所述判别网络的目标优化函数更新所述生成网络和所述判别网络，得到更新后的生成网络和判别网络。Updating the generation network and the discrimination network according to the image discrimination result and the target optimization functions of the generation network and the discrimination network to obtain updated generation networks and discrimination networks.

本发明所述的一种小样本图像生成方法，帮助解决小样本场景下由于数据有限导致的过拟合、模型坍塌问题，提升生成图片的真实性和多样性。A small-sample image generation method described in the present invention helps to solve the problems of over-fitting and model collapse caused by limited data in a small-sample scene, and improves the authenticity and diversity of generated images.

本发明所述的一种小样本图像生成方法，其特征在于，所述动态高斯混合分布符合如下对应关系：A small-sample image generation method according to the present invention is characterized in that the dynamic Gaussian mixture distribution conforms to the following correspondence:

z＝λu_i+(1-λ)σ_iδz＝λu _i +(1-λ)σ _i δ

其中，z为所述动态高斯混合分布，λ为动态调控因子，可以调整动态高斯混合隐编码中高斯分布的成分，u_i和σ_i为网络可学习参数，δ为从均值为0方差为1的高斯分布中随机采样获得，即δ∈N(0,1)。Among them, z is the dynamic Gaussian mixture distribution, λ is a dynamic control factor, which can adjust the components of the Gaussian distribution in the dynamic Gaussian mixture hidden coding, u _i and σ _i are network learnable parameters, and δ is from the mean value of 0 to the variance of 1 Obtained by random sampling in the Gaussian distribution of , that is, δ∈N(0,1).

本发明所述的一种小样本图像生成方法，其特征在于，所述混合注意力机制包含空间注意力机制和通道注意力机制。所述混合注意力机制增强对生成网络的中间特征进行增强，所述中间特征为生成网络对动态高斯混合隐编码映射得到，其中包括了生成图片的全局布局和局部内容信息，使用所述混合注意力增强后帮助提高生成图片的真实性。A small-sample image generation method according to the present invention is characterized in that the hybrid attention mechanism includes a spatial attention mechanism and a channel attention mechanism. The mixed attention mechanism enhancement enhances the intermediate features of the generation network, and the intermediate features are obtained by mapping the dynamic Gaussian mixture hidden coding by the generation network, which includes the global layout and local content information of the generated picture. Using the mixed attention Force augmentation helps improve the realism of generated images.

本发明所述的一种小样本图像生成方法，其特征在于，所述空间注意力机制关注于特征的哪部分内容是最重要的，并对该部分内容进行增强。所述空间注意力机制首先利用池化操作将通道信息聚合，得到两个2D的特征图：

和

分别代表使用平均池化和最大池化后得到的所述特征图。接下来，对两个所述特征图进行连接并通过卷积操作得到所述空间注意力的所述特征图。所述空间注意力机制形式化描述为：A small-sample image generation method according to the present invention is characterized in that the spatial attention mechanism focuses on which part of the feature is the most important, and enhances this part of the content. The spatial attention mechanism first uses the pooling operation to aggregate channel information to obtain two 2D feature maps:

and

represent the feature maps obtained after using average pooling and max pooling, respectively. Next, the two feature maps are concatenated and the feature map of the spatial attention is obtained through a convolution operation. The formal description of the spatial attention mechanism is:

其中σ表示Sigmoid激活函数，AvgPool和MaxPool分别表示所述平均池化和所述最大池化，f^7×7表示以7×7大小为卷积核的卷积操作。F为所述特征图，

和

为所述平均池化和所述最大池化后得到的所述特征图。所述空间注意力关注于全局布局信息，增强全局布局信息有利于生成图像的整体合理性和真实性。Where σ represents the Sigmoid activation function, AvgPool and MaxPool represent the average pooling and the maximum pooling respectively, and f ^7×7 represents a convolution operation with a convolution kernel size of 7×7. F is the feature map,

and

is the feature map obtained after the average pooling and the maximum pooling. The spatial attention focuses on the global layout information, and enhancing the global layout information is conducive to the overall rationality and authenticity of the generated image.

本发明所述的一种小样本图像生成方法，其特征在于，所述通道注意力机制关注于特征图中的什么内容是值得关注的，为了计算所述通道注意力的特征图，首先使用平均池化和最大池化将空间信息进行挤压，得到两个2D所述特征图：

和

然后利用一个网络产生所述通道注意力的所述特征图M_c∈R^c×1×1，这个网络是具有一个隐层的多层感知机。所述通道注意力机制形式化描述为：A small-sample image generation method according to the present invention is characterized in that the channel attention mechanism focuses on what content in the feature map is worthy of attention, and in order to calculate the feature map of the channel attention, first use the average Pooling and maximum pooling squeeze spatial information to obtain two 2D feature maps:

and

The feature map M _c ∈ R ^c×1×1 of the channel attention is then generated using a network, which is a multi-layer perceptron with one hidden layer. The channel attention mechanism is formally described as:

其中σ表示Sigmoid激活函数，W₁和W₀为共享参数，AvgPool和MaxPool分别表示所述平均池化和所述最大池化，MLP为多层感知机。

和

为使用所述平均池化和最大池化后得到的所述特征图。所述通道注意力关注于局部内容信息，增强局部内容信息有利于生成图像的局部真实性和细节逼真性。Wherein σ represents the Sigmoid activation function, W ₁ and W ₀ are shared parameters, AvgPool and MaxPool represent the average pooling and the maximum pooling respectively, and MLP is a multi-layer perceptron.

and

is the feature map obtained after using the average pooling and maximum pooling. The channel attention focuses on the local content information, and enhancing the local content information is conducive to the local authenticity and detail fidelity of the generated image.

本发明所述的一种小样本图像生成方法，其特征在于，所述根据所述图像判别结果，根据所述生成网络和所述判别网络的目标优化函数更新所述生成网络和所述判别网络的参数。包括：A small-sample image generation method according to the present invention is characterized in that, according to the image discrimination result, the generation network and the discrimination network are updated according to the target optimization function of the generation network and the discrimination network parameters. include:

将所述图像判别结果带入所述判别网络的所述目标优化函数，更新所述判别网络的参数，其中，所述判别网络的所述目标优化函数符合如下对应关系：Bringing the image discrimination result into the target optimization function of the discrimination network, updating the parameters of the discrimination network, wherein, the target optimization function of the discrimination network conforms to the following corresponding relationship:

其中E表示期望，I_real为所述真实图像集合，z为所述动态高斯混合分布，G(z)为所述生成图像集合，D(x)为所述图像判别结果，min()表示最小化，L_recons为重构损失，重构损失帮助提升所述判别网络提取特征的能力，从而提升所述判别网络判别能力。Where E represents expectation, I _real is the real image set, z is the dynamic Gaussian mixture distribution, G(z) is the generated image set, D(x) is the image discrimination result, and min() represents the minimum , L _recons is the reconstruction loss, and the reconstruction loss helps to improve the ability of the discriminant network to extract features, thereby improving the discriminative ability of the discriminant network.

将所述图像判别结果带入所述生成网络的所述目标优化函数，更新所述生成网络的参数，其中，所述生成网络的所述目标优化函数符合如下对应关系：Bringing the image discrimination result into the target optimization function of the generation network, updating the parameters of the generation network, wherein, the target optimization function of the generation network conforms to the following correspondence:

L_G＝-E_x～G(z)[D(x)]L _G ＝-E _x～G(z) [D(x)]

其中E表示期望，z为所述动态高斯混合分布，x～G(z)为所述生成图像集合，D(x)为所述图像判别结果。Where E represents expectation, z is the dynamic Gaussian mixture distribution, x˜G(z) is the set of generated images, and D(x) is the image discrimination result.

本发明所述的一种小样本图像生成方法，其特征在于，在所述根据所述生成网络和所述判别网络的目标优化函数更新所述生成网络和所述判别网络的参数之后，所述方法还包括：根据所述更新后的生成网络判别网络，所述更新后的生成网络用于生成图像数据用于数据增强、分类和分割中的至少一项。A small-sample image generation method according to the present invention is characterized in that, after updating the parameters of the generation network and the discrimination network according to the target optimization function of the generation network and the discrimination network, the The method also includes identifying a network based on the updated generative network used to generate image data for at least one of data augmentation, classification, and segmentation.

本发明所述的一种小样本图像生成方法，其特征在于，包括生成器和判别器，所述生成器与所述生成网络耦合，所述判别器与所述判别网络耦合，所述生成方法用于执行所述生成器与判别器的程序。A small-sample image generation method according to the present invention is characterized in that it includes a generator and a discriminator, the generator is coupled to the generation network, the discriminator is coupled to the discrimination network, and the generation method Programs for implementing the generator and discriminator.

本发明的有益效果是：本发明设计的小样本图像生成方法以所述动态混合高斯隐编码为输入，为生成网络提供了更多可编辑和更多变的属性，提升生成样本的多样性；利用所述混合注意力机制对所述生成过程中间特征的局部内容和全局布局进行增强，提升生成样本的真实性。二者融合在一起，减轻了生成模型的过拟合问题，在小样本场景下仍然能生成足够真实且多样的图片。本发明并不局限于特定的生成模型，可以适配性的嵌套入其他模型中，帮助提升生成样本的多样性和真实性，避免生成模型常出现的模式崩塌等问题。生成的图片也可以被用于图像分类、分割等领域。The beneficial effects of the present invention are: the small-sample image generation method designed by the present invention takes the dynamic mixed Gaussian hidden coding as input, provides more editable and more variable attributes for the generation network, and improves the diversity of generated samples; The mixed attention mechanism is used to enhance the local content and global layout of the intermediate features in the generation process to improve the authenticity of the generated samples. The combination of the two alleviates the over-fitting problem of the generative model, and can still generate sufficiently realistic and diverse pictures in small sample scenarios. The present invention is not limited to a specific generative model, and can be adaptively nested into other models to help improve the diversity and authenticity of generated samples, and avoid problems such as mode collapse that often occur in generative models. The generated pictures can also be used in image classification, segmentation and other fields.

附图说明Description of drawings

图1是本发明实施例提供的整体训练流程图。Fig. 1 is an overall training flow chart provided by the embodiment of the present invention.

图2是本发明实施例提供的整体框架图。Fig. 2 is an overall frame diagram provided by an embodiment of the present invention.

图3是本发明实施例提供的混合注意力机制图。Fig. 3 is a diagram of a hybrid attention mechanism provided by an embodiment of the present invention.

图4是本发明实施例提供的在艺术绘画风景和真实动物照片数据集的生成效果图。Fig. 4 is an effect diagram of the generation of artistic painting scenery and real animal photo data sets provided by the embodiment of the present invention.

图5是本发明实施例提供的在动漫人脸和真实人脸照片数据集的生成效果图。Fig. 5 is an effect diagram of generating an animation face and a real face photo data set according to an embodiment of the present invention.

具体实施方式Detailed ways

以下将结合图例具体阐述本发明的实施方式，为明确发明实施过程，系统实务细节也将详细说明。然而，这些实务细节并不会将本发明限制在所述的实施例范围之中。The implementation of the present invention will be described in detail below with reference to the illustrations. In order to clarify the implementation process of the invention, the practical details of the system will also be described in detail. However, these practical details do not limit the invention to the described embodiments.

本发明是一种小样本图像生成方法，利用动态高斯混合隐编码作为生成网络的输入，为生成网络提供更丰富的先验信息以及更多可编辑的属性信息；生成网络将隐编码映射为生成图片，在生成网络产生中间图片的过程中，中间特征表示包含了最终生成图片的局部内容和全局布局信息，使用混合注意力机制对中间表示的内容和布局信息进行增强，最后生成图片；将生成图片和真实图片输入到判别模型中，判别模型需要鉴别给定图片是生成的还是真实的；通过判别损失来对生成网络和判别网络进行更新，生成网络要学会生成尽可能接近真实分布的图片，判别网络要尽可能将真实图片和生成图片区分开来，二者相互博弈，在不断训练中变好，最终达到均衡状态。The invention is a small-sample image generation method, which uses dynamic Gaussian mixture hidden coding as the input of the generation network to provide richer prior information and more editable attribute information for the generation network; the generation network maps the hidden coding to generate Image, in the process of generating the intermediate image generated by the network, the intermediate feature representation contains the local content and global layout information of the final generated image, using the mixed attention mechanism to enhance the content and layout information of the intermediate representation, and finally generate the image; will generate The picture and the real picture are input into the discriminant model, and the discriminant model needs to identify whether the given picture is generated or real; the generative network and the discriminant network are updated through the discriminative loss, and the generative network must learn to generate pictures that are as close to the real distribution as possible. The discriminant network should distinguish the real picture from the generated picture as much as possible, and the two will play games with each other, get better during continuous training, and finally reach an equilibrium state.

本发明基于所述的生成网络和判别网络在小样本图像数据集上学习生成具有真实性和多样性的图片。训练流程参见图1，总体框架参见图2。The present invention learns to generate pictures with authenticity and diversity on a small sample image data set based on the generation network and the discrimination network. See Figure 1 for the training process and Figure 2 for the overall framework.

本发明的具体执行过程如下：Concrete execution process of the present invention is as follows:

步骤1：参数初始化Step 1: Parameter initialization

初始化训练图片尺寸D、训练集P、Batch Size、训练迭代次数T、对生成网络G和判别网络D进行随机初始化；Initialize training image size D, training set P, Batch Size, number of training iterations T, random initialization of generation network G and discriminant network D;

步骤2：采样动态高斯混合隐编码和数据集样本Step 2: Sampling Dynamic Gaussian Mixture Hidden Coding and Dataset Samples

从动态高斯混合分布中随机采样m个隐编码{z₁,…,z_m}，从训练集P中随机采样m张原始训练图片{I₁,…,I_m}，动态高斯混合分布为：Randomly sample m hidden codes {z ₁ ,…,z _m } from the dynamic Gaussian mixture distribution, randomly sample m original training pictures {I ₁ ,…,I _m } from the training set P, and the dynamic Gaussian mixture distribution is:

z＝λu_i+(1-λ)σ_iδ,δ∈N(0,1)z=λu _i +(1-λ)σ _i δ,δ∈N(0,1)

其中，λ为动态调控因子，可以动态地调整混合隐编码中高斯分布的成分，δ为从均值为0，方差为1的高斯分布中随机采样的向量。Among them, λ is a dynamic control factor, which can dynamically adjust the components of the Gaussian distribution in the hybrid hidden coding, and δ is a vector randomly sampled from a Gaussian distribution with a mean of 0 and a variance of 1.

步骤3：对m张原始图片进行预处理，将原始图片进行水平翻转、随机裁剪以及标准化，并将数据表示为张量形式；Step 3: Preprocess m original images, horizontally flip, randomly crop and standardize the original images, and represent the data in the form of tensors;

步骤4：将隐编码输入到生成网络中，在生成的中间过程，去除中间特征表示，利用混合注意力机制对中间特征表示的内容和布局进行增强，增强后得到的特征表示继续用于生成网络，得到m张生成图片并处理成为与训练图片一样的格式{G(z₁),…,G(z_m)}；Step 4: Input the hidden code into the generation network. In the middle process of generation, remove the intermediate feature representation, use the mixed attention mechanism to enhance the content and layout of the intermediate feature representation, and the enhanced feature representation will continue to be used in the generation network. , get m generated pictures and process them into the same format as the training pictures {G(z ₁ ),…,G(z _m )};

步骤4中所述混合注意力机制包括空间注意力机制以及通道注意力机制，关注于特征图的局部内容和全部布局信息，流程如图3所示。The mixed attention mechanism described in step 4 includes a spatial attention mechanism and a channel attention mechanism, focusing on the local content and all layout information of the feature map. The process is shown in Figure 3.

步骤5：将m张生成图片{G(z₁),…,G(z_m)}和m张真实图片{I₁,…,I_m}输入到判别网络中，真实图片的标签为“real”，生成图片的标签为“fake”，让判别网络进行判别；Step 5: Input m generated pictures {G(z ₁ ),…,G(z _m )} and m real pictures {I ₁ ,…,I _m } into the discriminant network, and the label of the real picture is “real ", the label of the generated picture is "fake", and let the discriminant network make the discriminative;

步骤6：训练判别网络Step 6: Train the discriminative network

通过最小目标损失函数的方式来提高判别网络的判别概率，在判别网络的损失函数上利用梯度下降进行反向传播，更新判别网络参数；The discriminative probability of the discriminant network is improved by means of the minimum target loss function, and the gradient descent is used for backpropagation on the loss function of the discriminant network to update the discriminant network parameters;

判别网络的损失函数定义为：The loss function of the discriminative network is defined as:

其中E表示期望，I_real为真实训练样本，z为所述动态高斯混合分布，G(z)为生成样本，min()表示最小化，D(x)为图像判别结果，L_recons为重构损失，重构损失帮助提升判别网络提取特征的能力，从而提升判别能力。Where E represents expectation, I _real is the real training sample, z is the dynamic Gaussian mixture distribution, G(z) is the generated sample, min() represents the minimization, D(x) is the image discrimination result, L _recons is the reconstruction Loss, the reconstruction loss helps to improve the ability of the discriminant network to extract features, thereby improving the discriminative ability.

步骤7：训练生成网络Step 7: Train the generative network

生成网络在判别网络的指导下不断生成图片，生成的图片需要尽可能的与真实图片相似，让判别网络混淆，通过最小化生成网络的目标损失函数，增大判别网络误判的概率，利用梯度下降对生成网络进行反向传播，更新生成网络参数；The generation network continuously generates pictures under the guidance of the discriminant network. The generated pictures need to be as similar as possible to the real picture to confuse the discriminant network. By minimizing the target loss function of the generative network, the probability of misjudgment by the discriminant network is increased, and the gradient is used to Descent performs backpropagation on the generated network, and updates the parameters of the generated network;

生成网络的损失函数定义为：The loss function of the generative network is defined as:

L_G＝-E_x～G(z)[D(G(z))]L _G ＝-E _x～G(z) [D(G(z))]

步骤8：检查迭代次数，本发明设置的总迭代次数为50000次，反复执行步骤2-步骤7直至达到终止条件，并且每迭代10000次保存一次模型参数，最终获得5个模型，利用保存的模型可以读取生成网络参数，生成图片用于可视化对比和量化指标对比，生成的图片也可以用于数据增强，帮助提升分类、分割等任务。Step 8: Check the number of iterations. The total number of iterations set by the present invention is 50,000 times. Step 2-Step 7 is repeatedly executed until the termination condition is reached, and the model parameters are saved once every 10,000 iterations. Finally, 5 models are obtained, and the saved model is used It can read and generate network parameters, and generate pictures for visual comparison and quantitative index comparison. The generated pictures can also be used for data enhancement to help improve tasks such as classification and segmentation.

实验设计experimental design

实验数据集Experimental dataset

实验数据集选取来源于包括动漫、绘画、人脸、风景等不同风格的小样本图像数据集，包括256*256*3,512*512*3以及1024*1024*3不同分辨率，且所有的数据集都不超过1000张图片，数据及其有限，数据的详细介绍如表1，具有较大的挑战，同时也具有很强的应用和研究意义。The experimental data set is selected from small sample image data sets of different styles including animation, painting, face, landscape, etc., including 256*256*3, 512*512*3 and 1024*1024*3 different resolutions, and all data sets There are no more than 1000 pictures, and the data is extremely limited. The detailed introduction of the data is shown in Table 1. It is a big challenge, but also has strong application and research significance.

表1 实验数据集介绍Table 1 Introduction of experimental data set

对比算法Comparison Algorithms

本发明针对小样本场景下的图像生成，对比算法包括同样针对有限样本场景下的当前最好的方法StyleGAN2、DiffAug、ADA、FastGAN。The present invention is aimed at image generation in a small-sample scene, and the comparison algorithm includes the current best methods StyleGAN2, DiffAug, ADA, and FastGAN also in a limited-sample scene.

评价指标Evaluation index

对生成图片的真实性和多样性评价常用的指标是FID。FID计算真实图片与生成图片的距离，参考常用设置，选取真实训练图片作为参考图片，生成5000张图片，计算二者之间的分布距离，数值越小，代表生成图片越接近真实图片，也就是说性能越好。A commonly used indicator for evaluating the authenticity and diversity of generated images is FID. FID calculates the distance between the real picture and the generated picture. Referring to the common settings, select the real training picture as the reference picture, generate 5000 pictures, and calculate the distribution distance between the two. The smaller the value, the closer the generated picture is to the real picture, that is Say the better the performance.

FID的计算公式为：The calculation formula of FID is:

FID＝||μ_r-μ_g||²+Tr(∑_r+∑_g-2(∑_r∑_g)^1/2)FID＝||μ _r -μ _g || ² +Tr(∑ _r +∑ _g -2(∑ _r ∑ _g ) ^1/2 )

其中μ_r和μ_g分别代表真实图片和生成图片的特征均值，∑_r和∑_g分别代表真实图片和生成图片的协方差矩阵，Tr表示求迹，||·||²表示二范数。Among them, μ _r and μ _g represent the feature mean of the real image and the generated image, respectively, ∑ _r and ∑ _g represent the covariance matrix of the real image and the generated image, respectively, Tr represents the trace, and ||·|| ² represents the two-norm.

实验结果Experimental results

表2 本发明与对比方法在256*256*3数据集上的实验结果Table 2 The experimental results of the present invention and the comparison method on the 256*256*3 data set

表3 本发明与对比方法在512*512*3数据集上的实验结果Table 3 The experimental results of the present invention and the comparison method on the 512*512*3 data set

DatasetsDatasets AnimeFaceAnime Face ArtPaintingArtPainting MoongateMoongate Flatflat FauvismFauvism StyleGAN2StyleGAN2 152.73152.73 74.5674.56 288.25288.25 285.61285.61 181.91181.91 DiffAugDiff Aug 135.85135.85 49.2549.25 136.12136.12 310.14310.14 223.58223.58 ADAADA 59.6759.67 46.3846.38 149.06149.06 248.46248.46 201.99201.99 FastGANFastGAN 59.3859.38 45.0845.08 122.29122.29 240.24240.24 182.14182.14 OursOurs 53.3653.36 44.5044.50 112.14112.14 200.99200.99 176.43176.43

表4 本发明与对比方法在1024*1024*3数据集上的实验结果Table 4 The experimental results of the present invention and the comparison method on the 1024*1024*3 data set

DatasetsDatasets PokemonPokémon SkullsSkulls ShellsShells FFHQFFHQ FlowersFlowers StyleGAN2StyleGAN2 190.23190.23 127.98127.98 241.37241.37 -- 45.2345.23 DiffAugDiff Aug 62.7362.73 124.23124.23 151.94151.94 48.8848.88 37.0937.09 ADAADA 66.4166.41 97.0597.05 136.52136.52 40.6340.63 27.3627.36 FastGANFastGAN 57.1957.19 130.05130.05 155.47155.47 47.7847.78 25.6625.66 OursOurs 47.0447.04 99.0299.02 134.20134.20 44.0144.01 24.0624.06

表2，3，4给出了本发明方法与对比方法在不同分辨率数据集上的实验结果，可以看出，在样本量有限的场景下，本发明能够生成具备更高真实性和多样性的图片，证明了本发明作为小样本图像生成的有效性和优越性。Tables 2, 3, and 4 show the experimental results of the method of the present invention and the comparison method on data sets of different resolutions. It can be seen that in the case of a limited sample size, the present invention can generate images with higher authenticity and diversity. The picture proves the effectiveness and superiority of the present invention as a small sample image generation.

可视化分析visual analysis

为了更好的调查本发明在小样本图像数据集上生成效果的多样性和真实性，针对不同分辨率，生成并整理可视化结果展示，参见图4和图5。从图中可视化效果可以看出，本发明产生的图片与真实图片较为接近，在不同分辨率的数据集上都取得了较好的效果，本发明产生的图片在局部内容、全局布局等方面都相当合理。In order to better investigate the diversity and authenticity of the effects generated by the present invention on small-sample image datasets, for different resolutions, the visualization results are generated and sorted out, see Figure 4 and Figure 5. It can be seen from the visualization effect in the figure that the pictures produced by the present invention are relatively close to the real pictures, and have achieved good results on data sets with different resolutions. The pictures produced by the present invention are excellent in terms of local content and global layout. quite reasonable.

综上所述，本发明所提出的小样本图像生成方法，能够显著提升小样本场景下生成图片的多样性和真实性，从定量和定性结果分析，验证了本发明系统的有效性和实用性。同时，本发明所生成的图片可以用于数据有限场景下广泛的任务，包括数据增强、用于分类、分割等。此外，本发明也为领域内其他相关问题提供了参考，本发明的原理和思想可以拓展延伸到相关的其他应用场景，具有良好的参考和借鉴意义，同时也提供了十分广阔的应用前景。To sum up, the small sample image generation method proposed by the present invention can significantly improve the diversity and authenticity of the generated pictures in the small sample scene, and the effectiveness and practicability of the system of the present invention are verified from the analysis of quantitative and qualitative results . At the same time, the pictures generated by the present invention can be used for a wide range of tasks in data-limited scenarios, including data enhancement, classification, segmentation, and the like. In addition, the present invention also provides a reference for other related issues in the field. The principles and ideas of the present invention can be extended to other related application scenarios, which has good reference and reference significance, and also provides a very broad application prospect.

以上所述为本发明的具体实施方式，并不将本发明限制于所述实例，对于本领域技术人员而言，本发明可以适配于各种不同的模型，也可以根据具体的任务进行适配性的调整和更改。凡在本发明的原理范围内所做的任何修改、替换和改进等，都应当包含在本发明的权利要求范围内。The above descriptions are specific implementations of the present invention, and the present invention is not limited to the examples. For those skilled in the art, the present invention can be adapted to various models, and can also be adapted according to specific tasks. Matching adjustments and changes. All modifications, substitutions and improvements made within the principle scope of the present invention shall be included in the claims of the present invention.

Claims

1. A small sample image generation method, comprising:

randomly sampling from dynamic Gaussian mixture distribution to obtain dynamic Gaussian mixture hidden codes, wherein the dynamic Gaussian mixture distribution is the Gaussian mixture distribution with dynamic regulation factors introduced;

inputting the dynamic Gaussian mixture hidden code into a generation network, enhancing the intermediate feature of the generation network through a mixed attention mechanism, wherein the intermediate feature is obtained by mapping the dynamic Gaussian mixture hidden code by the generation network, and the enhanced intermediate feature is input into the generation network to obtain a generated image set;

inputting the generated image set and the real image set into a discrimination network to obtain image discrimination results of the generated image set and the real image set;

and updating the generating network and the judging network according to the image judging result and the target optimization functions of the generating network and the judging network to obtain the updated generating network and the updated judging network.

2. The small sample image generation method according to claim 1, wherein the dynamic gaussian mixture distribution conforms to the following correspondence:

z＝λu _i +(1-λ)σ _i δ

wherein z is the dynamic Gaussian mixture distribution, lambda is a dynamic regulation factor, the components of the Gaussian distribution in the dynamic Gaussian mixture distribution implicit code can be adjusted, and u is _i And σ _i For the network learnable parameter, δ is a vector sampled randomly from a gaussian distribution with mean 0 and variance 1, i.e., δ ∈ N (0,1).

3. The small sample image generation method of claim 1, wherein the hybrid attention mechanism comprises a spatial attention mechanism and a channel attention mechanism.

4. The spatial attention mechanism of claim 3, wherein the spatial attention mechanism focuses on and enhances what portion of the content of a feature is most important. The spatial attention mechanism first aggregates channel information using pooling to obtain two 2D feature maps:

and

representing the profiles obtained using average pooling and maximum pooling, respectively. Next, the two feature maps are connectedAnd obtaining the feature map of the spatial attention through convolution operation. The spatial attention mechanism is formally described as:

where σ denotes the activation function, avgPool and MaxPool denote the mean pooling and the maximum pooling, respectively, f ^7× ⁷ f ^7×7 Represents the convolution operation with a convolution kernel of size 7 x 7, F being the signature,

and

and obtaining the characteristic diagram after the average pooling and the maximum pooling.

5. The channel attention mechanism of claim 3, wherein it is interesting to look at what is in a feature map, and to compute the feature map of the channel attention, spatial information is first squeezed using average pooling and maximum pooling, resulting in two 2D feature maps:

and

then, a network is used to generate the feature map M of the channel attention _c ∈R ^c×1×1 The network is a multi-layered perceptron with a hidden layer. The channel attention mechanism is formally described as:

where σ denotes a Sigmoid activation function, W ₁ And W ₀ For sharing parameters, avgPool and MAxPool represent average pooling and maximum pooling, respectively, MLP is a multilayer perceptron,

and

the feature maps obtained after using the average pooling and the maximum pooling.

6. The method according to any one of claims 1 to 5, wherein the parameters of the generation network and the discrimination network are updated according to an objective optimization function of the generation network and the discrimination network according to the image discrimination result. The method comprises the following steps:

substituting the image discrimination result into the target optimization function of the discrimination network, and updating the parameters of the discrimination network, wherein the target optimization function of the discrimination network conforms to the following corresponding relation:

wherein E represents desire, I _real For the real image set, z is the dynamic Gaussian mixture distribution, G (z) is the generated image set, D (x) is the image discrimination result, min () represents minimization, L _recons In order to reconstruct the loss, the reconstruction loss helps to improve the capability of the discrimination network for extracting features, thereby improving the discrimination capability of the discrimination network.

Substituting the image discrimination result into the target optimization function of the generated network, and updating the parameters of the generated network, wherein the target optimization function of the generated network conforms to the following corresponding relation:

L _G ＝-E _x～G(z) [D(x)]

where E represents expectation, z is the dynamic Gaussian mixture distribution, x-G (z) are the generated image set, and D (x) is the image discrimination result.

7. The method of any one of claims 1 to 6, wherein after said updating parameters of said generator network and said discriminant network according to an objective optimization function of said generator network and said discriminant network, said method further comprises:

and judging a network according to the updated generating network, wherein the updated generating network is used for generating image data for at least one of data enhancement, classification and segmentation.

8. A small sample image generation method comprising a generator coupled to the generation network and a discriminator coupled to the discrimination network, the generation method being configured to execute a program of the generator and the discriminator so that the generation system performs the method of any one of claims 1 to 7.