CN112580782A - Dual-attention Generative Adversarial Network and Image Generation Method Based on Channel Augmentation - Google Patents

Dual-attention Generative Adversarial Network and Image Generation Method Based on Channel Augmentation Download PDF

Info

Publication number
CN112580782A
CN112580782A CN202011470128.6A CN202011470128A CN112580782A CN 112580782 A CN112580782 A CN 112580782A CN 202011470128 A CN202011470128 A CN 202011470128A CN 112580782 A CN112580782 A CN 112580782A
Authority
CN
China
Prior art keywords
feature
attention
layer
channel
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011470128.6A
Other languages
Chinese (zh)
Other versions
CN112580782B (en
Inventor
罗健旭
岳丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202011470128.6A priority Critical patent/CN112580782B/en
Publication of CN112580782A publication Critical patent/CN112580782A/en
Application granted granted Critical
Publication of CN112580782B publication Critical patent/CN112580782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a channel enhancement-based double-attention generation countermeasure network and an image generation method, wherein the network comprises a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module; a compression activation operation layer used for acquiring channel attention through compression activation operation is arranged in each of the convolution block I and the convolution block II; the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses the location attention feature and the channel attention feature. The invention can improve the generation performance of the generated countermeasure network, the generated data distribution is closer to the original data distribution, and the generated image quality is better.

Description

基于通道增强的双注意力生成对抗网络及图像生成方法Dual-attention Generative Adversarial Network and Image Generation Method Based on Channel Augmentation

技术领域technical field

本发明涉及图像生成技术领域,尤其涉及一种基于通道增强的双注意力生成对抗网络及图像生成方法。The invention relates to the technical field of image generation, in particular to a dual attention generation confrontation network based on channel enhancement and an image generation method.

背景技术Background technique

生成对抗网络(GAN)技术主要应用于图像生成方向。生成对抗网络模型由生成器和判别器构成,其中,生成器根据输入噪声向量或类别标签可以生成图像,判别器用于判别图像的真假,生成器和判别器对抗训练使得生成器学习真实数据分布,最终生成器可以接近真实图像的伪造图像。生成对抗网络技术广泛应用于图像增强、红外图像生成、医学图像生成、图像超分辨重建等方向。Generative Adversarial Network (GAN) techniques are mainly applied in the direction of image generation. The generative adversarial network model consists of a generator and a discriminator, wherein the generator can generate images according to the input noise vector or class label, the discriminator is used to discriminate the true and false images, and the generator and discriminator are trained against each other so that the generator learns the distribution of real data , the final generator can be a fake image that is close to the real image. Generative adversarial network technology is widely used in image enhancement, infrared image generation, medical image generation, image super-resolution reconstruction and other directions.

目前,提升生成对抗网络模型生成图像质量的主要方法有修改网络结构、更改损失函数、建立特征空间关联性。自注意力生成对抗网络模型中使用了能够获取大范围空间关联性的自注意力机制,使得生成图像的结构更加合理。BigGAN模型在自注意力生成对抗网络的基础上,对生成对抗网络结构进行调整加深,提高了网络学习能力,使得生成质量得到进一步提升。At present, the main methods to improve the image quality of the generative adversarial network model are to modify the network structure, change the loss function, and establish the feature space correlation. Self-attention generative adversarial network model uses a self-attention mechanism that can obtain large-scale spatial correlations, making the structure of generated images more reasonable. Based on the self-attention generative adversarial network, the BigGAN model adjusts and deepens the structure of the generative adversarial network, improves the network learning ability, and further improves the generation quality.

虽然目前的GAN生成能力已经很强,但是在先验知识少的情况下,仅使用噪声向量z和条件标签y来生成相应类别的、具有大范围关联性的复杂场景的图像还是存在一定的难度。现有的GAN模型很难生成具有复杂结构分布的图像。自注意力机制虽然使得生成对抗网络能够生成具有大范围相关性的图像,但是生成图像中的目标物体结构还存在着一些缺陷。同时,对于包含多目标物体场景的图像很难生成,生成图像的质量也有待提高。Although the current GAN generation ability is already very strong, it is still difficult to generate images of complex scenes with a wide range of correlations using only the noise vector z and the conditional label y in the case of little prior knowledge. . Existing GAN models have difficulty generating images with complex structural distributions. Although the self-attention mechanism enables the generative adversarial network to generate images with a wide range of correlations, there are still some defects in the target object structure in the generated images. At the same time, it is difficult to generate images containing multi-target object scenes, and the quality of the generated images also needs to be improved.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对上述至少一部分不足之处,提供一种能够针对复杂场景,生成高质量图像的生成对抗网络模型,同时使得生成图像中的目标的结构分布更加合理,图像更加自然。The purpose of the present invention is to provide a generative adversarial network model capable of generating high-quality images for complex scenes in view of at least some of the above deficiencies, while making the structure distribution of the targets in the generated images more reasonable and the images more natural.

为了实现上述目的,本发明提供了一种基于通道增强的双注意力生成对抗网络,包括:In order to achieve the above object, the present invention provides a dual attention generative confrontation network based on channel enhancement, including:

生成器和判别器;所述生成器包括卷积块一和双注意力机制模块;所述判别器包括卷积块二和双注意力机制模块;A generator and a discriminator; the generator includes a convolution block one and a double attention mechanism module; the discriminator includes a convolution block two and a double attention mechanism module;

其中,卷积块一、卷积块二中均设有用于通过压缩激活操作获取通道注意力的压缩激活操作层;Among them, convolution block 1 and convolution block 2 are both provided with a compressed activation operation layer used to obtain channel attention through compressed activation operation;

所述双注意力机制模块包括并行的位置注意力单元与通道注意力单元;所述位置注意力单元基于自注意力机制建立位置间关联性,得到位置注意力特征,所述通道注意力单元基于通道注意力机制建立特征通道间依赖性,得到通道注意力特征;所述双注意力机制模块融合位置注意力特征和通道注意力特征。The dual attention mechanism module includes a parallel position attention unit and a channel attention unit; the position attention unit establishes the correlation between positions based on the self-attention mechanism, and obtains the position attention feature, and the channel attention unit is based on the self-attention mechanism. The channel attention mechanism establishes the inter-feature inter-channel dependency, and obtains the channel attention feature; the dual attention mechanism module fuses the position attention feature and the channel attention feature.

优选地,所述压缩激活操作层用于执行如下操作:Preferably, the compression activation operation layer is configured to perform the following operations:

将特征M每层取平均,压缩成一个值,得到一维向量s,表达式为:The feature M is averaged at each layer, compressed into a value, and a one-dimensional vector s is obtained. The expression is:

Figure BDA0002833462010000021
Figure BDA0002833462010000021

其中,特征M的层数为C,每层尺寸为H×W,sn表示一维向量s的第n个元素,mn表示特征M的第n层,n=1,...,C,mn i,j表示mn中坐标为(i,j)的元素;Among them, the number of layers of the feature M is C, the size of each layer is H×W, s n represents the nth element of the one-dimensional vector s, m n represents the nth layer of the feature M, n=1,...,C , m n i,j represents the element whose coordinates are (i, j) in m n ;

将一维向量s经两层非线性全连接网络学习进行激活,得到具有每层权重比例的权重特征向量w,表达式为:The one-dimensional vector s is activated by two-layer nonlinear fully connected network learning, and the weight feature vector w with the weight ratio of each layer is obtained, and the expression is:

w=σ2(W2σ1(W1s))w=σ 2 (W 2 σ 1 (W 1 s))

其中,W1、W2分别表示第一、二层全连接操作,σ1、σ2分别为激活函数ReLU和Sigmoid;Among them, W 1 and W 2 represent the first and second layer fully connected operations, respectively, and σ 1 and σ 2 are the activation functions ReLU and Sigmoid, respectively;

将权重特征向量w乘到特征M的对应层中,得到校准后的特征

Figure BDA0002833462010000036
表达式为:Multiply the weight feature vector w into the corresponding layer of feature M to get the calibrated feature
Figure BDA0002833462010000036
The expression is:

Figure BDA0002833462010000031
Figure BDA0002833462010000031

其中,wn表示权重特征向量w的第n个元素,

Figure BDA0002833462010000032
表示特征
Figure BDA0002833462010000033
的第n层。Among them, w n represents the nth element of the weight feature vector w,
Figure BDA0002833462010000032
Representation features
Figure BDA0002833462010000033
the nth layer.

优选地,所述压缩激活操作层包括依次连接的平均池化层、1×1卷积层、ReLU激活层、1×1卷积层、Sigmoid激活层,输出与输入进行通道倍乘,得到校准后的特征。Preferably, the compression activation operation layer includes an average pooling layer, a 1×1 convolutional layer, a ReLU activation layer, a 1×1 convolutional layer, and a Sigmoid activation layer connected in sequence, and the output and the input are multiplied by the channel to obtain the calibration later features.

优选地,所述卷积块一包括两个线性层、两个批归一化层、两个ReLU激活层、两个上采样层、两个3×3卷积层、1×1卷积层和所述压缩激活操作层;Preferably, the first convolutional block includes two linear layers, two batch normalization layers, two ReLU activation layers, two upsampling layers, two 3×3 convolutional layers, and 1×1 convolutional layers and the compressed activation operation layer;

所述卷积块二包括两个ReLU激活层、两个3×3卷积层、1×1卷积层、两个平均池化层和所述压缩激活操作层。The second convolutional block includes two ReLU activation layers, two 3×3 convolutional layers, 1×1 convolutional layers, two average pooling layers and the compressed activation operation layer.

优选地,所述通道注意力单元用于执行如下操作:Preferably, the channel attention unit is configured to perform the following operations:

将特征A进行重组,得到特征A′;其中,特征A的层数为C,每层尺寸为H×W,特征A′的尺寸为C×N,N=H×W;Recombining feature A to obtain feature A'; wherein, the number of layers of feature A is C, the size of each layer is H×W, the size of feature A' is C×N, and N=H×W;

将特征A′与特征A′的转置相乘取softmax,得到特征图Q,特征图Q的尺寸为C×C,特征图Q中的元素qji表达式为:Multiplying the feature A' and the transpose of the feature A' to get softmax, the feature map Q is obtained. The size of the feature map Q is C×C, and the element q ji in the feature map Q is expressed as:

Figure BDA0002833462010000034
Figure BDA0002833462010000034

其中,{i,j=1,2,…,C},a′i为特征A′的第i个特征向量,

Figure BDA0002833462010000037
为特征A′的转置的第j个特征向量;Among them, {i,j=1,2,...,C}, a' i is the ith feature vector of feature A',
Figure BDA0002833462010000037
is the j-th eigenvector of the transpose of feature A';

将特征图Q与特征A′相乘并进行反重组,得到通道注意力特征T,表达式为:The feature map Q is multiplied by the feature A' and de-recombined to obtain the channel attention feature T, which is expressed as:

Figure BDA0002833462010000035
Figure BDA0002833462010000035

其中,Tj表示通道注意力特征T的第j个特征向量,j=1,2,…,C,β表示学习参数,初始化为0,Aj表示特征A的第j个特征向量。Among them, T j represents the j-th feature vector of the channel attention feature T, j=1, 2, ..., C, β represents the learning parameter, initialized to 0, and A j represents the j-th feature vector of the feature A.

优选地,所述位置注意力单元用于执行如下操作:Preferably, the location attention unit is configured to perform the following operations:

以1×1卷积f(x)函数对特征A进行通道压缩得到特征B,再进行重组得到特征B’;其中,特征A的层数为C,每层尺寸为H×W,压缩后的层数为

Figure BDA0002833462010000041
特征B维度为
Figure BDA0002833462010000042
重组后维度为
Figure BDA0002833462010000043
Figure BDA0002833462010000044
Channel compression is performed on feature A with a 1×1 convolution f(x) function to obtain feature B, and then recombination is performed to obtain feature B'; among them, the number of layers of feature A is C, the size of each layer is H×W, and the compressed The number of layers is
Figure BDA0002833462010000041
The dimension of feature B is
Figure BDA0002833462010000042
The reorganized dimension is
Figure BDA0002833462010000043
Figure BDA0002833462010000044

以1×1卷积g(x)函数对特征A进行通道压缩得到特征O,再进行重组得到特征O’;压缩后的层数为

Figure BDA0002833462010000045
特征O维度为
Figure BDA0002833462010000046
重组后维度为
Figure BDA0002833462010000047
Channel compression is performed on feature A with a 1×1 convolution g(x) function to obtain feature O, and then recombination is performed to obtain feature O'; the number of layers after compression is
Figure BDA0002833462010000045
The feature O dimension is
Figure BDA0002833462010000046
The reorganized dimension is
Figure BDA0002833462010000047

将特征B’与特征O’的转置相乘取softmax,得到特征图P,特征图P的尺寸为N×N,特征图P中的元素pji表达式为:Multiply the feature B' and the transpose of the feature O' to get softmax to obtain the feature map P, the size of the feature map P is N×N, and the expression of the element p ji in the feature map P is:

Figure BDA0002833462010000048
Figure BDA0002833462010000048

其中,b′i表示特征B’的第i个特征向量,

Figure BDA0002833462010000049
表示特征O’的转置的第j个特征向量,i,j=1,2,…,N;Among them, b' i represents the ith feature vector of feature B',
Figure BDA0002833462010000049
The j-th feature vector representing the transpose of feature O', i,j=1,2,...,N;

以1×1卷积h(x)函数对特征A进行提取得到特征V,再进行重组得到特征V’;提取后的层数仍为C;Extract feature A with 1×1 convolution h(x) function to obtain feature V, and then recombine to obtain feature V’; the number of layers after extraction is still C;

将特征V’与特征图P相乘并进行反重组,得到位置注意力特征S时,表达式为:When the feature V' is multiplied by the feature map P and de-recombined to obtain the position attention feature S, the expression is:

Figure BDA00028334620100000410
Figure BDA00028334620100000410

其中,Sj表示位置注意力特征S的第j个特征向量,j=1,2,…,N,v′i表示表示特征V’的第i个特征向量,α表示学习参数,初始化为0,Aj表示特征A的第j个特征向量。Among them, S j represents the j-th feature vector of the position attention feature S, j=1,2,...,N, v' i represents the i-th feature vector representing the feature V', α represents the learning parameter, initialized to 0 , A j represents the j-th feature vector of feature A.

优选地,所述双注意力机制模块通过3×3卷积J(x)函数和3×3卷积K(x)函数,融合位置注意力特征S和通道注意力特征T,表达式为:Preferably, the dual attention mechanism module fuses the position attention feature S and the channel attention feature T through a 3×3 convolution J(x) function and a 3×3 convolution K(x) function, and the expression is:

U=J(S)+K(T)U=J(S)+K(T)

其中,U表示得到的融合特征。where U represents the resulting fusion feature.

优选地,所述生成器包括依次连接的线性层、第一个卷积块一、第二个卷积块一、双注意力机制模块、第三个卷积块一、第一激活模块、Tanh层;Preferably, the generator includes sequentially connected linear layers, a first convolution block one, a second convolution block one, a dual attention mechanism module, a third convolution block one, a first activation module, a Tanh Floor;

所述生成器的输入为噪声向量z和类别条件Condition,噪声向量z服从于正态分布,类别条件Condition嵌入到每个卷积块一的每一个批归一化层,所述生成器的输出为伪造图像;The input of the generator is the noise vector z and the class condition Condition, the noise vector z obeys the normal distribution, the class condition Condition is embedded in each batch normalization layer of each convolution block 1, and the output of the generator is for falsifying images;

所述判别器包括依次连接的第一个卷积块二、双注意力机制模块、第二个卷积块二、第三个卷积块二、第四个卷积块二、第二激活模块、线性变换与标签嵌入层;The discriminator includes a first convolution block 2, a dual attention mechanism module, a second convolution block 2, a third convolution block 2, a fourth convolution block 2, and a second activation module connected in sequence. , linear transformation and label embedding layer;

所述判别器的输入为RGB图像和标签y,输出为RGB图像的判别结果。The input of the discriminator is the RGB image and the label y, and the output is the discrimination result of the RGB image.

优选地,所述判别器的损失函数表达式为:Preferably, the loss function expression of the discriminator is:

Figure BDA0002833462010000051
Figure BDA0002833462010000051

所述生成器的损失函数表达式为:The loss function expression of the generator is:

Figure BDA0002833462010000052
Figure BDA0002833462010000052

其中,x表示图像,y为对应的类别标签,pdata为真实数据概率分布,pz为噪声概率分布,z表示一维噪声向量,G(z)表示生成器的映射过程,D(x,y)表示判别器的映射过程,

Figure BDA0002833462010000053
表示(x,y)服从pdata的概率期望计算,
Figure BDA0002833462010000054
表示z服从pz且y服从pdata的概率期望计算。Among them, x represents the image, y is the corresponding class label, p data is the real data probability distribution, p z is the noise probability distribution, z represents the one-dimensional noise vector, G(z) represents the mapping process of the generator, D(x, y) represents the mapping process of the discriminator,
Figure BDA0002833462010000053
Indicates that (x, y) obeys the probability expectation calculation of p data ,
Figure BDA0002833462010000054
Denotes the probability expectation calculation that z obeys p z and y obeys p data .

本发明还提供了一种图像生成方法,包括如下步骤:The present invention also provides an image generation method, comprising the following steps:

S1、构造如上述任一项所述的双注意力生成对抗网络;S1. Construct the dual attention generative adversarial network as described in any of the above;

S2、获取训练集并输入所述双注意力生成对抗网络进行训练;S2, obtaining a training set and inputting the dual attention generative adversarial network for training;

S3、利用训练完成后的所述双注意力生成对抗网络生成图像。S3. Generate an image using the dual attention generative adversarial network after the training is completed.

本发明的上述技术方案具有如下优点:本发明提供了一种基于通道增强的双注意力生成对抗网络及图像生成方法,本发明提供的双注意力生成对抗网络在现有的BigGAN模型基础上进行改进与提升,在生成对抗网络的卷积块中增加用于通过压缩激活操作获取通道注意力的压缩激活操作层,能够对生成对抗网络的特征进行重校准,即增强特征里一些作用较强的特征层,削弱一些无用的特征层,使得网络的中间层特征更具表达能力,提高卷积块的性能,改进生成对抗网络模型的特征学习能力;并且,该双注意力生成对抗网络采用双注意力机制模块,双重注意力机制模块不仅包含位置注意力单元(与自注意力机制模块功能相同),同时还包含通道注意力单元,引入双注意力机制后,生成对抗网络能够同时捕获大范围的位置上和通道上的关联信息,能够更全面地学习图像特征结构间关联性信息,进一步增强图像特征关联性,提高生成图像质量。The above technical solutions of the present invention have the following advantages: the present invention provides a dual-attention generative adversarial network and an image generation method based on channel enhancement, and the dual-attention generative adversarial network provided by the present invention is based on the existing BigGAN model. Improvements and enhancements, adding a compression activation operation layer used to obtain channel attention through compression activation operations in the convolution block of the generative adversarial network, which can recalibrate the features of the generative adversarial network, that is, enhance some of the features with strong effects. Feature layer, weaken some useless feature layers, make the middle layer features of the network more expressive, improve the performance of the convolution block, and improve the feature learning ability of the generative adversarial network model; and, the dual attention generative adversarial network adopts dual attention. The force mechanism module, the dual attention mechanism module not only contains the location attention unit (the same function as the self-attention mechanism module), but also contains the channel attention unit. After the dual attention mechanism is introduced, the generative adversarial network can simultaneously capture a large range of The correlation information on positions and channels can more comprehensively learn the correlation information between image feature structures, further enhance the correlation of image features, and improve the quality of generated images.

附图说明Description of drawings

图1(a)和图1(b)是本发明实施例中一种基于通道增强的双注意力生成对抗网络结构示意图;其中图1(a)为生成器结构示意图,图1(b)为判别器结构示意图;Figures 1(a) and 1(b) are schematic diagrams of the structure of a dual attention generative adversarial network based on channel enhancement in an embodiment of the present invention; Figure 1(a) is a schematic diagram of the generator structure, and Figure 1(b) is a schematic diagram of the structure of the generator. Schematic diagram of the structure of the discriminator;

图2是本发明实施例中压缩激活操作流程示意图;2 is a schematic diagram of a compression activation operation flow diagram in an embodiment of the present invention;

图3(a)是本发明实施例中一种卷积块一结构示意图,图3(b)是本发明实施例中一种卷积块二结构示意图;Fig. 3 (a) is a schematic diagram of a structure of a convolution block 1 in an embodiment of the present invention, and Fig. 3 (b) is a schematic diagram of a structure of a convolution block 2 in an embodiment of the present invention;

图4是本发明实施例中双注意力机制流程示意图;4 is a schematic flowchart of a dual attention mechanism in an embodiment of the present invention;

图5(a)示出了BigGAN模型生成图像结果图;图5(b)示出了本发明实施例中一种基于通道增强的双注意力生成对抗网络生成图像结果图。Fig. 5(a) shows a result graph of an image generated by the BigGAN model; Fig. 5(b) shows a graph of an image generated by a dual attention generative adversarial network based on channel enhancement in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

如图1(a)至图4所示,本发明实施例提供了一种基于通道增强的双注意力生成对抗网络,该双注意力生成对抗网络在现有的BigGAN模型基础上进行了改进与提升,包括生成器和判别器,生成器包括卷积块一和双注意力机制模块;判别器包括卷积块二和双注意力机制模块,卷积块一和卷积块二均为生成对抗网络的卷积块。As shown in Fig. 1(a) to Fig. 4, an embodiment of the present invention provides a dual-attention generative adversarial network based on channel enhancement. The dual-attention generative adversarial network is improved on the basis of the existing BigGAN model and Improvement, including generator and discriminator, the generator includes convolution block 1 and double attention mechanism module; the discriminator includes convolution block 2 and double attention mechanism module, convolution block 1 and convolution block 2 are both generative confrontation The convolutional block of the network.

其中,卷积块一、卷积块二中均设有用于通过压缩激活操作获取通道注意力的压缩激活操作层。本发明引入压缩激活机制对生成对抗网络的残差结构卷积块ResBlock进行改进,压缩激活操作层能够增强特征层中一些作用较强的特征层,削弱一些无用的特征层,使得该网络的中间层特征更具表达能力和特征提取能力。Among them, convolution block 1 and convolution block 2 are both provided with a compressed activation operation layer for obtaining channel attention through compressed activation operation. The invention introduces a compression activation mechanism to improve the residual structure convolution block ResBlock of the generative adversarial network. The compression activation operation layer can strengthen some feature layers with strong functions in the feature layer, weaken some useless feature layers, and make the middle of the network. Layer features are more expressive and feature extraction.

双注意力机制模块包括并行的位置注意力单元与通道注意力单元。位置注意力单元基于自注意力机制建立位置间关联性,得到位置注意力特征,通道注意力单元基于通道注意力机制建立特征通道间依赖性,得到通道注意力特征;双注意力机制模块融合位置注意力特征S和通道注意力特征T,得到具有丰富关联信息特征U。本发明引入双注意力机制(Dual attention),使得生成对抗网络能够获取大范围位置间和通道间的关联性信息,从而使得生成图像的目标物体的结构分布更加自然。The dual attention mechanism module includes parallel position attention unit and channel attention unit. The position attention unit establishes the correlation between positions based on the self-attention mechanism, and obtains the position attention feature. The channel attention unit establishes the feature inter-channel dependency based on the channel attention mechanism, and obtains the channel attention feature; the dual attention mechanism module fuses the position The attention feature S and the channel attention feature T are obtained to obtain the feature U with rich associated information. The present invention introduces a dual attention mechanism, so that the generative adversarial network can obtain the correlation information between a wide range of positions and between channels, so that the structure distribution of the target object of the generated image is more natural.

优选地,如图2所示,双注意力生成对抗网络中,压缩激活操作层用于执行能够获取通道注意力的压缩激活(Squeeze-and-Excitation)操作,压缩激活操作包括如下步骤:Preferably, as shown in Figure 2, in the dual-attention generative adversarial network, the compression activation operation layer is used to perform the compression activation (Squeeze-and-Excitation) operation that can obtain channel attention, and the compression activation operation includes the following steps:

将输入压缩激活操作层的特征M的每层取平均,压缩成一个值,得到一维向量s,表达式为:Take the average of each layer of the feature M of the input compression activation operation layer, compress it into a value, and obtain a one-dimensional vector s, the expression is:

Figure BDA0002833462010000081
Figure BDA0002833462010000081

其中,特征M的层数为C,也即通道数为C,每层尺寸为H×W,得到一维向量s的长度为C,sn表示一维向量s的第n个元素,mn表示特征M的第n层,n=1,...,C,mn i,j表示mn中坐标为(i,j)的元素;Among them, the number of layers of the feature M is C, that is, the number of channels is C, the size of each layer is H×W, the length of the obtained one-dimensional vector s is C, and s n represents the nth element of the one-dimensional vector s, m n Represents the nth layer of feature M, n=1,...,C, m n i,j represents the element whose coordinates are (i, j) in m n ;

将一维向量s经两层非线性全连接网络学习进行激活,得到具有每层权重比例的权重特征向量w,表达式为:The one-dimensional vector s is activated by two-layer nonlinear fully connected network learning, and the weight feature vector w with the weight ratio of each layer is obtained, and the expression is:

w=σ2(W2σ1(W1s))w=σ 2 (W 2 σ 1 (W 1 s))

其中,W1、W2分别表示第一、二层全连接操作,σ1、σ2分别为激活函数ReLU和Sigmoid,Sigmoid将w的值限制在(0,1)范围内,激活函数是神经网络学习到非线性关系;Among them, W 1 and W 2 represent the first and second layer fully connected operations, respectively, σ 1 , σ 2 are the activation functions ReLU and Sigmoid, respectively, Sigmoid limits the value of w in the range of (0, 1), and the activation function is the neural network The network learns nonlinear relationships;

将权重特征向量w乘到特征M的对应层中,得到校准后的特征

Figure BDA0002833462010000085
表达式为:Multiply the weight feature vector w into the corresponding layer of feature M to get the calibrated feature
Figure BDA0002833462010000085
The expression is:

Figure BDA0002833462010000082
Figure BDA0002833462010000082

其中,wn表示权重特征向量w的第n个元素,

Figure BDA0002833462010000083
表示特征
Figure BDA0002833462010000084
的第n层。Among them, w n represents the nth element of the weight feature vector w,
Figure BDA0002833462010000083
Representation features
Figure BDA0002833462010000084
the nth layer.

优选地,W1、W2可使用1×1卷积,得到压缩激活操作层(SELayer)包括依次连接的平均池化层(Average Pooling)、1×1卷积层(1×1Conv)、ReLU激活层(ReLU)、1×1卷积层、Sigmoid激活层(Sigmoid),压缩激活操作层的输出与输入进行通道倍乘,得到校准后的特征。Preferably, 1×1 convolution can be used for W 1 and W 2 to obtain a compressed activation operation layer (SELayer) including an average pooling layer (Average Pooling), a 1×1 convolution layer (1×1Conv), and ReLU connected in sequence. Activation layer (ReLU), 1×1 convolution layer, Sigmoid activation layer (Sigmoid), the output of the compressed activation operation layer and the input are multiplied by the channel, and the calibrated features are obtained.

本发明在BigGAN模型的卷积块ResBlock的结构中嵌入压缩激活操作层SELayer,得到SEResBlock。In the present invention, a compression activation operation layer SELayer is embedded in the structure of the convolution block ResBlock of the BigGAN model to obtain the SEResBlock.

图3(a)为生成器中的SEResBlock结构示意图,即图1(a)中的SEResBlock(G),也即卷积块一。进一步地,如图3(a)所示,在优选的实施方式中,卷积块一包括两个线性层(Linear)、两个批归一化层(BatchNorm)、两个ReLU激活层、两个上采样层(Upsample)、两个3×3卷积层(3×3Conv)、1×1卷积层(1×1Conv)和压缩激活操作层。其中,生成器输入的类别Condition通过两个线性层分别输入两个批归一化层;第一个批归一化层、第一个ReLU激活层、第二个上采样层、第一个3×3卷积层、第二个批归一化层、第二个ReLU激活层、第二个3×3卷积层依次连接,其输入为上一个模块的输出;第二个3×3卷积层再连接压缩激活操作层,第二个3×3卷积层的输出与压缩激活操作层的输出进行通道倍乘,得到校准后的特征;上一个模块的输出依次经过第一个上采样层、1×1卷积层后,与校准后的特征进行逐元素求和,作为本模块(即该卷积块一)的输出。Figure 3(a) is a schematic diagram of the structure of the SEResBlock in the generator, that is, the SEResBlock(G) in Figure 1(a), that is, the first convolution block. Further, as shown in Figure 3(a), in a preferred embodiment, the convolution block 1 includes two linear layers (Linear), two batch normalization layers (BatchNorm), two ReLU activation layers, two Upsampling layer (Upsample), two 3×3 convolutional layers (3×3Conv), 1×1 convolutional layer (1×1Conv) and compressed activation layer. Among them, the category Condition input by the generator is input into two batch normalization layers respectively through two linear layers; the first batch normalization layer, the first ReLU activation layer, the second upsampling layer, the first 3 The ×3 convolutional layer, the second batch normalization layer, the second ReLU activation layer, and the second 3×3 convolutional layer are connected in sequence, and the input is the output of the previous module; the second 3×3 volume The product layer is then connected to the compression activation layer, and the output of the second 3×3 convolution layer is multiplied by the output of the compression activation layer to obtain the calibrated features; the output of the previous module is sequentially upsampled by the first layer, 1 × 1 convolution layer, and element-wise summation with the calibrated features, as the output of this module (ie, the convolution block 1).

图3(b)为判别器中的SEResBlock结构示意图,即图1(b)中的SEResBlock(D),也即卷积块二,卷积块二相比卷积块一缺少了批归一化层。如图3(b)所示,卷积块二包括两个ReLU激活层(ReLU)、两个3×3卷积层(3×3Conv)、1×1卷积层(1×1Conv)、两个平均池化层(Average Pooling)和压缩激活操作层。其中,第一个ReLU激活层、第一个3×3卷积层、第二个ReLU激活层、第二个3×3卷积层、第二个平均池化层依次连接,其输入为上一个模块的输出(也即该卷积块二的输入);第二个平均池化层再连接压缩激活操作层,第二个平均池化层的输出与压缩激活操作层的输出进行通道倍乘,得到校准后的特征;上一个模块的输出依次经过1×1卷积层、第一个平均池化层后,与校准后的特征进行逐元素求和,作为本模块(即该卷积块二)的输出。Figure 3(b) is a schematic diagram of the structure of the SEResBlock in the discriminator, that is, the SEResBlock (D) in Figure 1(b), that is, the second convolution block. Compared with the first convolution block, the second convolution block lacks batch normalization. Floor. As shown in Figure 3(b), convolution block 2 includes two ReLU activation layers (ReLU), two 3×3 convolutional layers (3×3Conv), 1×1 convolutional layers (1×1Conv), two An average pooling layer (Average Pooling) and a compressed activation layer. Among them, the first ReLU activation layer, the first 3×3 convolutional layer, the second ReLU activation layer, the second 3×3 convolutional layer, and the second average pooling layer are connected in turn, and the input is the upper The output of a module (that is, the input of the second convolution block); the second average pooling layer is connected to the compression activation layer, and the output of the second average pooling layer is multiplied by the output of the compression activation layer. , the calibrated features are obtained; the output of the previous module goes through the 1×1 convolution layer and the first average pooling layer in turn, and then sums the calibrated features element by element as the module (that is, the convolution block). b) output.

本发明在卷积块一、卷积块二中,将压缩激活操作层用在第二个3×3卷积层之后,对整个卷积块学习到的特征进行重新校准,既不过多增加网络的复杂性,同时也带来了性能上的提升。In the present invention, in the first and second convolution blocks, the compression activation operation layer is used after the second 3×3 convolution layer, and the features learned by the entire convolution block are recalibrated without adding too much network. complexity, but also brings performance improvements.

优选地,如图4所示,双注意力生成对抗网络的双注意力机制模块中,通道注意力单元(图4中的Ⅰ部分)用于执行如下操作:Preferably, as shown in Figure 4, in the dual attention mechanism module of the dual attention generative adversarial network, the channel attention unit (Part I in Figure 4) is used to perform the following operations:

将输入双注意力机制模块的特征A进行重组(Reshape),得到特征A′;其中,特征A的层数为C,每层尺寸为H×W,特征A′的尺寸为C×N,N=H×W;Reshape the feature A of the input dual attention mechanism module to obtain feature A'; the number of layers of feature A is C, the size of each layer is H×W, and the size of feature A' is C×N, N =H×W;

将特征A′与特征A′的转置相乘取softmax,得到特征图Q,特征图Q的尺寸为C×C,特征图Q中的元素qji表达式为:Multiplying the feature A' and the transpose of the feature A' to get softmax, the feature map Q is obtained. The size of the feature map Q is C×C, and the element q ji in the feature map Q is expressed as:

Figure BDA0002833462010000101
Figure BDA0002833462010000101

其中,{i,j=1,2,…,C},a′i为特征A′的第i个特征向量,

Figure BDA00028334620100001010
为特征A′的转置的第j个特征向量,上标“T”表示转置;Among them, {i,j=1,2,...,C}, a' i is the ith feature vector of feature A',
Figure BDA00028334620100001010
is the j-th eigenvector of the transpose of feature A', and the superscript "T" means transposition;

将特征图Q与特征A′相乘并进行反重组(反Reshape),得到通道注意力特征T,表达式为:Multiply the feature map Q by the feature A' and perform inverse reorganization (inverse Reshape) to obtain the channel attention feature T, which is expressed as:

Figure BDA0002833462010000102
Figure BDA0002833462010000102

其中,Tj表示通道注意力特征T的第j个特征向量,j=1,2,…,C,β表示学习参数,初始化为0,Aj表示特征A的第j个特征向量,j=1,2,…,C。Among them, T j represents the j-th feature vector of the channel attention feature T, j=1,2,...,C, β represents the learning parameter, initialized to 0, A j represents the j-th feature vector of the feature A, j= 1,2,…,C.

进一步地,如图4所示,双注意力机制模块中,位置注意力单元(图4中的Ⅱ部分)用于执行如下操作:Further, as shown in Figure 4, in the dual attention mechanism module, the location attention unit (Part II in Figure 4) is used to perform the following operations:

以1×1卷积f(x)函数对特征A进行通道压缩得到特征B,再进行重组(Reshape)得到特征B’;特征A的层数为C,每层尺寸为H×W,压缩后层数,也即通道数为

Figure BDA0002833462010000103
特征B维度为
Figure BDA0002833462010000104
重组(Reshape)后维度为
Figure BDA0002833462010000105
The feature A is channel compressed with the 1×1 convolution f(x) function to obtain the feature B, and then reshaped (Reshape) to obtain the feature B'; the number of layers of the feature A is C, and the size of each layer is H×W. After compression The number of layers, that is, the number of channels is
Figure BDA0002833462010000103
The dimension of feature B is
Figure BDA0002833462010000104
The dimension after Reshape is
Figure BDA0002833462010000105

以1×1卷积g(x)函数对特征A进行通道压缩得到特征O,再进行重组(Reshape)得到特征O’;压缩后通道数为

Figure BDA0002833462010000106
特征O为
Figure BDA0002833462010000107
重组(Reshape)后维度为
Figure BDA0002833462010000108
Channel compression is performed on feature A with a 1×1 convolution g(x) function to obtain feature O, and then reshape is performed to obtain feature O'; the number of channels after compression is
Figure BDA0002833462010000106
The feature O is
Figure BDA0002833462010000107
The dimension after Reshape is
Figure BDA0002833462010000108

将特征B’与特征O’的转置相乘取softmax,得到特征图P,特征图P的尺寸为N×N,特征图P中的元素pji表达式为:Multiply the feature B' and the transpose of the feature O' to get softmax to obtain the feature map P, the size of the feature map P is N×N, and the expression of the element p ji in the feature map P is:

Figure BDA0002833462010000109
Figure BDA0002833462010000109

其中,b′i表示特征B’的第i个特征向量,

Figure BDA0002833462010000111
表示特征O’的转置的第j个特征向量,i,j=1,2,…,N;Among them, b' i represents the ith feature vector of feature B',
Figure BDA0002833462010000111
The j-th feature vector representing the transpose of feature O', i,j=1,2,...,N;

以1×1卷积h(x)函数对特征A进行提取得到特征V,再进行重组(Reshape)得到特征V’时,提取后特征V的层数仍为C;同组后得到V’=[v’1,v’2,...,v’N];Extract feature A with 1×1 convolution h(x) function to obtain feature V, and then reshape (Reshape) to obtain feature V', the number of layers of feature V after extraction is still C; after the same group, V'= [v' 1 ,v' 2 ,...,v' N ];

将特征V’与特征图P相乘并进行反重组(反Reshape),得到位置注意力特征S时,表达式为:Multiply the feature V' and the feature map P and perform inverse reorganization (inverse Reshape) to obtain the position attention feature S, the expression is:

Figure BDA0002833462010000112
Figure BDA0002833462010000112

其中,Sj表示位置注意力特征S的第j个特征向量,j=1,2,…,N,v′i表示表示特征V’的第i个特征向量,i=1,2,…,N,α表示学习参数,初始化为0,Aj表示特征A的第j个特征向量。Among them, S j represents the j-th feature vector of the position attention feature S, j=1,2,...,N, v' i represents the i-th feature vector representing the feature V', i=1,2,..., N, α represents the learning parameter, initialized to 0, A j represents the j-th feature vector of feature A.

进一步地,双注意力机制模块通过3×3卷积J(x)函数和3×3卷积K(x)函数,融合位置注意力特征S和通道注意力特征T,表达式为:Further, the dual attention mechanism module fuses the position attention feature S and the channel attention feature T through the 3×3 convolution J(x) function and the 3×3 convolution K(x) function, and the expression is:

U=J(S)+K(T)U=J(S)+K(T)

其中,U表示双注意力机制模块得到的融合特征。U中具有丰富关联信息。Among them, U represents the fusion feature obtained by the dual attention mechanism module. There is rich correlation information in U.

如图1(a)和图1(b)所示,优选的实施方式中,生成器包括依次连接的线性层、第一个卷积块一、第二个卷积块一、双注意力机制模块、第三个卷积块一、第一激活模块、Tanh层;生成器的输入为噪声向量z和类别条件Condition,噪声向量z服从于正态分布,类别条件Condition嵌入到每个卷积块一的每一个批归一化层,生成器的输出为伪造图像。As shown in Figure 1(a) and Figure 1(b), in a preferred embodiment, the generator includes sequentially connected linear layers, the first convolution block one, the second convolution block one, and a dual attention mechanism Module, the third convolution block 1, the first activation module, the Tanh layer; the input of the generator is the noise vector z and the category condition Condition, the noise vector z obeys the normal distribution, and the category condition Condition is embedded in each convolution block For each batch normalization layer, the output of the generator is a fake image.

判别器包括依次连接的第一个卷积块二、双注意力机制模块、第二个卷积块二、第三个卷积块二、第四个卷积块二、第二激活模块、线性变换与标签嵌入层;判别器的输入为伪造图像和标签y,输出为判别结果。The discriminator includes the first convolution block two, the dual attention mechanism module, the second convolution block two, the third convolution block two, the fourth convolution block two, the second activation module, the linear Transformation and label embedding layer; the input of the discriminator is the fake image and the label y, and the output is the discrimination result.

进一步地,如图1(a)所示,线性层将噪声向量z进行线性计算,再重组(reshape)为成4×4×16ch的特征张量,ch可设置为64,然后经过基层SEResBlock(G)进行学习,经过第一激活模块的批量归一(BN)、ReLU激活、3×3卷积处理,再经Tanh层激活,最终生成一个通道数ch=3的伪造图像,伪造图像为RGB图像,其中双注意力机制放在网络特征的中后层,与BigGAN模型中的自注意力机制位置相同。Further, as shown in Figure 1(a), the linear layer performs linear calculation on the noise vector z, and then reshapes it into a feature tensor of 4 × 4 × 16ch, where ch can be set to 64, and then passes through the base layer SEResBlock ( G) Learning, after batch normalization (BN), ReLU activation, 3×3 convolution processing of the first activation module, and then activation by the Tanh layer, a fake image with channel number ch=3 is finally generated, and the fake image is RGB The image, where the dual attention mechanism is placed in the middle and back layers of the network features, is in the same position as the self-attention mechanism in the BigGAN model.

如图1(b)所示,判别器网络结构由SEResBlock(D)和双注意力机制构成,结构与生成器中的结构相反,其功能是判别输入的RGB图像和标签y是否是真的,最终生成的特征经第二激活模块的ReLU激活、全局求和池化(Global sum pooling)处理,再由线性变换和嵌入的标签y融合进行判断,判断输入RGB图像的真假。As shown in Figure 1(b), the discriminator network structure is composed of SEResBlock (D) and a dual attention mechanism. The structure is opposite to that in the generator. Its function is to discriminate whether the input RGB image and label y are true or not. The final generated features are processed by ReLU activation and global sum pooling of the second activation module, and then judged by the fusion of linear transformation and embedded label y to judge the authenticity of the input RGB image.

优选地,生成器和判别器的损失函数使用Hinge Loss。其中,判别器的损失函数表达式为:Preferably, the loss functions of the generator and discriminator use Hinge Loss. Among them, the loss function expression of the discriminator is:

Figure BDA0002833462010000121
Figure BDA0002833462010000121

生成器的损失函数表达式为:The loss function expression of the generator is:

Figure BDA0002833462010000122
Figure BDA0002833462010000122

其中,x表示图像,y为其对应的类别标签,pdata为真实数据概率分布,pz为噪声概率分布,z表示一维噪声向量,G(z)表示生成器的映射过程,D(x,y)表示判别器的映射过程,

Figure BDA0002833462010000123
表示(x,y)服从pdata的概率期望计算,
Figure BDA0002833462010000124
表示z服从pz且y服从pdata的概率期望计算。Among them, x represents the image, y is the corresponding category label, p data is the real data probability distribution, p z is the noise probability distribution, z represents the one-dimensional noise vector, G(z) represents the mapping process of the generator, D(x , y) represents the mapping process of the discriminator,
Figure BDA0002833462010000123
Indicates that (x, y) obeys the probability expectation calculation of p data ,
Figure BDA0002833462010000124
Denotes the probability expectation calculation that z obeys p z and y obeys p data .

在优选的实施方式中,本发明设置生成器和判别器的学习率为0.0002,采用学习率衰减策略,衰减率为0.9999。网络部分采用频谱规范化方法,调整网络训练过程中的权重,使得生成对抗网络的训练更加稳定。网络每批次训练数据数Batch=64,总的迭代次数为10000次。In a preferred embodiment, the present invention sets the learning rate of the generator and the discriminator to 0.0002, adopts a learning rate decay strategy, and the decay rate is 0.9999. The network part adopts the spectrum normalization method to adjust the weights in the network training process, so that the training of the generative adversarial network is more stable. The number of training data for each batch of the network is Batch=64, and the total number of iterations is 10,000.

进一步地,本发明采用Mini-Batch随机梯度下降训练,先对判别器的损失函数进行训练,再对生成器训练。具体优化过程的伪代码如表1所示:Further, the present invention adopts Mini-Batch stochastic gradient descent training, firstly training the loss function of the discriminator, and then training the generator. The pseudocode of the specific optimization process is shown in Table 1:

表1生成对抗网络的优化过程Table 1 Optimization process of generative adversarial network

Figure BDA0002833462010000131
Figure BDA0002833462010000131

其中,

Figure BDA0002833462010000132
表示生成器的梯度值,由判别损失方向求导得到,θd表示判别器模型的网络参数,
Figure BDA0002833462010000133
表示判别器的梯度值,生成器损失函数反向求导所得,θg表示生成器模型的网络参数。in,
Figure BDA0002833462010000132
Represents the gradient value of the generator, derived from the direction of the discriminant loss, θ d represents the network parameters of the discriminator model,
Figure BDA0002833462010000133
Represents the gradient value of the discriminator, and the generator loss function is derived from the reverse derivation, and θ g represents the network parameters of the generator model.

特别地,为了定性评估本发明双注意力生成对抗网络的效果,可使用

Figure BDA0002833462010000134
Inception Distance(FID)评估指标。FID指标在评估图像时,指标越小,代表生成的图像质量越好。FID计算过程中,首先将生成图像和真实图像经过InceptionV3网络提取特征向量,然后分别计算真实数据服从的正态分布和生成数据服从的正态分布,然后计算两种数据分布间的距离。In particular, in order to qualitatively evaluate the effect of the dual-attention generative adversarial network of the present invention, one can use
Figure BDA0002833462010000134
Inception Distance (FID) evaluation metric. When the FID indicator is used to evaluate images, the smaller the indicator, the better the quality of the generated image. In the FID calculation process, the generated image and the real image are first extracted through the InceptionV3 network, and then the normal distribution obeyed by the real data and the normal distribution obeyed by the generated data are calculated respectively, and then the distance between the two data distributions is calculated.

本发明利用ImageNet公开数据集,对现有的BigGAN模型及本发明提供的双注意力生成对抗网络的性能进行了比较验证,提取ImageNet公开数据集其中部分类别,调整到128×128的分辨率,并使用FID指标进行了测评,部分生成图像结果如图5(a)和5(b)所示。测评结果显示,使用BigGAN模型得到FID指标为17.43,而本发明提供的基于通道增强的双注意力生成对抗网络(SEDA-GAN)得到FID指标为14.89,提升了14.57%。The invention uses the ImageNet public data set to compare and verify the performance of the existing BigGAN model and the dual attention generation confrontation network provided by the invention, extracts some categories of the ImageNet public data set, and adjusts it to a resolution of 128×128. The evaluation was carried out using the FID indicator, and the results of some generated images are shown in Figures 5(a) and 5(b). The evaluation results show that the FID index obtained by using the BigGAN model is 17.43, while the FID index obtained by the channel enhancement-based dual attention generative adversarial network (SEDA-GAN) provided by the present invention is 14.89, an increase of 14.57%.

并且,本发明所带来的结构上变化更加明显,图5(a)示出了BigGAN模型随机生成一些样本,图5(b)示出了SEDA-GAN模型随机生成一些样本,可以看出BigGAN模型生成样本中金鱼的器官结构位置不准确,多条金鱼的时候比较错乱,咖啡杯杯口不圆,卡车和木屋的结构也有一些缺陷,而SEDA-GAN模型生成的样本中在多条金鱼的情况下结构分布也比较自然,咖啡杯更加圆润,卡车整体结构也更加自然,木屋的建筑结构也更加平直,整体的视觉效果上均带来了提升。Moreover, the structural changes brought by the present invention are more obvious. Figure 5(a) shows that the BigGAN model randomly generates some samples, and Figure 5(b) shows that the SEDA-GAN model randomly generates some samples. It can be seen that the BigGAN model generates some samples randomly. The position of the organ structure of the goldfish in the sample generated by the model is inaccurate. When there are multiple goldfish, it is confusing, the mouth of the coffee cup is not round, and the structure of the truck and the wooden house also has some defects, and the samples generated by the SEDA-GAN model are in multiple goldfish. In this case, the structure distribution is also more natural, the coffee cup is more rounded, the overall structure of the truck is more natural, and the building structure of the wooden house is also more straight, and the overall visual effect has been improved.

综上所述,本发明提供的一种基于通道增强的双注意力生成对抗网络,一方面,本发明为卷积块结构增加通道关联性学习机制,对特征进行重校准,提升模型卷积块的特征学习能力,增强生成对抗网络的特征表达能力。在BigGAN模型的卷积块ResBlock结构基础上引入了能够获取通道注意力的压缩激活(Squeeze-and-Excitation)操作,提出新的卷积块SEResBlock。经验证,使用SEResBlock这种通道增强的卷积块所构成的生成对抗网络拥有更好的生成性能,且对于数据分布的学习速度也得到提升。另一方面,本发明考虑到同样从通道的角度,在注意力机制部分增加能够建立特征通道间依赖性的通道注意力机制,与自注意力机制并行,构建成双注意力机制模块,共同从位置和通道间建模特征间存在的潜在数据结构关联性。在实际中,网络特征不仅存在位置间的关联性,在特征通道上也存在着关联信息。因此引入双注意力机制后,生成对抗网络能够同时捕获大范围的位置上和通道上的关联信息,从而使生成对抗网络学习更多的数据结构分布信息,使生成图像的目标结构更加合理,这对于图像生成质量的提升是非常有帮助的。与现有的技术相比,本发明提出的技术方案在使用通道增强和双注意力机制后,网络总的参数量增加甚微,生成对抗网络的生成性能得到提升。本发明生成的数据分布更接近于原始数据分布,且视觉效果更好,生成图像的质量更好,目标物体的结构更加正常自然。To sum up, the present invention provides a dual attention generative adversarial network based on channel enhancement. On the one hand, the present invention adds a channel correlation learning mechanism to the convolution block structure, recalibrates the features, and improves the model convolution block. The feature learning ability of generative adversarial network enhances the feature expression ability of generative adversarial network. Based on the ResBlock structure of the convolution block of the BigGAN model, a Squeeze-and-Excitation operation that can obtain channel attention is introduced, and a new convolution block SEResBlock is proposed. It has been verified that the generative adversarial network composed of channel-enhanced convolution blocks such as SEResBlock has better generation performance, and the learning speed of data distribution is also improved. On the other hand, the present invention considers that from the perspective of channels, a channel attention mechanism capable of establishing dependencies between feature channels is added to the attention mechanism part, and parallel with the self-attention mechanism, it is constructed into a dual attention mechanism module, which together from the Potential data structure associations that exist between modeled features between locations and channels. In practice, network features not only have correlations between locations, but also associated information on feature channels. Therefore, after the introduction of the dual attention mechanism, the generative adversarial network can simultaneously capture the correlation information on a large range of positions and channels, so that the generative adversarial network can learn more data structure distribution information and make the target structure of the generated image more reasonable. It is very helpful to improve the quality of image generation. Compared with the prior art, after using the channel enhancement and dual attention mechanism, the technical solution proposed by the present invention has a small increase in the total network parameters, and the generation performance of the generative adversarial network is improved. The data distribution generated by the invention is closer to the original data distribution, and the visual effect is better, the quality of the generated image is better, and the structure of the target object is more normal and natural.

本发明还提供了一种图像生成方法,包括如下步骤:The present invention also provides an image generation method, comprising the following steps:

S1、构造如上述任一项所述的双注意力生成对抗网络;S1. Construct the dual attention generative adversarial network as described in any of the above;

S2、获取训练集并输入所述双注意力生成对抗网络进行训练;S2, obtaining a training set and inputting the dual attention generative adversarial network for training;

S3、利用训练完成后的所述双注意力生成对抗网络生成图像。S3. Generate an image using the dual attention generative adversarial network after the training is completed.

特别地,在本发明一些优选的实施方式中,还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述任一实施方式中所述图像生成方法的步骤。In particular, in some preferred embodiments of the present invention, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, any of the foregoing implementations is implemented The steps of the image generation method described in the method.

在本发明另一些优选的实施方式中,还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述任一实施方式中所述图像生成方法的步骤。In some other preferred embodiments of the present invention, a computer-readable storage medium is also provided, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image generation method described in any of the above-mentioned embodiments are implemented. .

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述图像生成方法实施例的流程,在此不再重复说明。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium, When the computer program is executed, it may include the flow of the above-mentioned image generation method embodiments, and the description will not be repeated here.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A channel-enhanced dual-attention-generating countermeasure network, characterized by: comprises a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module;
the first convolution block and the second convolution block are both provided with a compression activation operation layer used for acquiring channel attention through compression activation operation;
the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses a location attention feature and a channel attention feature.
2. The dual-attention generating countermeasure network of claim 1,
the compression activation operation layer is used for executing the following operations:
averaging each layer of the characteristics M, and compressing the average to form a value to obtain a one-dimensional vector s, wherein the expression is as follows:
Figure FDA0002833459000000011
wherein the number of layers of the feature M is C, and the size of each layer is H multiplied by W, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively, activation functions ReLU and Sigmoid;
multiplying the weight characteristic vector w into the corresponding layer of the characteristic M to obtain the calibrated characteristic
Figure FDA0002833459000000025
The expression is as follows:
Figure FDA0002833459000000021
wherein, wnThe nth element of the weight feature vector w,
Figure FDA0002833459000000022
representation feature
Figure FDA0002833459000000023
The nth layer of (1).
3. The dual-attention generating countermeasure network of claim 2,
the compressed activation operation layer comprises an average pooling layer, a 1 × 1 convolution layer, a ReLU activation layer, a 1 × 1 convolution layer and a Sigmoid activation layer which are sequentially connected, and channel multiplication is carried out on output and input to obtain calibrated characteristics.
4. The dual-attention generating countermeasure network of claim 3,
the convolution block I comprises two linear layers, two batch normalization layers, two ReLU active layers, two up-sampling layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer and the compression activation operation layer;
the convolution block two comprises two ReLU active layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer, two average pooling layers and the compressed active operation layer.
5. The dual-attention generating countermeasure network of claim 1,
the channel attention unit is used for executing the following operations:
recombining the characteristic A to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
multiplying the feature A 'by the transpose of the feature A' to obtain softmax, and obtaining a feature map Q, wherein the size of the feature map Q is C multiplied by C, and an element Q in the feature map Q isjiThe expression is as follows:
Figure FDA0002833459000000024
wherein { i, j ═ 1,2, …, C }, a'iThe ith feature vector, a 'of feature A'T jA jth feature vector that is a transpose of feature a';
multiplying the feature graph Q and the feature A' and carrying out reverse recombination to obtain a channel attention feature T, wherein the expression is as follows:
Figure FDA0002833459000000031
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe jth feature vector representing feature a.
6. The dual-attention generating countermeasure network of claim 5,
the position attention unit is used for executing the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination to obtain a characteristic B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression is
Figure FDA0002833459000000033
Characteristic B dimension of
Figure FDA0002833459000000034
Dimension after recombination of
Figure FDA0002833459000000035
Figure FDA0002833459000000036
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination to obtain a characteristic O'; number of layers after compression
Figure FDA0002833459000000037
Characteristic dimension O of
Figure FDA0002833459000000038
Dimension after recombination of
Figure FDA0002833459000000039
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
Figure FDA0002833459000000032
wherein, b'iThe ith feature vector representing feature B',
Figure FDA00028334590000000310
a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the characteristic A by a 1 multiplied by 1 convolution h (x) function to obtain a characteristic V, and recombining to obtain a characteristic V'; the number of layers after extraction is still C;
multiplying the feature V' by the feature map P and performing inverse recombination to obtain the position attention feature S, wherein the expression is as follows:
Figure FDA0002833459000000041
wherein S isjJ-th feature vector representing position attention feature S, j ═ 1,2, …, N, v'iDenotes the ith feature vector representing the feature V', alpha denotes the learning parameter, initialized to 0, AjThe jth feature vector representing feature a.
7. The dual-attention generating countermeasure network of claim 6,
the double attention mechanism module fuses the position attention feature S and the channel attention feature T through a 3 x 3 convolution J (x) function and a 3 x 3 convolution K (x) function, and the expression is as follows:
U=J(S)+K(T)
where U represents the resulting fusion signature.
8. The dual-attention generating countermeasure network of claim 1,
the generator comprises a linear layer, a first rolling block I, a second rolling block I, a double-attention machine module, a third rolling block I, a first activation module and a Tanh layer which are sequentially connected;
the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block I, and the output of the generator is a forged image;
the discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are sequentially connected;
the input of the discriminator is an RGB image and a label y, and the output is a discrimination result of the RGB image.
9. The dual-attention generating countermeasure network of claim 1,
the loss function expression of the discriminator is as follows:
Figure FDA0002833459000000042
the loss function expression of the generator is as follows:
Figure FDA0002833459000000051
where x denotes an image, y is the corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,
Figure FDA0002833459000000052
representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,
Figure FDA0002833459000000053
representing z obeys pzAnd y obeys pdataIs expected to be calculated.
10. An image generation method, comprising the steps of:
s1, constructing a dual-attention generating countermeasure network as claimed in any one of claims 1 to 9;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
CN202011470128.6A 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method Active CN112580782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011470128.6A CN112580782B (en) 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011470128.6A CN112580782B (en) 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method

Publications (2)

Publication Number Publication Date
CN112580782A true CN112580782A (en) 2021-03-30
CN112580782B CN112580782B (en) 2024-02-09

Family

ID=75135850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011470128.6A Active CN112580782B (en) 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method

Country Status (1)

Country Link
CN (1) CN112580782B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095330A (en) * 2021-04-30 2021-07-09 辽宁工程技术大学 Compressive attention model for semantically segmenting pixel groups
CN113223181A (en) * 2021-06-02 2021-08-06 广东工业大学 Weak texture object pose estimation method
CN113344146A (en) * 2021-08-03 2021-09-03 武汉大学 Image classification method and system based on double attention mechanism and electronic equipment
CN113627590A (en) * 2021-07-29 2021-11-09 中汽创智科技有限公司 Attention module and attention mechanism of convolutional neural network and convolutional neural network
CN113744265A (en) * 2021-11-02 2021-12-03 成都东方天呈智能科技有限公司 Anomaly detection system, method and storage medium based on generation countermeasure network
CN113935977A (en) * 2021-10-22 2022-01-14 河北工业大学 Solar cell panel defect generation method based on generation countermeasure network
CN114842284A (en) * 2022-03-17 2022-08-02 兰州交通大学 Attention mechanism and DCGAN-based steel rail surface defect image expansion method
CN114863413A (en) * 2022-05-06 2022-08-05 上海锡鼎智能科技有限公司 A Dashboard Recognition Method Based on Super Score and Key Points
CN115099328A (en) * 2022-06-21 2022-09-23 重庆长安新能源汽车科技有限公司 Traffic flow prediction method, system, device and storage medium based on countermeasure network
CN115937994A (en) * 2023-01-06 2023-04-07 南昌大学 Data detection method based on deep learning detection model
CN116385725A (en) * 2023-06-02 2023-07-04 杭州聚秀科技有限公司 Fundus image optic disk and optic cup segmentation method and device and electronic equipment
CN117011918A (en) * 2023-08-08 2023-11-07 南京工程学院 Method for constructing human face living body detection model based on linear attention mechanism
CN118506553A (en) * 2024-07-17 2024-08-16 西华大学 AIoT anomaly identification method, disaster warning system and road safety system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140633A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Text topic extraction method, apparatus, electronic device, and storage medium
CN111429433A (en) * 2020-03-25 2020-07-17 北京工业大学 A Multi-Exposure Image Fusion Method Based on Attention Generative Adversarial Networks
CN111476717A (en) * 2020-04-07 2020-07-31 西安电子科技大学 Face image super-resolution reconstruction method based on self-attention generative adversarial network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140633A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Text topic extraction method, apparatus, electronic device, and storage medium
CN111429433A (en) * 2020-03-25 2020-07-17 北京工业大学 A Multi-Exposure Image Fusion Method Based on Attention Generative Adversarial Networks
CN111476717A (en) * 2020-04-07 2020-07-31 西安电子科技大学 Face image super-resolution reconstruction method based on self-attention generative adversarial network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹真;杨云;齐勇;李程辉;: "基于多损失约束与注意力块的图像修复方法", 陕西科技大学学报, no. 03 *
黄宏宇;谷子丰;: "一种基于自注意力机制的文本图像生成对抗网络", 重庆大学学报, no. 03 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095330A (en) * 2021-04-30 2021-07-09 辽宁工程技术大学 Compressive attention model for semantically segmenting pixel groups
CN113223181A (en) * 2021-06-02 2021-08-06 广东工业大学 Weak texture object pose estimation method
CN113627590A (en) * 2021-07-29 2021-11-09 中汽创智科技有限公司 Attention module and attention mechanism of convolutional neural network and convolutional neural network
CN113344146A (en) * 2021-08-03 2021-09-03 武汉大学 Image classification method and system based on double attention mechanism and electronic equipment
CN113344146B (en) * 2021-08-03 2021-11-02 武汉大学 Image classification method, system and electronic device based on dual attention mechanism
CN113935977A (en) * 2021-10-22 2022-01-14 河北工业大学 Solar cell panel defect generation method based on generation countermeasure network
CN113744265A (en) * 2021-11-02 2021-12-03 成都东方天呈智能科技有限公司 Anomaly detection system, method and storage medium based on generation countermeasure network
CN114842284A (en) * 2022-03-17 2022-08-02 兰州交通大学 Attention mechanism and DCGAN-based steel rail surface defect image expansion method
CN114863413A (en) * 2022-05-06 2022-08-05 上海锡鼎智能科技有限公司 A Dashboard Recognition Method Based on Super Score and Key Points
CN115099328A (en) * 2022-06-21 2022-09-23 重庆长安新能源汽车科技有限公司 Traffic flow prediction method, system, device and storage medium based on countermeasure network
CN115937994A (en) * 2023-01-06 2023-04-07 南昌大学 Data detection method based on deep learning detection model
CN116385725A (en) * 2023-06-02 2023-07-04 杭州聚秀科技有限公司 Fundus image optic disk and optic cup segmentation method and device and electronic equipment
CN116385725B (en) * 2023-06-02 2023-09-08 杭州聚秀科技有限公司 Fundus image optic disk and optic cup segmentation method and device and electronic equipment
CN117011918A (en) * 2023-08-08 2023-11-07 南京工程学院 Method for constructing human face living body detection model based on linear attention mechanism
CN117011918B (en) * 2023-08-08 2024-03-26 南京工程学院 Method for constructing human face living body detection model based on linear attention mechanism
CN118506553A (en) * 2024-07-17 2024-08-16 西华大学 AIoT anomaly identification method, disaster warning system and road safety system

Also Published As

Publication number Publication date
CN112580782B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN112580782A (en) Dual-attention Generative Adversarial Network and Image Generation Method Based on Channel Augmentation
CN116206185A (en) Lightweight small target detection method based on improved YOLOv7
CN111325579A (en) Advertisement click rate prediction method
CN113779675A (en) Physical-data-driven intelligent shear wall architectural design method and device
CN111968193A (en) Text image generation method based on StackGAN network
CN111582225A (en) A kind of remote sensing image scene classification method and device
CN113379655B (en) Image synthesis method for generating antagonistic network based on dynamic self-attention
CN111562612B (en) Deep learning microseismic event identification method and system based on attention mechanism
CN108985929A (en) Training method, business datum classification processing method and device, electronic equipment
CN116188836A (en) Remote sensing image classification method and device based on space and channel feature extraction
CN110175986A (en) A kind of stereo-picture vision significance detection method based on convolutional neural networks
CN113160057B (en) RPGAN image super-resolution reconstruction method based on generative confrontation network
CN112149662A (en) A Multimodal Fusion Saliency Detection Method Based on Dilated Convolution Blocks
CN110210492A (en) A kind of stereo-picture vision significance detection method based on deep learning
CN111222583B (en) Image steganalysis method based on countermeasure training and critical path extraction
CN116503499A (en) Sketch drawing generation method and system based on cyclic generation countermeasure network
CN112766381A (en) Attribute-guided SAR image generation method under limited sample
CN114943646B (en) Texture-guided gradient weight loss and attention mechanism super-resolution method
Yan et al. Machine learning based framework for rapid forecasting of the crack propagation
CN115496824A (en) Multi-class object-level natural image generation method based on hand drawing
CN118196107B (en) Panoramic image blind quality assessment method and system based on multi-cooperative network assistance
CN119647072A (en) A slope support reliability assessment method and device
CN118470791A (en) Action recognition method and device based on three-dimensional convolutional neural network
CN118608792A (en) A super lightweight image segmentation method and computer device based on Mamba
Li et al. ST2SI: Image Style Transfer via Vision Transformer using Spatial Interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant