CN112580782A - Channel enhancement-based double-attention generation countermeasure network and image generation method - Google Patents

Channel enhancement-based double-attention generation countermeasure network and image generation method Download PDF

Info

Publication number
CN112580782A
CN112580782A CN202011470128.6A CN202011470128A CN112580782A CN 112580782 A CN112580782 A CN 112580782A CN 202011470128 A CN202011470128 A CN 202011470128A CN 112580782 A CN112580782 A CN 112580782A
Authority
CN
China
Prior art keywords
attention
feature
layer
characteristic
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011470128.6A
Other languages
Chinese (zh)
Other versions
CN112580782B (en
Inventor
罗健旭
岳丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202011470128.6A priority Critical patent/CN112580782B/en
Publication of CN112580782A publication Critical patent/CN112580782A/en
Application granted granted Critical
Publication of CN112580782B publication Critical patent/CN112580782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a channel enhancement-based double-attention generation countermeasure network and an image generation method, wherein the network comprises a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module; a compression activation operation layer used for acquiring channel attention through compression activation operation is arranged in each of the convolution block I and the convolution block II; the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses the location attention feature and the channel attention feature. The invention can improve the generation performance of the generated countermeasure network, the generated data distribution is closer to the original data distribution, and the generated image quality is better.

Description

Channel enhancement-based double-attention generation countermeasure network and image generation method
Technical Field
The invention relates to the technical field of image generation, in particular to a channel enhancement-based double-attention generation countermeasure network and an image generation method.
Background
The generation countermeasure network (GAN) technique is mainly applied to the image generation direction. The generation countermeasure network model is composed of a generator and a discriminator, wherein the generator can generate images according to input noise vectors or class labels, the discriminator is used for distinguishing true and false of the images, the generator and the discriminator are subjected to countermeasure training, so that the generator learns true data distribution, and finally the generator can approach false images of the true images. The generation countermeasure network technology is widely applied to the directions of image enhancement, infrared image generation, medical image generation, image super-resolution reconstruction and the like.
At present, the main methods for improving the quality of the image generated by the generation of the confrontation network model include modifying the network structure, changing the loss function and establishing the characteristic space correlation. The self-attention generation countermeasure network model uses a self-attention mechanism capable of acquiring large-range spatial correlation, so that the structure of the generated image is more reasonable. On the basis of self-attention generation of the antagonistic network, the BigGAN model adjusts and deepens the structure of the antagonistic network, so that the network learning capacity is improved, and the generation quality is further improved.
Although the current GAN generation capability is strong, in the case of a small amount of prior knowledge, it is difficult to generate images of complex scenes with a wide range of relevance by using only the noise vector z and the conditional label y. The existing GAN model is difficult to generate images with complex structural distribution. The mechanism of self-attention, while enabling the generation of an antagonistic network to generate images with a wide range of correlations, has some drawbacks in generating the structure of the target object in the image. Meanwhile, images containing multiple target object scenes are difficult to generate, and the quality of the generated images needs to be improved.
Disclosure of Invention
The invention aims to overcome at least part of defects, and provides a generation countermeasure network model which can generate high-quality images aiming at complex scenes, and simultaneously enables the structural distribution of targets in the generated images to be more reasonable and the images to be more natural.
In order to achieve the above object, the present invention provides a channel-enhanced dual-attention-generating countermeasure network, comprising:
a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module;
the first convolution block and the second convolution block are both provided with a compression activation operation layer used for acquiring channel attention through compression activation operation;
the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses a location attention feature and a channel attention feature.
Preferably, the compression activation operation layer is configured to perform the following operations:
averaging each layer of the characteristics M, and compressing the average to form a value to obtain a one-dimensional vector s, wherein the expression is as follows:
Figure BDA0002833462010000021
wherein the number of layers of the feature M is C, and the size of each layer is H multiplied by W, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively, activation functions ReLU and Sigmoid;
multiplying the weight characteristic vector w into the corresponding layer of the characteristic M to obtain the calibrated characteristic
Figure BDA0002833462010000036
The expression is as follows:
Figure BDA0002833462010000031
wherein, wnThe nth element of the weight feature vector w,
Figure BDA0002833462010000032
representation feature
Figure BDA0002833462010000033
The nth layer of (1).
Preferably, the compressed activation operation layer includes an average pooling layer, a 1 × 1 convolution layer, a ReLU activation layer, a 1 × 1 convolution layer, and a Sigmoid activation layer, which are connected in sequence, and the output and the input are subjected to channel multiplication to obtain the calibrated characteristic.
Preferably, the convolution block one comprises two linear layers, two batch normalization layers, two ReLU active layers, two upsampling layers, two 3 × 3 convolution layers, a 1 × 1 convolution layer and the compressed active operation layer;
the convolution block two comprises two ReLU active layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer, two average pooling layers and the compressed active operation layer.
Preferably, the channel attention unit is configured to perform the following operations:
recombining the characteristic A to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
multiplying the feature A 'by the transpose of the feature A' to obtain softmax, and obtaining a feature map Q, wherein the size of the feature map Q is C multiplied by C, and an element Q in the feature map Q isjiThe expression is as follows:
Figure BDA0002833462010000034
wherein { i, j ═ 1,2,…,C},a′ithe i-th feature vector being feature a',
Figure BDA0002833462010000037
a jth feature vector that is a transpose of feature a';
multiplying the feature graph Q and the feature A' and carrying out reverse recombination to obtain a channel attention feature T, wherein the expression is as follows:
Figure BDA0002833462010000035
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe jth feature vector representing feature a.
Preferably, the location attention unit is configured to perform the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination to obtain a characteristic B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression is
Figure BDA0002833462010000041
Characteristic B dimension of
Figure BDA0002833462010000042
Dimension after recombination of
Figure BDA0002833462010000043
Figure BDA0002833462010000044
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination to obtain a characteristic O'; number of layers after compression
Figure BDA0002833462010000045
Characteristic dimension O of
Figure BDA0002833462010000046
Dimension after recombination of
Figure BDA0002833462010000047
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
Figure BDA0002833462010000048
wherein, b'iThe ith feature vector representing feature B',
Figure BDA0002833462010000049
a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the characteristic A by a 1 multiplied by 1 convolution h (x) function to obtain a characteristic V, and recombining to obtain a characteristic V'; the number of layers after extraction is still C;
multiplying the feature V' by the feature map P and performing inverse recombination to obtain the position attention feature S, wherein the expression is as follows:
Figure BDA00028334620100000410
wherein S isjJ-th feature vector representing position attention feature S, j ═ 1,2, …, N, v'iDenotes the ith feature vector representing the feature V', alpha denotes the learning parameter, initialized to 0, AjThe jth feature vector representing feature a.
Preferably, the dual attention mechanism module fuses the location attention feature S and the channel attention feature T by a 3 × 3 convolution j (x) function and a 3 × 3 convolution k (x) function, where the expression is:
U=J(S)+K(T)
where U represents the resulting fusion signature.
Preferably, the generator comprises a linear layer, a first rolling block I, a second rolling block I, a double-attention machine module, a third rolling block I, a first activation module and a Tanh layer which are connected in sequence;
the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block I, and the output of the generator is a forged image;
the discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are sequentially connected;
the input of the discriminator is an RGB image and a label y, and the output is a discrimination result of the RGB image.
Preferably, the loss function expression of the discriminator is:
Figure BDA0002833462010000051
the loss function expression of the generator is as follows:
Figure BDA0002833462010000052
where x denotes an image, y is the corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,
Figure BDA0002833462010000053
representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,
Figure BDA0002833462010000054
representing z obeys pzAnd y clothesFrom pdataIs expected to be calculated.
The invention also provides an image generation method, which comprises the following steps:
s1, configuring the dual attention generating countermeasure network as described in any one of the above;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
The technical scheme of the invention has the following advantages: the invention provides a channel-enhanced double-attention-generation confrontation network and an image generation method, wherein the double-attention-generation confrontation network provided by the invention is improved and promoted on the basis of the existing BigGAN model, a compression activation operation layer for acquiring the attention of a channel through compression activation operation is added in a volume block for generating the confrontation network, the characteristics of the generated confrontation network can be recalibrated, namely, some characteristic layers with stronger functions in the characteristics are enhanced, some useless characteristic layers are weakened, the characteristics of an intermediate layer of the network have higher expression capacity, the performance of the volume block is improved, and the characteristic learning capacity of the generated confrontation network model is improved; moreover, the double attention generating and confronting network adopts a double attention mechanism module, the double attention mechanism module not only comprises a position attention unit (with the same function as the self attention mechanism module) but also comprises a channel attention unit, after the double attention mechanism is introduced, the generating and confronting network can capture the related information on a large range of positions and channels simultaneously, the related information among image characteristic structures can be learned more comprehensively, the image characteristic relevance is further enhanced, and the generated image quality is improved.
Drawings
Fig. 1(a) and 1(b) are schematic diagrams of a channel enhancement-based dual-attention force generation countermeasure network structure in an embodiment of the present invention; wherein, FIG. 1(a) is a schematic diagram of a generator structure, and FIG. 1(b) is a schematic diagram of a discriminator structure;
FIG. 2 is a schematic flow chart of the compression activation operation according to an embodiment of the present invention;
fig. 3(a) is a schematic diagram of a structure of a convolution block i according to an embodiment of the present invention, and fig. 3(b) is a schematic diagram of a structure of a convolution block ii according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a dual attention mechanism in an embodiment of the present invention;
FIG. 5(a) shows a BigGAN model generated image result graph; fig. 5(b) shows a result graph of generating an image based on a channel enhanced dual-attention-generating countermeasure network in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1(a) to 4, an embodiment of the present invention provides a channel enhancement based dual-attention force generation countermeasure network, which is improved and promoted based on an existing BigGAN model, and includes a generator and a discriminator, where the generator includes a rolling block one and a dual-attention force mechanism module; the discriminator comprises a convolution block II and a double-attention mechanism module, wherein the convolution block I and the convolution block II are both convolution blocks for generating the countermeasure network.
And the convolution block I and the convolution block II are both provided with a compression activation operation layer for acquiring the attention of the channel through compression activation operation. The invention introduces a compression activation mechanism to improve a residual error structure convolution block ResBlock of the anti-network, and a compression activation operation layer can enhance some characteristic layers with stronger functions in the characteristic layers and weaken some useless characteristic layers, so that the characteristics of the middle layer of the network have more expression capability and characteristic extraction capability.
The dual attention mechanism module includes a position attention unit and a channel attention unit in parallel. The position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; and the double attention mechanism module fuses the position attention characteristic S and the channel attention characteristic T to obtain a characteristic U with rich associated information. The invention introduces a double attention mechanism (Dual attention), so that the generation countermeasure network can acquire the relevance information between a large range of positions and channels, and the structural distribution of the target object of the generated image is more natural.
Preferably, as shown in fig. 2, in the dual attention generation countermeasure network, the compression activation operation layer is configured to perform a compression activation (Squeeze-and-activation) operation capable of acquiring channel attention, the compression activation operation including the steps of:
averaging each layer of the characteristics M of the input compression activation operation layer, and compressing the average to a value to obtain a one-dimensional vector s, wherein the expression is as follows:
Figure BDA0002833462010000081
wherein, the number of layers of the characteristic M is C, namely the number of channels is C, the size of each layer is H multiplied by W, the length of the obtained one-dimensional vector s is C, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively an activation function ReLU and a Sigmoid, wherein the Sigmoid limits the value of w to be in a range of (0, 1), and the activation function is a nonlinear relation learned by a neural network;
multiplying the weighted feature vector w toObtaining calibrated features in the corresponding layer of the features M
Figure BDA0002833462010000085
The expression is as follows:
Figure BDA0002833462010000082
wherein, wnThe nth element of the weight feature vector w,
Figure BDA0002833462010000083
representation feature
Figure BDA0002833462010000084
The nth layer of (1).
Preferably, W1、W2The obtained compressed activation operation layer (SELayer) can use 1 × 1 convolution and comprises an Average Pooling layer (Average potential), a 1 × 1 convolution layer (1 × 1Conv), a ReLU activation layer (ReLU), a 1 × 1 convolution layer and a Sigmoid activation layer (Sigmoid) which are sequentially connected, and the output and the input of the compressed activation operation layer are subjected to channel multiplication to obtain the calibrated characteristic.
According to the invention, a compression activation operation layer SELayer is embedded in the structure of a convolution block ResBlock of the BigGAN model to obtain SEResBlock.
Fig. 3(a) is a schematic diagram of a SEResBlock structure in a generator, i.e., SEResBlock (g) in fig. 1(a), i.e., volume block one. Further, as shown in fig. 3(a), in a preferred embodiment, the convolution block one includes two Linear layers (Linear), two batch normalization layers (BatchNorm), two ReLU active layers, two upsampling layers (upsamplle), two 3 × 3 convolution layers (3 × 3Conv), 1 × 1 convolution layer (1 × 1Conv), and a compressed active operation layer. The category Condition input by the generator is respectively input into two batch normalization layers through two linear layers; the first batch normalization layer, the first ReLU active layer, the second up-sampling layer, the first 3 x 3 convolution layer, the second batch normalization layer, the second ReLU active layer and the second 3 x 3 convolution layer are connected in sequence, and the input of the first batch normalization layer is the output of the previous module; the second 3 x 3 convolution layer is connected with the compression activation operation layer, and the output of the second 3 x 3 convolution layer and the output of the compression activation operation layer are subjected to channel multiplication to obtain the calibrated characteristic; the output of the previous module passes through the first upsampling layer and the 1 × 1 convolutional layer in sequence, and then is summed element by element with the calibrated characteristic to be used as the output of the current module (i.e. the convolutional block one).
Fig. 3(b) is a schematic diagram of a SEResBlock structure in the arbiter, i.e., SEResBlock (d) in fig. 1(b), i.e., a convolution block two, which lacks a batch normalization layer compared to the convolution block one. As shown in fig. 3(b), convolution block two includes two ReLU active layers (ReLU), two 3 × 3 convolution layers (3 × 3Conv), 1 × 1 convolution layer (1 × 1Conv), two Average Pooling layers (Average Pooling), and a compression active operation layer. Wherein, the first ReLU active layer, the first 3 × 3 convolutional layer, the second ReLU active layer, the second 3 × 3 convolutional layer, and the second average pooling layer are connected in sequence, and the input thereof is the output of the previous module (i.e. the input of the convolutional block two); the second average pooling layer is connected with the compression activation operation layer, and the output of the second average pooling layer and the output of the compression activation operation layer are subjected to channel multiplication to obtain calibrated characteristics; the output of the previous module passes through the 1 × 1 convolution layer and the first average pooling layer in sequence, and then is summed element by element with the calibrated characteristics to be used as the output of the current module (i.e. the convolution block two).
In the convolution block I and the convolution block II, after the compression activation operation layer is used in the second 3 x 3 convolution layer, the characteristics learned by the whole convolution block are recalibrated, so that the complexity of a network is not excessively increased, and the performance is improved.
Preferably, as shown in fig. 4, in the module of the dual attention mechanism of the dual attention generating countermeasure network, the channel attention unit (part i in fig. 4) is configured to perform the following operations:
recombining (Reshape) the characteristic A input into the double-attention mechanism module to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
mixing feature A 'with feature A'Multiplying by the transpose of the feature map Q to obtain the softmax, wherein the size of the feature map Q is C multiplied by C, and the element Q in the feature map Q isjiThe expression is as follows:
Figure BDA0002833462010000101
wherein { i, j ═ 1,2, …, C }, a'iThe i-th feature vector being feature a',
Figure BDA00028334620100001010
for the transposed jth feature vector of feature a', the superscript "T" denotes the transpose;
multiplying the feature graph Q and the feature A' and performing inverse recombination (inverse Reshape) to obtain a channel attention feature T, wherein the expression is as follows:
Figure BDA0002833462010000102
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe j-th feature vector, j ═ 1,2, …, C, representing feature a.
Further, as shown in fig. 4, in the dual attention mechanism module, the position attention unit (part ii in fig. 4) is used to perform the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination (Reshape) to obtain a characteristic B'; the number of layers of the characteristic A is C, the size of each layer is H multiplied by W, and the number of layers after compression, namely the number of channels is
Figure BDA0002833462010000103
Characteristic B dimension of
Figure BDA0002833462010000104
Dimension after recombination (Reshape) of
Figure BDA0002833462010000105
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination (Reshape) to obtain a characteristic O'; number of channels after compression of
Figure BDA0002833462010000106
Is characterized in that O is
Figure BDA0002833462010000107
Dimension after recombination (Reshape) of
Figure BDA0002833462010000108
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
Figure BDA0002833462010000109
wherein, b'iThe ith feature vector representing feature B',
Figure BDA0002833462010000111
a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the feature A by using a 1 × 1 convolution h (x) function to obtain a feature V, and performing recombination (Reshape) to obtain a feature V', wherein the number of layers of the feature V after extraction is still C; obtaining V ' ═ V ' after the same group '1,v’2,...,v’N];
Multiplying the feature V' by the feature map P and performing inverse recombination (inverse Reshape) to obtain the position attention feature S, wherein the expression is as follows:
Figure BDA0002833462010000112
wherein S isjJ-th feature vector representing the position attention feature S, j ═ j1,2,…,N,v′iAn i-th feature vector representing the feature V', i ═ 1,2, …, N, α represent learning parameters, initialized to 0, ajThe jth feature vector representing feature a.
Further, the dual attention mechanism module fuses the location attention feature S and the channel attention feature T by a 3 × 3 convolution j (x) function and a 3 × 3 convolution k (x) function, and the expression is:
U=J(S)+K(T)
wherein U represents the fusion feature obtained by the dual-attention mechanism module. U has rich associated information.
As shown in fig. 1(a) and 1(b), in a preferred embodiment, the generator comprises a linear layer, a first volume block one, a second volume block one, a double-attention mechanism module, a third volume block one, a first activation module and a Tanh layer which are connected in sequence; the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block one, and the output of the generator is a forged image.
The discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are connected in sequence; the input of the discriminator is the forged image and the label y, and the output is the discrimination result.
Further, as shown in fig. 1(a), the linear layer performs linear calculation on the noise vector z, recombines (reshape) the noise vector z into a feature tensor of 4 × 4 × 16ch, where ch may be set to 64, then learns through the base layer seresblock (g), performs Batch Normalization (BN), ReLU activation, and 3 × 3 convolution processing by the first activation module, and then activates through the Tanh layer, so as to finally generate a forged image with a channel number ch of 3, where the forged image is an RGB image, where the double attention mechanism is placed in a middle and rear layer of the network feature and is located at the same position as the self-attention mechanism in the BigGAN model.
As shown in fig. 1(b), the discriminator network structure is composed of seresblock (d) and a double-attention mechanism, the structure is opposite to the structure in the generator, the function is to discriminate whether the input RGB image and the label y are true, the finally generated features are subjected to ReLU activation and Global sum pooling (Global sum pooling) processing by the second activation module, and then are judged by linear transformation and embedded label y fusion to judge whether the input RGB image is true or false.
Preferably, the Loss function of the generator and arbiter uses Hinge Loss. Wherein, the loss function expression of the discriminator is:
Figure BDA0002833462010000121
the loss function expression of the generator is:
Figure BDA0002833462010000122
where x represents an image, y is its corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,
Figure BDA0002833462010000123
representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,
Figure BDA0002833462010000124
representing z obeys pzAnd y obeys pdataIs expected to be calculated.
In a preferred embodiment, the learning rate of the generator and the discriminator is set to 0.0002, and the learning rate attenuation strategy is adopted, with an attenuation rate of 0.9999. The network part adopts a frequency spectrum standardization method to adjust the weight in the network training process, so that the training of the generated countermeasure network is more stable. The number of training data Batch of the network is 64, and the total number of iterations is 10000.
Furthermore, the invention adopts Mini-Batch random gradient descent training, firstly trains the loss function of the discriminator, and then trains the generator. The pseudo code of the specific optimization process is shown in table 1:
TABLE 1 optimization procedure for generation of countermeasure networks
Figure BDA0002833462010000131
Wherein,
Figure BDA0002833462010000132
representing the gradient value of the generator, derived by discriminating the direction of loss, θdThe network parameters representing the model of the arbiter,
Figure BDA0002833462010000133
representing the gradient value of the discriminator, the generator loss function being inversely derived, θgNetwork parameters representing the generator model.
In particular, to qualitatively assess the effectiveness of the inventive dual-attention generating countermeasure network, one may use
Figure BDA0002833462010000134
Evaluation index (FID). FID index when evaluating an image, the smaller the index, the better the quality of the image generated. In the FID calculation process, firstly, the generated image and the real image are subjected to Inception V3 network extraction feature vectors, then normal distribution obeyed by real data and normal distribution obeyed by generated data are respectively calculated, and then the distance between the two data distributions is calculated.
The invention utilizes ImageNet public data set to compare and verify the performance of the existing BigGAN model and the double-attention-generation countermeasure network provided by the invention, extracts part of the classification of the ImageNet public data set, adjusts the classification to 128 multiplied by 128 resolution, and uses FID index to evaluate, and the results of part of generated images are shown in figures 5(a) and 5 (b). The evaluation result shows that the FID index obtained by using the BigGAN model is 17.43, while the FID index obtained by the channel enhancement-based double-attention generation countermeasure network (SEDA-GAN) provided by the invention is 14.89, which is improved by 14.57%.
Moreover, the structural change brought by the invention is more obvious, fig. 5(a) shows that the BigGAN model randomly generates some samples, and fig. 5(b) shows that the SEDA-GAN model randomly generates some samples, so that the organ structure position of goldfishes in the samples generated by the BigGAN model is inaccurate, a plurality of goldfishes are disordered, the cup mouths of coffee cups are not round, the structures of trucks and wooden houses have some defects, the structural distribution of the samples generated by the SEDA-GAN model under the condition of a plurality of goldfishes is more natural, the coffee cups are more round, the integral structure of the trucks is more natural, the building structure of the wooden houses is more straight, and the integral visual effect is improved.
In summary, according to the channel-enhanced dual-attention-based generation countermeasure network provided by the invention, on one hand, a channel association learning mechanism is added to the convolution block structure, the features are recalibrated, the feature learning capability of the model convolution block is improved, and the feature expression capability of the generation countermeasure network is enhanced. A compression-and-activation (Squeeze-and-Excitation) operation capable of acquiring the attention of a channel is introduced on the basis of a convolution block ResBlock structure of a BigGAN model, and a new convolution block SEResBlock is provided. Through verification, the generation countermeasure network formed by the convolution block enhanced by the SEResBlock channel has better generation performance, and the learning speed of data distribution is improved. On the other hand, the invention considers that a channel attention mechanism capable of establishing the dependency between the characteristic channels is added in the attention mechanism part from the channel perspective, and is constructed into a double attention mechanism module in parallel with the self attention mechanism, and the potential data structure relevance existing between the position and the inter-channel modeling characteristic is jointly modeled. In practice, not only the correlation between the positions but also the correlation information exists on the feature channels in the network features. Therefore, after the double attention mechanism is introduced, the generation countermeasure network can capture the associated information on a large range of positions and channels simultaneously, so that the generation countermeasure network learns more data structure distribution information, the target structure of the generated image is more reasonable, and the generation countermeasure network is very helpful to the improvement of the image generation quality. Compared with the prior art, the technical scheme provided by the invention has the advantages that after the channel enhancement and the double-attention machine mechanism are used, the total parameter quantity of the network is increased slightly, and the generation performance of the generated countermeasure network is improved. The data distribution generated by the method is closer to the original data distribution, the visual effect is better, the quality of the generated image is better, and the structure of the target object is more normal and natural.
The invention also provides an image generation method, which comprises the following steps:
s1, configuring the dual attention generating countermeasure network as described in any one of the above;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
In particular, in some preferred embodiments of the present invention, there is also provided a computer device, including a memory and a processor, the memory storing a computer program, and the processor implementing the steps of the image generation method in any one of the above embodiments when executing the computer program.
In other preferred embodiments of the present invention, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by a processor implements the steps of the image generation method described in any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the method according to the above embodiments may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and the computer program may include the processes of the embodiments of the image generation method, and will not be described again here.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A channel-enhanced dual-attention-generating countermeasure network, characterized by: comprises a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module;
the first convolution block and the second convolution block are both provided with a compression activation operation layer used for acquiring channel attention through compression activation operation;
the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses a location attention feature and a channel attention feature.
2. The dual-attention generating countermeasure network of claim 1,
the compression activation operation layer is used for executing the following operations:
averaging each layer of the characteristics M, and compressing the average to form a value to obtain a one-dimensional vector s, wherein the expression is as follows:
Figure FDA0002833459000000011
wherein the number of layers of the feature M is C, and the size of each layer is H multiplied by W, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively, activation functions ReLU and Sigmoid;
multiplying the weight characteristic vector w into the corresponding layer of the characteristic M to obtain the calibrated characteristic
Figure FDA0002833459000000025
The expression is as follows:
Figure FDA0002833459000000021
wherein, wnThe nth element of the weight feature vector w,
Figure FDA0002833459000000022
representation feature
Figure FDA0002833459000000023
The nth layer of (1).
3. The dual-attention generating countermeasure network of claim 2,
the compressed activation operation layer comprises an average pooling layer, a 1 × 1 convolution layer, a ReLU activation layer, a 1 × 1 convolution layer and a Sigmoid activation layer which are sequentially connected, and channel multiplication is carried out on output and input to obtain calibrated characteristics.
4. The dual-attention generating countermeasure network of claim 3,
the convolution block I comprises two linear layers, two batch normalization layers, two ReLU active layers, two up-sampling layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer and the compression activation operation layer;
the convolution block two comprises two ReLU active layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer, two average pooling layers and the compressed active operation layer.
5. The dual-attention generating countermeasure network of claim 1,
the channel attention unit is used for executing the following operations:
recombining the characteristic A to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
multiplying the feature A 'by the transpose of the feature A' to obtain softmax, and obtaining a feature map Q, wherein the size of the feature map Q is C multiplied by C, and an element Q in the feature map Q isjiThe expression is as follows:
Figure FDA0002833459000000024
wherein { i, j ═ 1,2, …, C }, a'iThe ith feature vector, a 'of feature A'T jA jth feature vector that is a transpose of feature a';
multiplying the feature graph Q and the feature A' and carrying out reverse recombination to obtain a channel attention feature T, wherein the expression is as follows:
Figure FDA0002833459000000031
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe jth feature vector representing feature a.
6. The dual-attention generating countermeasure network of claim 5,
the position attention unit is used for executing the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination to obtain a characteristic B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression is
Figure FDA0002833459000000033
Characteristic B dimension of
Figure FDA0002833459000000034
Dimension after recombination of
Figure FDA0002833459000000035
Figure FDA0002833459000000036
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination to obtain a characteristic O'; number of layers after compression
Figure FDA0002833459000000037
Characteristic dimension O of
Figure FDA0002833459000000038
Dimension after recombination of
Figure FDA0002833459000000039
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
Figure FDA0002833459000000032
wherein, b'iThe ith feature vector representing feature B',
Figure FDA00028334590000000310
a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the characteristic A by a 1 multiplied by 1 convolution h (x) function to obtain a characteristic V, and recombining to obtain a characteristic V'; the number of layers after extraction is still C;
multiplying the feature V' by the feature map P and performing inverse recombination to obtain the position attention feature S, wherein the expression is as follows:
Figure FDA0002833459000000041
wherein S isjJ-th feature vector representing position attention feature S, j ═ 1,2, …, N, v'iDenotes the ith feature vector representing the feature V', alpha denotes the learning parameter, initialized to 0, AjThe jth feature vector representing feature a.
7. The dual-attention generating countermeasure network of claim 6,
the double attention mechanism module fuses the position attention feature S and the channel attention feature T through a 3 x 3 convolution J (x) function and a 3 x 3 convolution K (x) function, and the expression is as follows:
U=J(S)+K(T)
where U represents the resulting fusion signature.
8. The dual-attention generating countermeasure network of claim 1,
the generator comprises a linear layer, a first rolling block I, a second rolling block I, a double-attention machine module, a third rolling block I, a first activation module and a Tanh layer which are sequentially connected;
the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block I, and the output of the generator is a forged image;
the discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are sequentially connected;
the input of the discriminator is an RGB image and a label y, and the output is a discrimination result of the RGB image.
9. The dual-attention generating countermeasure network of claim 1,
the loss function expression of the discriminator is as follows:
Figure FDA0002833459000000042
the loss function expression of the generator is as follows:
Figure FDA0002833459000000051
where x denotes an image, y is the corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,
Figure FDA0002833459000000052
representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,
Figure FDA0002833459000000053
representing z obeys pzAnd y obeys pdataIs expected to be calculated.
10. An image generation method, comprising the steps of:
s1, constructing a dual-attention generating countermeasure network as claimed in any one of claims 1 to 9;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
CN202011470128.6A 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method Active CN112580782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011470128.6A CN112580782B (en) 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011470128.6A CN112580782B (en) 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method

Publications (2)

Publication Number Publication Date
CN112580782A true CN112580782A (en) 2021-03-30
CN112580782B CN112580782B (en) 2024-02-09

Family

ID=75135850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011470128.6A Active CN112580782B (en) 2020-12-14 2020-12-14 Channel-enhanced dual-attention generation countermeasure network and image generation method

Country Status (1)

Country Link
CN (1) CN112580782B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095330A (en) * 2021-04-30 2021-07-09 辽宁工程技术大学 Compressive attention model for semantically segmenting pixel groups
CN113223181A (en) * 2021-06-02 2021-08-06 广东工业大学 Weak texture object pose estimation method
CN113344146A (en) * 2021-08-03 2021-09-03 武汉大学 Image classification method and system based on double attention mechanism and electronic equipment
CN113627590A (en) * 2021-07-29 2021-11-09 中汽创智科技有限公司 Attention module and attention mechanism of convolutional neural network and convolutional neural network
CN113744265A (en) * 2021-11-02 2021-12-03 成都东方天呈智能科技有限公司 Anomaly detection system, method and storage medium based on generation countermeasure network
CN113935977A (en) * 2021-10-22 2022-01-14 河北工业大学 Solar cell panel defect generation method based on generation countermeasure network
CN115937994A (en) * 2023-01-06 2023-04-07 南昌大学 Data detection method based on deep learning detection model
CN116385725A (en) * 2023-06-02 2023-07-04 杭州聚秀科技有限公司 Fundus image optic disk and optic cup segmentation method and device and electronic equipment
CN117011918A (en) * 2023-08-08 2023-11-07 南京工程学院 Method for constructing human face living body detection model based on linear attention mechanism
CN118506553A (en) * 2024-07-17 2024-08-16 西华大学 AIoT anomaly identification method, disaster early warning system and road safety system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140633A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Text topic extraction method, apparatus, electronic device, and storage medium
CN111429433A (en) * 2020-03-25 2020-07-17 北京工业大学 Multi-exposure image fusion method based on attention generation countermeasure network
CN111476717A (en) * 2020-04-07 2020-07-31 西安电子科技大学 Face image super-resolution reconstruction method based on self-attention generation countermeasure network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140633A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Text topic extraction method, apparatus, electronic device, and storage medium
CN111429433A (en) * 2020-03-25 2020-07-17 北京工业大学 Multi-exposure image fusion method based on attention generation countermeasure network
CN111476717A (en) * 2020-04-07 2020-07-31 西安电子科技大学 Face image super-resolution reconstruction method based on self-attention generation countermeasure network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹真;杨云;齐勇;李程辉;: "基于多损失约束与注意力块的图像修复方法", 陕西科技大学学报, no. 03 *
黄宏宇;谷子丰;: "一种基于自注意力机制的文本图像生成对抗网络", 重庆大学学报, no. 03 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095330A (en) * 2021-04-30 2021-07-09 辽宁工程技术大学 Compressive attention model for semantically segmenting pixel groups
CN113223181A (en) * 2021-06-02 2021-08-06 广东工业大学 Weak texture object pose estimation method
CN113627590A (en) * 2021-07-29 2021-11-09 中汽创智科技有限公司 Attention module and attention mechanism of convolutional neural network and convolutional neural network
CN113344146A (en) * 2021-08-03 2021-09-03 武汉大学 Image classification method and system based on double attention mechanism and electronic equipment
CN113344146B (en) * 2021-08-03 2021-11-02 武汉大学 Image classification method and system based on double attention mechanism and electronic equipment
CN113935977A (en) * 2021-10-22 2022-01-14 河北工业大学 Solar cell panel defect generation method based on generation countermeasure network
CN113744265A (en) * 2021-11-02 2021-12-03 成都东方天呈智能科技有限公司 Anomaly detection system, method and storage medium based on generation countermeasure network
CN115937994A (en) * 2023-01-06 2023-04-07 南昌大学 Data detection method based on deep learning detection model
CN116385725A (en) * 2023-06-02 2023-07-04 杭州聚秀科技有限公司 Fundus image optic disk and optic cup segmentation method and device and electronic equipment
CN116385725B (en) * 2023-06-02 2023-09-08 杭州聚秀科技有限公司 Fundus image optic disk and optic cup segmentation method and device and electronic equipment
CN117011918A (en) * 2023-08-08 2023-11-07 南京工程学院 Method for constructing human face living body detection model based on linear attention mechanism
CN117011918B (en) * 2023-08-08 2024-03-26 南京工程学院 Method for constructing human face living body detection model based on linear attention mechanism
CN118506553A (en) * 2024-07-17 2024-08-16 西华大学 AIoT anomaly identification method, disaster early warning system and road safety system

Also Published As

Publication number Publication date
CN112580782B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN112580782A (en) Channel enhancement-based double-attention generation countermeasure network and image generation method
CN111259930B (en) General target detection method of self-adaptive attention guidance mechanism
US11450066B2 (en) 3D reconstruction method based on deep learning
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109522857B (en) People number estimation method based on generation type confrontation network model
CN113779675B (en) Physical-data driven intelligent shear wall building structure design method and device
CN111476717A (en) Face image super-resolution reconstruction method based on self-attention generation countermeasure network
CN113688723A (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN108596329A (en) Threedimensional model sorting technique based on end-to-end Deep integrating learning network
CN112766279B (en) Image feature extraction method based on combined attention mechanism
CN114758288B (en) Power distribution network engineering safety control detection method and device
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN113642621A (en) Zero sample image classification method based on generation countermeasure network
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN115222998B (en) Image classification method
CN110390107A (en) Hereafter relationship detection method, device and computer equipment based on artificial intelligence
CN106056059A (en) Multidirectional SLGS characteristic description and performance cloud weight fusion face recognition method
CN113569805A (en) Action recognition method and device, electronic equipment and storage medium
CN111222583B (en) Image steganalysis method based on countermeasure training and critical path extraction
CN111371611B (en) Weighted network community discovery method and device based on deep learning
CN114187506A (en) Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network
CN113420833A (en) Visual question-answering method and device based on question semantic mapping
CN117671261A (en) Passive domain noise perception domain self-adaptive segmentation method for remote sensing image
CN108985385A (en) Based on the quick Weakly supervised object detection method for generating confrontation study
CN116596915A (en) Blind image quality evaluation method based on multi-scale characteristics and long-distance dependence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant