CN112580782A - Channel enhancement-based double-attention generation countermeasure network and image generation method - Google Patents
Channel enhancement-based double-attention generation countermeasure network and image generation method Download PDFInfo
- Publication number
- CN112580782A CN112580782A CN202011470128.6A CN202011470128A CN112580782A CN 112580782 A CN112580782 A CN 112580782A CN 202011470128 A CN202011470128 A CN 202011470128A CN 112580782 A CN112580782 A CN 112580782A
- Authority
- CN
- China
- Prior art keywords
- attention
- feature
- layer
- characteristic
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000004913 activation Effects 0.000 claims abstract description 50
- 230000007246 mechanism Effects 0.000 claims abstract description 45
- 238000007906 compression Methods 0.000 claims abstract description 39
- 230000006835 compression Effects 0.000 claims abstract description 37
- 238000005096 rolling process Methods 0.000 claims abstract description 25
- 238000009826 distribution Methods 0.000 claims abstract description 23
- 230000009977 dual effect Effects 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 60
- 230000006870 function Effects 0.000 claims description 33
- 238000005215 recombination Methods 0.000 claims description 19
- 230000006798 recombination Effects 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 5
- 241000252229 Carassius auratus Species 0.000 description 3
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 3
- 230000003042 antagnostic effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a channel enhancement-based double-attention generation countermeasure network and an image generation method, wherein the network comprises a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module; a compression activation operation layer used for acquiring channel attention through compression activation operation is arranged in each of the convolution block I and the convolution block II; the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses the location attention feature and the channel attention feature. The invention can improve the generation performance of the generated countermeasure network, the generated data distribution is closer to the original data distribution, and the generated image quality is better.
Description
Technical Field
The invention relates to the technical field of image generation, in particular to a channel enhancement-based double-attention generation countermeasure network and an image generation method.
Background
The generation countermeasure network (GAN) technique is mainly applied to the image generation direction. The generation countermeasure network model is composed of a generator and a discriminator, wherein the generator can generate images according to input noise vectors or class labels, the discriminator is used for distinguishing true and false of the images, the generator and the discriminator are subjected to countermeasure training, so that the generator learns true data distribution, and finally the generator can approach false images of the true images. The generation countermeasure network technology is widely applied to the directions of image enhancement, infrared image generation, medical image generation, image super-resolution reconstruction and the like.
At present, the main methods for improving the quality of the image generated by the generation of the confrontation network model include modifying the network structure, changing the loss function and establishing the characteristic space correlation. The self-attention generation countermeasure network model uses a self-attention mechanism capable of acquiring large-range spatial correlation, so that the structure of the generated image is more reasonable. On the basis of self-attention generation of the antagonistic network, the BigGAN model adjusts and deepens the structure of the antagonistic network, so that the network learning capacity is improved, and the generation quality is further improved.
Although the current GAN generation capability is strong, in the case of a small amount of prior knowledge, it is difficult to generate images of complex scenes with a wide range of relevance by using only the noise vector z and the conditional label y. The existing GAN model is difficult to generate images with complex structural distribution. The mechanism of self-attention, while enabling the generation of an antagonistic network to generate images with a wide range of correlations, has some drawbacks in generating the structure of the target object in the image. Meanwhile, images containing multiple target object scenes are difficult to generate, and the quality of the generated images needs to be improved.
Disclosure of Invention
The invention aims to overcome at least part of defects, and provides a generation countermeasure network model which can generate high-quality images aiming at complex scenes, and simultaneously enables the structural distribution of targets in the generated images to be more reasonable and the images to be more natural.
In order to achieve the above object, the present invention provides a channel-enhanced dual-attention-generating countermeasure network, comprising:
a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module;
the first convolution block and the second convolution block are both provided with a compression activation operation layer used for acquiring channel attention through compression activation operation;
the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses a location attention feature and a channel attention feature.
Preferably, the compression activation operation layer is configured to perform the following operations:
averaging each layer of the characteristics M, and compressing the average to form a value to obtain a one-dimensional vector s, wherein the expression is as follows:
wherein the number of layers of the feature M is C, and the size of each layer is H multiplied by W, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively, activation functions ReLU and Sigmoid;
multiplying the weight characteristic vector w into the corresponding layer of the characteristic M to obtain the calibrated characteristicThe expression is as follows:
wherein, wnThe nth element of the weight feature vector w,representation featureThe nth layer of (1).
Preferably, the compressed activation operation layer includes an average pooling layer, a 1 × 1 convolution layer, a ReLU activation layer, a 1 × 1 convolution layer, and a Sigmoid activation layer, which are connected in sequence, and the output and the input are subjected to channel multiplication to obtain the calibrated characteristic.
Preferably, the convolution block one comprises two linear layers, two batch normalization layers, two ReLU active layers, two upsampling layers, two 3 × 3 convolution layers, a 1 × 1 convolution layer and the compressed active operation layer;
the convolution block two comprises two ReLU active layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer, two average pooling layers and the compressed active operation layer.
Preferably, the channel attention unit is configured to perform the following operations:
recombining the characteristic A to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
multiplying the feature A 'by the transpose of the feature A' to obtain softmax, and obtaining a feature map Q, wherein the size of the feature map Q is C multiplied by C, and an element Q in the feature map Q isjiThe expression is as follows:
wherein { i, j ═ 1,2,…,C},a′ithe i-th feature vector being feature a',a jth feature vector that is a transpose of feature a';
multiplying the feature graph Q and the feature A' and carrying out reverse recombination to obtain a channel attention feature T, wherein the expression is as follows:
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe jth feature vector representing feature a.
Preferably, the location attention unit is configured to perform the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination to obtain a characteristic B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression isCharacteristic B dimension ofDimension after recombination of
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination to obtain a characteristic O'; number of layers after compressionCharacteristic dimension O ofDimension after recombination of
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
wherein, b'iThe ith feature vector representing feature B',a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the characteristic A by a 1 multiplied by 1 convolution h (x) function to obtain a characteristic V, and recombining to obtain a characteristic V'; the number of layers after extraction is still C;
multiplying the feature V' by the feature map P and performing inverse recombination to obtain the position attention feature S, wherein the expression is as follows:
wherein S isjJ-th feature vector representing position attention feature S, j ═ 1,2, …, N, v'iDenotes the ith feature vector representing the feature V', alpha denotes the learning parameter, initialized to 0, AjThe jth feature vector representing feature a.
Preferably, the dual attention mechanism module fuses the location attention feature S and the channel attention feature T by a 3 × 3 convolution j (x) function and a 3 × 3 convolution k (x) function, where the expression is:
U=J(S)+K(T)
where U represents the resulting fusion signature.
Preferably, the generator comprises a linear layer, a first rolling block I, a second rolling block I, a double-attention machine module, a third rolling block I, a first activation module and a Tanh layer which are connected in sequence;
the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block I, and the output of the generator is a forged image;
the discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are sequentially connected;
the input of the discriminator is an RGB image and a label y, and the output is a discrimination result of the RGB image.
Preferably, the loss function expression of the discriminator is:
the loss function expression of the generator is as follows:
where x denotes an image, y is the corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,representing z obeys pzAnd y clothesFrom pdataIs expected to be calculated.
The invention also provides an image generation method, which comprises the following steps:
s1, configuring the dual attention generating countermeasure network as described in any one of the above;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
The technical scheme of the invention has the following advantages: the invention provides a channel-enhanced double-attention-generation confrontation network and an image generation method, wherein the double-attention-generation confrontation network provided by the invention is improved and promoted on the basis of the existing BigGAN model, a compression activation operation layer for acquiring the attention of a channel through compression activation operation is added in a volume block for generating the confrontation network, the characteristics of the generated confrontation network can be recalibrated, namely, some characteristic layers with stronger functions in the characteristics are enhanced, some useless characteristic layers are weakened, the characteristics of an intermediate layer of the network have higher expression capacity, the performance of the volume block is improved, and the characteristic learning capacity of the generated confrontation network model is improved; moreover, the double attention generating and confronting network adopts a double attention mechanism module, the double attention mechanism module not only comprises a position attention unit (with the same function as the self attention mechanism module) but also comprises a channel attention unit, after the double attention mechanism is introduced, the generating and confronting network can capture the related information on a large range of positions and channels simultaneously, the related information among image characteristic structures can be learned more comprehensively, the image characteristic relevance is further enhanced, and the generated image quality is improved.
Drawings
Fig. 1(a) and 1(b) are schematic diagrams of a channel enhancement-based dual-attention force generation countermeasure network structure in an embodiment of the present invention; wherein, FIG. 1(a) is a schematic diagram of a generator structure, and FIG. 1(b) is a schematic diagram of a discriminator structure;
FIG. 2 is a schematic flow chart of the compression activation operation according to an embodiment of the present invention;
fig. 3(a) is a schematic diagram of a structure of a convolution block i according to an embodiment of the present invention, and fig. 3(b) is a schematic diagram of a structure of a convolution block ii according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a dual attention mechanism in an embodiment of the present invention;
FIG. 5(a) shows a BigGAN model generated image result graph; fig. 5(b) shows a result graph of generating an image based on a channel enhanced dual-attention-generating countermeasure network in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1(a) to 4, an embodiment of the present invention provides a channel enhancement based dual-attention force generation countermeasure network, which is improved and promoted based on an existing BigGAN model, and includes a generator and a discriminator, where the generator includes a rolling block one and a dual-attention force mechanism module; the discriminator comprises a convolution block II and a double-attention mechanism module, wherein the convolution block I and the convolution block II are both convolution blocks for generating the countermeasure network.
And the convolution block I and the convolution block II are both provided with a compression activation operation layer for acquiring the attention of the channel through compression activation operation. The invention introduces a compression activation mechanism to improve a residual error structure convolution block ResBlock of the anti-network, and a compression activation operation layer can enhance some characteristic layers with stronger functions in the characteristic layers and weaken some useless characteristic layers, so that the characteristics of the middle layer of the network have more expression capability and characteristic extraction capability.
The dual attention mechanism module includes a position attention unit and a channel attention unit in parallel. The position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; and the double attention mechanism module fuses the position attention characteristic S and the channel attention characteristic T to obtain a characteristic U with rich associated information. The invention introduces a double attention mechanism (Dual attention), so that the generation countermeasure network can acquire the relevance information between a large range of positions and channels, and the structural distribution of the target object of the generated image is more natural.
Preferably, as shown in fig. 2, in the dual attention generation countermeasure network, the compression activation operation layer is configured to perform a compression activation (Squeeze-and-activation) operation capable of acquiring channel attention, the compression activation operation including the steps of:
averaging each layer of the characteristics M of the input compression activation operation layer, and compressing the average to a value to obtain a one-dimensional vector s, wherein the expression is as follows:
wherein, the number of layers of the characteristic M is C, namely the number of channels is C, the size of each layer is H multiplied by W, the length of the obtained one-dimensional vector s is C, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively an activation function ReLU and a Sigmoid, wherein the Sigmoid limits the value of w to be in a range of (0, 1), and the activation function is a nonlinear relation learned by a neural network;
multiplying the weighted feature vector w toObtaining calibrated features in the corresponding layer of the features MThe expression is as follows:
wherein, wnThe nth element of the weight feature vector w,representation featureThe nth layer of (1).
Preferably, W1、W2The obtained compressed activation operation layer (SELayer) can use 1 × 1 convolution and comprises an Average Pooling layer (Average potential), a 1 × 1 convolution layer (1 × 1Conv), a ReLU activation layer (ReLU), a 1 × 1 convolution layer and a Sigmoid activation layer (Sigmoid) which are sequentially connected, and the output and the input of the compressed activation operation layer are subjected to channel multiplication to obtain the calibrated characteristic.
According to the invention, a compression activation operation layer SELayer is embedded in the structure of a convolution block ResBlock of the BigGAN model to obtain SEResBlock.
Fig. 3(a) is a schematic diagram of a SEResBlock structure in a generator, i.e., SEResBlock (g) in fig. 1(a), i.e., volume block one. Further, as shown in fig. 3(a), in a preferred embodiment, the convolution block one includes two Linear layers (Linear), two batch normalization layers (BatchNorm), two ReLU active layers, two upsampling layers (upsamplle), two 3 × 3 convolution layers (3 × 3Conv), 1 × 1 convolution layer (1 × 1Conv), and a compressed active operation layer. The category Condition input by the generator is respectively input into two batch normalization layers through two linear layers; the first batch normalization layer, the first ReLU active layer, the second up-sampling layer, the first 3 x 3 convolution layer, the second batch normalization layer, the second ReLU active layer and the second 3 x 3 convolution layer are connected in sequence, and the input of the first batch normalization layer is the output of the previous module; the second 3 x 3 convolution layer is connected with the compression activation operation layer, and the output of the second 3 x 3 convolution layer and the output of the compression activation operation layer are subjected to channel multiplication to obtain the calibrated characteristic; the output of the previous module passes through the first upsampling layer and the 1 × 1 convolutional layer in sequence, and then is summed element by element with the calibrated characteristic to be used as the output of the current module (i.e. the convolutional block one).
Fig. 3(b) is a schematic diagram of a SEResBlock structure in the arbiter, i.e., SEResBlock (d) in fig. 1(b), i.e., a convolution block two, which lacks a batch normalization layer compared to the convolution block one. As shown in fig. 3(b), convolution block two includes two ReLU active layers (ReLU), two 3 × 3 convolution layers (3 × 3Conv), 1 × 1 convolution layer (1 × 1Conv), two Average Pooling layers (Average Pooling), and a compression active operation layer. Wherein, the first ReLU active layer, the first 3 × 3 convolutional layer, the second ReLU active layer, the second 3 × 3 convolutional layer, and the second average pooling layer are connected in sequence, and the input thereof is the output of the previous module (i.e. the input of the convolutional block two); the second average pooling layer is connected with the compression activation operation layer, and the output of the second average pooling layer and the output of the compression activation operation layer are subjected to channel multiplication to obtain calibrated characteristics; the output of the previous module passes through the 1 × 1 convolution layer and the first average pooling layer in sequence, and then is summed element by element with the calibrated characteristics to be used as the output of the current module (i.e. the convolution block two).
In the convolution block I and the convolution block II, after the compression activation operation layer is used in the second 3 x 3 convolution layer, the characteristics learned by the whole convolution block are recalibrated, so that the complexity of a network is not excessively increased, and the performance is improved.
Preferably, as shown in fig. 4, in the module of the dual attention mechanism of the dual attention generating countermeasure network, the channel attention unit (part i in fig. 4) is configured to perform the following operations:
recombining (Reshape) the characteristic A input into the double-attention mechanism module to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
mixing feature A 'with feature A'Multiplying by the transpose of the feature map Q to obtain the softmax, wherein the size of the feature map Q is C multiplied by C, and the element Q in the feature map Q isjiThe expression is as follows:
wherein { i, j ═ 1,2, …, C }, a'iThe i-th feature vector being feature a',for the transposed jth feature vector of feature a', the superscript "T" denotes the transpose;
multiplying the feature graph Q and the feature A' and performing inverse recombination (inverse Reshape) to obtain a channel attention feature T, wherein the expression is as follows:
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe j-th feature vector, j ═ 1,2, …, C, representing feature a.
Further, as shown in fig. 4, in the dual attention mechanism module, the position attention unit (part ii in fig. 4) is used to perform the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination (Reshape) to obtain a characteristic B'; the number of layers of the characteristic A is C, the size of each layer is H multiplied by W, and the number of layers after compression, namely the number of channels isCharacteristic B dimension ofDimension after recombination (Reshape) of
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination (Reshape) to obtain a characteristic O'; number of channels after compression ofIs characterized in that O isDimension after recombination (Reshape) of
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
wherein, b'iThe ith feature vector representing feature B',a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the feature A by using a 1 × 1 convolution h (x) function to obtain a feature V, and performing recombination (Reshape) to obtain a feature V', wherein the number of layers of the feature V after extraction is still C; obtaining V ' ═ V ' after the same group '1,v’2,...,v’N];
Multiplying the feature V' by the feature map P and performing inverse recombination (inverse Reshape) to obtain the position attention feature S, wherein the expression is as follows:
wherein S isjJ-th feature vector representing the position attention feature S, j ═ j1,2,…,N,v′iAn i-th feature vector representing the feature V', i ═ 1,2, …, N, α represent learning parameters, initialized to 0, ajThe jth feature vector representing feature a.
Further, the dual attention mechanism module fuses the location attention feature S and the channel attention feature T by a 3 × 3 convolution j (x) function and a 3 × 3 convolution k (x) function, and the expression is:
U=J(S)+K(T)
wherein U represents the fusion feature obtained by the dual-attention mechanism module. U has rich associated information.
As shown in fig. 1(a) and 1(b), in a preferred embodiment, the generator comprises a linear layer, a first volume block one, a second volume block one, a double-attention mechanism module, a third volume block one, a first activation module and a Tanh layer which are connected in sequence; the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block one, and the output of the generator is a forged image.
The discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are connected in sequence; the input of the discriminator is the forged image and the label y, and the output is the discrimination result.
Further, as shown in fig. 1(a), the linear layer performs linear calculation on the noise vector z, recombines (reshape) the noise vector z into a feature tensor of 4 × 4 × 16ch, where ch may be set to 64, then learns through the base layer seresblock (g), performs Batch Normalization (BN), ReLU activation, and 3 × 3 convolution processing by the first activation module, and then activates through the Tanh layer, so as to finally generate a forged image with a channel number ch of 3, where the forged image is an RGB image, where the double attention mechanism is placed in a middle and rear layer of the network feature and is located at the same position as the self-attention mechanism in the BigGAN model.
As shown in fig. 1(b), the discriminator network structure is composed of seresblock (d) and a double-attention mechanism, the structure is opposite to the structure in the generator, the function is to discriminate whether the input RGB image and the label y are true, the finally generated features are subjected to ReLU activation and Global sum pooling (Global sum pooling) processing by the second activation module, and then are judged by linear transformation and embedded label y fusion to judge whether the input RGB image is true or false.
Preferably, the Loss function of the generator and arbiter uses Hinge Loss. Wherein, the loss function expression of the discriminator is:
the loss function expression of the generator is:
where x represents an image, y is its corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,representing z obeys pzAnd y obeys pdataIs expected to be calculated.
In a preferred embodiment, the learning rate of the generator and the discriminator is set to 0.0002, and the learning rate attenuation strategy is adopted, with an attenuation rate of 0.9999. The network part adopts a frequency spectrum standardization method to adjust the weight in the network training process, so that the training of the generated countermeasure network is more stable. The number of training data Batch of the network is 64, and the total number of iterations is 10000.
Furthermore, the invention adopts Mini-Batch random gradient descent training, firstly trains the loss function of the discriminator, and then trains the generator. The pseudo code of the specific optimization process is shown in table 1:
TABLE 1 optimization procedure for generation of countermeasure networks
Wherein,representing the gradient value of the generator, derived by discriminating the direction of loss, θdThe network parameters representing the model of the arbiter,representing the gradient value of the discriminator, the generator loss function being inversely derived, θgNetwork parameters representing the generator model.
In particular, to qualitatively assess the effectiveness of the inventive dual-attention generating countermeasure network, one may useEvaluation index (FID). FID index when evaluating an image, the smaller the index, the better the quality of the image generated. In the FID calculation process, firstly, the generated image and the real image are subjected to Inception V3 network extraction feature vectors, then normal distribution obeyed by real data and normal distribution obeyed by generated data are respectively calculated, and then the distance between the two data distributions is calculated.
The invention utilizes ImageNet public data set to compare and verify the performance of the existing BigGAN model and the double-attention-generation countermeasure network provided by the invention, extracts part of the classification of the ImageNet public data set, adjusts the classification to 128 multiplied by 128 resolution, and uses FID index to evaluate, and the results of part of generated images are shown in figures 5(a) and 5 (b). The evaluation result shows that the FID index obtained by using the BigGAN model is 17.43, while the FID index obtained by the channel enhancement-based double-attention generation countermeasure network (SEDA-GAN) provided by the invention is 14.89, which is improved by 14.57%.
Moreover, the structural change brought by the invention is more obvious, fig. 5(a) shows that the BigGAN model randomly generates some samples, and fig. 5(b) shows that the SEDA-GAN model randomly generates some samples, so that the organ structure position of goldfishes in the samples generated by the BigGAN model is inaccurate, a plurality of goldfishes are disordered, the cup mouths of coffee cups are not round, the structures of trucks and wooden houses have some defects, the structural distribution of the samples generated by the SEDA-GAN model under the condition of a plurality of goldfishes is more natural, the coffee cups are more round, the integral structure of the trucks is more natural, the building structure of the wooden houses is more straight, and the integral visual effect is improved.
In summary, according to the channel-enhanced dual-attention-based generation countermeasure network provided by the invention, on one hand, a channel association learning mechanism is added to the convolution block structure, the features are recalibrated, the feature learning capability of the model convolution block is improved, and the feature expression capability of the generation countermeasure network is enhanced. A compression-and-activation (Squeeze-and-Excitation) operation capable of acquiring the attention of a channel is introduced on the basis of a convolution block ResBlock structure of a BigGAN model, and a new convolution block SEResBlock is provided. Through verification, the generation countermeasure network formed by the convolution block enhanced by the SEResBlock channel has better generation performance, and the learning speed of data distribution is improved. On the other hand, the invention considers that a channel attention mechanism capable of establishing the dependency between the characteristic channels is added in the attention mechanism part from the channel perspective, and is constructed into a double attention mechanism module in parallel with the self attention mechanism, and the potential data structure relevance existing between the position and the inter-channel modeling characteristic is jointly modeled. In practice, not only the correlation between the positions but also the correlation information exists on the feature channels in the network features. Therefore, after the double attention mechanism is introduced, the generation countermeasure network can capture the associated information on a large range of positions and channels simultaneously, so that the generation countermeasure network learns more data structure distribution information, the target structure of the generated image is more reasonable, and the generation countermeasure network is very helpful to the improvement of the image generation quality. Compared with the prior art, the technical scheme provided by the invention has the advantages that after the channel enhancement and the double-attention machine mechanism are used, the total parameter quantity of the network is increased slightly, and the generation performance of the generated countermeasure network is improved. The data distribution generated by the method is closer to the original data distribution, the visual effect is better, the quality of the generated image is better, and the structure of the target object is more normal and natural.
The invention also provides an image generation method, which comprises the following steps:
s1, configuring the dual attention generating countermeasure network as described in any one of the above;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
In particular, in some preferred embodiments of the present invention, there is also provided a computer device, including a memory and a processor, the memory storing a computer program, and the processor implementing the steps of the image generation method in any one of the above embodiments when executing the computer program.
In other preferred embodiments of the present invention, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by a processor implements the steps of the image generation method described in any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the method according to the above embodiments may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and the computer program may include the processes of the embodiments of the image generation method, and will not be described again here.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A channel-enhanced dual-attention-generating countermeasure network, characterized by: comprises a generator and a discriminator; the generator comprises a first rolling block and a double-attention machine module; the discriminator comprises a second rolling block and a double-attention machine module;
the first convolution block and the second convolution block are both provided with a compression activation operation layer used for acquiring channel attention through compression activation operation;
the double-attention mechanism module comprises a position attention unit and a channel attention unit which are parallel; the position attention unit establishes relevance among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes dependency among characteristic channels based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses a location attention feature and a channel attention feature.
2. The dual-attention generating countermeasure network of claim 1,
the compression activation operation layer is used for executing the following operations:
averaging each layer of the characteristics M, and compressing the average to form a value to obtain a one-dimensional vector s, wherein the expression is as follows:
wherein the number of layers of the feature M is C, and the size of each layer is H multiplied by W, snN-th element, m, representing a one-dimensional vector snAn nth layer representing a feature M, n being 1n i,jRepresents mnAn element with a middle coordinate of (i, j);
activating the one-dimensional vector s through two-layer nonlinear full-connection network learning to obtain a weight characteristic vector w with each layer of weight proportion, wherein the expression is as follows:
w=σ2(W2σ1(W1s))
wherein, W1、W2Respectively representing the first and second layer fully connected operation, σ1、σ2Respectively, activation functions ReLU and Sigmoid;
multiplying the weight characteristic vector w into the corresponding layer of the characteristic M to obtain the calibrated characteristicThe expression is as follows:
3. The dual-attention generating countermeasure network of claim 2,
the compressed activation operation layer comprises an average pooling layer, a 1 × 1 convolution layer, a ReLU activation layer, a 1 × 1 convolution layer and a Sigmoid activation layer which are sequentially connected, and channel multiplication is carried out on output and input to obtain calibrated characteristics.
4. The dual-attention generating countermeasure network of claim 3,
the convolution block I comprises two linear layers, two batch normalization layers, two ReLU active layers, two up-sampling layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer and the compression activation operation layer;
the convolution block two comprises two ReLU active layers, two 3 x 3 convolution layers, a 1 x 1 convolution layer, two average pooling layers and the compressed active operation layer.
5. The dual-attention generating countermeasure network of claim 1,
the channel attention unit is used for executing the following operations:
recombining the characteristic A to obtain a characteristic A'; the number of layers of the feature A is C, the size of each layer is H multiplied by W, the size of the feature A' is C multiplied by N, and N is H multiplied by W;
multiplying the feature A 'by the transpose of the feature A' to obtain softmax, and obtaining a feature map Q, wherein the size of the feature map Q is C multiplied by C, and an element Q in the feature map Q isjiThe expression is as follows:
wherein { i, j ═ 1,2, …, C }, a'iThe ith feature vector, a 'of feature A'T jA jth feature vector that is a transpose of feature a';
multiplying the feature graph Q and the feature A' and carrying out reverse recombination to obtain a channel attention feature T, wherein the expression is as follows:
wherein, TjJ-th feature vector representing channel attention feature T, j being 1,2, …, C, β representing learning parameters, is initialized to 0, ajThe jth feature vector representing feature a.
6. The dual-attention generating countermeasure network of claim 5,
the position attention unit is used for executing the following operations:
performing channel compression on the characteristic A by using a 1 multiplied by 1 convolution f (x) function to obtain a characteristic B, and performing recombination to obtain a characteristic B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression isCharacteristic B dimension ofDimension after recombination of
Performing channel compression on the characteristic A by using a 1 × 1 convolution g (x) function to obtain a characteristic O, and performing recombination to obtain a characteristic O'; number of layers after compressionCharacteristic dimension O ofDimension after recombination of
Multiplying the feature B 'by the transpose of the feature O' to obtain softmax, and obtaining a feature map P, wherein the size of the feature map P is NxN, and an element P in the feature map PjiThe expression is as follows:
wherein, b'iThe ith feature vector representing feature B',a transposed jth feature vector representing feature O', i, j ═ 1,2, …, N;
extracting the characteristic A by a 1 multiplied by 1 convolution h (x) function to obtain a characteristic V, and recombining to obtain a characteristic V'; the number of layers after extraction is still C;
multiplying the feature V' by the feature map P and performing inverse recombination to obtain the position attention feature S, wherein the expression is as follows:
wherein S isjJ-th feature vector representing position attention feature S, j ═ 1,2, …, N, v'iDenotes the ith feature vector representing the feature V', alpha denotes the learning parameter, initialized to 0, AjThe jth feature vector representing feature a.
7. The dual-attention generating countermeasure network of claim 6,
the double attention mechanism module fuses the position attention feature S and the channel attention feature T through a 3 x 3 convolution J (x) function and a 3 x 3 convolution K (x) function, and the expression is as follows:
U=J(S)+K(T)
where U represents the resulting fusion signature.
8. The dual-attention generating countermeasure network of claim 1,
the generator comprises a linear layer, a first rolling block I, a second rolling block I, a double-attention machine module, a third rolling block I, a first activation module and a Tanh layer which are sequentially connected;
the input of the generator is a noise vector z and a class Condition, the noise vector z obeys normal distribution, the class Condition is embedded into each batch normalization layer of each volume block I, and the output of the generator is a forged image;
the discriminator comprises a first rolling block II, a double-attention machine module, a second rolling block II, a third rolling block II, a fourth rolling block II, a second activation module and a linear transformation and label embedding layer which are sequentially connected;
the input of the discriminator is an RGB image and a label y, and the output is a discrimination result of the RGB image.
9. The dual-attention generating countermeasure network of claim 1,
the loss function expression of the discriminator is as follows:
the loss function expression of the generator is as follows:
where x denotes an image, y is the corresponding class label, pdataFor true data probability distribution, pzFor the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the discriminator,representing (x, y) obeys pdataThe probability of (a) is expected to be calculated,representing z obeys pzAnd y obeys pdataIs expected to be calculated.
10. An image generation method, comprising the steps of:
s1, constructing a dual-attention generating countermeasure network as claimed in any one of claims 1 to 9;
s2, acquiring a training set and inputting the double-attention force generation countermeasure network for training;
and S3, generating an image by using the double-attention generation confrontation network after training is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011470128.6A CN112580782B (en) | 2020-12-14 | 2020-12-14 | Channel-enhanced dual-attention generation countermeasure network and image generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011470128.6A CN112580782B (en) | 2020-12-14 | 2020-12-14 | Channel-enhanced dual-attention generation countermeasure network and image generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112580782A true CN112580782A (en) | 2021-03-30 |
CN112580782B CN112580782B (en) | 2024-02-09 |
Family
ID=75135850
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011470128.6A Active CN112580782B (en) | 2020-12-14 | 2020-12-14 | Channel-enhanced dual-attention generation countermeasure network and image generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112580782B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113095330A (en) * | 2021-04-30 | 2021-07-09 | 辽宁工程技术大学 | Compressive attention model for semantically segmenting pixel groups |
CN113223181A (en) * | 2021-06-02 | 2021-08-06 | 广东工业大学 | Weak texture object pose estimation method |
CN113344146A (en) * | 2021-08-03 | 2021-09-03 | 武汉大学 | Image classification method and system based on double attention mechanism and electronic equipment |
CN113627590A (en) * | 2021-07-29 | 2021-11-09 | 中汽创智科技有限公司 | Attention module and attention mechanism of convolutional neural network and convolutional neural network |
CN113744265A (en) * | 2021-11-02 | 2021-12-03 | 成都东方天呈智能科技有限公司 | Anomaly detection system, method and storage medium based on generation countermeasure network |
CN113935977A (en) * | 2021-10-22 | 2022-01-14 | 河北工业大学 | Solar cell panel defect generation method based on generation countermeasure network |
CN115937994A (en) * | 2023-01-06 | 2023-04-07 | 南昌大学 | Data detection method based on deep learning detection model |
CN116385725A (en) * | 2023-06-02 | 2023-07-04 | 杭州聚秀科技有限公司 | Fundus image optic disk and optic cup segmentation method and device and electronic equipment |
CN117011918A (en) * | 2023-08-08 | 2023-11-07 | 南京工程学院 | Method for constructing human face living body detection model based on linear attention mechanism |
CN118506553A (en) * | 2024-07-17 | 2024-08-16 | 西华大学 | AIoT anomaly identification method, disaster early warning system and road safety system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020140633A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Text topic extraction method, apparatus, electronic device, and storage medium |
CN111429433A (en) * | 2020-03-25 | 2020-07-17 | 北京工业大学 | Multi-exposure image fusion method based on attention generation countermeasure network |
CN111476717A (en) * | 2020-04-07 | 2020-07-31 | 西安电子科技大学 | Face image super-resolution reconstruction method based on self-attention generation countermeasure network |
-
2020
- 2020-12-14 CN CN202011470128.6A patent/CN112580782B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020140633A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Text topic extraction method, apparatus, electronic device, and storage medium |
CN111429433A (en) * | 2020-03-25 | 2020-07-17 | 北京工业大学 | Multi-exposure image fusion method based on attention generation countermeasure network |
CN111476717A (en) * | 2020-04-07 | 2020-07-31 | 西安电子科技大学 | Face image super-resolution reconstruction method based on self-attention generation countermeasure network |
Non-Patent Citations (2)
Title |
---|
曹真;杨云;齐勇;李程辉;: "基于多损失约束与注意力块的图像修复方法", 陕西科技大学学报, no. 03 * |
黄宏宇;谷子丰;: "一种基于自注意力机制的文本图像生成对抗网络", 重庆大学学报, no. 03 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113095330A (en) * | 2021-04-30 | 2021-07-09 | 辽宁工程技术大学 | Compressive attention model for semantically segmenting pixel groups |
CN113223181A (en) * | 2021-06-02 | 2021-08-06 | 广东工业大学 | Weak texture object pose estimation method |
CN113627590A (en) * | 2021-07-29 | 2021-11-09 | 中汽创智科技有限公司 | Attention module and attention mechanism of convolutional neural network and convolutional neural network |
CN113344146A (en) * | 2021-08-03 | 2021-09-03 | 武汉大学 | Image classification method and system based on double attention mechanism and electronic equipment |
CN113344146B (en) * | 2021-08-03 | 2021-11-02 | 武汉大学 | Image classification method and system based on double attention mechanism and electronic equipment |
CN113935977A (en) * | 2021-10-22 | 2022-01-14 | 河北工业大学 | Solar cell panel defect generation method based on generation countermeasure network |
CN113744265A (en) * | 2021-11-02 | 2021-12-03 | 成都东方天呈智能科技有限公司 | Anomaly detection system, method and storage medium based on generation countermeasure network |
CN115937994A (en) * | 2023-01-06 | 2023-04-07 | 南昌大学 | Data detection method based on deep learning detection model |
CN116385725A (en) * | 2023-06-02 | 2023-07-04 | 杭州聚秀科技有限公司 | Fundus image optic disk and optic cup segmentation method and device and electronic equipment |
CN116385725B (en) * | 2023-06-02 | 2023-09-08 | 杭州聚秀科技有限公司 | Fundus image optic disk and optic cup segmentation method and device and electronic equipment |
CN117011918A (en) * | 2023-08-08 | 2023-11-07 | 南京工程学院 | Method for constructing human face living body detection model based on linear attention mechanism |
CN117011918B (en) * | 2023-08-08 | 2024-03-26 | 南京工程学院 | Method for constructing human face living body detection model based on linear attention mechanism |
CN118506553A (en) * | 2024-07-17 | 2024-08-16 | 西华大学 | AIoT anomaly identification method, disaster early warning system and road safety system |
Also Published As
Publication number | Publication date |
---|---|
CN112580782B (en) | 2024-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112580782A (en) | Channel enhancement-based double-attention generation countermeasure network and image generation method | |
CN111259930B (en) | General target detection method of self-adaptive attention guidance mechanism | |
US11450066B2 (en) | 3D reconstruction method based on deep learning | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
CN109522857B (en) | People number estimation method based on generation type confrontation network model | |
CN113779675B (en) | Physical-data driven intelligent shear wall building structure design method and device | |
CN111476717A (en) | Face image super-resolution reconstruction method based on self-attention generation countermeasure network | |
CN113688723A (en) | Infrared image pedestrian target detection method based on improved YOLOv5 | |
CN108596329A (en) | Threedimensional model sorting technique based on end-to-end Deep integrating learning network | |
CN112766279B (en) | Image feature extraction method based on combined attention mechanism | |
CN114758288B (en) | Power distribution network engineering safety control detection method and device | |
CN112818764B (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN113642621A (en) | Zero sample image classification method based on generation countermeasure network | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN115222998B (en) | Image classification method | |
CN110390107A (en) | Hereafter relationship detection method, device and computer equipment based on artificial intelligence | |
CN106056059A (en) | Multidirectional SLGS characteristic description and performance cloud weight fusion face recognition method | |
CN113569805A (en) | Action recognition method and device, electronic equipment and storage medium | |
CN111222583B (en) | Image steganalysis method based on countermeasure training and critical path extraction | |
CN111371611B (en) | Weighted network community discovery method and device based on deep learning | |
CN114187506A (en) | Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network | |
CN113420833A (en) | Visual question-answering method and device based on question semantic mapping | |
CN117671261A (en) | Passive domain noise perception domain self-adaptive segmentation method for remote sensing image | |
CN108985385A (en) | Based on the quick Weakly supervised object detection method for generating confrontation study | |
CN116596915A (en) | Blind image quality evaluation method based on multi-scale characteristics and long-distance dependence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |