CN112580782B

CN112580782B - Channel-enhanced dual-attention generation countermeasure network and image generation method

Info

Publication number: CN112580782B
Application number: CN202011470128.6A
Authority: CN
Inventors: 罗健旭; 岳丹阳
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2024-02-09
Anticipated expiration: 2040-12-14
Also published as: CN112580782A

Abstract

The invention relates to a channel-enhanced dual-attention generation countermeasure network and an image generation method, wherein the network comprises a generator and a discriminator; the generator comprises a convolution block I and a double-attention mechanism module; the discriminator comprises a convolution block II and a double-attention mechanism module; the first convolution block and the second convolution block are respectively provided with a compression activation operation layer for acquiring the attention of the channel through compression activation operation; the dual-attention mechanism module comprises a parallel position attention unit and a channel attention unit; the position attention unit establishes correlation among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes characteristic inter-channel dependence based on a channel attention mechanism to obtain channel attention characteristics; the dual-attention mechanism module fuses the location attention feature and the channel attention feature. The invention can promote the generation performance of the generation countermeasure network, the generated data distribution is closer to the original data distribution, and the generated image quality is better.

Description

Channel-enhanced dual-attention generation countermeasure network and image generation method

Technical Field

The invention relates to the technical field of image generation, in particular to a channel-enhanced dual-attention generation countermeasure network and an image generation method.

Background

The generation of a countermeasure network (GAN) technique is mainly applied to the image generation direction. The generating countermeasure network model is composed of a generator and a discriminator, wherein the generator can generate images according to input noise vectors or category labels, the discriminator is used for discriminating the true and false of the images, the generator and the discriminator countermeasure training enable the generator to learn true data distribution, and finally the generator can approach to fake images of the true images. The generation countermeasure network technology is widely applied to the directions of image enhancement, infrared image generation, medical image generation, image super-resolution reconstruction and the like.

At present, main methods for improving the quality of the image generated by the countermeasure network model include modifying the network structure, changing the loss function and establishing the characteristic space relevance. Self-attention generation uses a self-attention mechanism capable of acquiring a large-range spatial correlation in an antagonism network model, so that the structure of a generated image is more reasonable. The BigGAN model adjusts and deepens the structure of the generated countermeasure network on the basis of self-attention generation countermeasure network, improves the network learning capacity, and further improves the generation quality.

Although the current GAN generation capability is already strong, in the case of little prior knowledge, only using the noise vector z and the condition label y to generate images of complex scenes with a wide range of relevance of corresponding categories still has a certain difficulty. Existing GAN models have difficulty generating images with complex structural distributions. The self-attention mechanism, while enabling the generation of an antagonistic network to generate images with a wide range of correlations, has some drawbacks in generating target object structures in images. Meanwhile, it is difficult to generate an image including a scene of a multi-object, and the quality of the generated image is also to be improved.

Disclosure of Invention

The invention aims at least part of the defects, and provides a generation countermeasure network model capable of generating high-quality images aiming at complex scenes, and meanwhile, the structure distribution of targets in the generated images is more reasonable, and the images are more natural.

To achieve the above object, the present invention provides a channel-enhanced dual-attention-generating countermeasure network including:

a generator and a arbiter; the generator comprises a convolution block I and a double-attention mechanism module; the discriminator comprises a convolution block II and a double-attention mechanism module;

the first convolution block and the second convolution block are respectively provided with a compression activation operation layer for acquiring the attention of the channel through compression activation operation;

the dual-attention mechanism module comprises a parallel position attention unit and a channel attention unit; the position attention unit establishes correlation among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes characteristic inter-channel dependence based on a channel attention mechanism to obtain channel attention characteristics; the dual attention mechanism module fuses the location attention feature and the channel attention feature.

Preferably, the compression activation operation layer is configured to perform the following operations:

averaging each layer of the feature M, compressing the feature M into a value to obtain a one-dimensional vector s, wherein the expression is as follows:

wherein the number of layers of the feature M is C, and each layer has a size of H×W, s _n N-th element representing one-dimensional vector s, m _n N-th layer representing feature M, n=1,.. _n ^i,j Represents m _n An element with a middle coordinate of (i, j);

activating the one-dimensional vector s through two layers of nonlinear full-connection network learning to obtain a weight characteristic vector w with weight proportion of each layer, wherein the expression is as follows:

w＝σ ₂ (W ₂ σ ₁ (W ₁ s))

wherein W is ₁ 、W ₂ Respectively represent the first layer and the second layer of full-connection operation, sigma ₁ 、σ ₂ Activating functions ReLU and Sigmoid respectively;

multiplying the weight feature vector w into the corresponding layer of the feature M to obtain a calibrated featureThe expression is:

wherein w is _n The nth element representing the weight feature vector w,representation feature->Is a layer n of (c).

Preferably, the compression activation operation layer includes an average pooling layer, a 1×1 convolution layer, a ReLU activation layer, a 1×1 convolution layer, and a Sigmoid activation layer, which are sequentially connected, and the output and the input are multiplied by a channel to obtain the calibrated feature.

Preferably, the convolution block one comprises two linear layers, two batch normalization layers, two ReLU activation layers, two upsampling layers, two 3×3 convolution layers, a 1×1 convolution layer, and the compression activation operation layer;

the convolution block two comprises two ReLU activation layers, two 3×3 convolution layers, 1×1 convolution layers, two average pooling layers and the compression activation operation layer.

Preferably, the channel attention unit is configured to perform the following operations:

recombining the feature A to obtain a feature A'; wherein the number of layers of feature a is C, each layer has a size of h×w, and the size of feature a' is c×n, n=h×w;

multiplying feature A 'by the transpose of feature A' to obtain softmax, and obtaining feature map Q with dimension of C×C, wherein element Q in feature map Q _ji The expression is:

wherein { i, j=1, 2, …, C }, a' _i Is the ith feature vector of feature a',a j-th feature vector that is a transpose of feature a';

multiplying the feature map Q with the feature A' and performing inverse recombination to obtain a channel attention feature T, wherein the expression is as follows:

wherein T is _j The j-th feature vector representing the channel attention feature T, j=1, 2, …, C, β represents the learning parameter, initialInitialized to 0, A _j The j-th feature vector representing feature a.

Preferably, the position attention unit is configured to perform the following operations:

carrying out channel compression on the feature A by using a 1X 1 convolution f (x) function to obtain a feature B, and then carrying out recombination to obtain a feature B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression isFeature B dimension isThe dimension after recombination is->

Carrying out channel compression on the feature A by using a 1X 1 convolution g (x) function to obtain a feature O, and then carrying out recombination to obtain a feature O'; the number of layers after compression isFeature O dimension is->The dimension after recombination is->

Multiplying feature B 'by the transpose of feature O' to obtain a softmax to obtain a feature map P, wherein the dimension of the feature map P is NxN, and the element P in the feature map P _ji The expression is:

wherein b' _i An ith feature vector representing feature B',a j-th feature vector representing a transpose of the feature O', i, j=1, 2, …, N;

extracting the feature A by using a 1X 1 convolution h (x) function to obtain a feature V, and then recombining to obtain a feature V'; the number of layers after extraction is still C;

multiplying the feature V' with the feature map P and performing inverse recombination to obtain the position attention feature S, wherein the expression is as follows:

wherein S is _j The j-th feature vector representing the position attention feature S, j=1, 2, …, N, v' _i An i-th feature vector representing a feature V', alpha representing a learning parameter, initialized to 0, A _j The j-th feature vector representing feature a.

Preferably, the dual-attention mechanism module fuses the position attention feature S and the channel attention feature T by a 3×3 convolution J (x) function and a 3×3 convolution K (x) function, expressed as:

U＝J(S)+K(T)

wherein U represents the resulting fusion feature.

Preferably, the generator comprises a linear layer, a first convolution block I, a second convolution block I, a double-attention mechanism module, a third convolution block I, a first activation module and a Tanh layer which are connected in sequence;

the input of the generator is a noise vector z and a category Condition, the noise vector z is subjected to normal distribution, the category Condition is embedded into each batch normalization layer of each convolution block, and the output of the generator is a fake image;

the discriminator comprises a first convolution block II, a double-attention mechanism module, a second convolution block II, a third convolution block II, a fourth convolution block II, a second activation module and a linear transformation and label embedding layer which are connected in sequence;

the input of the discriminator is RGB image and label y, and the output is the discrimination result of RGB image.

Preferably, the loss function expression of the discriminator is:

the loss function expression of the generator is:

wherein x represents an image, y is a corresponding class label, p _data For true data probability distribution, p _z For the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the arbiter,representing (x, y) obeys p _data Is calculated, is->Indicating that z obeys p _z And y obeys p _data Is expected to calculate.

The invention also provides an image generation method, which comprises the following steps:

s1, constructing a dual-attention generation countermeasure network according to any one of the above;

s2, acquiring a training set and inputting the double-attention generation countermeasure network to train;

s3, generating an image by using the double-attention generated countermeasure network after training is completed.

The technical scheme of the invention has the following advantages: the invention provides a channel-enhanced dual-attention generation countermeasure network and an image generation method, wherein the dual-attention generation countermeasure network is improved and promoted on the basis of the existing BigGAN model, a compression activation operation layer for acquiring channel attention through a compression activation operation is added in a convolution block for generating the countermeasure network, characteristics of the generated countermeasure network can be recalibrated, namely, some characteristic layers with stronger effects in characteristics are enhanced, some useless characteristic layers are weakened, so that characteristics of an intermediate layer of the network have more expressive ability, the performance of the convolution block is improved, and the characteristic learning ability of the generated countermeasure network model is improved; in addition, the dual-attention generation countermeasure network adopts a dual-attention mechanism module, and the dual-attention mechanism module not only comprises a position attention unit (the functions of the dual-attention mechanism module are the same as those of the self-attention mechanism module), but also comprises a channel attention unit.

Drawings

FIGS. 1 (a) and 1 (b) are schematic diagrams of a channel-based enhanced dual-attention generation countermeasure network in accordance with embodiments of the present invention; wherein fig. 1 (a) is a schematic diagram of a generator structure, and fig. 1 (b) is a schematic diagram of a discriminator structure;

FIG. 2 is a schematic diagram of a compression activation operation flow in an embodiment of the present invention;

FIG. 3 (a) is a schematic diagram of a first convolution block in an embodiment of the present invention, and FIG. 3 (b) is a schematic diagram of a second convolution block in an embodiment of the present invention;

FIG. 4 is a flow chart of a dual-attention mechanism in an embodiment of the invention;

FIG. 5 (a) shows a BigGAN model generated image result graph; FIG. 5 (b) shows a graph of the results of generating images based on the channel enhanced dual attention generation countermeasure network in an embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1 (a) to fig. 4, an embodiment of the present invention provides a channel-enhanced dual-attention generation countermeasure network, which is improved and promoted on the basis of the existing BigGAN model, and includes a generator and a discriminator, wherein the generator includes a convolution block one and a dual-attention mechanism module; the discriminator comprises a second convolution block and a dual-attention mechanism module, wherein the first convolution block and the second convolution block both generate convolution blocks of the reactance network.

The first convolution block and the second convolution block are respectively provided with a compression activation operation layer for acquiring the attention of the channel through compression activation operation. The invention introduces a compression activation mechanism to improve the generation of the residual structure convolution block ResBlock of the countermeasure network, and the compression activation operation layer can strengthen some characteristic layers with stronger functions in the characteristic layers and weaken some useless characteristic layers, so that the middle layer characteristics of the network have more expression capability and characteristic extraction capability.

The dual attention mechanism module includes a position attention unit and a channel attention unit in parallel. The position attention unit establishes correlation among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes characteristic inter-channel dependence based on a channel attention mechanism to obtain channel attention characteristics; the double-attention mechanism module fuses the position attention feature S and the channel attention feature T to obtain a feature U with rich associated information. The invention introduces a Dual attention mechanism (Dual attention), so that the generation countermeasure network can acquire the relevance information between a large range of positions and channels, thereby ensuring that the structural distribution of the target object for generating the image is more natural.

Preferably, as shown in fig. 2, in the dual-attention generation countermeasure network, the compression activation operation layer is configured to perform a compression activation (Squeeze-and-specification) operation capable of acquiring the attention of the channel, where the compression activation operation includes the following steps:

averaging each layer of the characteristics M input into the compression activation operation layer, compressing the characteristics M into a value to obtain a one-dimensional vector s, wherein the expression is as follows:

wherein the number of layers of the feature M is C, that is, the number of channels is C, the size of each layer is H×W, and the length of the obtained one-dimensional vector s is C, s _n N-th element representing one-dimensional vector s, m _n N-th layer representing feature M, n=1,.. _n ^i,j Represents m _n An element with a middle coordinate of (i, j);

w＝σ ₂ (W ₂ σ ₁ (W ₁ s))

wherein W is ₁ 、W ₂ Respectively represent the first layer and the second layer of full-connection operation, sigma ₁ 、σ ₂ Limiting the value of w within the range of (0, 1) for an activation function ReLU and Sigmoid respectively, wherein the activation function is that the neural network learns a nonlinear relation;

Preferably, W ₁ 、W ₂ The compression activation operation layer (SELayer) may be obtained by using 1×1 convolution, and includes an Average Pooling layer (Average Pooling), a 1×1 convolution layer (1×1 Conv), a ReLU activation layer (ReLU), a 1×1 convolution layer, and a Sigmoid activation layer (Sigmoid) which are sequentially connected, where the output and the input of the compression activation operation layer are multiplied by a channel, so as to obtain the calibrated feature.

The invention embeds a compression activation operation layer SELayer in the structure of a convolution block ResBlock of the BigGAN model to obtain SEREsBlock.

Fig. 3 (a) is a schematic diagram of the SEResBlock structure in the generator, i.e. SEResBlock (G) in fig. 1 (a), i.e. convolution block one. Further, as shown in fig. 3 (a), in the preferred embodiment, the convolution block one includes two Linear layers (Linear), two batch normalization layers (batch norm), two ReLU activation layers, two upsampling layers (upsamples), two 3×3 convolution layers (3×3 Conv), 1×1 convolution layers (1×1 Conv), and a compression activation operation layer. The class Condition input by the generator is respectively input into two batch normalization layers through two linear layers; the first batch normalization layer, the first ReLU activation layer, the second upsampling layer, the first 3 multiplied by 3 convolution layer, the second batch normalization layer, the second ReLU activation layer and the second 3 multiplied by 3 convolution layer are connected in sequence, and the input of the first batch normalization layer, the first ReLU activation layer, the second upsampling layer, the first 3 multiplied by 3 convolution layer and the second batch normalization layer are the output of the last module; the second 3X 3 convolution layer is connected with the compression activation operation layer, and the output of the second 3X 3 convolution layer and the output of the compression activation operation layer are subjected to channel multiplication to obtain calibrated characteristics; the output of the last module sequentially passes through the first up-sampling layer and the 1×1 convolution layer, and then is summed with the calibrated features element by element to be used as the output of the module (namely, the first convolution block).

Fig. 3 (b) is a schematic diagram of the SEResBlock structure in the arbiter, i.e., SEResBlock (D) in fig. 1 (b), i.e., convolution block two, which lacks a batch normalization layer. As shown in fig. 3 (b), the convolution block two includes two ReLU active layers (ReLU), two 3×3 convolution layers (3×3 Conv), 1×1 convolution layers (1×1 Conv), two Average Pooling layers (Average Pooling), and a compression activation operation layer. The first ReLU activation layer, the first 3×3 convolution layer, the second ReLU activation layer, the second 3×3 convolution layer and the second average pooling layer are sequentially connected, and the input of the first ReLU activation layer, the first 3×3 convolution layer, the second 3×3 convolution layer and the second average pooling layer are the output of the last module (namely the input of the second convolution block); the second average pooling layer is connected with the compression activation operation layer, and the output of the second average pooling layer and the output of the compression activation operation layer are multiplied by each other to obtain calibrated characteristics; the output of the last module sequentially passes through the 1×1 convolution layer and the first average pooling layer, and then is summed with the calibrated features element by element to be used as the output of the module (namely, the convolution block two).

In the invention, in the first convolution block and the second convolution block, after the compression activation operation layer is used in the second 3 multiplied by 3 convolution layer, the features learned by the whole convolution block are recalibrated, so that the complexity of the network is not excessively increased, and the performance is improved.

Preferably, as shown in fig. 4, in the dual-attention mechanism module of the dual-attention generation countermeasure network, the channel attention unit (i part in fig. 4) is configured to perform the following operations:

recombining the feature A input into the dual-attention mechanism module (Reshape) to obtain a feature A'; wherein the number of layers of feature a is C, each layer has a size of h×w, and the size of feature a' is c×n, n=h×w;

wherein { i, j=1, 2, …, C }, a' _i Is the ith feature vector of feature a',the j-th feature vector, which is a transpose of feature a', superscript "T" denotes the transpose;

multiplying the feature map Q with the feature A' and performing inverse recombination (inverse Reshape) to obtain a channel attention feature T, wherein the expression is as follows:

wherein T is _j The j-th feature vector representing the channel attention feature T, j=1, 2, …, C, β represents the learning parameter, initialized to 0, a _j The j-th feature vector representing feature a, j=1, 2, …, C.

Further, as shown in fig. 4, in the dual-attention mechanism module, the position attention unit (part ii in fig. 4) is configured to perform the following operations:

carrying out channel compression on the feature A by using a 1X 1 convolution f (x) function to obtain a feature B, and then carrying out recombination (Reshape) to obtain a feature B'; the number of layers of feature A is C, the size of each layer is H×W, the number of layers after compression, i.e. the number of channels isFeature B dimension is->Recombination (Reshape) back dimension +.>

Carrying out channel compression on the feature A by using a 1X 1 convolution g (x) function to obtain a feature O, and then carrying out recombination (Reshape) to obtain a feature O'; the number of channels after compression isCharacteristic O is->Recombination (Reshape) back dimension is

Multiplying feature B 'by the transpose of feature O' to obtain a softmax to obtain a feature map P, wherein the dimension of the feature map P is NxN, and the element P in the feature map P _ji Expression typeThe method comprises the following steps:

extracting the feature A by using a 1X 1 convolution h (x) function to obtain a feature V, and recombining (Reshape) to obtain a feature V', wherein the number of layers of the extracted feature V is still C; after the same group, V '= [ V ]' ₁ ,v’ ₂ ,...,v’ _N ]；

Multiplying the feature V' with the feature map P and performing inverse recombination (inverse Reshape), and when the position attention feature S is obtained, the expression is:

wherein S is _j The j-th feature vector representing the position attention feature S, j=1, 2, …, N, v' _i I-th feature vector representing the feature V', i=1, 2, …, N, α represents the learning parameter, initialized to 0, a _j The j-th feature vector representing feature a.

Further, the dual-attention mechanism module fuses the position attention feature S and the channel attention feature T by a 3×3 convolution J (x) function and a 3×3 convolution K (x) function, expressed as:

U＝J(S)+K(T)

wherein U represents the fusion feature obtained by the dual-attention mechanism module. The U has rich associated information.

As shown in fig. 1 (a) and fig. 1 (b), in a preferred embodiment, the generator includes a linear layer, a first convolution block one, a second convolution block one, a dual-attention mechanism module, a third convolution block one, a first activation module, and a Tanh layer connected in sequence; the input of the generator is a noise vector z and a category Condition, the noise vector z is subjected to normal distribution, the category Condition is embedded into each batch normalization layer of each convolution block I, and the output of the generator is a fake image.

The discriminator comprises a first convolution block II, a double-attention mechanism module, a second convolution block II, a third convolution block II, a fourth convolution block II, a second activation module and a linear transformation and label embedding layer which are connected in sequence; the input of the discriminator is the fake image and the label y, and the output is the discrimination result.

Further, as shown in fig. 1 (a), the linear layer performs linear calculation on the noise vector z, and then re-sets (reshape) the noise vector z to be a characteristic tensor of 4×4×16ch, where ch may be set to 64, and then learns through base layer SEResBlock (G), and performs Batch Normalization (BN), reLU activation, 3×3 convolution processing through the first activation module, and then activation through the Tanh layer, and finally generates a fake image with a channel number ch=3, where the fake image is an RGB image, and a dual-attention mechanism is placed in the middle and later layers of the network feature and is the same as the self-attention mechanism in the BigGAN model.

As shown in fig. 1 (b), the network structure of the discriminator is composed of SEResBlock (D) and a dual-attention mechanism, and the structure is opposite to that in the generator, and the function of the network structure is to discriminate whether the input RGB image and the label y are true or not, and finally the generated characteristics are processed by ReLU activation and global summation pooling (Global sum pooling) of the second activation module, and then the linear transformation and the embedded label y fusion are used for judging whether the input RGB image is true or false.

Preferably, the Loss function of the generator and arbiter uses a finger Loss. The loss function expression of the discriminator is as follows:

the loss function expression of the generator is:

wherein x represents an image, y is a corresponding class label, and p _data For true data probability distribution, p _z For the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the arbiter,representing (x, y) obeys p _data Is calculated, is->Indicating that z obeys p _z And y obeys p _data Is expected to calculate.

In a preferred embodiment, the invention sets the learning rate of the generator and the arbiter to 0.0002, and adopts a learning rate attenuation strategy, wherein the attenuation rate is 0.9999. The network part adopts a frequency spectrum normalization method to adjust the weight in the network training process, so that the training of the generated countermeasure network is more stable. The training data number Batch of each Batch of the network=64, and the total iteration number is 10000.

Further, the invention adopts Mini-Batch random gradient descent training, firstly trains the loss function of the discriminator, and then trains the generator. The pseudocode for a specific optimization procedure is shown in table 1:

table 1 generating an optimization procedure for an countermeasure network

Wherein,gradient values representing the generator, derived from the direction of loss discrimination, θ _d Network parameters representing a model of a discriminant, +.>Gradient value representing the discriminator and generator loss function inversionDerived, θ _g Network parameters representing the generator model.

In particular, in order to qualitatively evaluate the effect of the dual attention generation of the present invention on the antagonism of the network, use may be made ofInception Distance (FID) evaluation index. The smaller the FID index is when evaluating an image, the better the quality of the generated image is represented. In the FID calculation process, firstly, a generated image and a real image are subjected to an acceptance V3 network to extract feature vectors, then normal distribution obeyed by real data and normal distribution obeyed by generated data are calculated respectively, and then the distance between the two data distributions is calculated.

The invention utilizes an ImageNet public data set to compare and verify the performance of the existing BigGAN model and the dual-attention generation countermeasure network provided by the invention, extracts partial categories in the ImageNet public data set, adjusts the resolution to 128 multiplied by 128, uses an FID index to carry out evaluation, and partially generates image results as shown in fig. 5 (a) and 5 (b). The evaluation result shows that the FID index obtained by using the BigGAN model is 17.43, and the FID index obtained by using the channel-enhanced dual-attention generation countermeasure network (SEDA-GAN) provided by the invention is 14.89, so that the improvement is 14.57%.

In addition, the structural change brought by the invention is more obvious, the BigGAN model randomly generates some samples, the SEDA-GAN model randomly generates some samples, the situation that the organ structure of goldfish in the sample generated by the BigGAN model is inaccurate can be seen, the situation that a plurality of goldfish are disordered, the cup mouth of a coffee cup is out of round, the structures of a truck and a wooden house have some defects, the structure distribution of the SEDA-GAN model generated sample is natural under the condition of a plurality of goldfish, the coffee cup is round, the whole structure of the truck is natural, the building structure of the wooden house is straight, and the whole visual effect is promoted.

In summary, the invention provides a channel-enhanced dual-attention generation countermeasure network, which on one hand increases a channel-relevance learning mechanism for a convolution block structure, recalibrates features, improves feature learning capability of a model convolution block, and enhances feature expression capability of the generation countermeasure network. A compressed activation (Squeeze-and-specification) operation capable of acquiring channel attention is introduced on the basis of a convolution block Resblock structure of the BigGAN model, and a new convolution block SERESBlock is provided. The generation of the convolution blocks enhanced by the channel of SERESBlock has better generation performance for the countermeasure network, and the learning speed for data distribution is improved. On the other hand, the invention considers that the channel attention mechanism capable of establishing the dependency among characteristic channels is added in the attention mechanism part from the angle of the channels, and the invention constructs a double-attention mechanism module in parallel with the self-attention mechanism, and the potential data structure relativity exists among modeling characteristics from the position and the channels. In practice, network features have not only relationships between locations, but also relationship information on feature channels. Therefore, after the dual-attention mechanism is introduced, the generation countermeasure network can capture the related information on a large range of positions and channels at the same time, so that the generation countermeasure network learns more data structure distribution information, the target structure of the generated image is more reasonable, and the method is very helpful for improving the image generation quality. Compared with the prior art, the technical scheme provided by the invention has the advantages that after the channel enhancement and the dual-attention mechanism are used, the total parameter quantity of the network is slightly increased, and the generation performance of the generation countermeasure network is improved. The data distribution generated by the method is closer to the original data distribution, the visual effect is better, the quality of the generated image is better, and the structure of the target object is more normal and natural.

In particular, in some preferred embodiments of the present invention, there is also provided a computer device comprising a memory storing a computer program and a processor implementing the steps of the image generation method of any of the above embodiments when the computer program is executed.

In other preferred embodiments of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the image generation method described in any of the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment method may be accomplished by a computer program to instruct related hardware, where the computer program may be stored in a non-volatile computer readable storage medium, and the computer program may include the above-described embodiment image generating method when executed, and the description thereof will not be repeated here.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image generation method, characterized by comprising the steps of:

s1, constructing a dual-attention generation countermeasure network;

s3, generating an image generated by the countermeasure network by using the double attentives after training is completed;

wherein the dual attention generation countermeasure network includes a generator and a discriminator for image generation; the generator comprises a convolution block I and a double-attention mechanism module, and the output of the generator is a fake image; the discriminator comprises a convolution block II and a dual-attention mechanism module, and is used for discriminating the true and false of the image;

the dual-attention mechanism module comprises a parallel position attention unit and a channel attention unit; the position attention unit establishes correlation among positions based on a self-attention mechanism to obtain position attention characteristics, and the channel attention unit establishes characteristic inter-channel dependence based on a channel attention mechanism to obtain channel attention characteristics; the dual-attention mechanism module fuses the position attention feature and the channel attention feature;

in the dual attention generation countermeasure network, the channel attention unit is configured to perform the following operations:

wherein { i, j=1, 2, …, C }, a' _i The ith feature vector, a ', which is feature A' ^T _j A j-th feature vector that is a transpose of feature a';

wherein T is _j Jth feature vector representing channel attention feature TJ=1, 2, …, C, β represents a learning parameter, initialized to 0, a _j A j-th feature vector representing feature a;

the dual attention generation countermeasure network, the location attention unit is configured to perform the following operations:

carrying out channel compression on the feature A by using a 1X 1 convolution f (x) function to obtain a feature B, and then carrying out recombination to obtain a feature B'; wherein the number of layers of the feature A is C, the size of each layer is H multiplied by W, and the number of layers after compression isFeature B dimension is->The dimension after recombination is->；

Carrying out channel compression on the feature A by using a 1X 1 convolution g (x) function to obtain a feature O, and then carrying out recombination to obtain a feature O'; the number of layers after compression isCharacteristic O dimension is->The dimension after recombination is->；

wherein,an ith feature vector representing feature B' -, a->A j-th feature vector representing a transpose of the feature O', i, j=1, 2, …, N;

wherein S is _j The j-th feature vector representing the position attention feature S, j=1, 2, …, N,an ith feature vector representing feature V' -, a->Representing learning parameters, initialized to 0, A _j A j-th feature vector representing feature a;

in the dual-attention generation countermeasure network, the dual-attention mechanism module fuses the position attention feature S and the channel attention feature T by a 3×3 convolution J (x) function and a 3×3 convolution K (x) function, and the expression is:

wherein U represents the resulting fusion feature.

2. The image generation method according to claim 1, wherein,

in the dual-attention generation countermeasure network, the compression activation operation layer is configured to perform the following operations:

s _n =

w=

wherein W is ₁ 、W ₂ Respectively representing the first and second layers of full-connection operation,、/>activating functions ReLU and Sigmoid respectively;

=w _n ×m _n

3. The image generation method according to claim 2, wherein,

in the dual-attention generation countermeasure network, the compression activation operation layer comprises an average pooling layer, a 1×1 convolution layer, a ReLU activation layer, a 1×1 convolution layer and a Sigmoid activation layer which are sequentially connected, and the output and the input are multiplied by channels to obtain the calibrated characteristics.

4. The image generation method according to claim 3, wherein,

in the dual-attention generation countermeasure network, the convolution block one comprises two linear layers, two batch normalization layers, two ReLU activation layers, two upsampling layers, two 3×3 convolution layers, a 1×1 convolution layer and the compression activation operation layer;

5. The image generation method according to claim 1, wherein,

in the dual-attention generation countermeasure network, the generator comprises a linear layer, a first convolution block I, a second convolution block I, a dual-attention mechanism module, a third convolution block I, a first activation module and a Tanh layer which are connected in sequence;

6. The image generation method according to claim 1, wherein,

in the dual-attention generation countermeasure network, the loss function expression of the discriminator is as follows:

the loss function expression of the generator is:

wherein,representing an image->For the corresponding category label->For the probability distribution of real data +.>For the noise probability distribution, z represents a one-dimensional noise vector, G (z) represents the mapping process of the generator, D (x, y) represents the mapping process of the arbiter,representing (x, y) compliance->Is calculated, is->Indicating z compliance->And y obeys->Is expected to calculate.