CN114612589A

CN114612589A - Application of stable generation countermeasure network in style migration based on attention mechanism

Info

Publication number: CN114612589A
Application number: CN202210250457.2A
Authority: CN
Inventors: 李庚隆; 徐蔚鸿; 张康
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-10

Abstract

A stable generation countermeasure network ASGAN based on an attention mechanism is provided, which can effectively enlarge the receptive field. The ASGAN training process is stabilized by using instance normalization and an AdaBelief optimizer and applied to image style migration. Theoretical analysis shows that ASGAN uses less computational cost and total amount of parameters than CycleGAN. The qualitative and quantitative analysis of the experiment shows that compared with CycleGAN and AGGAN, the ASGAN has better and more stable image style migration effect, and ASGAN migration indexes perform better in PSNR, SSIM, LPIPS and FID index evaluation.

Description

Application of stably-generated countermeasure network in style migration based on attention mechanism

Technical Field

The invention relates to the field of style migration, in particular to application of a stably-generated countermeasure network based on an attention mechanism in style migration.

Background

In recent years, with the development of deep learning, the generation of confrontational network model, which is proposed by Ian Goodfellow during reading of the morbid university, has been improved. In the process of researching a generation model, Ian hopes to generate pictures by simulating human brain thinking mode. But the quality of the generated pictures is not ideal all the time, and the images are blurred and unclear. Ian is doubtful about the mode of using the traditional neural network, and a completely new idea is proposed, namely two neural networks are simultaneously used to form a game and confrontation relationship, namely the original idea of generating the confrontation network. Nowadays, the game theory is also applied to various aspects such as images, voice, network security and the like.

With the development of GANs, researchers have analyzed GANs from different perspectives. Arjovsky et al analyzed some of the problems that occurred with the original GANs, and proposed some research directions based on these problems, which laid the guiding foundation for the subsequent development direction of GANs. Kurac et al analyzed the network structure of the GANs with some redundant regularization, normalization methods, and network structures, and improved using the latest loss function and network structure, and finally obtained better results than the conventional GANs. Bau et al demonstrate a number of practical applications that can be achieved by this framework, from comparing internal characterizations of different layers, models, and datasets, to improving GANs by locating and removing distortion-causing elements, thereby enabling interactive control of objects in a scene. Lucic et al find that experimental results do not fall through by analyzing a GANS model with intra-row center effects, and further propose a research direction for optimizing from the aspect of computing resources, and open up a new way for the development of GANs. Arora et al analyzed the assumptions of the GANs and found that the objective function therein could not solve the problems of pattern collapse and learning meaningless features. Mescheder et al demonstrated that discontinuous distributions in the learning samples are one cause of non-convergence of non-regularized GANs, and also analyzed some regularization methods recently proposed to stabilize training. Nagarajan et al analyzed the training convergence problem of GANs from a kinetic perspective. The variants that have been proposed to generate countermeasure networks are mainly divided into two main categories: the first class of variants is embodied in the structure of the network, such as a convolutional neural network when processing images, a circular neural network when processing time series data, and the other class of variants is embodied in the loss function, which can make the learning of the generator more stable.

The generation countermeasure network comprises two network structures of a generator and a discriminator, can be combined to form the GANs, and can also be used separately, so that the generation countermeasure network has strong adaptability, and the generation countermeasure network with different network structures is proposed for being applied to different fields. The primitive generation countermeasure network uses a fully connected neural network, can only process relatively simple image data sets such as MNIST, CIFAR-10 and Toronto Face Dataset, and the processing effect on complex image types and high-resolution image data sets is not good. And the traditional GANs are often accompanied by unstable training conditions in the training process, so that although the generation of the countermeasure network is an innovation for generating the model, a plurality of defects exist at the same time. Since the original generative confrontation network uses a fully connected network, there are many parameters, and to solve this problem, a CNN-based generative confrontation network model is proposed. The Deep convolution generation countermeasure network (Deep Convolutional networks) is a network structure which changes the full connection layer of the original GaNs into CNN, greatly reduces the calculation amount, and modifies the network structure, and the effect is better than that of the original GANs. However, the model has some problems, such as unstable model training, which still occurs as the training time of the model is prolonged. In order to make the application range of the GANs wider, CGANs have come to use, which extends the original GANs into a conditional model, and limits the output result of the network by adding additional conditions. Although the results shown in CGAN are very basic, it demonstrates the potential of a conditional countermeasure network and shows great application prospects.

Although the previously proposed generation of competing networks has been theoretically successful, researchers have found that many problems still occur during the training of GANs, the most significant of which stems from the extreme instability of the training. One solution to stabilize the training process is to start with a loss function. For example, the purpose of stable training is realized by introducing a Wasserstein distance in the initiative wgan (Wasserstein gan), and the distance has superior smoothness compared with KL divergence and JS divergence, so that the problem of gradient disappearance can be solved better in theory, and the training process can be stabilized more effectively. However, the selection method of the WGAN to the gradient value of the discriminator is not reasonable, so that the WGAN-GP uses a gradient penalty to enable the gradient to be updated smoothly, namely the 1-lipschitz condition is met, and the problem of training gradient vanishing gradient explosion is solved. RSGAN can produce more stable, higher quality data than other GANs varieties. Standard RSGANs with gradient penalties generate data with quality better than WGAN-GP and with a 400% reduction in time required for the best network architecture model in the GAN variant of the same phase. RSGANs are able to generate reasonably high resolution images from a very small sample compared to GANs and LSGAN, and the quality of the image is significantly improved compared to WGAN-GP.

The originally generated countermeasure network is improved through two aspects of the network structure and the loss function, so that the training stability and the training result are greatly improved. Meanwhile, applications for generating an anti-network are also continuously developed, wherein style migration is a big hotspot in the applications. Most pioneering is the application of CGAN to image style migration offering a conditional countermeasure network for a general solution to the image-to-image conversion problem, which model can learn features in the input image and then migrate into the output image, e.g. can change the night of a certain place to the day, and can also convert a label map to a real map, but which model must use paired datasets, i.e. day of a certain place must use the same location of night pictures as a match. So acquisition of the data set is often difficult. In order to solve the problem, CycleGAN is generated, and the network model can achieve a good style migration effect without using pairing data by learning cycle consistency loss.

Nowadays, CNN is mostly used for generating a countermeasure network, but CNN can only capture local spatial information and a receptive field, which is not enough to cover the whole network structure, and it is difficult to learn many kinds of data sets, and it may also cause key part shifts in images, such as the positions of five sense organs are not right in the generation of human faces. Therefore, an attention-based mechanism for generating a countermeasure network SAGAN is proposed, the model combines Non-local Neural Networks with GANs so that the network can have a larger receptive field without causing reduction of computational efficiency, and the model has the greatest advantage of being good at processing geometric figures. When capturing the correlation of a certain position in Non-local, the global correlation of the position and all positions in a picture is calculated, so that the calculation amount is increased, in order to solve the problem and maintain the precision, GCNet establishes a three-step universal framework of unified global context modeling, is a light-weight model, and can achieve the effect of effectively capturing global information.

Currently, more common optimizers can be roughly divided into two types: compared with the SGD, many models (such as convolutional neural networks) adopt the adaptive method, which generally converges faster but has poorer generalization effect (such as Adam) and an acceleration scheme (such as random gradient descent SGD with momentum). For such complex cases as creating a countermeasure network (GAN), an adaptive approach is often used by default because of its stability. The AdaBelief optimizer adjusts the step size according to "belief" in the current gradient direction, and takes the Exponential Moving Average (EMA) of noisy gradients as the next gradient prediction. If the observed gradient deviates significantly from the prediction, then the current observation is not trusted and a smaller step size is taken; if the observed gradient is close to the predicted value, the current observation is trusted and a larger step size is taken. Experiments show that the optimizer simultaneously meets 3 advantages: fast convergence, good generalization and training stability of the adaptive method.

Although the research on CycleGAN is increasing, the migration effect is also increasing. However, there are still some problems: firstly, the migration effect of CycleGAN is not satisfactory when performing geometric figures, and secondly, the result of image generation still has a space for improvement.

Disclosure of Invention

The invention aims to solve the problem of poor migration effect in the style migration field, and provides a stable generation countermeasure network based on an attention mechanism.

The purpose of the invention can be realized by the following technical scheme:

an application of a stable generation countermeasure network based on attention mechanism in style migration, comprising the following steps:

1) selecting a data set for migration from the style migration official data set to realize the migration of two domains in the data set;

2) inputting the sample data set of the first domain into a first generator to generate a second domain image after migration;

3) transmitting the generated second domain image into a discriminator to obtain a discrimination result, and calculating to obtain the countermeasure loss of the first domain migration;

4) inputting the generated second domain image into a second generator, generating an image of the first domain, and calculating the cycle consistency loss of the first domain;

5) the same steps are carried out on the second domain image, and the confrontation loss of the second domain and the cycle consistency loss of the second domain are calculated;

6) adding all losses to obtain a total loss;

7) fixing the parameters of the discriminator, and performing back propagation and parameter updating on the generator while not performing gradient descent;

8) allowing the gradient of the discriminator to be reduced, and performing backward propagation and parameter updating;

9) optimizing the model through continuous iteration to finally obtain a trained model;

10) inputting the test set image into the trained model to obtain a test result;

11) and testing the trained model by using PSNR, SSIM, LPIPS and FID evaluation indexes on the test result, and measuring and outputting the result.

In the step 3), the countermeasure loss is obtained by specifically calculating as follows:

for mapping function G X → Y and discriminator D_YThe challenge loss of (a) is expressed as follows:

for the mapping function F: Y → X and the discriminator D_XThe challenge loss of (a) is expressed as follows:

g in the formula (1) represents a generator of domain X → domain Y, D_YA discriminator representing a Y domain;

f in equation (2) represents the generator of domain Y → domain X, D_XA discriminator representing an X domain;

in the formulas (1) and (2),

and

respectively representing training examples of an X domain and a Y domain;

in the step 4), the cycle consistency loss is obtained by specifically calculating as follows:

in the step 6), the total loss is obtained by specifically calculating as follows:

L(G,F,D_x,D_y)＝L_GAN(G,D_Y,X,Y)+L_GAN(F,D_X,Y,X)+λL_cyc(G,F) (4)

in formula (4), λ represents the correlation between two domains;

in the step 9), the model is optimized, and the specific optimization objective is as follows:

in the formula (5), G^*And F^*A generator under the optimal condition;

the invention has the beneficial effects that:

the invention provides a stable generation countermeasure network based on an attention mechanism based on a style migration problem, and the network can effectively improve the migration effect of images.

Secondly, the invention adopts an attention mechanism to carry out secondary feature extraction on the input features.

And thirdly, the invention adopts a sub-pixel convolution mechanism to improve the sampling effect on the model, thereby improving the image migration effect.

And fourthly, stabilizing the model training process by adopting a spectrum normalization mechanism.

And fifthly, adding a weight coefficient to the attention diagram extracted by the attention mechanism, and dynamically adjusting the proportion of the attention in the network by modifying the weight coefficient.

And sixthly, an AdaBelief optimizer is adopted to further optimize the model and improve the generation effect.

Drawings

FIG. 1 is a diagram of an example network model of the present invention.

FIG. 2 is a diagram of an attention mechanism network model used in an embodiment of the present invention.

FIG. 3 is a diagram of an example generator network model of the present invention.

FIG. 4 is a diagram of an example arbiter network model of the present invention.

FIG. 5 is a graph comparing the effect of the example of the present invention on the applet 2orange dataset with the effect of cycleGAN, AGGAN.

FIG. 6 is a graph comparing the effect of an example of the invention on the horses2zebra dataset with the effect of cycleGAN, AGGAN.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Examples

An application of a stable generation countermeasure network based on an attention mechanism in style migration is characterized in that images of two domains are taken out from a data set image, a generator and a discriminator added with the attention mechanism are adopted to extract features of an input image and train the input image, and migration between the two domains is achieved. Network model diagram see fig. 1, network model diagram for attention mechanism in model see fig. 2, generator network model diagram in model see fig. 3, and discriminator network model diagram in model see fig. 4. The method comprises the following specific steps:

1. selection of data sets

We selected horse2zebra and apple2orange in the style migration official dataset, where the horse2zebra training set includes 1067 horse images, 1334 zebra images, and the test set includes 120 horse images, 140 zebra images. The applet 2orange training set included 995 apple images, 1019 orange images, and the test set included 266 apple images, 248 orange images.

2. New generator extracts input image features and migrates

9 residual blocks are used in the new generator. To further extract features of the images in the generator, a re-extraction of features is performed using an attention mechanism, and a network model diagram is shown in fig. 3.

3. The new discriminator discriminates the migration result of the generator

In order to improve the discrimination accuracy of the discriminator, an attention mechanism is also added to the discriminator to further extract the features of the image on the basis of the features of the original feature map, and a network model map of the discriminator is shown in fig. 4.

4. Training model

The total loss was calculated and back-propagated, and the trained model was obtained after 200 iterations in the experimental environment of table 2.

Table 1 experimental environment table

Name(s)	Configuration of
		Operating system	Ubuntu18.04
GPU	NVIDIA GEFORCE RTX 2080Ti
		CPU	Inter xeon processor(skylake,IBRS),2
RAM	16GB
		GPU correlation library	CUDA10.2,CUDNN7.6
Deep learning framework	Pytorch

5. Test model

And inputting the test set images in the data set into a generator of the trained model to obtain a test result. And testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, and measuring a migration result.

6. Description of evaluation index

We evaluated our test results using PSNR, SSIM, LPIPS, FID evaluation indices.

PSNR is called peak signal-to-noise ratio and is defined by Mean Square Error (MSE), two images G and noise images H with the same size of m multiplied by n are given, and the definition of the Mean Square Error (MSE) and the PSNR is as follows:

in equations (6) and (7), MAX is the maximum possible pixel value of the picture, MSE represents the mean square error of the current image G and the reference image H, and m and n are the height and width of the image, respectively.

SSIM is an index for measuring the similarity of two images. The contrast is measured based on the brightness (luminance), contrast (contrast) and structure (structure) between samples x and y. The more similar the two groups of pictures, the higher the value of SSIM.

LPIPS is a new judgment index simulating human perception, the calculation of the index depends on a VGG network, and deep features of different structures and tasks in the picture are extracted through the network.

The FID estimates the data distribution of the real image and the generated image in the deep neural network and calculates the distance between the real image and the generated image, the estimation result is highly similar to the human perception, and the smaller the FID value is, the closer the two images are.

7. Description of comparative model

ASGAN is compared to cycleGAN and AGGAN.

CycleGAN is used for image style migration between unpaired data, using a cyclic consistency penalty to learn not only the source domain to target domain mapping, but also the reverse mapping from target domain to source domain.

AGGAN solves the problem that it is difficult for traditional unsupervised image style migration techniques to focus on a single object without changing the background in the picture or multiple objects by introducing an unsupervised attention mechanism that does antagonism training with the generator and the discriminator.

8. Computational cost analysis

It is assumed that the dimension of input information of the ASGAN generator and the discriminator is H × W × C, the dimension of the input feature map of a certain calculation unit is H × W × C, and the dimension of the processed output feature map is H ' × W ' × C '. The number of executions of all the basic calculation units and simple operations involved is listed as shown in table 2.

TABLE 2 number of simple operation executions of calculation units of basic classes contained in ASGAN

Basic computing unit

S_×

S_÷

S₊

S_-

S_{＞，＜，≥，≤，＝＝，≠}

S_＝

Convlution

ck²h′w′c′

0

ck²h′w′c′+h′w′c′

0

h′w′c′

ReLU

0

hwc

h′w′c′

Tanh

6hwc

hwc

13hwc

hwc

0

h′w′c′

Softmax

hwc

4hwc-1

hwc

hwc-1

h′w′c′

Adding

0

hwc

0

h′w′c′

Mapping

0

h′w′c′

Matmuling

hwc

0

h(w-1)c

0

h′w′c′

Arranging

0

h′w′c′

In Table 1, S_×，S_÷，S₊，S_-，S_＝Respectively representing the number of times of execution of simple operations of multiplication, division, addition, subtraction and assignment, S_{＞，＜，≥，≤，＝＝，≠}Indicating that the number of times of execution of the compare operation is greater than, less than, greater than or equal to, less than or equal to, or not equal to. The Convlution calculation unit represents convolution or inverse convolution, k represents the convolution kernel size, and the ReLU, Tanh and Softmax calculation units represent activation functions, which can be converted into a combination of several kinds of simple operations. The Adding, Mapping, Matmuling and arraging computing units respectively represent matrix addition, matrix Mapping, matrix multiplication and row vector sequential arrangement into a three-dimensional matrix. Tanh and Softmax activation functions use e^xApproximation function, wherein e^xThe approximation function includes: the average error range of the method is 0.02-0.04 through one multiplication, three addition and two assignment operations.

According to the descriptions in fig. 3 and 4, when the dimension of the model input information is fixed, the dimension of the input feature map and the dimension of the output feature map of each basic computing unit are also fixed. Then, table 3 is obtained by accumulating the actual number of times of execution of the simple operation performed by each basic calculation unit in the ASGAN, based on table 2 and the fixed input dimension and output dimension of each basic calculation unit.

TABLE 3 Total number of executions of each simple operation in ASGAN

Similar to the statistical method for the calculated cost of ASGAN, the calculated cost to obtain CycleGAN is shown in Table 4.

Table 4 total number of executions of each simple operation in CycleGAN.

Simple operation	Number of executions
		S_×	13068HWC+1794376HW
S_÷	2HWC
		S₊	13084HWC+1795656.03125HW
S_-	2HWC
		S_{＞，＜，≥，≤，＝＝，≠}	776HW
S₌	20HWC+2128.03125HW

Because multiplication and division have similar instruction cycles, the computation costs of both are similar, as are the computation costs of addition, subtraction, comparison, and assignment. From tables 3 and 4, it can be seen that:

the computational cost of ASGAN multiplication and division is:

Cost_ASGAN＝13070HWC+1499784.25HW+131072 (8)

the computational cost of ASGAN addition, subtraction, comparison and assignment is:

the computational cost of CycleGAN multiplication and division is:

Cost_CycleGAN＝13070HWC+1794376HW (10)

the computation cost of addition, subtraction, comparison and assignment of the CycleGAN is as follows:

from equations (8) and (10), the reduced computational cost of ASGAN compared to cycleGAN when considering multiplication and division is:

from equations (9) and (11), the reduced computational cost of ASGAN compared to cycleGAN when considering addition, subtraction, comparison and assignment is:

wherein H is more than or equal to 1, W is more than or equal to 1 and C is more than or equal to 1. As can be seen from equations (12) and (13), the computational cost reduction for ASGAN multiplication and division ranges from 0 to 16.30% and the computational cost reduction for addition, subtraction, comparison, and assignment ranges from 0 to 16.27% compared to CycleGAN. Therefore, ASGAN run times are less than cycleGAN.

9. Analysis of total amount of parameters

The number of parameters required by a convolution basic calculation unit is as follows: k is a radical of²C.c '+ c', where k representsThe size of the convolution kernel, c represents the number of channels of the convolution input feature map, c' represents the number of channels of the convolution output feature map [43 ]]. Thus, the total number of parameters for ASGAN and CycleGAN are:

P_ASGAN＝14592C+27773058 (14)

P_CycleGAN＝14592C+28241920 (15)

from equations (14) and (15), one can obtain:

wherein C.gtoreq.1, as can be obtained from equation (16), the amount of the parameter decrease of ASGAN compared to cycleGAN is in the range of 0 to 1.66%. ASGAN thus occupies fewer computer memory resources than CycleGAN.

10. Description of qualitative comparison

Model results of the operation on the applet 2orange dataset are shown in fig. 5, and when the apple to orange transition is performed, the first row results show different degrees of blurring artifacts for both CycleGAN and AGGAN, while ASGAN shows no artifacts. In the second line of results, there was a case where the CycleGAN was not completely transformed in the lower right of the apple, and AGGAN appeared watermarks in the same positions. In the process of switching the orange to the apple, in the third row, both CycleGAN and AGgan destroy the background of the input image, and ASGAN can well realize the migration of the orange to the apple. In the fourth row, both CycleGAN and AGGAN create artifacts that affect the image migration result, while ASGAN better enables migration between the two domains.

The test result of the model in horse2zebra is shown in fig. 6, when the horse is converted into zebra, the first row CycleGAN has artifacts on the horse tail and horse head of the conversion result, the background is changed, and AGGAN has the problem of rough contour, but ASGAN keeps the background color unchanged and has no artifacts and contour blurring problems when the conversion is carried out. In the second row, when the conversion is carried out, obvious artifacts appear near the tail of a CycleGAN horse, the color of the horse is changed when the artifacts appear as the result of AGGAN, and the color of the horse is not changed when the artifacts do not appear as the result of ASGAN. When the zebra-to-horse conversion is carried out, the third row of cycleGAN and AGGAN is transparent when being processed at the head position of the horse, and the ASGAN can still retain the head characteristics of the zebra and carry out characteristic migration on the rest positions of the zebra. Fourth, CycleGAN changes the color of the horse ears, AGGAN changes the horse color, and ASGAN migration works better.

11. Description of quantitative comparison

The model provided by the invention is compared with two models, namely CycleGAN and AGGAN. Test results are obtained through the test set of the applet 2orange and the horse2zebra data sets, and then are evaluated through PSNR, SSIM, LPIPS and FID indexes, and an average value is taken. The results of evaluating the index on the horse2zebra dataset are shown in table 2, and the results of evaluating the index on the applet 2orange dataset are shown in table 3. Through comparison of the three indexes, the model provided by the invention improves the image migration effect.

TABLE 2 average PSNR, average SSIM, average LPIPS, average FID for cycleGAN, AGGAN, ASGAN in the applet 2orange dataset

TABLE 3 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, ASGAN in the horse2zebra dataset

The above embodiments describe in detail the application embodiments of the present invention for a robust generation countermeasure network based on attention mechanism in style migration, and the above embodiments are only used to help understanding the proposed method and core idea of the present invention.

Claims

1. An application of a stable generation countermeasure network based on attention mechanism in style migration, which is characterized by comprising the following steps:

(1) selecting a data set for migration from the style migration official data set;

(2) forward propagation: inputting the sample data sets of the two domains into a new generator, and performing convolution, attention mechanism, residual error and sub-pixel convolution to obtain a transferred generated image;

(3) and (3) back propagation: first, the parameters of the new discriminator are fixed so as not to perform gradient descent, and the new generator is reversely propagated and updated with the parameters. Then, allowing the gradient of the new discriminator to decrease, and performing back propagation and parameter updating;

(4) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.

2. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,

and (3) adding the attention mechanism of the new generator in the step (2) into the improved cycleGAN generator and the improved discriminator by the GC Block attention mechanism, so that the receptive field is enlarged, more spatial information can be captured by the improved cycleGAN generator, and a better effect can be obtained when the model processes geometric images.

3. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,

and (3) modifying the reverse inverted convolution in the generator of the cycleGAN into sub-pixel convolution by an up-sampling mechanism of the new generator in the step (2), so that the generation result of the model is better.

4. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,

and (3) adding spectrum normalization into the generator and the discriminator of the cycleGAN to stabilize the training process of the model by using the normalization mechanism of the new generator in the step (2).

5. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,

and (3) adding a weight coefficient to the attention diagram extracted from the attention mechanism in the step (2) of the attention mechanism of the new generator, and dynamically adjusting the proportion of the attention in the network by modifying the weight coefficient.

6. The application of the attention-based stable generation countermeasure network in style migration according to claim 1,

and (4) the optimizer in the step (3) uses an AdaBelief optimizer to further optimize the model and improve the generation effect.