CN114612589A - Application of stable generation countermeasure network in style migration based on attention mechanism - Google Patents
Application of stable generation countermeasure network in style migration based on attention mechanism Download PDFInfo
- Publication number
- CN114612589A CN114612589A CN202210250457.2A CN202210250457A CN114612589A CN 114612589 A CN114612589 A CN 114612589A CN 202210250457 A CN202210250457 A CN 202210250457A CN 114612589 A CN114612589 A CN 114612589A
- Authority
- CN
- China
- Prior art keywords
- attention
- generator
- network
- cyclegan
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013508 migration Methods 0.000 title claims abstract description 46
- 230000005012 migration Effects 0.000 title claims abstract description 46
- 230000007246 mechanism Effects 0.000 title claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 24
- 230000000694 effects Effects 0.000 claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 238000010606 normalization Methods 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 claims description 2
- 230000003042 antagnostic effect Effects 0.000 claims 4
- 230000000644 propagated effect Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 abstract description 2
- 238000004451 qualitative analysis Methods 0.000 abstract 1
- 238000004445 quantitative analysis Methods 0.000 abstract 1
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 22
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 22
- 238000004364 calculation method Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 238000013507 mapping Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 6
- 241000283070 Equus zebra Species 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241001023788 Cyttus traversi Species 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 241000195955 Equisetum hyemale Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007728 cost analysis Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000002969 morbid Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/001—Texturing; Colouring; Generation of texture or colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
A stable generation countermeasure network ASGAN based on an attention mechanism is provided, which can effectively enlarge the receptive field. The ASGAN training process is stabilized by using instance normalization and an AdaBelief optimizer and applied to image style migration. Theoretical analysis shows that ASGAN uses less computational cost and total amount of parameters than CycleGAN. The qualitative and quantitative analysis of the experiment shows that compared with CycleGAN and AGGAN, the ASGAN has better and more stable image style migration effect, and ASGAN migration indexes perform better in PSNR, SSIM, LPIPS and FID index evaluation.
Description
Technical Field
The invention relates to the field of style migration, in particular to application of a stably-generated countermeasure network based on an attention mechanism in style migration.
Background
In recent years, with the development of deep learning, the generation of confrontational network model, which is proposed by Ian Goodfellow during reading of the morbid university, has been improved. In the process of researching a generation model, Ian hopes to generate pictures by simulating human brain thinking mode. But the quality of the generated pictures is not ideal all the time, and the images are blurred and unclear. Ian is doubtful about the mode of using the traditional neural network, and a completely new idea is proposed, namely two neural networks are simultaneously used to form a game and confrontation relationship, namely the original idea of generating the confrontation network. Nowadays, the game theory is also applied to various aspects such as images, voice, network security and the like.
With the development of GANs, researchers have analyzed GANs from different perspectives. Arjovsky et al analyzed some of the problems that occurred with the original GANs, and proposed some research directions based on these problems, which laid the guiding foundation for the subsequent development direction of GANs. Kurac et al analyzed the network structure of the GANs with some redundant regularization, normalization methods, and network structures, and improved using the latest loss function and network structure, and finally obtained better results than the conventional GANs. Bau et al demonstrate a number of practical applications that can be achieved by this framework, from comparing internal characterizations of different layers, models, and datasets, to improving GANs by locating and removing distortion-causing elements, thereby enabling interactive control of objects in a scene. Lucic et al find that experimental results do not fall through by analyzing a GANS model with intra-row center effects, and further propose a research direction for optimizing from the aspect of computing resources, and open up a new way for the development of GANs. Arora et al analyzed the assumptions of the GANs and found that the objective function therein could not solve the problems of pattern collapse and learning meaningless features. Mescheder et al demonstrated that discontinuous distributions in the learning samples are one cause of non-convergence of non-regularized GANs, and also analyzed some regularization methods recently proposed to stabilize training. Nagarajan et al analyzed the training convergence problem of GANs from a kinetic perspective. The variants that have been proposed to generate countermeasure networks are mainly divided into two main categories: the first class of variants is embodied in the structure of the network, such as a convolutional neural network when processing images, a circular neural network when processing time series data, and the other class of variants is embodied in the loss function, which can make the learning of the generator more stable.
The generation countermeasure network comprises two network structures of a generator and a discriminator, can be combined to form the GANs, and can also be used separately, so that the generation countermeasure network has strong adaptability, and the generation countermeasure network with different network structures is proposed for being applied to different fields. The primitive generation countermeasure network uses a fully connected neural network, can only process relatively simple image data sets such as MNIST, CIFAR-10 and Toronto Face Dataset, and the processing effect on complex image types and high-resolution image data sets is not good. And the traditional GANs are often accompanied by unstable training conditions in the training process, so that although the generation of the countermeasure network is an innovation for generating the model, a plurality of defects exist at the same time. Since the original generative confrontation network uses a fully connected network, there are many parameters, and to solve this problem, a CNN-based generative confrontation network model is proposed. The Deep convolution generation countermeasure network (Deep Convolutional networks) is a network structure which changes the full connection layer of the original GaNs into CNN, greatly reduces the calculation amount, and modifies the network structure, and the effect is better than that of the original GANs. However, the model has some problems, such as unstable model training, which still occurs as the training time of the model is prolonged. In order to make the application range of the GANs wider, CGANs have come to use, which extends the original GANs into a conditional model, and limits the output result of the network by adding additional conditions. Although the results shown in CGAN are very basic, it demonstrates the potential of a conditional countermeasure network and shows great application prospects.
Although the previously proposed generation of competing networks has been theoretically successful, researchers have found that many problems still occur during the training of GANs, the most significant of which stems from the extreme instability of the training. One solution to stabilize the training process is to start with a loss function. For example, the purpose of stable training is realized by introducing a Wasserstein distance in the initiative wgan (Wasserstein gan), and the distance has superior smoothness compared with KL divergence and JS divergence, so that the problem of gradient disappearance can be solved better in theory, and the training process can be stabilized more effectively. However, the selection method of the WGAN to the gradient value of the discriminator is not reasonable, so that the WGAN-GP uses a gradient penalty to enable the gradient to be updated smoothly, namely the 1-lipschitz condition is met, and the problem of training gradient vanishing gradient explosion is solved. RSGAN can produce more stable, higher quality data than other GANs varieties. Standard RSGANs with gradient penalties generate data with quality better than WGAN-GP and with a 400% reduction in time required for the best network architecture model in the GAN variant of the same phase. RSGANs are able to generate reasonably high resolution images from a very small sample compared to GANs and LSGAN, and the quality of the image is significantly improved compared to WGAN-GP.
The originally generated countermeasure network is improved through two aspects of the network structure and the loss function, so that the training stability and the training result are greatly improved. Meanwhile, applications for generating an anti-network are also continuously developed, wherein style migration is a big hotspot in the applications. Most pioneering is the application of CGAN to image style migration offering a conditional countermeasure network for a general solution to the image-to-image conversion problem, which model can learn features in the input image and then migrate into the output image, e.g. can change the night of a certain place to the day, and can also convert a label map to a real map, but which model must use paired datasets, i.e. day of a certain place must use the same location of night pictures as a match. So acquisition of the data set is often difficult. In order to solve the problem, CycleGAN is generated, and the network model can achieve a good style migration effect without using pairing data by learning cycle consistency loss.
Nowadays, CNN is mostly used for generating a countermeasure network, but CNN can only capture local spatial information and a receptive field, which is not enough to cover the whole network structure, and it is difficult to learn many kinds of data sets, and it may also cause key part shifts in images, such as the positions of five sense organs are not right in the generation of human faces. Therefore, an attention-based mechanism for generating a countermeasure network SAGAN is proposed, the model combines Non-local Neural Networks with GANs so that the network can have a larger receptive field without causing reduction of computational efficiency, and the model has the greatest advantage of being good at processing geometric figures. When capturing the correlation of a certain position in Non-local, the global correlation of the position and all positions in a picture is calculated, so that the calculation amount is increased, in order to solve the problem and maintain the precision, GCNet establishes a three-step universal framework of unified global context modeling, is a light-weight model, and can achieve the effect of effectively capturing global information.
Currently, more common optimizers can be roughly divided into two types: compared with the SGD, many models (such as convolutional neural networks) adopt the adaptive method, which generally converges faster but has poorer generalization effect (such as Adam) and an acceleration scheme (such as random gradient descent SGD with momentum). For such complex cases as creating a countermeasure network (GAN), an adaptive approach is often used by default because of its stability. The AdaBelief optimizer adjusts the step size according to "belief" in the current gradient direction, and takes the Exponential Moving Average (EMA) of noisy gradients as the next gradient prediction. If the observed gradient deviates significantly from the prediction, then the current observation is not trusted and a smaller step size is taken; if the observed gradient is close to the predicted value, the current observation is trusted and a larger step size is taken. Experiments show that the optimizer simultaneously meets 3 advantages: fast convergence, good generalization and training stability of the adaptive method.
Although the research on CycleGAN is increasing, the migration effect is also increasing. However, there are still some problems: firstly, the migration effect of CycleGAN is not satisfactory when performing geometric figures, and secondly, the result of image generation still has a space for improvement.
Disclosure of Invention
The invention aims to solve the problem of poor migration effect in the style migration field, and provides a stable generation countermeasure network based on an attention mechanism.
The purpose of the invention can be realized by the following technical scheme:
an application of a stable generation countermeasure network based on attention mechanism in style migration, comprising the following steps:
1) selecting a data set for migration from the style migration official data set to realize the migration of two domains in the data set;
2) inputting the sample data set of the first domain into a first generator to generate a second domain image after migration;
3) transmitting the generated second domain image into a discriminator to obtain a discrimination result, and calculating to obtain the countermeasure loss of the first domain migration;
4) inputting the generated second domain image into a second generator, generating an image of the first domain, and calculating the cycle consistency loss of the first domain;
5) the same steps are carried out on the second domain image, and the confrontation loss of the second domain and the cycle consistency loss of the second domain are calculated;
6) adding all losses to obtain a total loss;
7) fixing the parameters of the discriminator, and performing back propagation and parameter updating on the generator while not performing gradient descent;
8) allowing the gradient of the discriminator to be reduced, and performing backward propagation and parameter updating;
9) optimizing the model through continuous iteration to finally obtain a trained model;
10) inputting the test set image into the trained model to obtain a test result;
11) and testing the trained model by using PSNR, SSIM, LPIPS and FID evaluation indexes on the test result, and measuring and outputting the result.
In the step 3), the countermeasure loss is obtained by specifically calculating as follows:
for mapping function G X → Y and discriminator DYThe challenge loss of (a) is expressed as follows:
for the mapping function F: Y → X and the discriminator DXThe challenge loss of (a) is expressed as follows:
g in the formula (1) represents a generator of domain X → domain Y, DYA discriminator representing a Y domain;
f in equation (2) represents the generator of domain Y → domain X, DXA discriminator representing an X domain;
in the formulas (1) and (2),andrespectively representing training examples of an X domain and a Y domain;
in the step 4), the cycle consistency loss is obtained by specifically calculating as follows:
in the step 6), the total loss is obtained by specifically calculating as follows:
L(G,F,Dx,Dy)=LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)+λLcyc(G,F) (4)
in formula (4), λ represents the correlation between two domains;
in the step 9), the model is optimized, and the specific optimization objective is as follows:
in the formula (5), G*And F*A generator under the optimal condition;
the invention has the beneficial effects that:
the invention provides a stable generation countermeasure network based on an attention mechanism based on a style migration problem, and the network can effectively improve the migration effect of images.
Secondly, the invention adopts an attention mechanism to carry out secondary feature extraction on the input features.
And thirdly, the invention adopts a sub-pixel convolution mechanism to improve the sampling effect on the model, thereby improving the image migration effect.
And fourthly, stabilizing the model training process by adopting a spectrum normalization mechanism.
And fifthly, adding a weight coefficient to the attention diagram extracted by the attention mechanism, and dynamically adjusting the proportion of the attention in the network by modifying the weight coefficient.
And sixthly, an AdaBelief optimizer is adopted to further optimize the model and improve the generation effect.
Drawings
FIG. 1 is a diagram of an example network model of the present invention.
FIG. 2 is a diagram of an attention mechanism network model used in an embodiment of the present invention.
FIG. 3 is a diagram of an example generator network model of the present invention.
FIG. 4 is a diagram of an example arbiter network model of the present invention.
FIG. 5 is a graph comparing the effect of the example of the present invention on the applet 2orange dataset with the effect of cycleGAN, AGGAN.
FIG. 6 is a graph comparing the effect of an example of the invention on the horses2zebra dataset with the effect of cycleGAN, AGGAN.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
An application of a stable generation countermeasure network based on an attention mechanism in style migration is characterized in that images of two domains are taken out from a data set image, a generator and a discriminator added with the attention mechanism are adopted to extract features of an input image and train the input image, and migration between the two domains is achieved. Network model diagram see fig. 1, network model diagram for attention mechanism in model see fig. 2, generator network model diagram in model see fig. 3, and discriminator network model diagram in model see fig. 4. The method comprises the following specific steps:
1. selection of data sets
We selected horse2zebra and apple2orange in the style migration official dataset, where the horse2zebra training set includes 1067 horse images, 1334 zebra images, and the test set includes 120 horse images, 140 zebra images. The applet 2orange training set included 995 apple images, 1019 orange images, and the test set included 266 apple images, 248 orange images.
2. New generator extracts input image features and migrates
9 residual blocks are used in the new generator. To further extract features of the images in the generator, a re-extraction of features is performed using an attention mechanism, and a network model diagram is shown in fig. 3.
3. The new discriminator discriminates the migration result of the generator
In order to improve the discrimination accuracy of the discriminator, an attention mechanism is also added to the discriminator to further extract the features of the image on the basis of the features of the original feature map, and a network model map of the discriminator is shown in fig. 4.
4. Training model
The total loss was calculated and back-propagated, and the trained model was obtained after 200 iterations in the experimental environment of table 2.
Table 1 experimental environment table
Name(s) | Configuration of |
Operating system | Ubuntu18.04 |
GPU | NVIDIA GEFORCE RTX 2080Ti |
CPU | Inter xeon processor(skylake,IBRS),2 |
RAM | 16GB |
GPU correlation library | CUDA10.2,CUDNN7.6 |
Deep learning framework | Pytorch |
5. Test model
And inputting the test set images in the data set into a generator of the trained model to obtain a test result. And testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, and measuring a migration result.
6. Description of evaluation index
We evaluated our test results using PSNR, SSIM, LPIPS, FID evaluation indices.
PSNR is called peak signal-to-noise ratio and is defined by Mean Square Error (MSE), two images G and noise images H with the same size of m multiplied by n are given, and the definition of the Mean Square Error (MSE) and the PSNR is as follows:
in equations (6) and (7), MAX is the maximum possible pixel value of the picture, MSE represents the mean square error of the current image G and the reference image H, and m and n are the height and width of the image, respectively.
SSIM is an index for measuring the similarity of two images. The contrast is measured based on the brightness (luminance), contrast (contrast) and structure (structure) between samples x and y. The more similar the two groups of pictures, the higher the value of SSIM.
LPIPS is a new judgment index simulating human perception, the calculation of the index depends on a VGG network, and deep features of different structures and tasks in the picture are extracted through the network.
The FID estimates the data distribution of the real image and the generated image in the deep neural network and calculates the distance between the real image and the generated image, the estimation result is highly similar to the human perception, and the smaller the FID value is, the closer the two images are.
7. Description of comparative model
ASGAN is compared to cycleGAN and AGGAN.
CycleGAN is used for image style migration between unpaired data, using a cyclic consistency penalty to learn not only the source domain to target domain mapping, but also the reverse mapping from target domain to source domain.
AGGAN solves the problem that it is difficult for traditional unsupervised image style migration techniques to focus on a single object without changing the background in the picture or multiple objects by introducing an unsupervised attention mechanism that does antagonism training with the generator and the discriminator.
8. Computational cost analysis
It is assumed that the dimension of input information of the ASGAN generator and the discriminator is H × W × C, the dimension of the input feature map of a certain calculation unit is H × W × C, and the dimension of the processed output feature map is H ' × W ' × C '. The number of executions of all the basic calculation units and simple operations involved is listed as shown in table 2.
TABLE 2 number of simple operation executions of calculation units of basic classes contained in ASGAN
Basic computing unit | S× | S÷ | S+ | S- | S>,<,≥,≤,==,≠ | S= |
Convlution | ck2h′w′c′ | 0 | ck2h′w′c′+h′w′c′ | 0 | 0 | h′w′c′ |
ReLU | 0 | 0 | 0 | 0 | hwc | h′w′c′ |
Tanh | 6hwc | hwc | 13hwc | hwc | 0 | h′w′c′ |
Softmax | hwc | hwc | 4hwc-1 | hwc | hwc-1 | h′w′c′ |
Adding | 0 | 0 | hwc | 0 | 0 | h′w′c′ |
Mapping | 0 | 0 | 0 | 0 | 0 | h′w′c′ |
Matmuling | hwc | 0 | h(w-1)c | 0 | 0 | h′w′c′ |
Arranging | 0 | 0 | 0 | 0 | 0 | h′w′c′ |
In Table 1, S×,S÷,S+,S-,S=Respectively representing the number of times of execution of simple operations of multiplication, division, addition, subtraction and assignment, S>,<,≥,≤,==,≠Indicating that the number of times of execution of the compare operation is greater than, less than, greater than or equal to, less than or equal to, or not equal to. The Convlution calculation unit represents convolution or inverse convolution, k represents the convolution kernel size, and the ReLU, Tanh and Softmax calculation units represent activation functions, which can be converted into a combination of several kinds of simple operations. The Adding, Mapping, Matmuling and arraging computing units respectively represent matrix addition, matrix Mapping, matrix multiplication and row vector sequential arrangement into a three-dimensional matrix. Tanh and Softmax activation functions use exApproximation function, wherein exThe approximation function includes: the average error range of the method is 0.02-0.04 through one multiplication, three addition and two assignment operations.
According to the descriptions in fig. 3 and 4, when the dimension of the model input information is fixed, the dimension of the input feature map and the dimension of the output feature map of each basic computing unit are also fixed. Then, table 3 is obtained by accumulating the actual number of times of execution of the simple operation performed by each basic calculation unit in the ASGAN, based on table 2 and the fixed input dimension and output dimension of each basic calculation unit.
TABLE 3 Total number of executions of each simple operation in ASGAN
Similar to the statistical method for the calculated cost of ASGAN, the calculated cost to obtain CycleGAN is shown in Table 4.
Table 4 total number of executions of each simple operation in CycleGAN.
Simple operation | Number of executions |
S× | 13068HWC+1794376HW |
S÷ | 2HWC |
S+ | 13084HWC+1795656.03125HW |
S- | 2HWC |
S>,<,≥,≤,==,≠ | 776HW |
S= | 20HWC+2128.03125HW |
Because multiplication and division have similar instruction cycles, the computation costs of both are similar, as are the computation costs of addition, subtraction, comparison, and assignment. From tables 3 and 4, it can be seen that:
the computational cost of ASGAN multiplication and division is:
CostASGAN=13070HWC+1499784.25HW+131072 (8)
the computational cost of ASGAN addition, subtraction, comparison and assignment is:
the computational cost of CycleGAN multiplication and division is:
CostCycleGAN=13070HWC+1794376HW (10)
the computation cost of addition, subtraction, comparison and assignment of the CycleGAN is as follows:
from equations (8) and (10), the reduced computational cost of ASGAN compared to cycleGAN when considering multiplication and division is:
from equations (9) and (11), the reduced computational cost of ASGAN compared to cycleGAN when considering addition, subtraction, comparison and assignment is:
wherein H is more than or equal to 1, W is more than or equal to 1 and C is more than or equal to 1. As can be seen from equations (12) and (13), the computational cost reduction for ASGAN multiplication and division ranges from 0 to 16.30% and the computational cost reduction for addition, subtraction, comparison, and assignment ranges from 0 to 16.27% compared to CycleGAN. Therefore, ASGAN run times are less than cycleGAN.
9. Analysis of total amount of parameters
The number of parameters required by a convolution basic calculation unit is as follows: k is a radical of2C.c '+ c', where k representsThe size of the convolution kernel, c represents the number of channels of the convolution input feature map, c' represents the number of channels of the convolution output feature map [43 ]]. Thus, the total number of parameters for ASGAN and CycleGAN are:
PASGAN=14592C+27773058 (14)
PCycleGAN=14592C+28241920 (15)
from equations (14) and (15), one can obtain:
wherein C.gtoreq.1, as can be obtained from equation (16), the amount of the parameter decrease of ASGAN compared to cycleGAN is in the range of 0 to 1.66%. ASGAN thus occupies fewer computer memory resources than CycleGAN.
10. Description of qualitative comparison
Model results of the operation on the applet 2orange dataset are shown in fig. 5, and when the apple to orange transition is performed, the first row results show different degrees of blurring artifacts for both CycleGAN and AGGAN, while ASGAN shows no artifacts. In the second line of results, there was a case where the CycleGAN was not completely transformed in the lower right of the apple, and AGGAN appeared watermarks in the same positions. In the process of switching the orange to the apple, in the third row, both CycleGAN and AGgan destroy the background of the input image, and ASGAN can well realize the migration of the orange to the apple. In the fourth row, both CycleGAN and AGGAN create artifacts that affect the image migration result, while ASGAN better enables migration between the two domains.
The test result of the model in horse2zebra is shown in fig. 6, when the horse is converted into zebra, the first row CycleGAN has artifacts on the horse tail and horse head of the conversion result, the background is changed, and AGGAN has the problem of rough contour, but ASGAN keeps the background color unchanged and has no artifacts and contour blurring problems when the conversion is carried out. In the second row, when the conversion is carried out, obvious artifacts appear near the tail of a CycleGAN horse, the color of the horse is changed when the artifacts appear as the result of AGGAN, and the color of the horse is not changed when the artifacts do not appear as the result of ASGAN. When the zebra-to-horse conversion is carried out, the third row of cycleGAN and AGGAN is transparent when being processed at the head position of the horse, and the ASGAN can still retain the head characteristics of the zebra and carry out characteristic migration on the rest positions of the zebra. Fourth, CycleGAN changes the color of the horse ears, AGGAN changes the horse color, and ASGAN migration works better.
11. Description of quantitative comparison
The model provided by the invention is compared with two models, namely CycleGAN and AGGAN. Test results are obtained through the test set of the applet 2orange and the horse2zebra data sets, and then are evaluated through PSNR, SSIM, LPIPS and FID indexes, and an average value is taken. The results of evaluating the index on the horse2zebra dataset are shown in table 2, and the results of evaluating the index on the applet 2orange dataset are shown in table 3. Through comparison of the three indexes, the model provided by the invention improves the image migration effect.
TABLE 2 average PSNR, average SSIM, average LPIPS, average FID for cycleGAN, AGGAN, ASGAN in the applet 2orange dataset
TABLE 3 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, ASGAN in the horse2zebra dataset
The above embodiments describe in detail the application embodiments of the present invention for a robust generation countermeasure network based on attention mechanism in style migration, and the above embodiments are only used to help understanding the proposed method and core idea of the present invention.
Claims (6)
1. An application of a stable generation countermeasure network based on attention mechanism in style migration, which is characterized by comprising the following steps:
(1) selecting a data set for migration from the style migration official data set;
(2) forward propagation: inputting the sample data sets of the two domains into a new generator, and performing convolution, attention mechanism, residual error and sub-pixel convolution to obtain a transferred generated image;
(3) and (3) back propagation: first, the parameters of the new discriminator are fixed so as not to perform gradient descent, and the new generator is reversely propagated and updated with the parameters. Then, allowing the gradient of the new discriminator to decrease, and performing back propagation and parameter updating;
(4) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.
2. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,
and (3) adding the attention mechanism of the new generator in the step (2) into the improved cycleGAN generator and the improved discriminator by the GC Block attention mechanism, so that the receptive field is enlarged, more spatial information can be captured by the improved cycleGAN generator, and a better effect can be obtained when the model processes geometric images.
3. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,
and (3) modifying the reverse inverted convolution in the generator of the cycleGAN into sub-pixel convolution by an up-sampling mechanism of the new generator in the step (2), so that the generation result of the model is better.
4. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,
and (3) adding spectrum normalization into the generator and the discriminator of the cycleGAN to stabilize the training process of the model by using the normalization mechanism of the new generator in the step (2).
5. The use of an attention-based mechanism for stably generating an antagonistic network in style migration according to claim 1,
and (3) adding a weight coefficient to the attention diagram extracted from the attention mechanism in the step (2) of the attention mechanism of the new generator, and dynamically adjusting the proportion of the attention in the network by modifying the weight coefficient.
6. The application of the attention-based stable generation countermeasure network in style migration according to claim 1,
and (4) the optimizer in the step (3) uses an AdaBelief optimizer to further optimize the model and improve the generation effect.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210250457.2A CN114612589A (en) | 2022-03-15 | 2022-03-15 | Application of stable generation countermeasure network in style migration based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210250457.2A CN114612589A (en) | 2022-03-15 | 2022-03-15 | Application of stable generation countermeasure network in style migration based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114612589A true CN114612589A (en) | 2022-06-10 |
Family
ID=81863751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210250457.2A Pending CN114612589A (en) | 2022-03-15 | 2022-03-15 | Application of stable generation countermeasure network in style migration based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114612589A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116958468A (en) * | 2023-07-05 | 2023-10-27 | 中国科学院地理科学与资源研究所 | Mountain snow environment simulation method and system based on SCycleGAN |
CN117994122A (en) * | 2024-01-31 | 2024-05-07 | 哈尔滨工业大学(威海) | Image style migration method based on cyclic generation countermeasure network |
-
2022
- 2022-03-15 CN CN202210250457.2A patent/CN114612589A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116958468A (en) * | 2023-07-05 | 2023-10-27 | 中国科学院地理科学与资源研究所 | Mountain snow environment simulation method and system based on SCycleGAN |
CN117994122A (en) * | 2024-01-31 | 2024-05-07 | 哈尔滨工业大学(威海) | Image style migration method based on cyclic generation countermeasure network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113673307B (en) | Lightweight video action recognition method | |
Cheng et al. | Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion | |
CN114612589A (en) | Application of stable generation countermeasure network in style migration based on attention mechanism | |
CN115311187B (en) | Hyperspectral fusion imaging method, system and medium based on internal and external prior | |
CN112837224A (en) | Super-resolution image reconstruction method based on convolutional neural network | |
CN117078510B (en) | Single image super-resolution reconstruction method of potential features | |
CN111025385B (en) | Seismic data reconstruction method based on low rank and sparse constraint | |
CN111986085A (en) | Image super-resolution method based on depth feedback attention network system | |
Xu et al. | AutoSegNet: An automated neural network for image segmentation | |
CN117994708B (en) | Human body video generation method based on time sequence consistent hidden space guiding diffusion model | |
CN116109689A (en) | Edge-preserving stereo matching method based on guide optimization aggregation | |
CN114037770B (en) | Image generation method of attention mechanism based on discrete Fourier transform | |
CN114598833A (en) | Video frame interpolation method based on spatio-temporal joint attention | |
CN118247414A (en) | Small sample image reconstruction method based on combined diffusion texture constraint nerve radiation field | |
CN114037600A (en) | New cycleGAN style migration network based on new attention mechanism | |
CN107330912B (en) | Target tracking method based on sparse representation of multi-feature fusion | |
CN116309014A (en) | Image style migration method, model, device, electronic equipment and storage medium | |
CN116883524A (en) | Image generation model training, image generation method and device and computer equipment | |
CN113706650A (en) | Image generation method based on attention mechanism and flow model | |
Ding et al. | MSEConv: A Unified Warping Framework for Video Frame Interpolation | |
CN113095328A (en) | Self-training-based semantic segmentation method guided by Gini index | |
You et al. | Generative neural fields by mixtures of neural implicit functions | |
CN114881843B (en) | Fluid artistic control method based on deep learning | |
Zhang et al. | Stochastic reconstruction of porous media based on attention mechanisms and multi-stage generative adversarial network | |
Yu | Reconstruction of compressive sensed (CS) images with deep equilibrium model (DEQ) based on iterative shrinkage-thresholding algorithm (ISTA) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |