CN114037600A - New cycleGAN style migration network based on new attention mechanism - Google Patents

New cycleGAN style migration network based on new attention mechanism Download PDF

Info

Publication number
CN114037600A
CN114037600A CN202111180291.3A CN202111180291A CN114037600A CN 114037600 A CN114037600 A CN 114037600A CN 202111180291 A CN202111180291 A CN 202111180291A CN 114037600 A CN114037600 A CN 114037600A
Authority
CN
China
Prior art keywords
new
image
cyclegan
attention mechanism
conv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111180291.3A
Other languages
Chinese (zh)
Inventor
李庚隆
徐蔚鸿
胡雪梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202111180291.3A priority Critical patent/CN114037600A/en
Publication of CN114037600A publication Critical patent/CN114037600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The Image-to-Image is used for mapping the features in the source domain Image into the target domain Image so as to enable the Image in the target domain to obtain the style of the source domain Image. However, most network models that perform Image-to-Image cannot extract features in the source domain well, resulting in unsatisfactory migration. In recent years, many improved methods exist, but the problem of poor migration effect still exists, and as the network model deepens, the problem that too much video memory is consumed due to too many parameters is more and more obvious, and people hope to improve the image migration effect while reducing the parameters. Therefore, an Attention-CyceGAN model is proposed, which is based on a new lightweight Attention mechanism and can improve the image migration effect while using less parameters. Experiments show that compared with the Attention-CycleGAN model, the parameter quantity is reduced by 13.8M, the consumed video memory is reduced by 6.9G, and the model has better performance in PSNR, SSIM, LPIPS and FID index evaluation.

Description

New cycleGAN style migration network based on new attention mechanism
Technical Field
The invention relates to the field of style migration, in particular to a new cycleGAN style migration network based on a new attention mechanism.
Background
I2IT is a type of visual and graphical problem whose goal is to learn the mapping between input images to output images through a training set. Many computer vision problems can be regarded as an I2IT problem, for example, image hyper-resolution can be regarded as a process of migrating from low resolution to high resolution, image coloring can be regarded as a process of mapping features of gray images to color images, image synthesis can be regarded as a process of mapping label features to real pictures, and I2IT has wide application in domain adaptation and data enhancement.
The study of I2IT has been rapidly developing since the use of CGAN as a general solution to the problem of I2IT was proposed by Isola et al. CGAN-based I2IT learns the mapping from input images to output images while also learning the loss function to train such a mapping, making the model very effective in the face of the task of highly structured graphical output. However, the following problems still exist with this method: the resulting image quality is not high and the input images need to be matched, which is very difficult for the I2IT task, which has difficulty acquiring a matching data set. Therefore, in recent years, people have been improved from different angles, and the effect is continuously improved, and the improvement angles are mainly divided into the following three aspects: loss function based improvements, hidden space based improvements, and network model based improvements.
Zhu et al, propose CycleGAN, solve the problem that I2I can only use pairwise datasets by using cyclic consistency loss, while learning the source domain to target domain mapping, also learning the target domain to source domain reverse mapping. Likewise, DualGAN and DiscogAN are I2IT that use unsupervised learning for non-matching datasets, except that the loss functions used are different. Although feature mapping between unpaired datasets can be achieved using a round robin consistency penalty, there are also some problems. CUT considers that the usage cycle consistency loss is too strict, and therefore introduces comparative learning between unpaired images, employing patch-based multi-layer patch loss to maximize mutual information between corresponding blobs of input and output images. DCLGAN considers that one embedding in the CUT is used for two different domains, which may not effectively capture the domain gap, limiting performance. Therefore, the DCLGAN further utilizes contrast learning by using a double learning method to avoid the defect of cycle consistency. The above researches are improved based on a loss function, and the problems of data set mismatching of I2IT and the defect of cycle consistency loss are solved, but the improvement space still exists for image feature extraction in the migration process.
In addition, the improvement based on the hidden space also enables the effect of I2IT to be further improved. The UNIT is an I2IT network proposed based on CoGAN and a shared hidden space assumption that a pair of images from different domains can be mapped to the same hidden representation. And some I2IT methods assume that the latent space of an image can be decomposed into a content space and a style space, which allows the generation of a multi-modal output. Huang et al propose a multi-modal unsupervised I2IT framework (MUNIT) with two potential representations, respectively style and content, whose content codes are combined with different style representations sampled from the target domain in order to translate one image to another domain. Also, Lee et al introduced a diversified image-to-image translation (DRIT) based on separate characterizations on unpaired data that decomposed the potential space into two spaces: one is a content space where the domain of shared information is unchanged, and the other is a domain-specific attribute space where different outputs are generated under the same content, so that different changes of the same content are modeled.
The promotion of the I2IT task can also be achieved by improving the network. StarGAN proposes a novel and extensible method, which can translate images in multiple fields into images by using only one model, and solves the problem that different models need to be independently established for each pair of image fields when conversion is carried out between more than two fields. The StarGANV2 uses a single generator to generate different images of multiple fields on the basis of StarGAN, so that the images of one field are translated into different images of a target field, and conversion of multiple target fields is supported. NICE-GAN proposes a new I2IT network, reuses the discriminators for coding, and develops a decoupling paradigm for efficient training. Compared with previous networks, NICE-GAN does not need a separate coding component, and this plug-in encoder is trained directly by adversary's loss, and is more efficient if a multi-scale discriminator is applied, with a larger information volume. The method for improving the network model is improved in migration effect, but the model is gradually increased along with gradual development, so that the parameter quantity and the floating point operation quantity of the model are gradually increased.
The task of I2IT can be further refined by further extracting features in the network. The attention mechanism can further extract the features in the feature map, so that better migration effect can be achieved by applying the attention mechanism to I2 IT. For example, Attention-GAN, SPA-GAN, AGGAN, by applying Attention to I2IT, improved migration performance was achieved.
In summary, although the I2IT task has achieved significant results in both single-domain migration and multi-domain migration through different improved approaches, some problems still remain. Firstly, the migration effect is to be further improved, and secondly, the parameter amount of the model is too much, and the training time is too long. Therefore, the invention is a new attention mechanism and is applied to the I2IT task, and the high-level characteristics of the source domain image are extracted, so that the migration effect of the network is improved. And the generator and the discriminator are optimized, the number of parameters is reduced while the migration is kept, and the training time is shortened.
Disclosure of Invention
The invention aims to solve the problems of poor migration effect and excessive model parameter quantity in the field of style migration, and provides a new cycleGAN style migration network based on a new attention mechanism.
The purpose of the invention can be realized by the following technical scheme:
a new cycleGAN style migration network based on a new attention mechanism, comprising the steps of:
1) selecting a data set for migration from the style migration official data set to realize the migration of two domains in the data set;
2) inputting the sample data set of the first domain into a first generator to generate a second domain image after migration;
3) transmitting the generated second domain image into a discriminator to obtain a discrimination result, and calculating to obtain the countermeasure loss of the first domain migration;
4) inputting the generated second domain image into a second generator, generating an image of the first domain, and calculating the cycle consistency loss of the first domain;
5) the same steps are carried out on the second domain image, and the confrontation loss of the second domain and the cycle consistency loss of the second domain are calculated;
6) adding all losses to obtain a total loss;
7) fixing the parameters of the discriminator, and performing back propagation and parameter updating on the generator while not performing gradient descent;
8) allowing the gradient of the discriminator to be reduced, and performing backward propagation and parameter updating;
9) optimizing the model through continuous iteration to finally obtain a trained model;
10) inputting the test set image into the trained model to obtain a test result;
11) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time for the test result, and measuring and outputting the result.
In the step 3), the countermeasure loss is obtained by specifically calculating as follows:
for the mapping function G: x → Y and discriminator DYThe challenge loss of (a) is expressed as follows:
Figure BDA0003296821990000021
for the mapping function F: y → X and discriminator DXThe challenge loss of (a) is expressed as follows:
Figure BDA0003296821990000022
g in the formula (1) represents a generator of domain X → domain Y, DYA discriminator representing a Y domain;
f in equation (2) represents the generator of domain Y → domain X, DXA discriminator representing an X domain;
in the formulas (1) and (2),
Figure BDA0003296821990000023
and
Figure BDA0003296821990000024
respectively representing training examples of an X domain and a Y domain;
in the step 4), the cycle consistency loss is obtained by specifically calculating as follows:
Figure BDA0003296821990000025
in the step 6), the total loss is obtained by specifically calculating as follows:
L(G,F,Dx,Dy)=LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)+λLcyc(G,F) (4)
in formula (4), λ represents the correlation between two domains;
in the step 9), the model is optimized, and the specific optimization objective is as follows:
Figure BDA0003296821990000031
in the formula (5), G*And F*A generator under the optimal condition;
the invention has the beneficial effects that:
the invention provides a new cycleGAN style migration network based on a new attention mechanism based on style migration problems, and the network improves the migration effect of images while reducing the number of model parameters.
Secondly, the invention adopts a new attention mechanism to carry out the second time of feature extraction on the input features.
And thirdly, the original cycleGAN model generator is optimized, and the generator parameter quantity is reduced.
And fourthly, the invention adopts a new discriminator to improve the discrimination result of the migration effect of the generator, thereby promoting the optimization of the model.
Drawings
FIG. 1 is a diagram of an example generator network model of the present invention.
FIG. 2 is a diagram of a new attention mechanism network model used in an example of the present invention.
FIG. 3 is a diagram of an example arbiter network model of the present invention.
FIG. 4 is a graph comparing the effect of an example of the invention on the horses2zebra dataset with the effect of cycleGAN, AGGAN.
FIG. 5 is a graph comparing the effect of the example of the present invention on the applet 2orange dataset with the effect of cycleGAN, AGGAN.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
A new CycleGAN style migration network based on a new attention mechanism extracts images of two domains from images of a data set, and adopts a generator and a discriminator added with the new attention mechanism to extract the characteristics of input images and train the input images so as to realize the migration between the two domains. The network model of the generator is shown in fig. 1, the network model of the new attention mechanism is shown in fig. 2, and the network model of the new arbiter is shown in fig. 3. The method comprises the following specific steps:
1. selection of data sets
We selected horse2zebra and apple2orange in the style migration official dataset, where the horse2zebra training set includes 1067 horse images, 1334 zebra images, and the test set includes 120 horse images, 140 zebra images. The applet 2orange training set included 995 apple images, 1019 orange images, and the test set included 266 apple images, 248 orange images.
2. New generator extracts input image features and migrates
The use of 3 residual blocks in the new generator reduces the number of model parameters. To further extract features of the images in the generator, a new attention mechanism is used for feature re-extraction, and a network model diagram is shown in fig. 2. The new attention mechanism extraction feature diagram comprises the following specific implementation steps:
assuming that the size of the input feature map of the attention mechanism is H × W × C, the mathematical formula is expressed as:
Figure BDA0003296821990000033
s.t.i=1,2...,H;j=1,2...,W;k=1,2...,c.
in the formula (6), FinIs an input feature map, fijkIs FinThe value at the point of the kth channel in the ith row and jth column.
The first step is as follows: inputting a feature map FinMapping into a two-dimensional matrix to obtain SC×HW×1=(skhp)C×HW×1The mathematical formula is expressed as:
Figure BDA0003296821990000032
s.t.h=1,2,...,HW;k=1,2,...,C;p=1.
in the formula (7), 5 denotes a two-dimensional matrix after mapping, skhpIs the value at the point of the p channel in the kth row and the h column of S,
Figure BDA0003296821990000041
is shown as
Figure BDA0003296821990000042
Go to the first
Figure BDA0003296821990000043
Mapping the k channel in the column to obtain skhpAnd represents multiplication of two numbers.
The second step is that: row vector convolution kernel using one undetermined parameter
Figure BDA0003296821990000044
And GmnpMultiplying the column vectors to form a matrix DH×W×1=(dijp)H×W×1The mathematical formula is expressed as:
Figure BDA0003296821990000045
Gmnp=[(f(i-1)(j-1)k)C’(f(i-1)jk)C’(f(i-1)(j+1)k)C’(fi(j-1)k)C’(fijk)C’(fi(j+1)k)C’(f(i+1)(j-1)k)C’(f(i+1)jk)C’(f(i+1)(j+1)k)C]T (9)
s.t.i=1,2,...,H;j=1,2,...,W;k=1,2,...,C;m=1,2,...,3·3·C;n=1;p=1.
in the formula (8), wnmpIs Conv0The value at the point of the p-th channel in the n-th row and m-th column. GmnpThe value at the point of the p-th channel in the m-th row and n-th column in the input feature map is represented by formula (9). D represents the matrix obtained after convolution, DijpIs the value at the point of the p-th lane in the ith row and jth column of D, represents a multiplication of two numbers, and T represents a matrix transposition.
The third step: mapping the feature matrix D into a column vector to obtain D'HW×1×1=(d′hnp)HW×1×1The mathematical formula is expressed as:
Figure BDA0003296821990000046
s.t.h=1,2,...,HW;n=1;p=1.
in the formula (10), D ' represents a column vector D ' obtained by mapping 'hnpIs the value at the point of the p channel in the nth column of the h row of D',
Figure BDA0003296821990000047
is shown as
Figure BDA0003296821990000048
Go to the first
Figure BDA0003296821990000049
The p-th channel in the column is mapped to d'hnpAnd represents multiplication of two numbers.
The fourth step: for each element D 'in the feature vector D'hnpPerforming softmax transformation to obtain D ″)HW×1×1=(d″hnp)HW×1×1The mathematical formula is expressed as:
Figure BDA00032968219900000410
s.t.h=1,2,...,HW;t=1,2,...,HW;n=1;p=1.
in the formula (11), D 'represents a feature vector after softmax conversion, D'hnpIs the value at the point of the p channel in row h and column n of D ".
The fifth step: matrix D and matrix D' are subjected to matrix multiplication to obtain matrix UC×1×1=(uknp)C×1×1The mathematical formula is expressed as:
Figure BDA00032968219900000411
s.t.k=1,2,...,C;n=1;p=1.
in the formula (12), U represents the result of multiplying the matrix S by the matrix D ″, UknpRepresenting the value at the point of the p-th channel in the kth row and the nth column,
Figure BDA00032968219900000412
representing a matrix multiplication.
And a sixth step: construction of C convolution kernels Conv containing C undetermined parameters1,Conv2,...,ConvC(Note: Co-C)2A parameter) Separately convolved with U, i.e.
Figure BDA00032968219900000413
Generating C numbers and forming a column vector VC×1×1=(vknp)C×1×1The mathematical formula is expressed as:
Figure BDA0003296821990000051
s.t.k=1,2,...,C.
in the formula (13), V represents a feature vector obtained by convolution, VknpIs the value at a point in the p channel in the k row and n column of V, ConvkThe k-th convolution kernel is represented by,
Figure BDA0003296821990000052
representing a matrix multiplication.
The seventh step: LayerNorm transformation of each element in V to obtain V'C×1×1=(v′knp)C×1×1The mathematical formula is expressed as:
Figure BDA0003296821990000053
s.t.k=1,2,...,C;n=1;p=1.
in the formula (14), V ' represents a feature vector V ' after LayerNorm conversion 'knpIs the value at a point in the p channel in the kth row and nth column in V'. Gamma represents a weight coefficient for the weight of the image,
Figure BDA0003296821990000054
represents the mean value of the feature vector V',
Figure BDA0003296821990000055
represents the variance of the feature vector V', e represents a very small value, β represents the bias coefficient, and · represents the multiplication of two numbers.
Eighth step: ReLU transformation for each element in VObtaining VC×1×1=(v″knp)C×1×1The mathematical formula is expressed as:
Figure BDA0003296821990000056
s.t.k=1,2,...,C;n=1;p=1.
in the formula (15), V "represents a feature vector after ReLU transformation, and V ″knpIs the value at a point in the p channel in the kth row, n column of V ".
The ninth step: constructing C convolution kernels Conv 'containing C undetermined parameters'1,Conv′2,...,Conv′C(note: consensus C)2One parameter) are convolved with V', respectively, i.e.
Figure BDA0003296821990000057
Generating C numbers and forming a column vector QC×1×1=(qknp)C×1×1The mathematical formula is expressed as:
Figure BDA0003296821990000058
s.t.k=1,2,...,C;n=1;p=1.
in the formula (16), Q represents a feature vector after the convolution operation, and QknpIs the value at a point in the p channel in the kth column of line k of Q, Conv'kThe k-th convolution kernel is represented by,
Figure BDA0003296821990000059
representing a matrix multiplication.
The tenth step: adding the feature vector U to the feature vector Q to obtain Q'C×1×1=(q′knp)C×1×1The mathematical formula is expressed as:
q′knp=qknp+uknp (17)
s.t.k=1,2,...,C;n=1;p=1.
formula (A), (B) and17) wherein Q 'represents the result of addition of feature vectors, Q'knpIs the value at a point in the p channel in the kth row and nth column of Q'.
The eleventh step: handlebar H W (Q')TThe vectors are sequentially arranged to form a three-dimensional matrix, and Q ″' is obtainedH×W×C=(q″ijk)H×W×CThe mathematical formula is expressed as:
q″ijk=(q′knp)T (18)
s.t.i=1,2,...,H;j=1,2,...,W;k=1,2,...,C;n=1;p=1.
in the formula (18), Q 'represents a three-dimensional matrix after sequential arrangement, Q'ijkIs the value at a point in the kth lane in the ith row and jth column of Q "and T represents the matrix transpose.
The twelfth step: the three-dimensional matrix characteristic diagram Q' is compared with the input characteristic diagram FinAdding corresponding elements to obtain an output characteristic diagram
Figure BDA0003296821990000066
Figure BDA0003296821990000067
The mathematical formula is expressed as:
oijk=q″ijk+fijk (19)
s.t.i=1,2,...,H;j=1,2,...,W;k=1,2,...,C.
in the formula (19), FoutRepresents the result of the addition of two profiles, oijkIs FoutThe value at a point in the kth channel in the ith row and jth column.
The general formula: will be provided with
Figure BDA0003296821990000068
As an input feature map for the attention mechanism,
Figure BDA0003296821990000069
as an output feature map of the attention mechanism, the total mathematical formula is expressed as:
Figure BDA0003296821990000061
s.t.h=1,2,...,HW;h′=1,2,...,HW.
in the formula (20), the reaction mixture is,
Figure BDA0003296821990000062
s.t.k=1,2,...,C.
Wv2is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv2=[Conv′1,Conv′2,...,Conv′C]TOf Conv'kIs the kth convolution kernel containing C pending parameters.
ReLU denotes the ReLU activation function, i.e.
Figure BDA0003296821990000063
Wherein A represents a two-dimensional matrix, aijIs a value in A.
LN denotes LayerNorm normalization, i.e.
Figure BDA0003296821990000064
Wherein B is a two-dimensional matrix, BijIs a value in the B matrix, mu is the mean value of the B matrix, sigma2Is the variance of the B matrix, e is a small number, γ is the weighting factor, and β is the bias factor.
Wv1Is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv1=[Conv1,Conv2,...,ConvC]TWherein ConvkIs the kth convolution kernel containing C pending parameters.
Figure BDA0003296821990000065
s.t.h″=1,2,...,HW.
In the formula (21),Conv0Is a row vector convolution kernel of the parameters to be determined,
Figure BDA00032968219900000610
representing a matrix multiplication.
3. The discriminator discriminates the migration result of the generator
In order to improve the discrimination accuracy of the discriminator, the idea of residual error is adopted in the discriminator, so that the image realizes further feature extraction on the basis of the original feature of the feature map, and a network model map of the discriminator refers to fig. 2.
4. Mathematical commonality analysis of models
Taking CycleGAN as a reference, carrying out mathematical universality analysis on the network model, wherein the analysis steps are as follows:
(1) input image size commonality analysis
Assuming that the size of the input image is H × W × C, the image size is changed as shown in fig. 1, and although upsampling and downsampling are performed, the size of the image is finally restored to H × W × C. Our model is therefore valid for arbitrary size input image migration.
(2) Calculated volume commonality analysis
Assume that the input image size of an arbitrary block in our model is H × W × C, and the output image size is H ' × W ' × C '. Then all kinds of blocks in the network structure and the calculated cost of each block are referred to table 1. Wherein 5×、5÷、S+、S-、S>,<,≥,≤,==,≠、SRespectively the number of multiply, divide, add, subtract, compare and assign operations. And k is the size of the convolution kernel in the convolution operation. convolution represents convolution operation, ReLU, leak _ ReLU, tanh and softmax represent corresponding activation functions, addressing represents matrix addition, mapping represents matrix mapping, and multiplexing represents two-dimensional matrix multiplication, so that C is 1, and arraging represents sequential arrangement of vectors into a three-dimensional tensor.
Tan h was subjected to a second order taylor expansion, expressed as:
Figure BDA0003296821990000071
in equation (22), after one tanh transformation is performed, each value needs to undergo two multiplications, one division, and one subtraction.
Second order taylor expansion is performed on softmax, expressed as:
Figure BDA0003296821990000072
in equation (23), HW is a constant, and a matrix is calculated only once when s0ftmax is performed, and is ignored. Therefore, after a softmax transformation, each value is subjected to HW additions and division.
The calculation amount of the model is measured through two modes, the first mode only considers the complex operation of multiplication and division, and the second mode not only considers the complex operation of multiplication and division, but also considers the simple operation of addition, subtraction, comparison and assignment. Let us assume that the input image size is H × W × C, where H, W, C are each equal to or greater than 1. For a CycleGAN network, the computational load of only complex operations is:
SCycleGAN=(6532C+897720.0156)HW
not only considering the complex operation of the CycleGAN network, but also considering the simple operation, the calculated amount is as follows:
Figure BDA0003296821990000077
for our model, the computational complexity of the complex operation is:
SAttention-CycleGAN=(6340C+462198.2656)HW+271488
considering both complex and simple operations, the calculation amount is:
Figure BDA0003296821990000078
we analyzed the amount of computation we model reduces over cycleGAN from considering both complex and simple computations, respectively.
When only the case of complex operations is considered, the reduced amount of computation is:
Figure BDA0003296821990000073
as can be seen from equation (24), τ increases as HW increases and C decreases. Conversely, as HW decreases, C increases and τ decreases. However, no matter how much HW and C are taken, when only complex operation is considered, the reduction of the model is more than 2.94 percent of calculation amount and less than 48.18 percent compared with the CycleGAN model.
When both complex and simple calculations are considered, the reduced amount of calculations is:
Figure BDA0003296821990000074
as can be seen from the formula (25),
Figure BDA0003296821990000075
with the change of HW and C, the change trend is the same as tau. However, no matter how much HW and C are taken, when the complex operation and the simple operation are considered, the reduction of the model is more than 2.94 percent of calculation amount and less than 48.15 percent compared with the CycleGAN model.
TABLE 1 model calculation cost
Figure BDA0003296821990000076
Figure BDA0003296821990000081
(3) Calculated volume commonality analysis
We counted the parameters of CycleGAN and our model, and the results were as follows:
PCycleGAN=(7296C+14120960)*2
PAttention-CycleGAN=(6528C+7216771)*2
the amount of parameters that we model reduced compared to CycleGAN was:
P=PCycleGAN-PAttention-CycleGAN=1536C+13808378≥13809914 (26)
as can be seen from equation (26), P increases with increasing C. But regardless of the image input size, our model is reduced by at least a parameter amount of 13.8M compared to CycleGAN.
5. Training model
The total loss was calculated and back-propagated, and the trained model was obtained after 200 iterations in the experimental environment of table 2.
Table 2 experimental environment table
Name(s) Configuration of
Operating system Ubuntu18.04
GPU NVIDIA GEFORCE RTX 2080Ti
CPU Inter xeon processor(skylake,IBRS),2
RAM 16GB
GPU correlation library CUDA10.2,CUDNN7.6
Deep learning framework Pytorch
6. Test model
And inputting the test set images in the data set into a generator of the trained model to obtain a test result. And testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.
7. Description of evaluation index
We evaluated our test results using PSNR, SSIM, LPIPS, FID evaluation indices.
PSNR is called peak signal-to-noise ratio and is defined by Mean Square Error (MSE), two images G and noise images H with the same size of m multiplied by n are given, and the definition of the Mean Square Error (MSE) and the PSNR is as follows:
Figure BDA0003296821990000082
Figure BDA0003296821990000083
in equations (27) and (28), MAX is the maximum possible pixel value of the picture, MSE represents the mean square error of the current image G and the reference image H, and m and n are the height and width of the image, respectively.
SSIM is an index for measuring the similarity of two images. The contrast is measured based on the brightness (luminance), contrast (contrast) and structure (structure) between samples x and y. The more similar the two groups of pictures, the higher the value of SSIM.
LPIPS is a new judgment index simulating human perception, the calculation of the index depends on a VGG network, and deep features of different structures and tasks in the picture are extracted through the network.
The FID is a measure of the similarity between two images from the statistical aspect of computer vision features of the original image, and is a measure of the distance between the feature vectors of the real image and the generated image.
8. Description of comparative model
Attention-CycleGAN was compared to CycleGAN and AGGAN.
CycleGAN is an I2IT for unpaired data, using a cyclic consistency penalty to learn not only the source domain to target domain mapping, but also the reverse mapping from target domain to source domain.
AGGAN addresses the problem of conventional unsupervised I2IT techniques that have difficulty focusing attention on a single object without changing the background or multiple objects in the picture by introducing an unsupervised attention mechanism for antagonism training in conjunction with the generator and the discriminator.
9. Description of qualitative comparison
The results of testing of CycleGAN, AGGAN and the model Attention-CycleGAN proposed by the present invention in the horse2zebra dataset are shown in fig. 4. In the process of moving horses to zebras, artifacts of different degrees appear in the conversion results of the first-row cycleGAN and the AGGAN, and the orientation-cycleGAN can avoid the artifacts. In the second line, the CycleGAN does not completely transfer the horse into the zebra, the transfer result of the AGGAN has too many artifacts and is fuzzy, and the Attention-CycleGAN ensures the definition of the image while finishing the transfer. In the process of migrating zebra to horse, the migration result of the third row, namely, excessive migration of different degrees of the CycleGAN and the AGGAN, causes the information loss of the horse, and the Attention-CycleGAN avoids the problem of excessive migration. In the fourth row, the migration of CycleGAN and AGGAN is incomplete, and speckle features still exist, while the anchoring-CycleGAN migration effect is significantly better than both.
The results of testing of cycleGAN, AGGAN and the model Attention-cycleGAN proposed by the present invention in the applet 2orange dataset are shown in FIG. 5. In the process of transferring the apples to the oranges, hands of the CycleGANs and the AGGANs are damaged to different degrees, and part of contents are still not transferred, but the Attention-CycleGANs realize characteristic transfer of the apples to the oranges while hands are not changed. In the second line, CycleGAN does not implement image migration, while AGGAN changes the color of the onion in the background, and Attention-CycleGAN implements image migration while not changing the background. During the orange to apple migration, artifacts appear in both CycleGAN and AGAGN in the third row. In the fourth row, the color migration of CycleGAN is incomplete and AGAGN has artifacts in the lower left corner of the picture. Compared with the two methods, the Attention-cycle GAN realizes the image migration while ensuring that no artifact occurs.
10. Description of quantitative comparison
The model Attention-CycleGAN provided by the invention is compared with two models, namely CycleGAN and AGGAN. And obtaining a test result through a testing set of the horse2zebra and maps data sets, evaluating through PSNR, SSIM, LPIPS and FID indexes, and taking an average value. The results of evaluating the index on the horse2zebra dataset are shown in table 3, and the results of evaluating the index on the applet 2orange dataset are shown in table 4. Through comparison of three indexes, the model Attention-cycle GAN provided by the invention improves the image migration effect.
TABLE 3 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, Attention-cycleGAN in horse- > zebra data sets
Figure BDA0003296821990000091
TABLE 4 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, Attention-cycleGAN in applet- > orange dataset
Figure BDA0003296821990000101
The parameter quantities and running times of the three models are counted on two data sets, the consumption results on the horse2zebra data set are shown in table 6, and the consumption results on the applet 2orange data set are shown in table 7.
TABLE 6 parameter, spent video memory, and run time conditions in the horse- > zebra dataset for cycleGAN, AGGAN, Attention-cycleGAN
Figure BDA0003296821990000102
TABLE 7 quantity, consumed video memory, and run time conditions in Apple2orange dataset for cycleGAN, AGGAN, Attention-cycleGAN
Figure BDA0003296821990000103
The above embodiments describe in detail a specific implementation of a new CycleGAN style migration network based on a new attention mechanism, and the above embodiments are only used to help understand the proposed method and core idea of the present invention.

Claims (4)

1. A new cycleGAN style migration network based on a new attention mechanism, comprising the steps of:
(1) selecting a data set for migration from the style migration official data set;
(2) forward propagation: inputting the sample data sets of the two domains into a new generator, and performing convolution, a new attention mechanism, residual errors and reverse inversion convolution operations to obtain a migrated generated image;
(3) and (3) back propagation: first, the parameters of the new discriminator are fixed so as not to perform gradient descent, and the new generator is reversely propagated and updated with the parameters. Then, allowing the gradient of the new discriminator to decrease, and performing back propagation and parameter updating;
(4) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.
2. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,
new attention mechanism for new generator in step (2), assuming new attention mechanism will be
Figure FDA0003296821980000011
As an input feature map for the attention mechanism,
Figure FDA0003296821980000012
as an output feature map of the attention mechanism, a final output feature map may be regarded as a result of performing a new feature extraction on the input features. The mathematical formula is expressed as:
Figure FDA0003296821980000013
s.t.h=1,2,…,HW;h′=1,2,…,HW.
wherein,
Figure FDA0003296821980000014
s.t.k=1,2,…,C.
Wv2is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv2=[Conv′1,Conv′2,…,Conv′C]TOf Conv'kIs the kth convolution kernel containing C pending parameters.
ReLU denotes the ReLU activation function, i.e.
Figure FDA0003296821980000015
Wherein A represents a two-dimensional matrix, aijIs a value in A.
LN denotes LayerNorm normalization, i.e.
Figure FDA0003296821980000016
Wherein B is a two-dimensional matrix, BijIs a value in the B matrix, mu is the mean value of the B matrix, sigma2Is the variance of the B matrix, e is a small number, γ is the weighting factor, and β is the bias factor.
Wv1Is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv1=[Conv1,Conv2,…,ConvC]TWherein ConvkIs the kth convolution kernel containing C pending parameters.
Figure FDA0003296821980000017
s.t.h″=1,2,…,HW.
Wherein, Conv0Is a row vector convolution kernel of the parameters to be determined,
Figure FDA0003296821980000018
representing a matrix multiplication.
3. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,
in the step (2), the new generator can perform better level extraction on the image features after using a new attention mechanism, so that good migration effect can be obtained while reducing the parameter quantity by using 3 residual blocks, and the problems of excessive parameter quantity and gradient explosion caused by using 9 residual blocks are avoided.
4. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,
and (4) the new discriminator in the step (3) adopts a residual error idea in convolution, so that the feature extraction can be carried out on the basis of the original feature map of the image, and the problem that the discriminator cannot well extract the image features due to the disappearance of the gradient caused by the over-deep network is prevented.
CN202111180291.3A 2021-10-11 2021-10-11 New cycleGAN style migration network based on new attention mechanism Pending CN114037600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111180291.3A CN114037600A (en) 2021-10-11 2021-10-11 New cycleGAN style migration network based on new attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111180291.3A CN114037600A (en) 2021-10-11 2021-10-11 New cycleGAN style migration network based on new attention mechanism

Publications (1)

Publication Number Publication Date
CN114037600A true CN114037600A (en) 2022-02-11

Family

ID=80141033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111180291.3A Pending CN114037600A (en) 2021-10-11 2021-10-11 New cycleGAN style migration network based on new attention mechanism

Country Status (1)

Country Link
CN (1) CN114037600A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958468A (en) * 2023-07-05 2023-10-27 中国科学院地理科学与资源研究所 Mountain snow environment simulation method and system based on SCycleGAN
CN118115862A (en) * 2024-04-30 2024-05-31 南京信息工程大学 Face image tampering anomaly detection method, device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696027A (en) * 2020-05-20 2020-09-22 电子科技大学 Multi-modal image style migration method based on adaptive attention mechanism
CN112070658A (en) * 2020-08-25 2020-12-11 西安理工大学 Chinese character font style migration method based on deep learning
CN112767519A (en) * 2020-12-30 2021-05-07 电子科技大学 Controllable expression generation method combined with style migration
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
CN111696027A (en) * 2020-05-20 2020-09-22 电子科技大学 Multi-modal image style migration method based on adaptive attention mechanism
CN112070658A (en) * 2020-08-25 2020-12-11 西安理工大学 Chinese character font style migration method based on deep learning
CN112767519A (en) * 2020-12-30 2021-05-07 电子科技大学 Controllable expression generation method combined with style migration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李鹰;徐蔚鸿;唐良荣;: "带参数聚合算子的模糊联想记忆网络", 控制理论与应用, no. 11, 15 November 2010 (2010-11-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958468A (en) * 2023-07-05 2023-10-27 中国科学院地理科学与资源研究所 Mountain snow environment simulation method and system based on SCycleGAN
CN118115862A (en) * 2024-04-30 2024-05-31 南京信息工程大学 Face image tampering anomaly detection method, device and medium
CN118115862B (en) * 2024-04-30 2024-07-05 南京信息工程大学 Face image tampering anomaly detection method, device and medium

Similar Documents

Publication Publication Date Title
Wang et al. Mixed transformer u-net for medical image segmentation
Lee et al. Monocular depth estimation using relative depth maps
Huang et al. Joint-sparse-blocks and low-rank representation for hyperspectral unmixing
Miao et al. Low-rank quaternion tensor completion for recovering color videos and images
Bishop et al. A hierarchical latent variable model for data visualization
Liu et al. TransUNet+: Redesigning the skip connection to enhance features in medical image segmentation
Wen et al. Image recovery via transform learning and low-rank modeling: The power of complementary regularizers
Luo et al. Bayesian MRI reconstruction with joint uncertainty estimation using diffusion models
Zhang et al. Adaptive importance learning for improving lightweight image super-resolution network
CN114037600A (en) New cycleGAN style migration network based on new attention mechanism
Zhang et al. Robust regularized singular value decomposition with application to mortality data
Wang et al. General solution to reduce the point spread function effect in subpixel mapping
Guo et al. Deep attentive wasserstein generative adversarial networks for MRI reconstruction with recurrent context-awareness
Ma et al. A super-resolution convolutional-neural-network-based approach for subpixel mapping of hyperspectral images
CN115760814A (en) Remote sensing image fusion method and system based on double-coupling deep neural network
Paul et al. Dimensionality reduction of hyperspectral image using signal entropy and spatial information in genetic algorithm with discrete wavelet transformation
Zhang et al. Graph convolutional networks-based super-resolution land cover mapping
Shao et al. Iviu-net: Implicit variable iterative unrolling network for hyperspectral sparse unmixing
CN114612589A (en) Application of stable generation countermeasure network in style migration based on attention mechanism
Cao et al. Unsupervised multi-level spatio-spectral fusion transformer for hyperspectral image super-resolution
Liu et al. Sparse and dense hybrid representation via subspace modeling for dynamic MRI
Guo et al. Hypercomplex low rank reconstruction for nmr spectroscopy
Li et al. Enhanced transformer encoder and hybrid cascaded upsampler for medical image segmentation
Sipilä et al. Nonlinear blind source separation exploiting spatial nonstationarity
Cao et al. Hierarchical neural architecture search with adaptive global–local feature learning for Magnetic Resonance Image reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination