CN114037600A - New cycleGAN style migration network based on new attention mechanism - Google Patents
New cycleGAN style migration network based on new attention mechanism Download PDFInfo
- Publication number
- CN114037600A CN114037600A CN202111180291.3A CN202111180291A CN114037600A CN 114037600 A CN114037600 A CN 114037600A CN 202111180291 A CN202111180291 A CN 202111180291A CN 114037600 A CN114037600 A CN 114037600A
- Authority
- CN
- China
- Prior art keywords
- new
- image
- cyclegan
- attention mechanism
- conv
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013508 migration Methods 0.000 title claims abstract description 56
- 230000005012 migration Effects 0.000 title claims abstract description 56
- 230000007246 mechanism Effects 0.000 title claims abstract description 32
- 230000000694 effects Effects 0.000 claims abstract description 19
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 239000000284 extract Substances 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 39
- 239000013598 vector Substances 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 230000007423 decrease Effects 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 101100272279 Beauveria bassiana Beas gene Proteins 0.000 claims 1
- 230000008034 disappearance Effects 0.000 claims 1
- 238000004880 explosion Methods 0.000 claims 1
- 230000000644 propagated effect Effects 0.000 claims 1
- 238000013507 mapping Methods 0.000 abstract description 20
- 238000000034 method Methods 0.000 abstract description 17
- 238000002474 experimental method Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 241000283070 Equus zebra Species 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 6
- 230000006872 improvement Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 241000220225 Malus Species 0.000 description 5
- 238000007792 addition Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 4
- 241000283086 Equidae Species 0.000 description 2
- 235000021016 apples Nutrition 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 244000291564 Allium cepa Species 0.000 description 1
- 235000002732 Allium cepa var. cepa Nutrition 0.000 description 1
- 241001522296 Erithacus rubecula Species 0.000 description 1
- GMBQZIIUCVWOCD-WWASVFFGSA-N Sarsapogenine Chemical compound O([C@@H]1[C@@H]([C@]2(CC[C@@H]3[C@@]4(C)CC[C@H](O)C[C@H]4CC[C@H]3[C@@H]2C1)C)[C@@H]1C)[C@]11CC[C@H](C)CO1 GMBQZIIUCVWOCD-WWASVFFGSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The Image-to-Image is used for mapping the features in the source domain Image into the target domain Image so as to enable the Image in the target domain to obtain the style of the source domain Image. However, most network models that perform Image-to-Image cannot extract features in the source domain well, resulting in unsatisfactory migration. In recent years, many improved methods exist, but the problem of poor migration effect still exists, and as the network model deepens, the problem that too much video memory is consumed due to too many parameters is more and more obvious, and people hope to improve the image migration effect while reducing the parameters. Therefore, an Attention-CyceGAN model is proposed, which is based on a new lightweight Attention mechanism and can improve the image migration effect while using less parameters. Experiments show that compared with the Attention-CycleGAN model, the parameter quantity is reduced by 13.8M, the consumed video memory is reduced by 6.9G, and the model has better performance in PSNR, SSIM, LPIPS and FID index evaluation.
Description
Technical Field
The invention relates to the field of style migration, in particular to a new cycleGAN style migration network based on a new attention mechanism.
Background
I2IT is a type of visual and graphical problem whose goal is to learn the mapping between input images to output images through a training set. Many computer vision problems can be regarded as an I2IT problem, for example, image hyper-resolution can be regarded as a process of migrating from low resolution to high resolution, image coloring can be regarded as a process of mapping features of gray images to color images, image synthesis can be regarded as a process of mapping label features to real pictures, and I2IT has wide application in domain adaptation and data enhancement.
The study of I2IT has been rapidly developing since the use of CGAN as a general solution to the problem of I2IT was proposed by Isola et al. CGAN-based I2IT learns the mapping from input images to output images while also learning the loss function to train such a mapping, making the model very effective in the face of the task of highly structured graphical output. However, the following problems still exist with this method: the resulting image quality is not high and the input images need to be matched, which is very difficult for the I2IT task, which has difficulty acquiring a matching data set. Therefore, in recent years, people have been improved from different angles, and the effect is continuously improved, and the improvement angles are mainly divided into the following three aspects: loss function based improvements, hidden space based improvements, and network model based improvements.
Zhu et al, propose CycleGAN, solve the problem that I2I can only use pairwise datasets by using cyclic consistency loss, while learning the source domain to target domain mapping, also learning the target domain to source domain reverse mapping. Likewise, DualGAN and DiscogAN are I2IT that use unsupervised learning for non-matching datasets, except that the loss functions used are different. Although feature mapping between unpaired datasets can be achieved using a round robin consistency penalty, there are also some problems. CUT considers that the usage cycle consistency loss is too strict, and therefore introduces comparative learning between unpaired images, employing patch-based multi-layer patch loss to maximize mutual information between corresponding blobs of input and output images. DCLGAN considers that one embedding in the CUT is used for two different domains, which may not effectively capture the domain gap, limiting performance. Therefore, the DCLGAN further utilizes contrast learning by using a double learning method to avoid the defect of cycle consistency. The above researches are improved based on a loss function, and the problems of data set mismatching of I2IT and the defect of cycle consistency loss are solved, but the improvement space still exists for image feature extraction in the migration process.
In addition, the improvement based on the hidden space also enables the effect of I2IT to be further improved. The UNIT is an I2IT network proposed based on CoGAN and a shared hidden space assumption that a pair of images from different domains can be mapped to the same hidden representation. And some I2IT methods assume that the latent space of an image can be decomposed into a content space and a style space, which allows the generation of a multi-modal output. Huang et al propose a multi-modal unsupervised I2IT framework (MUNIT) with two potential representations, respectively style and content, whose content codes are combined with different style representations sampled from the target domain in order to translate one image to another domain. Also, Lee et al introduced a diversified image-to-image translation (DRIT) based on separate characterizations on unpaired data that decomposed the potential space into two spaces: one is a content space where the domain of shared information is unchanged, and the other is a domain-specific attribute space where different outputs are generated under the same content, so that different changes of the same content are modeled.
The promotion of the I2IT task can also be achieved by improving the network. StarGAN proposes a novel and extensible method, which can translate images in multiple fields into images by using only one model, and solves the problem that different models need to be independently established for each pair of image fields when conversion is carried out between more than two fields. The StarGANV2 uses a single generator to generate different images of multiple fields on the basis of StarGAN, so that the images of one field are translated into different images of a target field, and conversion of multiple target fields is supported. NICE-GAN proposes a new I2IT network, reuses the discriminators for coding, and develops a decoupling paradigm for efficient training. Compared with previous networks, NICE-GAN does not need a separate coding component, and this plug-in encoder is trained directly by adversary's loss, and is more efficient if a multi-scale discriminator is applied, with a larger information volume. The method for improving the network model is improved in migration effect, but the model is gradually increased along with gradual development, so that the parameter quantity and the floating point operation quantity of the model are gradually increased.
The task of I2IT can be further refined by further extracting features in the network. The attention mechanism can further extract the features in the feature map, so that better migration effect can be achieved by applying the attention mechanism to I2 IT. For example, Attention-GAN, SPA-GAN, AGGAN, by applying Attention to I2IT, improved migration performance was achieved.
In summary, although the I2IT task has achieved significant results in both single-domain migration and multi-domain migration through different improved approaches, some problems still remain. Firstly, the migration effect is to be further improved, and secondly, the parameter amount of the model is too much, and the training time is too long. Therefore, the invention is a new attention mechanism and is applied to the I2IT task, and the high-level characteristics of the source domain image are extracted, so that the migration effect of the network is improved. And the generator and the discriminator are optimized, the number of parameters is reduced while the migration is kept, and the training time is shortened.
Disclosure of Invention
The invention aims to solve the problems of poor migration effect and excessive model parameter quantity in the field of style migration, and provides a new cycleGAN style migration network based on a new attention mechanism.
The purpose of the invention can be realized by the following technical scheme:
a new cycleGAN style migration network based on a new attention mechanism, comprising the steps of:
1) selecting a data set for migration from the style migration official data set to realize the migration of two domains in the data set;
2) inputting the sample data set of the first domain into a first generator to generate a second domain image after migration;
3) transmitting the generated second domain image into a discriminator to obtain a discrimination result, and calculating to obtain the countermeasure loss of the first domain migration;
4) inputting the generated second domain image into a second generator, generating an image of the first domain, and calculating the cycle consistency loss of the first domain;
5) the same steps are carried out on the second domain image, and the confrontation loss of the second domain and the cycle consistency loss of the second domain are calculated;
6) adding all losses to obtain a total loss;
7) fixing the parameters of the discriminator, and performing back propagation and parameter updating on the generator while not performing gradient descent;
8) allowing the gradient of the discriminator to be reduced, and performing backward propagation and parameter updating;
9) optimizing the model through continuous iteration to finally obtain a trained model;
10) inputting the test set image into the trained model to obtain a test result;
11) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time for the test result, and measuring and outputting the result.
In the step 3), the countermeasure loss is obtained by specifically calculating as follows:
for the mapping function G: x → Y and discriminator DYThe challenge loss of (a) is expressed as follows:
for the mapping function F: y → X and discriminator DXThe challenge loss of (a) is expressed as follows:
g in the formula (1) represents a generator of domain X → domain Y, DYA discriminator representing a Y domain;
f in equation (2) represents the generator of domain Y → domain X, DXA discriminator representing an X domain;
in the formulas (1) and (2),andrespectively representing training examples of an X domain and a Y domain;
in the step 4), the cycle consistency loss is obtained by specifically calculating as follows:
in the step 6), the total loss is obtained by specifically calculating as follows:
L(G,F,Dx,Dy)=LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)+λLcyc(G,F) (4)
in formula (4), λ represents the correlation between two domains;
in the step 9), the model is optimized, and the specific optimization objective is as follows:
in the formula (5), G*And F*A generator under the optimal condition;
the invention has the beneficial effects that:
the invention provides a new cycleGAN style migration network based on a new attention mechanism based on style migration problems, and the network improves the migration effect of images while reducing the number of model parameters.
Secondly, the invention adopts a new attention mechanism to carry out the second time of feature extraction on the input features.
And thirdly, the original cycleGAN model generator is optimized, and the generator parameter quantity is reduced.
And fourthly, the invention adopts a new discriminator to improve the discrimination result of the migration effect of the generator, thereby promoting the optimization of the model.
Drawings
FIG. 1 is a diagram of an example generator network model of the present invention.
FIG. 2 is a diagram of a new attention mechanism network model used in an example of the present invention.
FIG. 3 is a diagram of an example arbiter network model of the present invention.
FIG. 4 is a graph comparing the effect of an example of the invention on the horses2zebra dataset with the effect of cycleGAN, AGGAN.
FIG. 5 is a graph comparing the effect of the example of the present invention on the applet 2orange dataset with the effect of cycleGAN, AGGAN.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
A new CycleGAN style migration network based on a new attention mechanism extracts images of two domains from images of a data set, and adopts a generator and a discriminator added with the new attention mechanism to extract the characteristics of input images and train the input images so as to realize the migration between the two domains. The network model of the generator is shown in fig. 1, the network model of the new attention mechanism is shown in fig. 2, and the network model of the new arbiter is shown in fig. 3. The method comprises the following specific steps:
1. selection of data sets
We selected horse2zebra and apple2orange in the style migration official dataset, where the horse2zebra training set includes 1067 horse images, 1334 zebra images, and the test set includes 120 horse images, 140 zebra images. The applet 2orange training set included 995 apple images, 1019 orange images, and the test set included 266 apple images, 248 orange images.
2. New generator extracts input image features and migrates
The use of 3 residual blocks in the new generator reduces the number of model parameters. To further extract features of the images in the generator, a new attention mechanism is used for feature re-extraction, and a network model diagram is shown in fig. 2. The new attention mechanism extraction feature diagram comprises the following specific implementation steps:
assuming that the size of the input feature map of the attention mechanism is H × W × C, the mathematical formula is expressed as:
s.t.i=1,2...,H;j=1,2...,W;k=1,2...,c.
in the formula (6), FinIs an input feature map, fijkIs FinThe value at the point of the kth channel in the ith row and jth column.
The first step is as follows: inputting a feature map FinMapping into a two-dimensional matrix to obtain SC×HW×1=(skhp)C×HW×1The mathematical formula is expressed as:
s.t.h=1,2,...,HW;k=1,2,...,C;p=1.
in the formula (7), 5 denotes a two-dimensional matrix after mapping, skhpIs the value at the point of the p channel in the kth row and the h column of S,is shown asGo to the firstMapping the k channel in the column to obtain skhpAnd represents multiplication of two numbers.
The second step is that: row vector convolution kernel using one undetermined parameterAnd GmnpMultiplying the column vectors to form a matrix DH×W×1=(dijp)H×W×1The mathematical formula is expressed as:
Gmnp=[(f(i-1)(j-1)k)C’(f(i-1)jk)C’(f(i-1)(j+1)k)C’(fi(j-1)k)C’(fijk)C’(fi(j+1)k)C’(f(i+1)(j-1)k)C’(f(i+1)jk)C’(f(i+1)(j+1)k)C]T (9)
s.t.i=1,2,...,H;j=1,2,...,W;k=1,2,...,C;m=1,2,...,3·3·C;n=1;p=1.
in the formula (8), wnmpIs Conv0The value at the point of the p-th channel in the n-th row and m-th column. GmnpThe value at the point of the p-th channel in the m-th row and n-th column in the input feature map is represented by formula (9). D represents the matrix obtained after convolution, DijpIs the value at the point of the p-th lane in the ith row and jth column of D, represents a multiplication of two numbers, and T represents a matrix transposition.
The third step: mapping the feature matrix D into a column vector to obtain D'HW×1×1=(d′hnp)HW×1×1The mathematical formula is expressed as:
s.t.h=1,2,...,HW;n=1;p=1.
in the formula (10), D ' represents a column vector D ' obtained by mapping 'hnpIs the value at the point of the p channel in the nth column of the h row of D',is shown asGo to the firstThe p-th channel in the column is mapped to d'hnpAnd represents multiplication of two numbers.
The fourth step: for each element D 'in the feature vector D'hnpPerforming softmax transformation to obtain D ″)HW×1×1=(d″hnp)HW×1×1The mathematical formula is expressed as:
s.t.h=1,2,...,HW;t=1,2,...,HW;n=1;p=1.
in the formula (11), D 'represents a feature vector after softmax conversion, D'hnpIs the value at the point of the p channel in row h and column n of D ".
The fifth step: matrix D and matrix D' are subjected to matrix multiplication to obtain matrix UC×1×1=(uknp)C×1×1The mathematical formula is expressed as:
s.t.k=1,2,...,C;n=1;p=1.
in the formula (12), U represents the result of multiplying the matrix S by the matrix D ″, UknpRepresenting the value at the point of the p-th channel in the kth row and the nth column,representing a matrix multiplication.
And a sixth step: construction of C convolution kernels Conv containing C undetermined parameters1,Conv2,...,ConvC(Note: Co-C)2A parameter) Separately convolved with U, i.e.Generating C numbers and forming a column vector VC×1×1=(vknp)C×1×1The mathematical formula is expressed as:
s.t.k=1,2,...,C.
in the formula (13), V represents a feature vector obtained by convolution, VknpIs the value at a point in the p channel in the k row and n column of V, ConvkThe k-th convolution kernel is represented by,representing a matrix multiplication.
The seventh step: LayerNorm transformation of each element in V to obtain V'C×1×1=(v′knp)C×1×1The mathematical formula is expressed as:
s.t.k=1,2,...,C;n=1;p=1.
in the formula (14), V ' represents a feature vector V ' after LayerNorm conversion 'knpIs the value at a point in the p channel in the kth row and nth column in V'. Gamma represents a weight coefficient for the weight of the image,represents the mean value of the feature vector V',represents the variance of the feature vector V', e represents a very small value, β represents the bias coefficient, and · represents the multiplication of two numbers.
Eighth step: ReLU transformation for each element in VObtaining VC×1×1=(v″knp)C×1×1The mathematical formula is expressed as:
s.t.k=1,2,...,C;n=1;p=1.
in the formula (15), V "represents a feature vector after ReLU transformation, and V ″knpIs the value at a point in the p channel in the kth row, n column of V ".
The ninth step: constructing C convolution kernels Conv 'containing C undetermined parameters'1,Conv′2,...,Conv′C(note: consensus C)2One parameter) are convolved with V', respectively, i.e.Generating C numbers and forming a column vector QC×1×1=(qknp)C×1×1The mathematical formula is expressed as:
s.t.k=1,2,...,C;n=1;p=1.
in the formula (16), Q represents a feature vector after the convolution operation, and QknpIs the value at a point in the p channel in the kth column of line k of Q, Conv'kThe k-th convolution kernel is represented by,representing a matrix multiplication.
The tenth step: adding the feature vector U to the feature vector Q to obtain Q'C×1×1=(q′knp)C×1×1The mathematical formula is expressed as:
q′knp=qknp+uknp (17)
s.t.k=1,2,...,C;n=1;p=1.
formula (A), (B) and17) wherein Q 'represents the result of addition of feature vectors, Q'knpIs the value at a point in the p channel in the kth row and nth column of Q'.
The eleventh step: handlebar H W (Q')TThe vectors are sequentially arranged to form a three-dimensional matrix, and Q ″' is obtainedH×W×C=(q″ijk)H×W×CThe mathematical formula is expressed as:
q″ijk=(q′knp)T (18)
s.t.i=1,2,...,H;j=1,2,...,W;k=1,2,...,C;n=1;p=1.
in the formula (18), Q 'represents a three-dimensional matrix after sequential arrangement, Q'ijkIs the value at a point in the kth lane in the ith row and jth column of Q "and T represents the matrix transpose.
The twelfth step: the three-dimensional matrix characteristic diagram Q' is compared with the input characteristic diagram FinAdding corresponding elements to obtain an output characteristic diagram The mathematical formula is expressed as:
oijk=q″ijk+fijk (19)
s.t.i=1,2,...,H;j=1,2,...,W;k=1,2,...,C.
in the formula (19), FoutRepresents the result of the addition of two profiles, oijkIs FoutThe value at a point in the kth channel in the ith row and jth column.
The general formula: will be provided withAs an input feature map for the attention mechanism,as an output feature map of the attention mechanism, the total mathematical formula is expressed as:
s.t.h=1,2,...,HW;h′=1,2,...,HW.
s.t.k=1,2,...,C.
Wv2is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv2=[Conv′1,Conv′2,...,Conv′C]TOf Conv'kIs the kth convolution kernel containing C pending parameters.
ReLU denotes the ReLU activation function, i.e.Wherein A represents a two-dimensional matrix, aijIs a value in A.
LN denotes LayerNorm normalization, i.e.Wherein B is a two-dimensional matrix, BijIs a value in the B matrix, mu is the mean value of the B matrix, sigma2Is the variance of the B matrix, e is a small number, γ is the weighting factor, and β is the bias factor.
Wv1Is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv1=[Conv1,Conv2,...,ConvC]TWherein ConvkIs the kth convolution kernel containing C pending parameters.
s.t.h″=1,2,...,HW.
In the formula (21),Conv0Is a row vector convolution kernel of the parameters to be determined,representing a matrix multiplication.
3. The discriminator discriminates the migration result of the generator
In order to improve the discrimination accuracy of the discriminator, the idea of residual error is adopted in the discriminator, so that the image realizes further feature extraction on the basis of the original feature of the feature map, and a network model map of the discriminator refers to fig. 2.
4. Mathematical commonality analysis of models
Taking CycleGAN as a reference, carrying out mathematical universality analysis on the network model, wherein the analysis steps are as follows:
(1) input image size commonality analysis
Assuming that the size of the input image is H × W × C, the image size is changed as shown in fig. 1, and although upsampling and downsampling are performed, the size of the image is finally restored to H × W × C. Our model is therefore valid for arbitrary size input image migration.
(2) Calculated volume commonality analysis
Assume that the input image size of an arbitrary block in our model is H × W × C, and the output image size is H ' × W ' × C '. Then all kinds of blocks in the network structure and the calculated cost of each block are referred to table 1. Wherein 5×、5÷、S+、S-、S>,<,≥,≤,==,≠、S=Respectively the number of multiply, divide, add, subtract, compare and assign operations. And k is the size of the convolution kernel in the convolution operation. convolution represents convolution operation, ReLU, leak _ ReLU, tanh and softmax represent corresponding activation functions, addressing represents matrix addition, mapping represents matrix mapping, and multiplexing represents two-dimensional matrix multiplication, so that C is 1, and arraging represents sequential arrangement of vectors into a three-dimensional tensor.
Tan h was subjected to a second order taylor expansion, expressed as:
in equation (22), after one tanh transformation is performed, each value needs to undergo two multiplications, one division, and one subtraction.
Second order taylor expansion is performed on softmax, expressed as:
in equation (23), HW is a constant, and a matrix is calculated only once when s0ftmax is performed, and is ignored. Therefore, after a softmax transformation, each value is subjected to HW additions and division.
The calculation amount of the model is measured through two modes, the first mode only considers the complex operation of multiplication and division, and the second mode not only considers the complex operation of multiplication and division, but also considers the simple operation of addition, subtraction, comparison and assignment. Let us assume that the input image size is H × W × C, where H, W, C are each equal to or greater than 1. For a CycleGAN network, the computational load of only complex operations is:
SCycleGAN=(6532C+897720.0156)HW
not only considering the complex operation of the CycleGAN network, but also considering the simple operation, the calculated amount is as follows:
for our model, the computational complexity of the complex operation is:
SAttention-CycleGAN=(6340C+462198.2656)HW+271488
considering both complex and simple operations, the calculation amount is:
we analyzed the amount of computation we model reduces over cycleGAN from considering both complex and simple computations, respectively.
When only the case of complex operations is considered, the reduced amount of computation is:
as can be seen from equation (24), τ increases as HW increases and C decreases. Conversely, as HW decreases, C increases and τ decreases. However, no matter how much HW and C are taken, when only complex operation is considered, the reduction of the model is more than 2.94 percent of calculation amount and less than 48.18 percent compared with the CycleGAN model.
When both complex and simple calculations are considered, the reduced amount of calculations is:
as can be seen from the formula (25),with the change of HW and C, the change trend is the same as tau. However, no matter how much HW and C are taken, when the complex operation and the simple operation are considered, the reduction of the model is more than 2.94 percent of calculation amount and less than 48.15 percent compared with the CycleGAN model.
TABLE 1 model calculation cost
(3) Calculated volume commonality analysis
We counted the parameters of CycleGAN and our model, and the results were as follows:
PCycleGAN=(7296C+14120960)*2
PAttention-CycleGAN=(6528C+7216771)*2
the amount of parameters that we model reduced compared to CycleGAN was:
P=PCycleGAN-PAttention-CycleGAN=1536C+13808378≥13809914 (26)
as can be seen from equation (26), P increases with increasing C. But regardless of the image input size, our model is reduced by at least a parameter amount of 13.8M compared to CycleGAN.
5. Training model
The total loss was calculated and back-propagated, and the trained model was obtained after 200 iterations in the experimental environment of table 2.
Table 2 experimental environment table
Name(s) | Configuration of |
Operating system | Ubuntu18.04 |
GPU | NVIDIA GEFORCE RTX 2080Ti |
CPU | Inter xeon processor(skylake,IBRS),2 |
RAM | 16GB |
GPU correlation library | CUDA10.2,CUDNN7.6 |
Deep learning framework | Pytorch |
6. Test model
And inputting the test set images in the data set into a generator of the trained model to obtain a test result. And testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.
7. Description of evaluation index
We evaluated our test results using PSNR, SSIM, LPIPS, FID evaluation indices.
PSNR is called peak signal-to-noise ratio and is defined by Mean Square Error (MSE), two images G and noise images H with the same size of m multiplied by n are given, and the definition of the Mean Square Error (MSE) and the PSNR is as follows:
in equations (27) and (28), MAX is the maximum possible pixel value of the picture, MSE represents the mean square error of the current image G and the reference image H, and m and n are the height and width of the image, respectively.
SSIM is an index for measuring the similarity of two images. The contrast is measured based on the brightness (luminance), contrast (contrast) and structure (structure) between samples x and y. The more similar the two groups of pictures, the higher the value of SSIM.
LPIPS is a new judgment index simulating human perception, the calculation of the index depends on a VGG network, and deep features of different structures and tasks in the picture are extracted through the network.
The FID is a measure of the similarity between two images from the statistical aspect of computer vision features of the original image, and is a measure of the distance between the feature vectors of the real image and the generated image.
8. Description of comparative model
Attention-CycleGAN was compared to CycleGAN and AGGAN.
CycleGAN is an I2IT for unpaired data, using a cyclic consistency penalty to learn not only the source domain to target domain mapping, but also the reverse mapping from target domain to source domain.
AGGAN addresses the problem of conventional unsupervised I2IT techniques that have difficulty focusing attention on a single object without changing the background or multiple objects in the picture by introducing an unsupervised attention mechanism for antagonism training in conjunction with the generator and the discriminator.
9. Description of qualitative comparison
The results of testing of CycleGAN, AGGAN and the model Attention-CycleGAN proposed by the present invention in the horse2zebra dataset are shown in fig. 4. In the process of moving horses to zebras, artifacts of different degrees appear in the conversion results of the first-row cycleGAN and the AGGAN, and the orientation-cycleGAN can avoid the artifacts. In the second line, the CycleGAN does not completely transfer the horse into the zebra, the transfer result of the AGGAN has too many artifacts and is fuzzy, and the Attention-CycleGAN ensures the definition of the image while finishing the transfer. In the process of migrating zebra to horse, the migration result of the third row, namely, excessive migration of different degrees of the CycleGAN and the AGGAN, causes the information loss of the horse, and the Attention-CycleGAN avoids the problem of excessive migration. In the fourth row, the migration of CycleGAN and AGGAN is incomplete, and speckle features still exist, while the anchoring-CycleGAN migration effect is significantly better than both.
The results of testing of cycleGAN, AGGAN and the model Attention-cycleGAN proposed by the present invention in the applet 2orange dataset are shown in FIG. 5. In the process of transferring the apples to the oranges, hands of the CycleGANs and the AGGANs are damaged to different degrees, and part of contents are still not transferred, but the Attention-CycleGANs realize characteristic transfer of the apples to the oranges while hands are not changed. In the second line, CycleGAN does not implement image migration, while AGGAN changes the color of the onion in the background, and Attention-CycleGAN implements image migration while not changing the background. During the orange to apple migration, artifacts appear in both CycleGAN and AGAGN in the third row. In the fourth row, the color migration of CycleGAN is incomplete and AGAGN has artifacts in the lower left corner of the picture. Compared with the two methods, the Attention-cycle GAN realizes the image migration while ensuring that no artifact occurs.
10. Description of quantitative comparison
The model Attention-CycleGAN provided by the invention is compared with two models, namely CycleGAN and AGGAN. And obtaining a test result through a testing set of the horse2zebra and maps data sets, evaluating through PSNR, SSIM, LPIPS and FID indexes, and taking an average value. The results of evaluating the index on the horse2zebra dataset are shown in table 3, and the results of evaluating the index on the applet 2orange dataset are shown in table 4. Through comparison of three indexes, the model Attention-cycle GAN provided by the invention improves the image migration effect.
TABLE 3 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, Attention-cycleGAN in horse- > zebra data sets
TABLE 4 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, Attention-cycleGAN in applet- > orange dataset
The parameter quantities and running times of the three models are counted on two data sets, the consumption results on the horse2zebra data set are shown in table 6, and the consumption results on the applet 2orange data set are shown in table 7.
TABLE 6 parameter, spent video memory, and run time conditions in the horse- > zebra dataset for cycleGAN, AGGAN, Attention-cycleGAN
TABLE 7 quantity, consumed video memory, and run time conditions in Apple2orange dataset for cycleGAN, AGGAN, Attention-cycleGAN
The above embodiments describe in detail a specific implementation of a new CycleGAN style migration network based on a new attention mechanism, and the above embodiments are only used to help understand the proposed method and core idea of the present invention.
Claims (4)
1. A new cycleGAN style migration network based on a new attention mechanism, comprising the steps of:
(1) selecting a data set for migration from the style migration official data set;
(2) forward propagation: inputting the sample data sets of the two domains into a new generator, and performing convolution, a new attention mechanism, residual errors and reverse inversion convolution operations to obtain a migrated generated image;
(3) and (3) back propagation: first, the parameters of the new discriminator are fixed so as not to perform gradient descent, and the new generator is reversely propagated and updated with the parameters. Then, allowing the gradient of the new discriminator to decrease, and performing back propagation and parameter updating;
(4) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.
2. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,
new attention mechanism for new generator in step (2), assuming new attention mechanism will beAs an input feature map for the attention mechanism,as an output feature map of the attention mechanism, a final output feature map may be regarded as a result of performing a new feature extraction on the input features. The mathematical formula is expressed as:
s.t.h=1,2,…,HW;h′=1,2,…,HW.
wherein,
s.t.k=1,2,…,C.
Wv2is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv2=[Conv′1,Conv′2,…,Conv′C]TOf Conv'kIs the kth convolution kernel containing C pending parameters.
ReLU denotes the ReLU activation function, i.e.Wherein A represents a two-dimensional matrix, aijIs a value in A.
LN denotes LayerNorm normalization, i.e.Wherein B is a two-dimensional matrix, BijIs a value in the B matrix, mu is the mean value of the B matrix, sigma2Is the variance of the B matrix, e is a small number, γ is the weighting factor, and β is the bias factor.
Wv1Is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. Wv1=[Conv1,Conv2,…,ConvC]TWherein ConvkIs the kth convolution kernel containing C pending parameters.
s.t.h″=1,2,…,HW.
3. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,
in the step (2), the new generator can perform better level extraction on the image features after using a new attention mechanism, so that good migration effect can be obtained while reducing the parameter quantity by using 3 residual blocks, and the problems of excessive parameter quantity and gradient explosion caused by using 9 residual blocks are avoided.
4. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,
and (4) the new discriminator in the step (3) adopts a residual error idea in convolution, so that the feature extraction can be carried out on the basis of the original feature map of the image, and the problem that the discriminator cannot well extract the image features due to the disappearance of the gradient caused by the over-deep network is prevented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111180291.3A CN114037600A (en) | 2021-10-11 | 2021-10-11 | New cycleGAN style migration network based on new attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111180291.3A CN114037600A (en) | 2021-10-11 | 2021-10-11 | New cycleGAN style migration network based on new attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114037600A true CN114037600A (en) | 2022-02-11 |
Family
ID=80141033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111180291.3A Pending CN114037600A (en) | 2021-10-11 | 2021-10-11 | New cycleGAN style migration network based on new attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114037600A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116958468A (en) * | 2023-07-05 | 2023-10-27 | 中国科学院地理科学与资源研究所 | Mountain snow environment simulation method and system based on SCycleGAN |
CN118115862A (en) * | 2024-04-30 | 2024-05-31 | 南京信息工程大学 | Face image tampering anomaly detection method, device and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696027A (en) * | 2020-05-20 | 2020-09-22 | 电子科技大学 | Multi-modal image style migration method based on adaptive attention mechanism |
CN112070658A (en) * | 2020-08-25 | 2020-12-11 | 西安理工大学 | Chinese character font style migration method based on deep learning |
CN112767519A (en) * | 2020-12-30 | 2021-05-07 | 电子科技大学 | Controllable expression generation method combined with style migration |
WO2021115159A1 (en) * | 2019-12-09 | 2021-06-17 | 中兴通讯股份有限公司 | Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor |
-
2021
- 2021-10-11 CN CN202111180291.3A patent/CN114037600A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021115159A1 (en) * | 2019-12-09 | 2021-06-17 | 中兴通讯股份有限公司 | Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor |
CN111696027A (en) * | 2020-05-20 | 2020-09-22 | 电子科技大学 | Multi-modal image style migration method based on adaptive attention mechanism |
CN112070658A (en) * | 2020-08-25 | 2020-12-11 | 西安理工大学 | Chinese character font style migration method based on deep learning |
CN112767519A (en) * | 2020-12-30 | 2021-05-07 | 电子科技大学 | Controllable expression generation method combined with style migration |
Non-Patent Citations (1)
Title |
---|
李鹰;徐蔚鸿;唐良荣;: "带参数聚合算子的模糊联想记忆网络", 控制理论与应用, no. 11, 15 November 2010 (2010-11-15) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116958468A (en) * | 2023-07-05 | 2023-10-27 | 中国科学院地理科学与资源研究所 | Mountain snow environment simulation method and system based on SCycleGAN |
CN118115862A (en) * | 2024-04-30 | 2024-05-31 | 南京信息工程大学 | Face image tampering anomaly detection method, device and medium |
CN118115862B (en) * | 2024-04-30 | 2024-07-05 | 南京信息工程大学 | Face image tampering anomaly detection method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Mixed transformer u-net for medical image segmentation | |
Lee et al. | Monocular depth estimation using relative depth maps | |
Huang et al. | Joint-sparse-blocks and low-rank representation for hyperspectral unmixing | |
Miao et al. | Low-rank quaternion tensor completion for recovering color videos and images | |
Bishop et al. | A hierarchical latent variable model for data visualization | |
Liu et al. | TransUNet+: Redesigning the skip connection to enhance features in medical image segmentation | |
Wen et al. | Image recovery via transform learning and low-rank modeling: The power of complementary regularizers | |
Luo et al. | Bayesian MRI reconstruction with joint uncertainty estimation using diffusion models | |
Zhang et al. | Adaptive importance learning for improving lightweight image super-resolution network | |
CN114037600A (en) | New cycleGAN style migration network based on new attention mechanism | |
Zhang et al. | Robust regularized singular value decomposition with application to mortality data | |
Wang et al. | General solution to reduce the point spread function effect in subpixel mapping | |
Guo et al. | Deep attentive wasserstein generative adversarial networks for MRI reconstruction with recurrent context-awareness | |
Ma et al. | A super-resolution convolutional-neural-network-based approach for subpixel mapping of hyperspectral images | |
CN115760814A (en) | Remote sensing image fusion method and system based on double-coupling deep neural network | |
Paul et al. | Dimensionality reduction of hyperspectral image using signal entropy and spatial information in genetic algorithm with discrete wavelet transformation | |
Zhang et al. | Graph convolutional networks-based super-resolution land cover mapping | |
Shao et al. | Iviu-net: Implicit variable iterative unrolling network for hyperspectral sparse unmixing | |
CN114612589A (en) | Application of stable generation countermeasure network in style migration based on attention mechanism | |
Cao et al. | Unsupervised multi-level spatio-spectral fusion transformer for hyperspectral image super-resolution | |
Liu et al. | Sparse and dense hybrid representation via subspace modeling for dynamic MRI | |
Guo et al. | Hypercomplex low rank reconstruction for nmr spectroscopy | |
Li et al. | Enhanced transformer encoder and hybrid cascaded upsampler for medical image segmentation | |
Sipilä et al. | Nonlinear blind source separation exploiting spatial nonstationarity | |
Cao et al. | Hierarchical neural architecture search with adaptive global–local feature learning for Magnetic Resonance Image reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |