CN114037600A

CN114037600A - New cycleGAN style migration network based on new attention mechanism

Info

Publication number: CN114037600A
Application number: CN202111180291.3A
Authority: CN
Inventors: 李庚隆; 徐蔚鸿; 胡雪梅
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2022-02-11

Abstract

The Image-to-Image is used for mapping the features in the source domain Image into the target domain Image so as to enable the Image in the target domain to obtain the style of the source domain Image. However, most network models that perform Image-to-Image cannot extract features in the source domain well, resulting in unsatisfactory migration. In recent years, many improved methods exist, but the problem of poor migration effect still exists, and as the network model deepens, the problem that too much video memory is consumed due to too many parameters is more and more obvious, and people hope to improve the image migration effect while reducing the parameters. Therefore, an Attention-CyceGAN model is proposed, which is based on a new lightweight Attention mechanism and can improve the image migration effect while using less parameters. Experiments show that compared with the Attention-CycleGAN model, the parameter quantity is reduced by 13.8M, the consumed video memory is reduced by 6.9G, and the model has better performance in PSNR, SSIM, LPIPS and FID index evaluation.

Description

New cycleGAN style migration network based on new attention mechanism

Technical Field

The invention relates to the field of style migration, in particular to a new cycleGAN style migration network based on a new attention mechanism.

Background

I2IT is a type of visual and graphical problem whose goal is to learn the mapping between input images to output images through a training set. Many computer vision problems can be regarded as an I2IT problem, for example, image hyper-resolution can be regarded as a process of migrating from low resolution to high resolution, image coloring can be regarded as a process of mapping features of gray images to color images, image synthesis can be regarded as a process of mapping label features to real pictures, and I2IT has wide application in domain adaptation and data enhancement.

The study of I2IT has been rapidly developing since the use of CGAN as a general solution to the problem of I2IT was proposed by Isola et al. CGAN-based I2IT learns the mapping from input images to output images while also learning the loss function to train such a mapping, making the model very effective in the face of the task of highly structured graphical output. However, the following problems still exist with this method: the resulting image quality is not high and the input images need to be matched, which is very difficult for the I2IT task, which has difficulty acquiring a matching data set. Therefore, in recent years, people have been improved from different angles, and the effect is continuously improved, and the improvement angles are mainly divided into the following three aspects: loss function based improvements, hidden space based improvements, and network model based improvements.

Zhu et al, propose CycleGAN, solve the problem that I2I can only use pairwise datasets by using cyclic consistency loss, while learning the source domain to target domain mapping, also learning the target domain to source domain reverse mapping. Likewise, DualGAN and DiscogAN are I2IT that use unsupervised learning for non-matching datasets, except that the loss functions used are different. Although feature mapping between unpaired datasets can be achieved using a round robin consistency penalty, there are also some problems. CUT considers that the usage cycle consistency loss is too strict, and therefore introduces comparative learning between unpaired images, employing patch-based multi-layer patch loss to maximize mutual information between corresponding blobs of input and output images. DCLGAN considers that one embedding in the CUT is used for two different domains, which may not effectively capture the domain gap, limiting performance. Therefore, the DCLGAN further utilizes contrast learning by using a double learning method to avoid the defect of cycle consistency. The above researches are improved based on a loss function, and the problems of data set mismatching of I2IT and the defect of cycle consistency loss are solved, but the improvement space still exists for image feature extraction in the migration process.

In addition, the improvement based on the hidden space also enables the effect of I2IT to be further improved. The UNIT is an I2IT network proposed based on CoGAN and a shared hidden space assumption that a pair of images from different domains can be mapped to the same hidden representation. And some I2IT methods assume that the latent space of an image can be decomposed into a content space and a style space, which allows the generation of a multi-modal output. Huang et al propose a multi-modal unsupervised I2IT framework (MUNIT) with two potential representations, respectively style and content, whose content codes are combined with different style representations sampled from the target domain in order to translate one image to another domain. Also, Lee et al introduced a diversified image-to-image translation (DRIT) based on separate characterizations on unpaired data that decomposed the potential space into two spaces: one is a content space where the domain of shared information is unchanged, and the other is a domain-specific attribute space where different outputs are generated under the same content, so that different changes of the same content are modeled.

The promotion of the I2IT task can also be achieved by improving the network. StarGAN proposes a novel and extensible method, which can translate images in multiple fields into images by using only one model, and solves the problem that different models need to be independently established for each pair of image fields when conversion is carried out between more than two fields. The StarGANV2 uses a single generator to generate different images of multiple fields on the basis of StarGAN, so that the images of one field are translated into different images of a target field, and conversion of multiple target fields is supported. NICE-GAN proposes a new I2IT network, reuses the discriminators for coding, and develops a decoupling paradigm for efficient training. Compared with previous networks, NICE-GAN does not need a separate coding component, and this plug-in encoder is trained directly by adversary's loss, and is more efficient if a multi-scale discriminator is applied, with a larger information volume. The method for improving the network model is improved in migration effect, but the model is gradually increased along with gradual development, so that the parameter quantity and the floating point operation quantity of the model are gradually increased.

The task of I2IT can be further refined by further extracting features in the network. The attention mechanism can further extract the features in the feature map, so that better migration effect can be achieved by applying the attention mechanism to I2 IT. For example, Attention-GAN, SPA-GAN, AGGAN, by applying Attention to I2IT, improved migration performance was achieved.

In summary, although the I2IT task has achieved significant results in both single-domain migration and multi-domain migration through different improved approaches, some problems still remain. Firstly, the migration effect is to be further improved, and secondly, the parameter amount of the model is too much, and the training time is too long. Therefore, the invention is a new attention mechanism and is applied to the I2IT task, and the high-level characteristics of the source domain image are extracted, so that the migration effect of the network is improved. And the generator and the discriminator are optimized, the number of parameters is reduced while the migration is kept, and the training time is shortened.

Disclosure of Invention

The invention aims to solve the problems of poor migration effect and excessive model parameter quantity in the field of style migration, and provides a new cycleGAN style migration network based on a new attention mechanism.

The purpose of the invention can be realized by the following technical scheme:

a new cycleGAN style migration network based on a new attention mechanism, comprising the steps of:

1) selecting a data set for migration from the style migration official data set to realize the migration of two domains in the data set;

2) inputting the sample data set of the first domain into a first generator to generate a second domain image after migration;

3) transmitting the generated second domain image into a discriminator to obtain a discrimination result, and calculating to obtain the countermeasure loss of the first domain migration;

4) inputting the generated second domain image into a second generator, generating an image of the first domain, and calculating the cycle consistency loss of the first domain;

5) the same steps are carried out on the second domain image, and the confrontation loss of the second domain and the cycle consistency loss of the second domain are calculated;

6) adding all losses to obtain a total loss;

7) fixing the parameters of the discriminator, and performing back propagation and parameter updating on the generator while not performing gradient descent;

8) allowing the gradient of the discriminator to be reduced, and performing backward propagation and parameter updating;

9) optimizing the model through continuous iteration to finally obtain a trained model;

10) inputting the test set image into the trained model to obtain a test result;

11) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time for the test result, and measuring and outputting the result.

In the step 3), the countermeasure loss is obtained by specifically calculating as follows:

for the mapping function G: x → Y and discriminator D_YThe challenge loss of (a) is expressed as follows:

for the mapping function F: y → X and discriminator D_XThe challenge loss of (a) is expressed as follows:

g in the formula (1) represents a generator of domain X → domain Y, D_YA discriminator representing a Y domain;

f in equation (2) represents the generator of domain Y → domain X, D_XA discriminator representing an X domain;

in the formulas (1) and (2),

and

respectively representing training examples of an X domain and a Y domain;

in the step 4), the cycle consistency loss is obtained by specifically calculating as follows:

in the step 6), the total loss is obtained by specifically calculating as follows:

L(G，F，D_x，D_y)＝L_GAN(G，D_Y，X，Y)+L_GAN(F，D_X，Y，X)+λL_cyc(G，F) (4)

in formula (4), λ represents the correlation between two domains;

in the step 9), the model is optimized, and the specific optimization objective is as follows:

in the formula (5), G^*And F^*A generator under the optimal condition;

the invention has the beneficial effects that:

the invention provides a new cycleGAN style migration network based on a new attention mechanism based on style migration problems, and the network improves the migration effect of images while reducing the number of model parameters.

Secondly, the invention adopts a new attention mechanism to carry out the second time of feature extraction on the input features.

And thirdly, the original cycleGAN model generator is optimized, and the generator parameter quantity is reduced.

And fourthly, the invention adopts a new discriminator to improve the discrimination result of the migration effect of the generator, thereby promoting the optimization of the model.

Drawings

FIG. 1 is a diagram of an example generator network model of the present invention.

FIG. 2 is a diagram of a new attention mechanism network model used in an example of the present invention.

FIG. 3 is a diagram of an example arbiter network model of the present invention.

FIG. 4 is a graph comparing the effect of an example of the invention on the horses2zebra dataset with the effect of cycleGAN, AGGAN.

FIG. 5 is a graph comparing the effect of the example of the present invention on the applet 2orange dataset with the effect of cycleGAN, AGGAN.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Examples

A new CycleGAN style migration network based on a new attention mechanism extracts images of two domains from images of a data set, and adopts a generator and a discriminator added with the new attention mechanism to extract the characteristics of input images and train the input images so as to realize the migration between the two domains. The network model of the generator is shown in fig. 1, the network model of the new attention mechanism is shown in fig. 2, and the network model of the new arbiter is shown in fig. 3. The method comprises the following specific steps:

1. selection of data sets

We selected horse2zebra and apple2orange in the style migration official dataset, where the horse2zebra training set includes 1067 horse images, 1334 zebra images, and the test set includes 120 horse images, 140 zebra images. The applet 2orange training set included 995 apple images, 1019 orange images, and the test set included 266 apple images, 248 orange images.

2. New generator extracts input image features and migrates

The use of 3 residual blocks in the new generator reduces the number of model parameters. To further extract features of the images in the generator, a new attention mechanism is used for feature re-extraction, and a network model diagram is shown in fig. 2. The new attention mechanism extraction feature diagram comprises the following specific implementation steps:

assuming that the size of the input feature map of the attention mechanism is H × W × C, the mathematical formula is expressed as:

s.t.i＝1,2...，H；j＝1,2...，W；k＝1,2...，c.

in the formula (6), F_inIs an input feature map, f_ijkIs F_inThe value at the point of the kth channel in the ith row and jth column.

The first step is as follows: inputting a feature map F_inMapping into a two-dimensional matrix to obtain S_C×HW×1＝(s_khp)_C×HW×1The mathematical formula is expressed as:

s.t.h＝1,2，...，HW；k＝1,2，...，C；p＝1.

in the formula (7), 5 denotes a two-dimensional matrix after mapping, s_khpIs the value at the point of the p channel in the kth row and the h column of S,

is shown as

Go to the first

Mapping the k channel in the column to obtain s_khpAnd represents multiplication of two numbers.

The second step is that: row vector convolution kernel using one undetermined parameter

And G_mnpMultiplying the column vectors to form a matrix D_H×W×1＝(d_ijp)_H×W×1The mathematical formula is expressed as:

G_mnp＝[(f_(i-1)(j-1)k)_C’(f_(i-1)jk)_C’(f_(i-1)(j+1)k)_C’(f_i(j-1)k)_C’(f_ijk)_C’(f_i(j+1)k)_C’(f_(i+1)(j-1)k)_C’(f_(i+1)jk)_C’(f_(i+1)(j+1)k)_C]^T (9)

s.t.i＝1,2，...，H；j＝1,2，...，W；k＝1,2，...，C；m＝1,2，...，3·3·C；n＝1；p＝1.

in the formula (8), w_nmpIs Conv₀The value at the point of the p-th channel in the n-th row and m-th column. G_mnpThe value at the point of the p-th channel in the m-th row and n-th column in the input feature map is represented by formula (9). D represents the matrix obtained after convolution, D_ijpIs the value at the point of the p-th lane in the ith row and jth column of D, represents a multiplication of two numbers, and T represents a matrix transposition.

The third step: mapping the feature matrix D into a column vector to obtain D'_HW×1×1＝(d′_hnp)_HW×1×1The mathematical formula is expressed as:

s.t.h＝1,2，...，HW；n＝1；p＝1.

in the formula (10), D ' represents a column vector D ' obtained by mapping '_hnpIs the value at the point of the p channel in the nth column of the h row of D',

is shown as

Go to the first

The p-th channel in the column is mapped to d'_hnpAnd represents multiplication of two numbers.

The fourth step: for each element D 'in the feature vector D'_hnpPerforming softmax transformation to obtain D ″)_HW×1×1＝(d″_hnp)_HW×1×1The mathematical formula is expressed as:

s.t.h＝1,2，...，HW；t＝1,2，...，HW；n＝1；p＝1.

in the formula (11), D 'represents a feature vector after softmax conversion, D'_hnpIs the value at the point of the p channel in row h and column n of D ".

The fifth step: matrix D and matrix D' are subjected to matrix multiplication to obtain matrix U_C×1×1＝(u_knp)_C×1×1The mathematical formula is expressed as:

s.t.k＝1,2，...，C；n＝1；p＝1.

in the formula (12), U represents the result of multiplying the matrix S by the matrix D ″, U_knpRepresenting the value at the point of the p-th channel in the kth row and the nth column,

representing a matrix multiplication.

And a sixth step: construction of C convolution kernels Conv containing C undetermined parameters₁，Conv₂，...，Conv_C(Note: Co-C)²A parameter) Separately convolved with U, i.e.

Generating C numbers and forming a column vector V_C×1×1＝(v_knp)_C×1×1The mathematical formula is expressed as:

s.t.k＝1，2，...，C.

in the formula (13), V represents a feature vector obtained by convolution, V_knpIs the value at a point in the p channel in the k row and n column of V, Conv_kThe k-th convolution kernel is represented by,

representing a matrix multiplication.

The seventh step: LayerNorm transformation of each element in V to obtain V'_C×1×1＝(v′_knp)_C×1×1The mathematical formula is expressed as:

s.t.k＝1，2，...，C；n＝1；p＝1.

in the formula (14), V ' represents a feature vector V ' after LayerNorm conversion '_knpIs the value at a point in the p channel in the kth row and nth column in V'. Gamma represents a weight coefficient for the weight of the image,

represents the mean value of the feature vector V',

represents the variance of the feature vector V', e represents a very small value, β represents the bias coefficient, and · represents the multiplication of two numbers.

Eighth step: ReLU transformation for each element in VObtaining V_C×1×1＝(v″_knp)_C×1×1The mathematical formula is expressed as:

s.t.k＝1，2，...，C；n＝1；p＝1.

in the formula (15), V "represents a feature vector after ReLU transformation, and V ″_knpIs the value at a point in the p channel in the kth row, n column of V ".

The ninth step: constructing C convolution kernels Conv 'containing C undetermined parameters'₁，Conv′₂，...，Conv′_C(note: consensus C)²One parameter) are convolved with V', respectively, i.e.

Generating C numbers and forming a column vector Q_C×1×1＝(q_knp)_C×1×1The mathematical formula is expressed as:

s.t.k＝1，2，...，C；n＝1；p＝1.

in the formula (16), Q represents a feature vector after the convolution operation, and Q_knpIs the value at a point in the p channel in the kth column of line k of Q, Conv'_kThe k-th convolution kernel is represented by,

representing a matrix multiplication.

The tenth step: adding the feature vector U to the feature vector Q to obtain Q'_C×1×1＝(q′_knp)_C×1×1The mathematical formula is expressed as:

q′_knp＝q_knp+u_knp (17)

s.t.k＝1，2，...，C；n＝1；p＝1.

formula (A), (B) and17) wherein Q 'represents the result of addition of feature vectors, Q'_knpIs the value at a point in the p channel in the kth row and nth column of Q'.

The eleventh step: handlebar H W (Q')^TThe vectors are sequentially arranged to form a three-dimensional matrix, and Q ″' is obtained_H×W×C＝(q″_ijk)_H×W×CThe mathematical formula is expressed as:

q″_ijk＝(q′_knp)^T (18)

s.t.i＝1，2，...，H；j＝1，2，...，W；k＝1，2，...，C；n＝1；p＝1.

in the formula (18), Q 'represents a three-dimensional matrix after sequential arrangement, Q'_ijkIs the value at a point in the kth lane in the ith row and jth column of Q "and T represents the matrix transpose.

The twelfth step: the three-dimensional matrix characteristic diagram Q' is compared with the input characteristic diagram F_inAdding corresponding elements to obtain an output characteristic diagram

The mathematical formula is expressed as:

o_ijk＝q″_ijk+f_ijk (19)

s.t.i＝1，2，...，H；j＝1，2，...，W；k＝1，2，...，C.

in the formula (19), F_outRepresents the result of the addition of two profiles, o_ijkIs F_outThe value at a point in the kth channel in the ith row and jth column.

The general formula: will be provided with

As an input feature map for the attention mechanism,

as an output feature map of the attention mechanism, the total mathematical formula is expressed as:

s.t.h＝1，2，...，HW；h′＝1，2，...，HW.

in the formula (20), the reaction mixture is,

s.t.k＝1，2，...，C.

W_v2is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. W_v2＝[Conv′₁，Conv′₂，...，Conv′_C]^TOf Conv'_kIs the kth convolution kernel containing C pending parameters.

ReLU denotes the ReLU activation function, i.e.

Wherein A represents a two-dimensional matrix, a_ijIs a value in A.

LN denotes LayerNorm normalization, i.e.

Wherein B is a two-dimensional matrix, B_ijIs a value in the B matrix, mu is the mean value of the B matrix, sigma²Is the variance of the B matrix, e is a small number, γ is the weighting factor, and β is the bias factor.

W_v1Is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. W_v1＝[Conv₁，Conv₂，...，Conv_C]^TWherein Conv_kIs the kth convolution kernel containing C pending parameters.

s.t.h″＝1，2，...，HW.

In the formula (21)，Conv₀Is a row vector convolution kernel of the parameters to be determined,

representing a matrix multiplication.

3. The discriminator discriminates the migration result of the generator

In order to improve the discrimination accuracy of the discriminator, the idea of residual error is adopted in the discriminator, so that the image realizes further feature extraction on the basis of the original feature of the feature map, and a network model map of the discriminator refers to fig. 2.

4. Mathematical commonality analysis of models

Taking CycleGAN as a reference, carrying out mathematical universality analysis on the network model, wherein the analysis steps are as follows:

(1) input image size commonality analysis

Assuming that the size of the input image is H × W × C, the image size is changed as shown in fig. 1, and although upsampling and downsampling are performed, the size of the image is finally restored to H × W × C. Our model is therefore valid for arbitrary size input image migration.

(2) Calculated volume commonality analysis

Assume that the input image size of an arbitrary block in our model is H × W × C, and the output image size is H ' × W ' × C '. Then all kinds of blocks in the network structure and the calculated cost of each block are referred to table 1. Wherein 5_×、5_÷、S₊、S_-、S_{＞，＜，≥，≤，＝＝，≠}、S_＝Respectively the number of multiply, divide, add, subtract, compare and assign operations. And k is the size of the convolution kernel in the convolution operation. convolution represents convolution operation, ReLU, leak _ ReLU, tanh and softmax represent corresponding activation functions, addressing represents matrix addition, mapping represents matrix mapping, and multiplexing represents two-dimensional matrix multiplication, so that C is 1, and arraging represents sequential arrangement of vectors into a three-dimensional tensor.

Tan h was subjected to a second order taylor expansion, expressed as:

in equation (22), after one tanh transformation is performed, each value needs to undergo two multiplications, one division, and one subtraction.

Second order taylor expansion is performed on softmax, expressed as:

in equation (23), HW is a constant, and a matrix is calculated only once when s0ftmax is performed, and is ignored. Therefore, after a softmax transformation, each value is subjected to HW additions and division.

The calculation amount of the model is measured through two modes, the first mode only considers the complex operation of multiplication and division, and the second mode not only considers the complex operation of multiplication and division, but also considers the simple operation of addition, subtraction, comparison and assignment. Let us assume that the input image size is H × W × C, where H, W, C are each equal to or greater than 1. For a CycleGAN network, the computational load of only complex operations is:

S_CycleGAN＝(6532C+897720.0156)HW

not only considering the complex operation of the CycleGAN network, but also considering the simple operation, the calculated amount is as follows:

for our model, the computational complexity of the complex operation is:

S_{Attention-CycleGAN}＝(6340C+462198.2656)HW+271488

considering both complex and simple operations, the calculation amount is:

we analyzed the amount of computation we model reduces over cycleGAN from considering both complex and simple computations, respectively.

When only the case of complex operations is considered, the reduced amount of computation is:

as can be seen from equation (24), τ increases as HW increases and C decreases. Conversely, as HW decreases, C increases and τ decreases. However, no matter how much HW and C are taken, when only complex operation is considered, the reduction of the model is more than 2.94 percent of calculation amount and less than 48.18 percent compared with the CycleGAN model.

When both complex and simple calculations are considered, the reduced amount of calculations is:

as can be seen from the formula (25),

with the change of HW and C, the change trend is the same as tau. However, no matter how much HW and C are taken, when the complex operation and the simple operation are considered, the reduction of the model is more than 2.94 percent of calculation amount and less than 48.15 percent compared with the CycleGAN model.

TABLE 1 model calculation cost

(3) Calculated volume commonality analysis

We counted the parameters of CycleGAN and our model, and the results were as follows:

P_CycleGAN＝(7296C+14120960)*2

P_{Attention-CycleGAN}＝(6528C+7216771)*2

the amount of parameters that we model reduced compared to CycleGAN was:

P＝P_CycleGAN-P_{Attention-CycleGAN}＝1536C+13808378≥13809914 (26)

as can be seen from equation (26), P increases with increasing C. But regardless of the image input size, our model is reduced by at least a parameter amount of 13.8M compared to CycleGAN.

5. Training model

The total loss was calculated and back-propagated, and the trained model was obtained after 200 iterations in the experimental environment of table 2.

Table 2 experimental environment table

Name(s)	Configuration of
		Operating system	Ubuntu18.04
GPU	NVIDIA GEFORCE RTX 2080Ti
		CPU	Inter xeon processor(skylake，IBRS)，2
RAM	16GB
		GPU correlation library	CUDA10.2，CUDNN7.6
Deep learning framework	Pytorch

6. Test model

And inputting the test set images in the data set into a generator of the trained model to obtain a test result. And testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.

7. Description of evaluation index

We evaluated our test results using PSNR, SSIM, LPIPS, FID evaluation indices.

PSNR is called peak signal-to-noise ratio and is defined by Mean Square Error (MSE), two images G and noise images H with the same size of m multiplied by n are given, and the definition of the Mean Square Error (MSE) and the PSNR is as follows:

in equations (27) and (28), MAX is the maximum possible pixel value of the picture, MSE represents the mean square error of the current image G and the reference image H, and m and n are the height and width of the image, respectively.

SSIM is an index for measuring the similarity of two images. The contrast is measured based on the brightness (luminance), contrast (contrast) and structure (structure) between samples x and y. The more similar the two groups of pictures, the higher the value of SSIM.

LPIPS is a new judgment index simulating human perception, the calculation of the index depends on a VGG network, and deep features of different structures and tasks in the picture are extracted through the network.

The FID is a measure of the similarity between two images from the statistical aspect of computer vision features of the original image, and is a measure of the distance between the feature vectors of the real image and the generated image.

8. Description of comparative model

Attention-CycleGAN was compared to CycleGAN and AGGAN.

CycleGAN is an I2IT for unpaired data, using a cyclic consistency penalty to learn not only the source domain to target domain mapping, but also the reverse mapping from target domain to source domain.

AGGAN addresses the problem of conventional unsupervised I2IT techniques that have difficulty focusing attention on a single object without changing the background or multiple objects in the picture by introducing an unsupervised attention mechanism for antagonism training in conjunction with the generator and the discriminator.

9. Description of qualitative comparison

The results of testing of CycleGAN, AGGAN and the model Attention-CycleGAN proposed by the present invention in the horse2zebra dataset are shown in fig. 4. In the process of moving horses to zebras, artifacts of different degrees appear in the conversion results of the first-row cycleGAN and the AGGAN, and the orientation-cycleGAN can avoid the artifacts. In the second line, the CycleGAN does not completely transfer the horse into the zebra, the transfer result of the AGGAN has too many artifacts and is fuzzy, and the Attention-CycleGAN ensures the definition of the image while finishing the transfer. In the process of migrating zebra to horse, the migration result of the third row, namely, excessive migration of different degrees of the CycleGAN and the AGGAN, causes the information loss of the horse, and the Attention-CycleGAN avoids the problem of excessive migration. In the fourth row, the migration of CycleGAN and AGGAN is incomplete, and speckle features still exist, while the anchoring-CycleGAN migration effect is significantly better than both.

The results of testing of cycleGAN, AGGAN and the model Attention-cycleGAN proposed by the present invention in the applet 2orange dataset are shown in FIG. 5. In the process of transferring the apples to the oranges, hands of the CycleGANs and the AGGANs are damaged to different degrees, and part of contents are still not transferred, but the Attention-CycleGANs realize characteristic transfer of the apples to the oranges while hands are not changed. In the second line, CycleGAN does not implement image migration, while AGGAN changes the color of the onion in the background, and Attention-CycleGAN implements image migration while not changing the background. During the orange to apple migration, artifacts appear in both CycleGAN and AGAGN in the third row. In the fourth row, the color migration of CycleGAN is incomplete and AGAGN has artifacts in the lower left corner of the picture. Compared with the two methods, the Attention-cycle GAN realizes the image migration while ensuring that no artifact occurs.

10. Description of quantitative comparison

The model Attention-CycleGAN provided by the invention is compared with two models, namely CycleGAN and AGGAN. And obtaining a test result through a testing set of the horse2zebra and maps data sets, evaluating through PSNR, SSIM, LPIPS and FID indexes, and taking an average value. The results of evaluating the index on the horse2zebra dataset are shown in table 3, and the results of evaluating the index on the applet 2orange dataset are shown in table 4. Through comparison of three indexes, the model Attention-cycle GAN provided by the invention improves the image migration effect.

TABLE 3 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, Attention-cycleGAN in horse- > zebra data sets

TABLE 4 average PSNR, average SSIM, average LPIPS, average FID of cycleGAN, AGGAN, Attention-cycleGAN in applet- > orange dataset

The parameter quantities and running times of the three models are counted on two data sets, the consumption results on the horse2zebra data set are shown in table 6, and the consumption results on the applet 2orange data set are shown in table 7.

TABLE 6 parameter, spent video memory, and run time conditions in the horse- > zebra dataset for cycleGAN, AGGAN, Attention-cycleGAN

TABLE 7 quantity, consumed video memory, and run time conditions in Apple2orange dataset for cycleGAN, AGGAN, Attention-cycleGAN

The above embodiments describe in detail a specific implementation of a new CycleGAN style migration network based on a new attention mechanism, and the above embodiments are only used to help understand the proposed method and core idea of the present invention.

Claims

1. A new cycleGAN style migration network based on a new attention mechanism, comprising the steps of:

(1) selecting a data set for migration from the style migration official data set;

(2) forward propagation: inputting the sample data sets of the two domains into a new generator, and performing convolution, a new attention mechanism, residual errors and reverse inversion convolution operations to obtain a migrated generated image;

(3) and (3) back propagation: first, the parameters of the new discriminator are fixed so as not to perform gradient descent, and the new generator is reversely propagated and updated with the parameters. Then, allowing the gradient of the new discriminator to decrease, and performing back propagation and parameter updating;

(4) and testing the trained model by adopting PSNR, SSIM, LPIPS and FID evaluation indexes, parameter quantity, consumed video memory and training time, and measuring a migration result.

2. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,

new attention mechanism for new generator in step (2), assuming new attention mechanism will be

As an input feature map for the attention mechanism,

as an output feature map of the attention mechanism, a final output feature map may be regarded as a result of performing a new feature extraction on the input features. The mathematical formula is expressed as:

s.t.h＝1,2,…,HW；h′＝1,2,…,HW.

wherein,

s.t.k＝1,2,…,C.

W_v2is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. W_v2＝[Conv′₁,Conv′₂,…,Conv′_C]^TOf Conv'_kIs the kth convolution kernel containing C pending parameters.

ReLU denotes the ReLU activation function, i.e.

Wherein A represents a two-dimensional matrix, a_ijIs a value in A.

LN denotes LayerNorm normalization, i.e.

W_v1Is a C x C two-dimensional matrix formed by C convolution kernels as column vectors, i.e. W_v1＝[Conv₁,Conv₂,…,Conv_C]^TWherein Conv_kIs the kth convolution kernel containing C pending parameters.

s.t.h″＝1,2,…,HW.

Wherein, Conv₀Is a row vector convolution kernel of the parameters to be determined,

representing a matrix multiplication.

3. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,

in the step (2), the new generator can perform better level extraction on the image features after using a new attention mechanism, so that good migration effect can be obtained while reducing the parameter quantity by using 3 residual blocks, and the problems of excessive parameter quantity and gradient explosion caused by using 9 residual blocks are avoided.

4. The new CycleGAN style migration network based on new attention mechanism as claimed in claim 1,

and (4) the new discriminator in the step (3) adopts a residual error idea in convolution, so that the feature extraction can be carried out on the basis of the original feature map of the image, and the problem that the discriminator cannot well extract the image features due to the disappearance of the gradient caused by the over-deep network is prevented.