CN112163605A - Multi-domain image translation method based on attention network generation - Google Patents

Multi-domain image translation method based on attention network generation Download PDF

Info

Publication number
CN112163605A
CN112163605A CN202010976851.5A CN202010976851A CN112163605A CN 112163605 A CN112163605 A CN 112163605A CN 202010976851 A CN202010976851 A CN 202010976851A CN 112163605 A CN112163605 A CN 112163605A
Authority
CN
China
Prior art keywords
image
discriminator
attention
domain
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010976851.5A
Other languages
Chinese (zh)
Inventor
张友彩
邵明文
禹发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202010976851.5A priority Critical patent/CN112163605A/en
Publication of CN112163605A publication Critical patent/CN112163605A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Image translation is the mapping of images from one domain to another. Currently this task is mainly faced with three challenging problems: 1) the flexibility in handling multi-domain translations is not sufficient; 2) cannot focus only on the area to be converted while leaving other irrelevant attributes unchanged; 3) blurred image artifacts are easily created. The invention aims at the limitations and provides a novel multi-domain image translation method. Aiming at the problem 2), the invention embeds an attention module in a generator and a discriminator, so that the model can apply a larger weight coefficient to the most important region in the image translation process according to the attention obtained by an auxiliary classifier. The invention abandons the traditional discriminator structure and adopts the Patch discriminator, so that the discriminator can pay more attention to the detail part in the image, thereby improving the quality of the generated image.

Description

Multi-domain image translation method based on attention network generation
Technical Field
The invention relates to deep learning style migration and image translation. The Pytorch deep learning framework is needed, and the main development environments used are Pytorch1.1, python3.5 and CUDA10.0.
Background
Before the creation of countermeasure network proposals, the main focus of deep learning has been on studying rich hierarchical models to represent the probability distributions of various data encountered during application, such as natural images, audio waveforms containing speech, and various symbols in natural language libraries. Until the generation countermeasure network (GAN) proposed by Ian Goodfellow in 2014 broke this situation, GAN once proposed took much attention and became the hottest model in deep learning. The generation of the countermeasure network can be understood in two parts, and the generation is to enable the model to learn some data such as pictures, languages and the like the brain and to automatically generate some similar data. For example, let the model learn some pictures of cats, then it can generate pictures of cats itself; antagonism, as the name implies, is the relationship between the two, and therefore necessarily involves two networks to form an antagonistic network. Two networks in GAN are the generator network (G) and the arbiter network (D), respectively, and their respective roles are as follows:
Figure BDA0002686014140000011
g is a generator network that receives a random noise (i.e. randomly generated data) and generates pictures based on this random noise (maximally learning its data distribution), and the generated pictures are denoted as G (z).
Figure BDA0002686014140000012
D is a discriminator network which is used for discriminating whether a picture is a real picture or a generated picture, receives a picture x and outputs D (x), if x is the real picture, D (x) is 1, and if x is the generated false picture, D (x) is 0.
The advent of GAN has revolutionized the field of image generation, since GAN, a number of variants of GAN are named, and the following are relatively representative derivatives of GAN.
(1) CGAN. The advantages of the generation of the countermeasure network are undoubted, the gradient is obtained only through a back propagation algorithm, a Markov chain is not needed, complex reasoning is not needed in the learning process, and various factors and the relationship among the factors can be easily fused into the model. Such a generation model has no constraint, and therefore, it is impossible to control what data is generated. The CGAN is a model conditional constraint through some additional information, so that the generation process of data can be guided. These additional information include some sort of label, some reminder information for image inpainting, or information from other modes. Compared with the original GAN, the CGAN adds constraint conditions in both the discriminator and the generator, so that the generation of the picture is not unsupervised and purposeless.
(2) DCGAN. The DCGAN perfects achievements of the convolutional network in the aspects of supervised learning and unsupervised learning through certain architectural constraints, and becomes a strong candidate for unsupervised learning by virtue of good performance. In the large volume of unlabeled data collection, it has been an active area to study how to represent the portion of the features of data that are reusable. People learn with a virtually unlimited number of unlabeled images and videos, and after a good intermediate representation is obtained, it can be used in different supervised learning studies or tasks. DCGAN follows the above idea by training generative confrontation networks (GANs), then reusing the generative and discriminator networks, performing feature extraction in different supervision tasks, and proposing a set of constraints on evaluating GAN topology, which keep GAN stably trained under most settings, avoiding the generative output from being meaningless.
(3) InfoGAN. The InfoGAN maximizes the mutual information between the underlying variables and the observed data. Specifically, the InfoGAN successfully separated the writing style from the number shapes in the MNIST dataset, the gestures from the 3D rendered image, and the background numbers from the center numbers of the SVHN dataset. It also finds some visual concepts including hairstyle, whether glasses are worn, and facial emotions in the CelebA facial dataset. In the original GAN, the input received by the generator is an irregular single continuous noise, which is uninterpretable, and there is no way to control a certain dimension to generate specific image information, and the noise is usually subjected to fitting process. Analyzing the MNIST data set, the numbers can be decomposed into a plurality of dimensions, each dimension represents different characteristics such as digital content, line thickness, font inclination degree and the like, and a certain dimension cannot be changed in the original GAN to enable a generator to generate an image with a specific dimension. The InfoGAN is improved on the basis that a single continuous input noise Z is processed and is decomposed into two parts, one part is the original noise Z, the other part is the characteristic dimension of the noise Z, and different dimensions represent different characteristics.
Disclosure of Invention
A multi-domain image translation method based on an attention network generation is characterized in that an attention module is added in an anti-generation network on the basis of the existing work to realize multi-domain image translation. Considering that the existing image translation method cannot enable the network to pay more attention to the areas more important to the translation process, the invention embeds the attention module in the generator and the discriminator, and the attention module integrates an auxiliary classifier for generating the attention diagram, so that the model can apply a larger weight coefficient to the areas most important in the image translation process according to the attention diagram obtained by the auxiliary classifier. In order to generate a clearer and more natural translation image, the invention abandons the traditional discriminator structure and adopts a Patch discriminator. The traditional discriminator discriminates a whole image as input, so that the detailed part in the image is inevitably ignored, and the discriminator can pay more attention to the detailed part in the image by dividing one image into a plurality of Patch with the same scale, thereby improving the quality of the generated image. The method mainly comprises the following steps:
randomly sampling a batch data set, and preprocessing an original domain label to obtain a target domain label;
step (2), inputting the image and the target domain label in the step (1) into an encoder together after channel-level concat, and extracting features;
step (3), inputting the feature map extracted in the step (2) into an auxiliary classifier to obtain corresponding importance weight, and performing weighted multiplication on the attention weight and the corresponding feature map to obtain an attention map;
step (4), inputting the attention diagram obtained in the step (3) into a decoder for decoding, and finally generating an output image;
step (5), the output image coded by the coder is input into a discriminator for discrimination; the discriminator has two functions, one is to judge whether the image is true or false according to the distribution of the image input into the discriminator, and the other is to classify the image according to the characteristics of the input image and output the classification label of the image, and the classification label output by the discriminator should be the same as the target class label of the image, and the discriminator is optimized according to the following formulas (1), (2) and (3).
Figure BDA0002686014140000041
Figure BDA0002686014140000042
Figure BDA0002686014140000043
And (6) inputting the false target domain image generated in the step (5) and the original domain label thereof into the same generator to reconstruct the image, wherein the reconstructed image has the same image characteristics as the original input image, so that the following formula (4) is provided.
Figure BDA0002686014140000044
And (7) performing iterative optimization on the network according to the loss function.
Drawings
FIG. 1 is a schematic diagram of the architecture of the present invention, and the whole network structure includes two generators GtAnd GrAnd a discriminator D. Conversion generator GtThe original domain image can be translated according to the target domain label, and the reconstruction generator can reconstruct the converted image by using the original domain label. Two different generators are used to handle different tasks and allowMany different network architecture designs.
A block diagram of the generator is shown in fig. 2. According to the picture display, the generator is mainly composed of three parts, namely an encoding part, an attention part and a decoding part. The generator part adopts a network structure of Unet, and can ensure that other irrelevant attributes are reserved to the maximum extent in the conversion process. An encoder receives an original picture and a target domain label as input and performs feature extraction; and the attention module inputs the feature map extracted by the encoder into a classifier to obtain corresponding importance weights, and finally multiplies the importance weights by the feature map weights to obtain the final attention map. The encoder is responsible for decoding the attention map to generate the final output image.
Detailed Description
A multi-domain image translation method based on an attention network generation is characterized in that an attention module is added in an anti-generation network on the basis of the existing work to realize multi-domain image translation. Considering that the existing image translation method cannot enable the network to pay more attention to the areas more important to the translation process, the invention embeds the attention module in the generator and the discriminator, and the attention module integrates an auxiliary classifier for generating the attention diagram, so that the model can apply a larger weight coefficient to the areas most important in the image translation process according to the attention diagram obtained by the auxiliary classifier. In order to generate a clearer and more natural translation image, the invention abandons the traditional discriminator structure and adopts a Patch discriminator. The traditional discriminator discriminates a whole image as input, so that the detailed part in the image is inevitably ignored, and the discriminator can pay more attention to the detailed part in the image by dividing one image into a plurality of Patch with the same scale, thereby improving the quality of the generated image. The method mainly comprises the following steps:
randomly sampling a batch data set, and preprocessing an original domain label to obtain a target domain label;
step (2), inputting the image and the target domain label in the step (1) into an encoder together after channel-level concat, and extracting features;
step (3), inputting the feature map extracted in the step (2) into an auxiliary classifier to obtain corresponding importance weight, and performing weighted multiplication on the attention weight and the corresponding feature map to obtain an attention map;
step (4), inputting the attention diagram obtained in the step (3) into a decoder for decoding, and finally generating an output image;
step (5), the output image coded by the coder is input into a discriminator for discrimination; the discriminator has two functions, one is to judge whether the image is true or false according to the distribution of the image input into the discriminator, and the other is to classify the image according to the characteristics of the input image and output the classification label of the image, and the classification label output by the discriminator should be the same as the target class label of the image, and the discriminator is optimized according to the following formulas (1), (2) and (3).
Figure BDA0002686014140000071
Figure BDA0002686014140000072
Figure BDA0002686014140000073
And (6) inputting the false target domain image generated in the step (5) and the original domain label thereof into the same generator to reconstruct the image, wherein the reconstructed image has the same image characteristics as the original input image, so that the following formula (4) is provided.
Figure BDA0002686014140000074
And (7) carrying out iterative optimization and annotation on the network according to the loss function.
And (7) after the face segmentation is finished, applying histogram matching to each part, and then obtaining loss functions of the three parts, wherein the loss functions can be expressed as a formula (2).
And (8) extracting the content of the image by utilizing an vgg16 network pre-trained on the ImageNet data set, wherein the content is shown as a formula (4).
And (9) performing iterative optimization on the generator according to the loss function.

Claims (1)

1. A multi-domain image translation method based on an attention network generation is characterized in that an attention module is added in an anti-generation network on the basis of the existing work to realize multi-domain image translation. Considering that the existing image translation method cannot enable the network to pay more attention to the areas more important to the translation process, the invention embeds the attention module in the generator and the discriminator, and the attention module integrates an auxiliary classifier for generating the attention diagram, so that the model can apply a larger weight coefficient to the areas most important in the image translation process according to the attention diagram obtained by the auxiliary classifier. In order to generate a clearer and more natural translation image, the invention abandons the traditional discriminator structure and adopts a Patch discriminator. The traditional discriminator discriminates a whole image as input, so that the detailed part in the image is inevitably ignored, and the discriminator can pay more attention to the detailed part in the image by dividing one image into a plurality of Patch with the same scale, thereby improving the quality of the generated image.
The method mainly comprises the following steps:
randomly sampling a batch data set, and preprocessing an original domain label to obtain a target domain label;
step (2), inputting the image and the target domain label in the step (1) into an encoder together after channel-level concat, and extracting features;
step (3), inputting the feature map extracted in the step (2) into an auxiliary classifier to obtain corresponding importance weight, and performing weighted multiplication on the attention weight and the corresponding feature map to obtain an attention map;
step (4), inputting the attention diagram obtained in the step (3) into a decoder for decoding, and finally generating an output image;
step (5), the output image coded by the coder is input into a discriminator for discrimination; the discriminator has two functions, one is to judge whether the image is true or false according to the distribution of the image input into the discriminator, the other is to classify the image according to the characteristics of the input image and output the classification label of the image,
at this time, the classification label output by the discriminator should be the same as the target class label of the image, and the discriminator is optimized as the following formulas (1), (2) and (3).
Figure FDA0002686014130000021
Figure FDA0002686014130000022
Figure FDA0002686014130000023
And (6) inputting the false target domain image generated in the step (5) and the original domain label thereof into the same generator to reconstruct the image, wherein the reconstructed image has the same image characteristics as the original input image, so that the following formula (4) is provided.
Figure FDA0002686014130000024
And (7) performing iterative optimization on the network according to the loss function.
CN202010976851.5A 2020-09-17 2020-09-17 Multi-domain image translation method based on attention network generation Pending CN112163605A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010976851.5A CN112163605A (en) 2020-09-17 2020-09-17 Multi-domain image translation method based on attention network generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010976851.5A CN112163605A (en) 2020-09-17 2020-09-17 Multi-domain image translation method based on attention network generation

Publications (1)

Publication Number Publication Date
CN112163605A true CN112163605A (en) 2021-01-01

Family

ID=73859158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010976851.5A Pending CN112163605A (en) 2020-09-17 2020-09-17 Multi-domain image translation method based on attention network generation

Country Status (1)

Country Link
CN (1) CN112163605A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841589A (en) * 2022-11-08 2023-03-24 河南大学 Unsupervised image translation method based on generation type self-attention mechanism
CN116958468A (en) * 2023-07-05 2023-10-27 中国科学院地理科学与资源研究所 Mountain snow environment simulation method and system based on SCycleGAN

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841589A (en) * 2022-11-08 2023-03-24 河南大学 Unsupervised image translation method based on generation type self-attention mechanism
CN116958468A (en) * 2023-07-05 2023-10-27 中国科学院地理科学与资源研究所 Mountain snow environment simulation method and system based on SCycleGAN

Similar Documents

Publication Publication Date Title
CN108596265B (en) Video generation model based on text description information and generation countermeasure network
CN112887698B (en) High-quality face voice driving method based on nerve radiation field
CN109919830B (en) Method for restoring image with reference eye based on aesthetic evaluation
CN109815826B (en) Method and device for generating face attribute model
CN110555896B (en) Image generation method and device and storage medium
CN110570481A (en) calligraphy word stock automatic repairing method and system based on style migration
CN111160452A (en) Multi-modal network rumor detection method based on pre-training language model
CN113807265B (en) Diversified human face image synthesis method and system
CN110909680A (en) Facial expression recognition method and device, electronic equipment and storage medium
CN112036276A (en) Artificial intelligent video question-answering method
CN112633288B (en) Face sketch generation method based on painting brush touch guidance
CN113837366A (en) Multi-style font generation method
CN109711411B (en) Image segmentation and identification method based on capsule neurons
CN110232564A (en) A kind of traffic accident law automatic decision method based on multi-modal data
CN117058266B (en) Handwriting word generation method based on skeleton and outline
CN112163605A (en) Multi-domain image translation method based on attention network generation
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
CN116797868A (en) Text image generation method and diffusion generation model training method
CN116129013A (en) Method, device and storage medium for generating virtual person animation video
CN112330759A (en) Face attribute editing method based on generation countermeasure network
CN117496567A (en) Facial expression recognition method and system based on feature enhancement
CN113658285B (en) Method for generating face photo to artistic sketch
CN116127959A (en) Image mood mining and mood conversion Chinese ancient poems method based on deep learning
CN113722536B (en) Video description method based on bilinear adaptive feature interaction and target perception
CN115346259A (en) Multi-granularity academic emotion recognition method combined with context information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210101

WD01 Invention patent application deemed withdrawn after publication