CN109166102A - It is a kind of based on critical region candidate fight network image turn image interpretation method - Google Patents
It is a kind of based on critical region candidate fight network image turn image interpretation method Download PDFInfo
- Publication number
- CN109166102A CN109166102A CN201810820240.4A CN201810820240A CN109166102A CN 109166102 A CN109166102 A CN 109166102A CN 201810820240 A CN201810820240 A CN 201810820240A CN 109166102 A CN109166102 A CN 109166102A
- Authority
- CN
- China
- Prior art keywords
- image
- real
- loss
- probability
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000011218 segmentation Effects 0.000 claims abstract description 32
- 238000013519 translation Methods 0.000 claims description 31
- 239000003607 modifier Substances 0.000 claims description 18
- 238000012937 correction Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 description 8
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 7
- 239000002131 composite material Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 238000009826 distribution Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 241000282414 Homo sapiens Species 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 102100022778 POC1 centriolar protein homolog A Human genes 0.000 description 2
- 101710125073 POC1 centriolar protein homolog A Proteins 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 208000018883 loss of balance Diseases 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30181—Earth observation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of image for fighting network based on critical region candidate and turns image interpretation method, and the semantic segmentation figure of true picture is input to the generator, generates the first image;The first image is inputted in described image block arbiter, predicts to obtain shot chart by described image block arbiter;Most apparent artifact region image block is found in the shot chart using sliding window, the artifact region image block is mapped in the first image, the critical region in the first image is obtained;Operation is masked to the true picture using the critical region, the fault image after obtaining mask;Fault image after the true picture and the mask is input in the corrector, for judging the true and false of input picture;The generator generates the image closer to the true picture according to the amendment of the corrector.The present invention can synthesize the high quality graphic of high-resolution, true detail and less artifact.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an image-to-image translation method based on a discrimination area candidate countermeasure network.
Background
From the perspective of human visual perception, we consider a composite image to be false, usually because it contains local artifacts. Although at first sight it seems realistic, we can still easily tell authenticity by gazing at only about 1000 ms. Human beings have the ability to draw a real scene from coarse structures to fine details, which is how we usually get the global structure of a scene while focusing on the details of an object and understand how it is related to the surrounding environment.
Many efforts have been made to develop automated image translation systems. The straightforward approach is to optimize either the L1 or L2 penalty in pixel space, however both suffer from blurring problems. Thus, some work has increased the counter-loss to produce sharp images in both spatial and spectral dimensions. In addition to GAN loss, perceptual loss has been applied to image-to-image translation tasks, but it is limited to pre-trained depth models and training data sets. Despite the various losses in evaluating the difference between the actual image and the generated image, the image-to-image translation with GAN still suffers from artifacts and uneven color distribution problems and even makes it difficult to generate a true picture of high resolution due to the high-dimensional distribution.
Disclosure of Invention
The invention provides an image-to-image translation method based on a discrimination area candidate countermeasure network, which aims to solve the technical problems of artifacts, unbalanced color distribution, low resolution of converted images and the like in image-to-image translation in the prior art.
An image-to-image translation method based on a discrimination area candidate countermeasure network, the discrimination area candidate countermeasure network comprises a generator, an image block discriminator and a corrector, the method comprises the following steps:
s1: inputting a semantic segmentation graph of a real image into the generator to generate a first image;
s2: inputting the first image into the image block discriminator, and predicting through the image block discriminator to obtain a score map;
s3: finding the most obvious artifact area image block in the score map by using a sliding window, and mapping the artifact area image block into the first image to obtain a judgment area in the first image;
s4: carrying out mask operation on the real image by using the distinguishing area to obtain a masked false image;
s5: inputting the real image and the masked false image into the corrector for judging the truth of the input image;
s6: the generator generates an image closer to the real image according to the correction by the corrector.
Further, a resolution w is giveni×wiAnd processed by the image block discriminator as ws×wsSize score chart ifIt is desired to obtain w*×w*The size of the sliding window of the score map is w × w, where w is w*×ws/wi。
Further, the discrimination area drIs the area mapped by the discrimination area candidate confrontation network, namely:wherein,in the formula,in order to determine the center coordinates of the region, τ is the distance between the first image and the score map, (x)c,yc) Is the center coordinate of the score plot.
Further, the total objective function of the discrimination area candidate countermeasure network is as follows:
L(G,Dp,R)=(1-λ)LD(G,Dp)+λLR(G,R)+LL1(G) wherein L isD(G,Dp) Is the loss of the image block discriminator, LR(G, R) is the loss of the corrector, LL1(G) Is the gamma loss.
Further, the loss of the image block discriminator is:
LD(G,Dp)=Ey[logDp(x,y)]+Ex,z[log(1-Dp(x,G(x,z)))],
in the formula, DpIs an image block discriminator, x is a semantic segmentation map of a real image, y is the real image, G (x, z) is a first image, Ey[logDp(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into the image block discriminator, the probability is between 0 and 1, the label of the real image is 1, calculating the loss between the probability and 1, Ex,z[log(1-Dp(x,G(x,z)))]The probability obtained after the semantic segmentation graph x of the real image and the first image G (x, z) are input into the image block discriminator is represented, the probability is between 0 and 1, the label of the false graph is 0, and the loss between the probability and 0 is calculated.
Further, the loss of the corrector is:
wherein R is a corrector, ymaskFor the masked false image, α is the hyper-parameter, δ is the random noise on the semantic segmentation map x of the real image,for the gradient of the semantic segmentation map x of the real image, Ey[logR(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into a corrector, the probability being between 0 and 1, the label of the real image being 1, calculating the loss between the probability and 1, Ex,z[log(1-R(x,ymask))]Showing a semantic segmentation map x of a real image and a masked false image ymaskThe probability obtained after input to the modifier is between 0 and 1, the label of the false graph is 0, the loss between the probability and 0 is calculated,indicating the addition of a normalized loss modifier as a gradient penalty.
Further, the gamma loss is:
in the formula (d)rFor the discrimination region, yr is a region on the real image corresponding to the discrimination region on the first image G (x, z), FDRPnet(G (x, z)) is a discrimination region on the first image G (x, z), β and γ are hyper-parameters, | × | | | survival1Is a norm, representing the sum of the absolute values of the differences between two elements; ex,y,z[||y-G(x,z)||1For the loss between the real image y and the first image G (x, z),as an area y on the real image corresponding to the discrimination area on the first image G (x, z)rAnd a discrimination region F on the first image G (x, z)DRPnet(G (x, z)) in the same way.
The invention provides an image-to-image translation method based on a discrimination area candidate countermeasure network, which has the following advantages: the method can synthesize high-quality images with high resolution, real details and less artifacts.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise.
FIG. 1(a) is a schematic diagram of a real image for a semantic segmentation map;
FIG. 1(b) is a semantic segmentation diagram;
FIG. 2 is a schematic flow chart of a method according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a process of improving the quality of a synthesized image by training DRPANs according to an embodiment of the present disclosure;
fig. 4 is a schematic score chart of the first image and the real image according to the embodiment of the present application;
FIG. 5 is a schematic diagram of the necessity of DRPANs for high quality image-to-image translation according to an embodiment of the present application;
FIG. 6 is a schematic diagram showing a comparison of qualitative results of different size decision regions for DRPANs compared with ID-CGAN according to the embodiment of the present invention;
FIG. 7 is a comparison of qualitative results for translating real semantic tags for DRPANs compared to Pix2Pix in an embodiment of the present application;
FIG. 8 is a schematic diagram showing a comparison of the translation results from aerial photographs to maps and from maps to aerial photographs in the embodiment of the present application, comparing DRPANs with Pix2 Pix;
FIG. 9 is a comparison of the DRPANs compared with the CRN and Pix2Pix for comparing the results from abstract image to real image in the embodiment of the present application;
fig. 10 is a diagram showing the comparison of the translation results from edge to real and sketch to real compared with other methods for DRPANs of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It is noted that examples of the described embodiments are illustrated in the accompanying drawings, where like reference numerals refer to the same or similar components or components having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
It should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Examples
The embodiment of the application is the preferred embodiment of the application.
The embodiment of the application provides a discrimination area candidate countermeasure network (DRPANs) for high-quality image-to-image translation, the discrimination area candidate countermeasure network comprises a generator, an image block Discriminator and a corrector, wherein the image block Discriminator (Patch Discriminator) adopts a Patch GAN Markov Discriminator to extract a false image after a discrimination area generates a mask.
As shown in fig. 2, the method comprises the steps of:
s1: inputting a semantic segmentation graph of a real image into the generator to generate a first image;
the image semantic segmentation is that a machine automatically segments and identifies the content in an image, and the semantic segmentation image is an image obtained by performing image semantic segmentation on a real image. As shown in fig. 1, fig. 1(a) is a real image, and fig. 1(b) is a semantic segmentation map corresponding to the real image.
The first image is a false map.
S2: inputting the first image into the image block discriminator, and predicting through the image block discriminator to obtain a score map;
the specific method for obtaining the score map through the image block discriminator prediction comprises the following steps:
the first image input to the image block discriminator is partitioned according to a matrix, for example, the first image is partitioned into 10 × 10 matrix partitions, as shown in fig. 4, a sum of values on the matrix obtained through the neural network of the image block discriminator is obtained, where each value in the matrix represents a true or false degree of a corresponding position of the discrimination region, and the more true the image in the corresponding matrix block is, the closer the score is to 1, and the more false the image in the corresponding matrix block is, the closer the score is to 0. A higher score of the score map indicates a more true first image, and a lower score of the score map indicates a more false first image.
S3: finding the most obvious artifact area image block in the score map by using a sliding window, and mapping the artifact area image block into the first image to obtain a judgment area in the first image;
s4: carrying out mask operation on the real image by using the distinguishing area to obtain a masked false image;
s5: inputting the real image and the masked false image into the corrector for judging the truth of the input image;
s6: the generator generates an image closer to the real image according to the correction by the corrector.
As shown in fig. 3, a process of how to improve the quality of the composite image is shown. It can be seen that as the DRPANs continue to train, the discrimination regions (right) of the masked false images change, so the quality (left) of the composite image improves while producing brighter (higher scoring) score maps (first and last). Although it is difficult to distinguish synthetic samples from actual samples after many training passes, the DRPANs of the embodiments of the present application can still be optimized with continuous modifications in detail to the generator to achieve high quality results.
The image block discriminator is first used to generate a meaningful score map, and its application is not limited to picture synthesis. Fig. 4 is an output result of the score map at different image quality levels (false and true) by the pre-training image patch discriminator PatchGAN. Wherein, the first column input represents the semantic segmentation map of the Real image, the second column Fake represents the generated first image, the third column Score map represents the generated Score map of the first image, the fourth column Real represents the Real image, and the fifth column Score map represents the Score map of the Real image, and it can be seen that the false sample Score map with obvious artifacts and shape deformation has darker connecting regions and lower scores. In contrast, the score map of a real sample is brighter and has a higher score. From its visualization point of view, it can be proposed that the discrimination area finds the darkest part, and has an excellent property of finding false areas.
Based on FIG. 4, in the embodiment of the present application, a resolution is given as wi×wiAnd processed by the image block discriminator as ws×wsSize score map, if desired to obtain w*×w*First, the size of the sliding window of the score map is calculated as w × w, where w is w*×ws/wi;
According to the central coordinates of the discrimination areaAnd the side length w of the discrimination region*Extracting the area which is considered the most false by the image block discriminator from the synthesized first image, namely, the discrimination area (discrimination area), wherein the discrimination area drIs the area mapped by the discrimination area candidate confrontation network, namely: dr=FDRPnet(x* c,y* c,w*) Whereinin the formula,is a discrimination regionThe center coordinate of the domain, τ, is the distance between the first image and the score map.
To realize high-quality image-to-image translation, not only the independent local area needs to be optimized, but also the relation between the false discrimination area and the surrounding real area needs to be emphasized, and only then, the false discrimination area can be connected to the real area to provide correction for the generator. Therefore, the embodiment of the application adopts the false distinguishing area to mask the corresponding real image, generates the masked false image, and then designs the image generator which uses the CNN (convolutional neural network) modifier to distinguish the true image from the false image, thereby optimizing and synthesizing the high-quality image generator.
For the task of translating images into images, not only a vivid sample is generated, but also diversity is expected to be realized through different condition inputs. The discrimination area candidate countermeasure network provided by the invention comprises a generator G and an image block discriminator DpAnd a corrector R. Generator G passes through image block discriminator DpAnd modifiers R and L1, so the overall objective function is:
L(G,Dp,R)=(1-λ)LD(G,Dp)+λLR(G,R)+LL1(G) wherein L isD(G,Dp) Is the loss of the image block discriminator, LR(G, R) is the loss of the corrector, LL1(G) Is the gamma loss.
The native countermeasure network GANs have problems of instability and mode collapse. The embodiments of the present application thus improve the training of GANs. To stably train DRPANs with a high diversity synthesis capability, the embodiments of the present application modify DRPANs as the penalty for the modifier R and train the Patch Discriminator (image block Discriminator) using the original objective function.
The loss of the image block discriminator is as follows: l isD(G,Dp)=Ey[logDp(x,y)]+Ex,z[log(1-Dp(x,G(x,z)))],
In the formula, DpIs an image block discriminator, and is,x is the semantic segmentation map of the real image, y is the real image, G (x, z) is the first image, Ey[logDp(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into the image block discriminator, the probability is between 0 and 1, the label of the real image is 1, calculating the loss between the probability and 1, Ex,z[log(1-Dp(x,G(x,z)))]The probability obtained after the semantic segmentation graph x of the real image and the first image G (x, z) are input into the image block discriminator is represented, the probability is between 0 and 1, the label of the false graph is 0, and the loss between the probability and 0 is calculated.
The goal of the discriminator D in GANs is that D is desirable because the closer the true image is to 1, the better, and the closer the false image generated by the generator G is to 0, the betterpThe closer to 1 the (x, y) is, the better, DpThe closer (x, G (x, z)) is to 0, the better.
For the modifier R, to distinguish between a very similar real image and a masked false image real ymaskM (G (x, z)), where M (·) denotes a masking operation, the embodiment of the present application adds a regularized loss modifier as a penalty.
The loss of the corrector is:
wherein R is a corrector, ymaskFor the masked false image, α is the hyper-parameter, δ is the random noise on the semantic segmentation map x of the real image,for the gradient of the semantic segmentation map x of the real image, Ey[logR(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into a corrector, the probability being between 0 and 1, the label of the real image being 1, calculating the loss between the probability and 1, Ex,z[log(1-R(x,ymask))]Semantic segmentation graph x and masking representing real imagesModeled false image ymaskThe probability obtained after input to the modifier is between 0 and 1, the label of the false graph is 0, the loss between the probability and 0 is calculated,indicating the addition of a normalized loss modifier as a gradient penalty.
Previous studies found that it is beneficial to combine the GANs targets with traditional losses (e.g., distance of L2 and L1). Considering that the loss of L1 produces less blurring than the loss of L2, an additional L1 loss is provided over the entire input image and the local decision region of the generator for regularization:
in the formula (d)rTo distinguish the region, yrAs a region on the real image corresponding to the discrimination region on the first image G (x, z), FDRPnet(G (x, z)) is a discrimination region on the first image G (x, z), β and γ are hyper-parameters, | × | | | survival1Is a norm, representing the sum of the absolute values of the differences between two elements; ex,y,z[||y-G(x,z)||1For the loss between the real image y and the first image G (x, z),as an area y on the real image corresponding to the discrimination area on the first image G (x, z)rAnd a discrimination region F on the first image G (x, z)DRPnet(G (x, z)) in the same way.
The invention provides a distinguishing area candidate confrontation network architecture which comprises the following steps: for the generator R, its convincing single image super-resolution architecture is used. And (3) respectively performing up-sampling and down-sampling by adopting a convolution layer and a deconvolution layer, and 9 residual blocks for task learning. Each layer uses batch normalization and ReLU as activation functions. For the image block discriminator, it is mainly implemented by a PatchGAN of 70 × 70. The modifier is a discriminator that modifies on DCGAN with a global view on all inputs. At the end of the discriminators and modifiers, probabilities are output using Sigmoid as the activation function.
The training process of the embodiment of the application is as follows: by minimizing L (G, D)pR) to learn DRPANs parameters and use a mini-batch SGD and an application Adam optimizer to learn the difference between masked false and real images, the corrector R adds a gradient penalty as a regularized scheme that forces the corrector parameters to be a RipHizsContinue in x. it was found experimentally that the hyper-parameter α was set to 10, works robustly on various data sets, sets the number of steps to 1, and assigns the size of the mini-batch to 1-4 in different tasks.
In order to evaluate the performance of the method provided by the invention in image-to-image translation, the embodiment deploys translation task experiments on different levels, and compares the method provided by the invention with the prior art. For different tasks, different assessment metrics were used, including human perception studies and automated quantitative measurements.
1) Evaluation index
And (5) evaluating the image quality. PSNR (Peak Signal to Noise Ratio), SSIM (structural distortion index measurement system) and VIF (variance expansion Factor) are among the most popular evaluation criteria for low-level computer vision tasks such as deblurring, defogging and image restoration. Therefore, for the task of removing rain and satellite map to map, PSNR, SSIM, VIF and RECO (Relative Edge Coherence) are used in this embodiment to verify the performance of the results.
And (5) evaluating indexes of image segmentation. In this embodiment, the actual semantic tag task on the cityscaps dataset is evaluated using the standard metrics of cityscaps benchmark, including per-pixel precision, per-Class precision, and Class IOU (interaction over unit).
Amazon Mechanical Turk (AMT). AMT is used as a golden metric to assess the authenticity of a composite image in many tasks, and this embodiment uses it as a task evaluation metric for semantic tag-to-real image and map-to-satellite map translation.
FCN-8s score. As a result of automatic quantitative measurements using off-the-shelf classifiers, a classifier trained on real images can also correctly classify synthetic images if the generated images are real. This embodiment uses FCN-8s score to evaluate the semantic tags of the actual tasks on the Cityscapes dataset.
2) Discriminating between regional candidate countermeasure networks DRPANs
To investigate the effect of DRPANs on correcting for different loss cases between synthetic and candidate regions and actual regions, this example designed an experiment. Starting from the pre-trained PatchGAN, the multi-channel training continues: continuing to train with PatchGAN; continuing to train with PatchGAN under the condition that L1 loses discrimination and real area; training continues using PatchGAN and modifier.
The Patch discriminator (PatchD for short) effectively finds the most false or real regions from the image (see FIG. 4), but it is difficult to improve the details of these regions because PatchD has difficulty capturing high-dimensional distributions. In this case, the invention proposes to discriminate the regional candidate countermeasure network DRPnet (increasing the strength of the patch d) and to design a modifier to gradually eliminate the visual artifact, thus turning it into a low-dimensional estimation problem. This can be viewed as a "top-down" process, unlike other gradual "bottom-up" image generation methods. Fig. 5 illustrates the necessity of the DRPANs proposed by the present embodiment for high quality image-to-image translation, which illustrates that continuing to train the patch d without contributing to the reduction of artifacts in the event of a loss of balance L1. DRPANs with L1 losses can smooth the artifacts but do not sharpen in detail, while DRPANs with modifiers exceed the performance of the PatchD and have fewer visual artifacts. The modifier and L1 losses combine to reduce artifacts that are ignored by the patch d. The present example also finds that the pseudo-masking operation can improve the fluency of the entire image in some samples (e.g., the connection between the door and the wall). Therefore, DRPANs with a pseudo mask were achieved in the following experiments.
3) Low level translation
First, the model of the present invention is applied to two low-level translation tasks that are only related to the apparent translation of the image, e.g., in a de-raining task, without the need to alter the content and texture of the input samples. Therefore, in eqn.9, λ is set to 1, and image synthesis is performed using only the corrector.
And removing rain from the single image. Fig. 6 shows that compared with ID-CGAN (Image De-raining Conditional general adaptive Network, Image rain-removing Conditional general countermeasure Network), the DRPANs of the present invention have qualitative results of different sized discrimination regions, and the performance of DRPANs is better than that of ID-CGAN, not only more effective, but also more vivid color and clear details. Table 1 shows the corresponding quantitative results evaluated by the PSNR, SSIM, VIF and RECO indices, with the best results (in bold) achieved by the DRPANs of the present invention.
4) Real to abstract translation
The DRPANs proposed by the present invention are implemented on two real to abstract translation tasks, which require many-to-one abstraction capabilities.
True semantic tags. For the true semantic labeling task, this embodiment tested the DRPANs model on the two most commonly used data sets: cityscapes and facades. Fig. 7 shows the qualitative results of the DRPANs of the present invention for translating true semantic tags compared to Pix2Pix on the cityscaps dataset, the DRPANs can synthesize results closer to true targets than Pix2Pix, and the quantitative results in table 2 can also illustrate this conclusion in terms of per-pixel precision, per-Class precision and Class IOU.
Satellite map to map. The present embodiment also applies DRPANs on aerial photographs to the mapping task and experiments were carried out using paired images at 512 × 512 resolution. The top row of fig. 8 shows qualitative results of the inventive DRPANs compared to Pix2Pix, indicating that the inventive DRPANs can correctly translate the highway on the aerial photograph into the orange line on the map, whereas Pix2Pix cannot.
Map to satellite map. Relative to the satellite map-to-map mission, the present embodiment also tested the DRPANs of the present invention on the map-to-satellite map mission, with the qualitative results shown at the bottom row of fig. 8, which clearly shows that the DRPANs of the present invention can synthesize aerial photographs of higher quality than Pix2 Pix.
5) Abstract to real translation
In addition, this embodiment also demonstrates several abstract to real tasks of DRPANs proposed by the present invention, which can translate one to more: semantic tags to photos, maps to satellite maps, edges to trues, and sketches to trues.
Semantic tags to true. For semantic tags of an actual task, the translation model aims to synthesize a real-world image from the semantic tags. CGAN (generating countermeasure networks) based approaches fail to capture details in the real world and suffer from distortion and ambiguity problems. CNN (convolutional neural network) based methods (e.g., CRN) can synthesize high resolution but smooth not realistic results. Figure 9 shows a qualitative comparison of the results, from which it can be seen that the DRPANs of the present invention can synthesize the most realistic results with high quality (clearer, lower distortion, high resolution) compared to Pix2Pix and CRN.
Evaluation of GAN remains a challenging problem. Many efforts use off-the-shelf classifiers as an automatic measure of the composite image. Table 3 shows the performance evaluation of the FCN-8s model segment, the DRPANs of the present invention exceed Pix2Pix 10% per pixel accuracy and also achieve the highest performance at each level of accuracy and Class IOU.
Human perception verification. The embodiment evaluates the performance of abstract to real semantic tags on photos and maps to satellite maps through the AMT. For the study of authenticity, the present embodiment follows a perceptual study protocol, collecting data for each algorithm from 30 participants. Each participant had 1000 milliseconds to see a sample. The present embodiment also compares the trueness of the composite image between different algorithms. Table 4 shows that images synthesized from DRPANs are more realistic than the prior art (DRPAN 18.2% > CRN 9.4% > Pix2Pix 3.3%). And compared with Pix2Pix and CRN, the images synthesized by DRPANs are more realistic by 95.2% and 75.7%. Table 5 shows the results of a comparison of the map-to-satellite map mission, the DRPANs of the present invention were hidden from participants at a rate exceeding Pix2Pix 18.7% and CycleGAN 26.8%, respectively.
Edge-to-true and sketch-to-true. For edge-to-real and sketch-to-real tasks, previous work often encountered two problems: one is that when the input of edges and the like is sparse, artifacts and artificial color distribution are easily generated in the region; another is that it is difficult to handle exception inputs such as sketches. This example tests the DRPANs models of the present invention on UT zapops 50K dataset and handbag dataset. Figure 10 shows that the model of the present invention also deals well with both of these problems.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (7)
1. An image-to-image translation method based on a discrimination area candidate countermeasure network is characterized in that the discrimination area candidate countermeasure network comprises a generator, an image block discriminator and a corrector, and the method comprises the following steps:
s1: inputting a semantic segmentation graph of a real image into the generator to generate a first image;
s2: inputting the first image into the image block discriminator, and predicting through the image block discriminator to obtain a score map;
s3: finding the most obvious artifact area image block in the score map by using a sliding window, and mapping the artifact area image block into the first image to obtain a judgment area in the first image;
s4: carrying out mask operation on the real image by using the distinguishing area to obtain a masked false image;
s5: inputting the real image and the masked false image into the corrector for judging the truth of the input image;
s6: the generator generates an image closer to the real image according to the correction by the corrector.
2. The method of claim 1, wherein a resolution w is given toi×wiAnd processed by the image block discriminator as ws×wsSize score map, if desired to obtain w*×w*The size of the sliding window of the score map is w × w, where w is w*×ws/wi。
3. The image-to-image translation method based on the confrontation network of the discrimination region candidate as claimed in claim 2, wherein the discrimination region drIs the area mapped by the discrimination area candidate confrontation network, namely:wherein,in the formula,in order to determine the center coordinates of the region, τ is the distance between the first image and the score map, (x)c,yc) Is the center coordinate of the score plot.
4. The image-to-image translation method based on the discriminative area candidate countermeasure network as claimed in claim 1, wherein the total objective function of the discriminative area candidate countermeasure network is:
L(G,Dp,R)=(1-λ)LD(G,Dp)+λLR(G,R)+LL1(G) wherein L isD(G,Dp) Is the loss of the image block discriminator, LR(G, R) is the loss of the corrector, LL1(G) Is the gamma loss.
5. The image-to-image translation method based on the discriminative area candidate countermeasure network of claim 4, wherein the loss of the image block discriminator is:
LD(G,Dp)=Ey[logDp(x,y)]+Ex,z[log(1-Dp(x,G(x,z)))],
in the formula, DpIs an image block discriminator, x is a semantic segmentation map of a real image, y is the real image, G (x, z) is a first image, Ey[logDp(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into the image block discriminator, the probability is between 0 and 1, the label of the real image is 1, calculating the loss between the probability and 1, Ex,z[log(1-Dp(x,G(x,z)))]The probability obtained after the semantic segmentation graph x of the real image and the first image G (x, z) are input into the image block discriminator is represented, the probability is between 0 and 1, the label of the false graph is 0, and the loss between the probability and 0 is calculated.
6. The image-to-image translation method based on the discriminative area candidate countermeasure network as claimed in claim 4, wherein the loss of the modifier is:
wherein R is a corrector, ymaskFor the masked false image, α is the hyper-parameter, δ is the random noise on the semantic segmentation map x of the real image,for the gradient of the semantic segmentation map x of the real image, Ey[logR(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into a corrector, the probability being between 0 and 1, the label of the real image being 1, calculating the loss between the probability and 1, Ex,z[log(1-R(x,ymask))]Showing a semantic segmentation map x of a real image and a masked false image ymaskThe probability obtained after input to the modifier is between 0 and 1, the label of the false graph is 0, the loss between the probability and 0 is calculated,indicating the addition of a normalized loss modifier as a gradient penalty.
7. The image-to-image translation method based on the discriminative area candidate countermeasure network as claimed in claim 4, wherein the γ loss is:in the formula (d)rTo distinguish the region, yrAs a region on the real image corresponding to the discrimination region on the first image G (x, z), FDRPnet(G (x, z)) is a discrimination region on the first image G (x, z), β and γ are hyper-parameters, | × | | | survival1Is a norm, representing the sum of the absolute values of the differences between two elements; ex,y,z[||y-G(x,z)||1For the loss between the real image y and the first image G (x, z),as an area y on the real image corresponding to the discrimination area on the first image G (x, z)rTo the first image G (x, z)Discriminating region FDRPnet(G (x, z)) in the same way.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810820240.4A CN109166102A (en) | 2018-07-24 | 2018-07-24 | It is a kind of based on critical region candidate fight network image turn image interpretation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810820240.4A CN109166102A (en) | 2018-07-24 | 2018-07-24 | It is a kind of based on critical region candidate fight network image turn image interpretation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109166102A true CN109166102A (en) | 2019-01-08 |
Family
ID=64898347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810820240.4A Pending CN109166102A (en) | 2018-07-24 | 2018-07-24 | It is a kind of based on critical region candidate fight network image turn image interpretation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109166102A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109889849A (en) * | 2019-01-30 | 2019-06-14 | 北京市商汤科技开发有限公司 | Video generation method, device, medium and equipment |
CN110163813A (en) * | 2019-04-16 | 2019-08-23 | 中国科学院深圳先进技术研究院 | A kind of image rain removing method, device, readable storage medium storing program for executing and terminal device |
CN110210422A (en) * | 2019-06-05 | 2019-09-06 | 哈尔滨工业大学 | It is a kind of based on optical imagery auxiliary naval vessel ISAR as recognition methods |
CN110705328A (en) * | 2019-09-27 | 2020-01-17 | 江苏提米智能科技有限公司 | Method for acquiring power data based on two-dimensional code image |
CN110868598A (en) * | 2019-10-17 | 2020-03-06 | 上海交通大学 | Video content replacement method and system based on countermeasure generation network |
CN111340716A (en) * | 2019-11-20 | 2020-06-26 | 电子科技大学成都学院 | Image deblurring method for improving dual-discrimination countermeasure network model |
CN111539439A (en) * | 2020-04-30 | 2020-08-14 | 宜宾电子科技大学研究院 | Image semantic segmentation method |
CN112330569A (en) * | 2020-11-27 | 2021-02-05 | 上海眼控科技股份有限公司 | Model training method, text denoising method, device, equipment and storage medium |
CN112330542A (en) * | 2020-11-18 | 2021-02-05 | 重庆邮电大学 | Image reconstruction system and method based on CRCSAN network |
CN112381839A (en) * | 2020-11-14 | 2021-02-19 | 四川大学华西医院 | Breast cancer pathological image HE cancer nest segmentation method based on deep learning |
CN113012071A (en) * | 2021-03-30 | 2021-06-22 | 华南理工大学 | Image out-of-focus deblurring method based on depth perception network |
CN113139893A (en) * | 2020-01-20 | 2021-07-20 | 北京达佳互联信息技术有限公司 | Image translation model construction method and device and image translation method and device |
US11151703B2 (en) | 2019-09-12 | 2021-10-19 | International Business Machines Corporation | Artifact removal in medical imaging |
CN113962885A (en) * | 2021-10-14 | 2022-01-21 | 东北林业大学 | Image highlight processing method based on improved cycleGAN |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699527A (en) * | 2013-12-20 | 2014-04-02 | 上海合合信息科技发展有限公司 | Image translation system and method |
US20150199792A1 (en) * | 2014-01-13 | 2015-07-16 | Arecont Vision, Llc. | System and method for obtaining super image resolution through optical image translation |
CN106951919A (en) * | 2017-03-02 | 2017-07-14 | 浙江工业大学 | A kind of flow monitoring implementation method based on confrontation generation network |
CN107123151A (en) * | 2017-04-28 | 2017-09-01 | 深圳市唯特视科技有限公司 | A kind of image method for transformation based on variation autocoder and generation confrontation network |
CN108009628A (en) * | 2017-10-30 | 2018-05-08 | 杭州电子科技大学 | A kind of method for detecting abnormality based on generation confrontation network |
-
2018
- 2018-07-24 CN CN201810820240.4A patent/CN109166102A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699527A (en) * | 2013-12-20 | 2014-04-02 | 上海合合信息科技发展有限公司 | Image translation system and method |
US20150199792A1 (en) * | 2014-01-13 | 2015-07-16 | Arecont Vision, Llc. | System and method for obtaining super image resolution through optical image translation |
CN106951919A (en) * | 2017-03-02 | 2017-07-14 | 浙江工业大学 | A kind of flow monitoring implementation method based on confrontation generation network |
CN107123151A (en) * | 2017-04-28 | 2017-09-01 | 深圳市唯特视科技有限公司 | A kind of image method for transformation based on variation autocoder and generation confrontation network |
CN108009628A (en) * | 2017-10-30 | 2018-05-08 | 杭州电子科技大学 | A kind of method for detecting abnormality based on generation confrontation network |
Non-Patent Citations (1)
Title |
---|
CHAO WANG,ET AL: "Discriminative Region Proposal Adversarial Networks for High-Quality Image-to-Image Translation", 《ARXIV:1711.09554V2》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109889849A (en) * | 2019-01-30 | 2019-06-14 | 北京市商汤科技开发有限公司 | Video generation method, device, medium and equipment |
CN110163813A (en) * | 2019-04-16 | 2019-08-23 | 中国科学院深圳先进技术研究院 | A kind of image rain removing method, device, readable storage medium storing program for executing and terminal device |
CN110163813B (en) * | 2019-04-16 | 2022-02-01 | 中国科学院深圳先进技术研究院 | Image rain removing method and device, readable storage medium and terminal equipment |
CN110210422B (en) * | 2019-06-05 | 2021-04-27 | 哈尔滨工业大学 | Ship ISAR image identification method based on optical image assistance |
CN110210422A (en) * | 2019-06-05 | 2019-09-06 | 哈尔滨工业大学 | It is a kind of based on optical imagery auxiliary naval vessel ISAR as recognition methods |
US11151703B2 (en) | 2019-09-12 | 2021-10-19 | International Business Machines Corporation | Artifact removal in medical imaging |
CN110705328A (en) * | 2019-09-27 | 2020-01-17 | 江苏提米智能科技有限公司 | Method for acquiring power data based on two-dimensional code image |
CN110868598A (en) * | 2019-10-17 | 2020-03-06 | 上海交通大学 | Video content replacement method and system based on countermeasure generation network |
CN110868598B (en) * | 2019-10-17 | 2021-06-22 | 上海交通大学 | Video content replacement method and system based on countermeasure generation network |
CN111340716A (en) * | 2019-11-20 | 2020-06-26 | 电子科技大学成都学院 | Image deblurring method for improving dual-discrimination countermeasure network model |
CN113139893A (en) * | 2020-01-20 | 2021-07-20 | 北京达佳互联信息技术有限公司 | Image translation model construction method and device and image translation method and device |
CN113139893B (en) * | 2020-01-20 | 2023-10-03 | 北京达佳互联信息技术有限公司 | Image translation model construction method and device and image translation method and device |
CN111539439A (en) * | 2020-04-30 | 2020-08-14 | 宜宾电子科技大学研究院 | Image semantic segmentation method |
CN112381839A (en) * | 2020-11-14 | 2021-02-19 | 四川大学华西医院 | Breast cancer pathological image HE cancer nest segmentation method based on deep learning |
CN112330542A (en) * | 2020-11-18 | 2021-02-05 | 重庆邮电大学 | Image reconstruction system and method based on CRCSAN network |
CN112330542B (en) * | 2020-11-18 | 2022-05-03 | 重庆邮电大学 | Image reconstruction system and method based on CRCSAN network |
CN112330569A (en) * | 2020-11-27 | 2021-02-05 | 上海眼控科技股份有限公司 | Model training method, text denoising method, device, equipment and storage medium |
CN113012071A (en) * | 2021-03-30 | 2021-06-22 | 华南理工大学 | Image out-of-focus deblurring method based on depth perception network |
CN113012071B (en) * | 2021-03-30 | 2023-01-06 | 华南理工大学 | Image out-of-focus deblurring method based on depth perception network |
CN113962885A (en) * | 2021-10-14 | 2022-01-21 | 东北林业大学 | Image highlight processing method based on improved cycleGAN |
CN113962885B (en) * | 2021-10-14 | 2024-05-28 | 东北林业大学 | Image highlight processing method based on improvement CycleGAN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109166102A (en) | It is a kind of based on critical region candidate fight network image turn image interpretation method | |
Engin et al. | Cycle-dehaze: Enhanced cyclegan for single image dehazing | |
CN107154023B (en) | Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution | |
CN101339607B (en) | Human face recognition method and system, human face recognition model training method and system | |
CN109035172B (en) | Non-local mean ultrasonic image denoising method based on deep learning | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
CN110473142B (en) | Single image super-resolution reconstruction method based on deep learning | |
CN102063708B (en) | Image denoising method based on Treelet and non-local means | |
CN110826389B (en) | Gait recognition method based on attention 3D frequency convolution neural network | |
CN111161178A (en) | Single low-light image enhancement method based on generation type countermeasure network | |
CN113870157A (en) | SAR image synthesis method based on cycleGAN | |
Swami et al. | Candy: Conditional adversarial networks based fully end-to-end system for single image haze removal | |
CN111275638A (en) | Face restoration method for generating confrontation network based on multi-channel attention selection | |
CN114782298B (en) | Infrared and visible light image fusion method with regional attention | |
CN116596792B (en) | Inland river foggy scene recovery method, system and equipment for intelligent ship | |
CN116934747B (en) | Fundus image segmentation model training method, fundus image segmentation model training equipment and glaucoma auxiliary diagnosis system | |
CN115731597A (en) | Automatic segmentation and restoration management platform and method for mask image of face mask | |
CN116563916A (en) | Attention fusion-based cyclic face super-resolution method and system | |
CN117456330A (en) | MSFAF-Net-based low-illumination target detection method | |
CN118172283A (en) | Marine target image defogging method based on improved gUNet model | |
CN117217997A (en) | Remote sensing image super-resolution method based on context perception edge enhancement | |
Viacheslav et al. | Low-level features for inpainting quality assessment | |
Zhang et al. | Face deblurring based on separable normalization and adaptive denormalization | |
CN113487506A (en) | Countermeasure sample defense method, device and system based on attention denoising | |
CN115393491A (en) | Ink video generation method and device based on instance segmentation and reference frame |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190108 |
|
WD01 | Invention patent application deemed withdrawn after publication |