CN109166102A

CN109166102A - It is a kind of based on critical region candidate fight network image turn image interpretation method

Info

Publication number: CN109166102A
Application number: CN201810820240.4A
Authority: CN
Inventors: 郑海永; 王超; 俞智斌
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2019-01-08

Abstract

The present invention provides a kind of image for fighting network based on critical region candidate and turns image interpretation method, and the semantic segmentation figure of true picture is input to the generator, generates the first image；The first image is inputted in described image block arbiter, predicts to obtain shot chart by described image block arbiter；Most apparent artifact region image block is found in the shot chart using sliding window, the artifact region image block is mapped in the first image, the critical region in the first image is obtained；Operation is masked to the true picture using the critical region, the fault image after obtaining mask；Fault image after the true picture and the mask is input in the corrector, for judging the true and false of input picture；The generator generates the image closer to the true picture according to the amendment of the corrector.The present invention can synthesize the high quality graphic of high-resolution, true detail and less artifact.

Description

Image-to-image translation method based on discrimination area candidate countermeasure network

Technical Field

The invention relates to the technical field of image processing, in particular to an image-to-image translation method based on a discrimination area candidate countermeasure network.

Background

From the perspective of human visual perception, we consider a composite image to be false, usually because it contains local artifacts. Although at first sight it seems realistic, we can still easily tell authenticity by gazing at only about 1000 ms. Human beings have the ability to draw a real scene from coarse structures to fine details, which is how we usually get the global structure of a scene while focusing on the details of an object and understand how it is related to the surrounding environment.

Many efforts have been made to develop automated image translation systems. The straightforward approach is to optimize either the L1 or L2 penalty in pixel space, however both suffer from blurring problems. Thus, some work has increased the counter-loss to produce sharp images in both spatial and spectral dimensions. In addition to GAN loss, perceptual loss has been applied to image-to-image translation tasks, but it is limited to pre-trained depth models and training data sets. Despite the various losses in evaluating the difference between the actual image and the generated image, the image-to-image translation with GAN still suffers from artifacts and uneven color distribution problems and even makes it difficult to generate a true picture of high resolution due to the high-dimensional distribution.

Disclosure of Invention

The invention provides an image-to-image translation method based on a discrimination area candidate countermeasure network, which aims to solve the technical problems of artifacts, unbalanced color distribution, low resolution of converted images and the like in image-to-image translation in the prior art.

An image-to-image translation method based on a discrimination area candidate countermeasure network, the discrimination area candidate countermeasure network comprises a generator, an image block discriminator and a corrector, the method comprises the following steps:

s1: inputting a semantic segmentation graph of a real image into the generator to generate a first image;

s2: inputting the first image into the image block discriminator, and predicting through the image block discriminator to obtain a score map;

s3: finding the most obvious artifact area image block in the score map by using a sliding window, and mapping the artifact area image block into the first image to obtain a judgment area in the first image;

s4: carrying out mask operation on the real image by using the distinguishing area to obtain a masked false image;

s5: inputting the real image and the masked false image into the corrector for judging the truth of the input image;

s6: the generator generates an image closer to the real image according to the correction by the corrector.

Further, a resolution w is given_i×w_iAnd processed by the image block discriminator as w_s×w_sSize score chart ifIt is desired to obtain w^*×w^*The size of the sliding window of the score map is w × w, where w is w^*×w_s/w_i。

Further, the discrimination area d_rIs the area mapped by the discrimination area candidate confrontation network, namely:wherein,in the formula,in order to determine the center coordinates of the region, τ is the distance between the first image and the score map, (x)_c,y_c) Is the center coordinate of the score plot.

Further, the total objective function of the discrimination area candidate countermeasure network is as follows:

L(G,D_p,R)＝(1-λ)L_D(G,D_p)+λL_R(G,R)+L_L1(G) wherein L is_D(G,D_p) Is the loss of the image block discriminator, L_R(G, R) is the loss of the corrector, L_L1(G) Is the gamma loss.

Further, the loss of the image block discriminator is:

L_D(G,D_p)＝E_y[logD_p(x,y)]+E_x,z[log(1-D_p(x,G(x,z)))]，

in the formula, D_pIs an image block discriminator, x is a semantic segmentation map of a real image, y is the real image, G (x, z) is a first image, E_y[logD_p(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into the image block discriminator, the probability is between 0 and 1, the label of the real image is 1, calculating the loss between the probability and 1, E_x,z[log(1-D_p(x,G(x,z)))]The probability obtained after the semantic segmentation graph x of the real image and the first image G (x, z) are input into the image block discriminator is represented, the probability is between 0 and 1, the label of the false graph is 0, and the loss between the probability and 0 is calculated.

Further, the loss of the corrector is:

wherein R is a corrector, y_maskFor the masked false image, α is the hyper-parameter, δ is the random noise on the semantic segmentation map x of the real image,for the gradient of the semantic segmentation map x of the real image, E_y[logR(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into a corrector, the probability being between 0 and 1, the label of the real image being 1, calculating the loss between the probability and 1, E_x,z[log(1-R(x,y_mask))]Showing a semantic segmentation map x of a real image and a masked false image y_maskThe probability obtained after input to the modifier is between 0 and 1, the label of the false graph is 0, the loss between the probability and 0 is calculated,indicating the addition of a normalized loss modifier as a gradient penalty.

Further, the gamma loss is:

in the formula (d)_rFor the discrimination region, yr is a region on the real image corresponding to the discrimination region on the first image G (x, z), F_DRPnet(G (x, z)) is a discrimination region on the first image G (x, z), β and γ are hyper-parameters, | × | | | survival₁Is a norm, representing the sum of the absolute values of the differences between two elements; e_x,y,z[||y-G(x,z)||₁For the loss between the real image y and the first image G (x, z),as an area y on the real image corresponding to the discrimination area on the first image G (x, z)_rAnd a discrimination region F on the first image G (x, z)_DRPnet(G (x, z)) in the same way.

The invention provides an image-to-image translation method based on a discrimination area candidate countermeasure network, which has the following advantages: the method can synthesize high-quality images with high resolution, real details and less artifacts.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1(a) is a schematic diagram of a real image for a semantic segmentation map;

FIG. 1(b) is a semantic segmentation diagram;

FIG. 2 is a schematic flow chart of a method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a process of improving the quality of a synthesized image by training DRPANs according to an embodiment of the present disclosure;

fig. 4 is a schematic score chart of the first image and the real image according to the embodiment of the present application;

FIG. 5 is a schematic diagram of the necessity of DRPANs for high quality image-to-image translation according to an embodiment of the present application;

FIG. 6 is a schematic diagram showing a comparison of qualitative results of different size decision regions for DRPANs compared with ID-CGAN according to the embodiment of the present invention;

FIG. 7 is a comparison of qualitative results for translating real semantic tags for DRPANs compared to Pix2Pix in an embodiment of the present application;

FIG. 8 is a schematic diagram showing a comparison of the translation results from aerial photographs to maps and from maps to aerial photographs in the embodiment of the present application, comparing DRPANs with Pix2 Pix;

FIG. 9 is a comparison of the DRPANs compared with the CRN and Pix2Pix for comparing the results from abstract image to real image in the embodiment of the present application;

fig. 10 is a diagram showing the comparison of the translation results from edge to real and sketch to real compared with other methods for DRPANs of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It is noted that examples of the described embodiments are illustrated in the accompanying drawings, where like reference numerals refer to the same or similar components or components having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

It should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Examples

The embodiment of the application is the preferred embodiment of the application.

The embodiment of the application provides a discrimination area candidate countermeasure network (DRPANs) for high-quality image-to-image translation, the discrimination area candidate countermeasure network comprises a generator, an image block Discriminator and a corrector, wherein the image block Discriminator (Patch Discriminator) adopts a Patch GAN Markov Discriminator to extract a false image after a discrimination area generates a mask.

As shown in fig. 2, the method comprises the steps of:

the image semantic segmentation is that a machine automatically segments and identifies the content in an image, and the semantic segmentation image is an image obtained by performing image semantic segmentation on a real image. As shown in fig. 1, fig. 1(a) is a real image, and fig. 1(b) is a semantic segmentation map corresponding to the real image.

The first image is a false map.

the specific method for obtaining the score map through the image block discriminator prediction comprises the following steps:

the first image input to the image block discriminator is partitioned according to a matrix, for example, the first image is partitioned into 10 × 10 matrix partitions, as shown in fig. 4, a sum of values on the matrix obtained through the neural network of the image block discriminator is obtained, where each value in the matrix represents a true or false degree of a corresponding position of the discrimination region, and the more true the image in the corresponding matrix block is, the closer the score is to 1, and the more false the image in the corresponding matrix block is, the closer the score is to 0. A higher score of the score map indicates a more true first image, and a lower score of the score map indicates a more false first image.

As shown in fig. 3, a process of how to improve the quality of the composite image is shown. It can be seen that as the DRPANs continue to train, the discrimination regions (right) of the masked false images change, so the quality (left) of the composite image improves while producing brighter (higher scoring) score maps (first and last). Although it is difficult to distinguish synthetic samples from actual samples after many training passes, the DRPANs of the embodiments of the present application can still be optimized with continuous modifications in detail to the generator to achieve high quality results.

The image block discriminator is first used to generate a meaningful score map, and its application is not limited to picture synthesis. Fig. 4 is an output result of the score map at different image quality levels (false and true) by the pre-training image patch discriminator PatchGAN. Wherein, the first column input represents the semantic segmentation map of the Real image, the second column Fake represents the generated first image, the third column Score map represents the generated Score map of the first image, the fourth column Real represents the Real image, and the fifth column Score map represents the Score map of the Real image, and it can be seen that the false sample Score map with obvious artifacts and shape deformation has darker connecting regions and lower scores. In contrast, the score map of a real sample is brighter and has a higher score. From its visualization point of view, it can be proposed that the discrimination area finds the darkest part, and has an excellent property of finding false areas.

Based on FIG. 4, in the embodiment of the present application, a resolution is given as w_i×w_iAnd processed by the image block discriminator as w_s×w_sSize score map, if desired to obtain w^*×w^*First, the size of the sliding window of the score map is calculated as w × w, where w is w^*×w_s/w_i；

According to the central coordinates of the discrimination areaAnd the side length w of the discrimination region^*Extracting the area which is considered the most false by the image block discriminator from the synthesized first image, namely, the discrimination area (discrimination area), wherein the discrimination area d_rIs the area mapped by the discrimination area candidate confrontation network, namely: d_r＝F_DRPnet(x^* _c,y^* _c,w^*) Whereinin the formula,is a discrimination regionThe center coordinate of the domain, τ, is the distance between the first image and the score map.

To realize high-quality image-to-image translation, not only the independent local area needs to be optimized, but also the relation between the false discrimination area and the surrounding real area needs to be emphasized, and only then, the false discrimination area can be connected to the real area to provide correction for the generator. Therefore, the embodiment of the application adopts the false distinguishing area to mask the corresponding real image, generates the masked false image, and then designs the image generator which uses the CNN (convolutional neural network) modifier to distinguish the true image from the false image, thereby optimizing and synthesizing the high-quality image generator.

For the task of translating images into images, not only a vivid sample is generated, but also diversity is expected to be realized through different condition inputs. The discrimination area candidate countermeasure network provided by the invention comprises a generator G and an image block discriminator D_pAnd a corrector R. Generator G passes through image block discriminator D_pAnd modifiers R and L1, so the overall objective function is:

The native countermeasure network GANs have problems of instability and mode collapse. The embodiments of the present application thus improve the training of GANs. To stably train DRPANs with a high diversity synthesis capability, the embodiments of the present application modify DRPANs as the penalty for the modifier R and train the Patch Discriminator (image block Discriminator) using the original objective function.

The loss of the image block discriminator is as follows: l is_D(G,D_p)＝E_y[logD_p(x,y)]+E_x,z[log(1-D_p(x,G(x,z)))]，

In the formula, D_pIs an image block discriminator, and is,x is the semantic segmentation map of the real image, y is the real image, G (x, z) is the first image, E_y[logD_p(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into the image block discriminator, the probability is between 0 and 1, the label of the real image is 1, calculating the loss between the probability and 1, E_x,z[log(1-D_p(x,G(x,z)))]The probability obtained after the semantic segmentation graph x of the real image and the first image G (x, z) are input into the image block discriminator is represented, the probability is between 0 and 1, the label of the false graph is 0, and the loss between the probability and 0 is calculated.

The goal of the discriminator D in GANs is that D is desirable because the closer the true image is to 1, the better, and the closer the false image generated by the generator G is to 0, the better_pThe closer to 1 the (x, y) is, the better, D_pThe closer (x, G (x, z)) is to 0, the better.

For the modifier R, to distinguish between a very similar real image and a masked false image real y_maskM (G (x, z)), where M (·) denotes a masking operation, the embodiment of the present application adds a regularized loss modifier as a penalty.

The loss of the corrector is:

wherein R is a corrector, y_maskFor the masked false image, α is the hyper-parameter, δ is the random noise on the semantic segmentation map x of the real image,for the gradient of the semantic segmentation map x of the real image, E_y[logR(x,y)]Representing the probability obtained by inputting the semantic segmentation graph x and the real image y of the real image into a corrector, the probability being between 0 and 1, the label of the real image being 1, calculating the loss between the probability and 1, E_x,z[log(1-R(x,y_mask))]Semantic segmentation graph x and masking representing real imagesModeled false image y_maskThe probability obtained after input to the modifier is between 0 and 1, the label of the false graph is 0, the loss between the probability and 0 is calculated,indicating the addition of a normalized loss modifier as a gradient penalty.

Previous studies found that it is beneficial to combine the GANs targets with traditional losses (e.g., distance of L2 and L1). Considering that the loss of L1 produces less blurring than the loss of L2, an additional L1 loss is provided over the entire input image and the local decision region of the generator for regularization:

in the formula (d)_rTo distinguish the region, y_rAs a region on the real image corresponding to the discrimination region on the first image G (x, z), F_DRPnet(G (x, z)) is a discrimination region on the first image G (x, z), β and γ are hyper-parameters, | × | | | survival₁Is a norm, representing the sum of the absolute values of the differences between two elements; e_x,y,_z[||y-G(x,z)||₁For the loss between the real image y and the first image G (x, z),as an area y on the real image corresponding to the discrimination area on the first image G (x, z)_rAnd a discrimination region F on the first image G (x, z)_DRPnet(G (x, z)) in the same way.

The invention provides a distinguishing area candidate confrontation network architecture which comprises the following steps: for the generator R, its convincing single image super-resolution architecture is used. And (3) respectively performing up-sampling and down-sampling by adopting a convolution layer and a deconvolution layer, and 9 residual blocks for task learning. Each layer uses batch normalization and ReLU as activation functions. For the image block discriminator, it is mainly implemented by a PatchGAN of 70 × 70. The modifier is a discriminator that modifies on DCGAN with a global view on all inputs. At the end of the discriminators and modifiers, probabilities are output using Sigmoid as the activation function.

The training process of the embodiment of the application is as follows: by minimizing L (G, D)_pR) to learn DRPANs parameters and use a mini-batch SGD and an application Adam optimizer to learn the difference between masked false and real images, the corrector R adds a gradient penalty as a regularized scheme that forces the corrector parameters to be a RipHizsContinue in x. it was found experimentally that the hyper-parameter α was set to 10, works robustly on various data sets, sets the number of steps to 1, and assigns the size of the mini-batch to 1-4 in different tasks.

In order to evaluate the performance of the method provided by the invention in image-to-image translation, the embodiment deploys translation task experiments on different levels, and compares the method provided by the invention with the prior art. For different tasks, different assessment metrics were used, including human perception studies and automated quantitative measurements.

1) Evaluation index

And (5) evaluating the image quality. PSNR (Peak Signal to Noise Ratio), SSIM (structural distortion index measurement system) and VIF (variance expansion Factor) are among the most popular evaluation criteria for low-level computer vision tasks such as deblurring, defogging and image restoration. Therefore, for the task of removing rain and satellite map to map, PSNR, SSIM, VIF and RECO (Relative Edge Coherence) are used in this embodiment to verify the performance of the results.

And (5) evaluating indexes of image segmentation. In this embodiment, the actual semantic tag task on the cityscaps dataset is evaluated using the standard metrics of cityscaps benchmark, including per-pixel precision, per-Class precision, and Class IOU (interaction over unit).

Amazon Mechanical Turk (AMT). AMT is used as a golden metric to assess the authenticity of a composite image in many tasks, and this embodiment uses it as a task evaluation metric for semantic tag-to-real image and map-to-satellite map translation.

FCN-8s score. As a result of automatic quantitative measurements using off-the-shelf classifiers, a classifier trained on real images can also correctly classify synthetic images if the generated images are real. This embodiment uses FCN-8s score to evaluate the semantic tags of the actual tasks on the Cityscapes dataset.

2) Discriminating between regional candidate countermeasure networks DRPANs

To investigate the effect of DRPANs on correcting for different loss cases between synthetic and candidate regions and actual regions, this example designed an experiment. Starting from the pre-trained PatchGAN, the multi-channel training continues: continuing to train with PatchGAN; continuing to train with PatchGAN under the condition that L1 loses discrimination and real area; training continues using PatchGAN and modifier.

The Patch discriminator (PatchD for short) effectively finds the most false or real regions from the image (see FIG. 4), but it is difficult to improve the details of these regions because PatchD has difficulty capturing high-dimensional distributions. In this case, the invention proposes to discriminate the regional candidate countermeasure network DRPnet (increasing the strength of the patch d) and to design a modifier to gradually eliminate the visual artifact, thus turning it into a low-dimensional estimation problem. This can be viewed as a "top-down" process, unlike other gradual "bottom-up" image generation methods. Fig. 5 illustrates the necessity of the DRPANs proposed by the present embodiment for high quality image-to-image translation, which illustrates that continuing to train the patch d without contributing to the reduction of artifacts in the event of a loss of balance L1. DRPANs with L1 losses can smooth the artifacts but do not sharpen in detail, while DRPANs with modifiers exceed the performance of the PatchD and have fewer visual artifacts. The modifier and L1 losses combine to reduce artifacts that are ignored by the patch d. The present example also finds that the pseudo-masking operation can improve the fluency of the entire image in some samples (e.g., the connection between the door and the wall). Therefore, DRPANs with a pseudo mask were achieved in the following experiments.

3) Low level translation

First, the model of the present invention is applied to two low-level translation tasks that are only related to the apparent translation of the image, e.g., in a de-raining task, without the need to alter the content and texture of the input samples. Therefore, in eqn.9, λ is set to 1, and image synthesis is performed using only the corrector.

And removing rain from the single image. Fig. 6 shows that compared with ID-CGAN (Image De-raining Conditional general adaptive Network, Image rain-removing Conditional general countermeasure Network), the DRPANs of the present invention have qualitative results of different sized discrimination regions, and the performance of DRPANs is better than that of ID-CGAN, not only more effective, but also more vivid color and clear details. Table 1 shows the corresponding quantitative results evaluated by the PSNR, SSIM, VIF and RECO indices, with the best results (in bold) achieved by the DRPANs of the present invention.

4) Real to abstract translation

The DRPANs proposed by the present invention are implemented on two real to abstract translation tasks, which require many-to-one abstraction capabilities.

True semantic tags. For the true semantic labeling task, this embodiment tested the DRPANs model on the two most commonly used data sets: cityscapes and facades. Fig. 7 shows the qualitative results of the DRPANs of the present invention for translating true semantic tags compared to Pix2Pix on the cityscaps dataset, the DRPANs can synthesize results closer to true targets than Pix2Pix, and the quantitative results in table 2 can also illustrate this conclusion in terms of per-pixel precision, per-Class precision and Class IOU.

Satellite map to map. The present embodiment also applies DRPANs on aerial photographs to the mapping task and experiments were carried out using paired images at 512 × 512 resolution. The top row of fig. 8 shows qualitative results of the inventive DRPANs compared to Pix2Pix, indicating that the inventive DRPANs can correctly translate the highway on the aerial photograph into the orange line on the map, whereas Pix2Pix cannot.

Map to satellite map. Relative to the satellite map-to-map mission, the present embodiment also tested the DRPANs of the present invention on the map-to-satellite map mission, with the qualitative results shown at the bottom row of fig. 8, which clearly shows that the DRPANs of the present invention can synthesize aerial photographs of higher quality than Pix2 Pix.

5) Abstract to real translation

In addition, this embodiment also demonstrates several abstract to real tasks of DRPANs proposed by the present invention, which can translate one to more: semantic tags to photos, maps to satellite maps, edges to trues, and sketches to trues.

Semantic tags to true. For semantic tags of an actual task, the translation model aims to synthesize a real-world image from the semantic tags. CGAN (generating countermeasure networks) based approaches fail to capture details in the real world and suffer from distortion and ambiguity problems. CNN (convolutional neural network) based methods (e.g., CRN) can synthesize high resolution but smooth not realistic results. Figure 9 shows a qualitative comparison of the results, from which it can be seen that the DRPANs of the present invention can synthesize the most realistic results with high quality (clearer, lower distortion, high resolution) compared to Pix2Pix and CRN.

Evaluation of GAN remains a challenging problem. Many efforts use off-the-shelf classifiers as an automatic measure of the composite image. Table 3 shows the performance evaluation of the FCN-8s model segment, the DRPANs of the present invention exceed Pix2Pix 10% per pixel accuracy and also achieve the highest performance at each level of accuracy and Class IOU.

Human perception verification. The embodiment evaluates the performance of abstract to real semantic tags on photos and maps to satellite maps through the AMT. For the study of authenticity, the present embodiment follows a perceptual study protocol, collecting data for each algorithm from 30 participants. Each participant had 1000 milliseconds to see a sample. The present embodiment also compares the trueness of the composite image between different algorithms. Table 4 shows that images synthesized from DRPANs are more realistic than the prior art (DRPAN 18.2% > CRN 9.4% > Pix2Pix 3.3%). And compared with Pix2Pix and CRN, the images synthesized by DRPANs are more realistic by 95.2% and 75.7%. Table 5 shows the results of a comparison of the map-to-satellite map mission, the DRPANs of the present invention were hidden from participants at a rate exceeding Pix2Pix 18.7% and CycleGAN 26.8%, respectively.

Edge-to-true and sketch-to-true. For edge-to-real and sketch-to-real tasks, previous work often encountered two problems: one is that when the input of edges and the like is sparse, artifacts and artificial color distribution are easily generated in the region; another is that it is difficult to handle exception inputs such as sketches. This example tests the DRPANs models of the present invention on UT zapops 50K dataset and handbag dataset. Figure 10 shows that the model of the present invention also deals well with both of these problems.

The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image-to-image translation method based on a discrimination area candidate countermeasure network is characterized in that the discrimination area candidate countermeasure network comprises a generator, an image block discriminator and a corrector, and the method comprises the following steps:

2. The method of claim 1, wherein a resolution w is given to_i×w_iAnd processed by the image block discriminator as w_s×w_sSize score map, if desired to obtain w^*×w^*The size of the sliding window of the score map is w × w, where w is w^*×w_s/w_i。

3. The image-to-image translation method based on the confrontation network of the discrimination region candidate as claimed in claim 2, wherein the discrimination region d_rIs the area mapped by the discrimination area candidate confrontation network, namely:wherein,in the formula,in order to determine the center coordinates of the region, τ is the distance between the first image and the score map, (x)_c,y_c) Is the center coordinate of the score plot.

4. The image-to-image translation method based on the discriminative area candidate countermeasure network as claimed in claim 1, wherein the total objective function of the discriminative area candidate countermeasure network is:

5. The image-to-image translation method based on the discriminative area candidate countermeasure network of claim 4, wherein the loss of the image block discriminator is:

L_D(G,D_p)＝E_y[logD_p(x,y)]+E_x,z[log(1-D_p(x,G(x,z)))]，

6. The image-to-image translation method based on the discriminative area candidate countermeasure network as claimed in claim 4, wherein the loss of the modifier is:

7. The image-to-image translation method based on the discriminative area candidate countermeasure network as claimed in claim 4, wherein the γ loss is:in the formula (d)_rTo distinguish the region, y_rAs a region on the real image corresponding to the discrimination region on the first image G (x, z), F_DRPnet(G (x, z)) is a discrimination region on the first image G (x, z), β and γ are hyper-parameters, | × | | | survival₁Is a norm, representing the sum of the absolute values of the differences between two elements; e_x,y,z[||y-G(x,z)||₁For the loss between the real image y and the first image G (x, z),as an area y on the real image corresponding to the discrimination area on the first image G (x, z)_rTo the first image G (x, z)Discriminating region F_DRPnet(G (x, z)) in the same way.