CN112541864A - Image restoration method based on multi-scale generation type confrontation network model - Google Patents
Image restoration method based on multi-scale generation type confrontation network model Download PDFInfo
- Publication number
- CN112541864A CN112541864A CN202011021917.1A CN202011021917A CN112541864A CN 112541864 A CN112541864 A CN 112541864A CN 202011021917 A CN202011021917 A CN 202011021917A CN 112541864 A CN112541864 A CN 112541864A
- Authority
- CN
- China
- Prior art keywords
- image
- discriminator
- generator
- network
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 238000012549 training Methods 0.000 claims abstract description 74
- 230000008439 repair process Effects 0.000 claims abstract description 53
- 230000000694 effects Effects 0.000 claims abstract description 34
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000002156 mixing Methods 0.000 claims abstract description 13
- 238000009826 distribution Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 24
- 230000008485 antagonism Effects 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 230000002194 synthesizing effect Effects 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013256 Gubra-Amylin NASH model Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims 1
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 22
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 22
- 238000010586 diagram Methods 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 230000007547 defect Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000009792 diffusion process Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000036039 immunity Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- NAPPWIFDUAHTRY-XYDRQXHOSA-N (8r,9s,10r,13s,14s,17r)-17-ethynyl-17-hydroxy-13-methyl-1,2,6,7,8,9,10,11,12,14,15,16-dodecahydrocyclopenta[a]phenanthren-3-one;(8r,9s,13s,14s,17r)-17-ethynyl-13-methyl-7,8,9,11,12,14,15,16-octahydro-6h-cyclopenta[a]phenanthrene-3,17-diol Chemical compound O=C1CC[C@@H]2[C@H]3CC[C@](C)([C@](CC4)(O)C#C)[C@@H]4[C@@H]3CCC2=C1.OC1=CC=C2[C@H]3CC[C@](C)([C@](CC4)(O)C#C)[C@@H]4[C@@H]3CCC2=C1 NAPPWIFDUAHTRY-XYDRQXHOSA-N 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 210000001508 eye Anatomy 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000001331 nose Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G06T5/77—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention belongs to the technical field of image restoration, and discloses an image restoration method and system based on a multi-scale generation type confrontation network model, wherein a deep generation confrontation restoration model consisting of a generator and a confrontation discriminator is constructed, and missing contents are synthesized from random noise by utilizing reconstruction loss and confrontation loss; improving the network structure of the discriminator, putting forward a multi-scale discriminator structure, carrying out countermeasure training on the discriminator structure, and repairing an image; performing subsequent processing on the repaired image by using a Poisson mixing method; and verifying the advantages of the image restoration algorithm based on the generative confrontation network model and the restoration effect of the image. The method generates a countermeasure repair model through multiple scales, and synthesizes missing contents from random noise by utilizing reconstruction loss and multiple countermeasure losses; by using the idea of WGAN and adopting EM distance to simulate data distribution, the network stability is improved and the picture restoration effect is improved.
Description
Technical Field
The invention belongs to the technical field of image restoration, and particularly relates to an image restoration method based on a multi-scale generation type confrontation network model.
Background
Currently, with the rapid development of deep learning in the field of computer vision, the research on the problems of image editing (image editing) and image generation (image generation) has achieved significant success. The image restoration is taken as a research hotspot in the field of current deep learning, and has important significance in the real life of people. The existing image restoration method has various problems, so that the requirements of people cannot be met visually.
Image inpainting is a problem with traditional graphics: a certain area of a certain size is missed at a certain position on an image, and the missing area is recovered by other information, so that people cannot distinguish the repaired part. As shown in fig. 8, the two images have cups and flowers in the missing areas, respectively, so that people can easily complement the images according to the content of the surrounding images. Different human repairing effects are different, so that the principles of structure, similarity, texture consistency, structure priority and the like must be followed in the image repairing process. However, the task of image restoration is extremely difficult for computers, because the problem has no uniquely determined solution, and it is a concern of researchers how to use other information to assist restoration and how to judge whether the restoration result is true enough.
At present, the image restoration algorithm mainly includes three directions: the invention mainly aims at the image restoration algorithm based on deep learning. Early image inpainting methods such as Bertalmio et al iteratively propagate low-level features of known regions along mask boundaries to unknown regions using diffusion equations. Although performing well in repair, it is limited to treating small, uniform areas. By introducing texture synthesis, the repairing effect is further improved. In Zoran and Weiss, the image with missing pixels is recovered by learning the patch a priori. In recent years, Convolutional Neural Networks (CNNs) have greatly improved the performance of tasks such as classification, object detection, and segmentation of semantic images. Ren et al have learned a convolutional network, which greatly improves the performance of image restoration through an efficient patch matching algorithm. When a similar patch is found, it performs well, but when the dataset does not contain enough data to fill the unknown region, it is likely to fail. Since each part may be unique in object repair and no patch with a trusted missing region can be found. While this problem can be alleviated through the use of external databases, the next problem is the need to learn a high-level representation of a particular object class for patch matching. Wright et al take image inpainting as a task to recover sparse signals from the input. By solving for sparse linear systems, images can be repaired from some corrupted input images. However, such algorithms require a high degree of structuring of the image. The purpose of image inpainting is to enable algorithms to complete inpainting of images without strict constraints. Vincent et al introduced a de-noised self-encoder that could learn to reconstruct a clean signal from a corrupted input. Dosovitskiy et al demonstrate that object images can be reconstructed by inverting the deep convolutional network features through the decoder network. Kingma et al propose Variational Autocoders (VAEs) that allow images to be generated by sampling or interpolation from potential units by imposing a priori on the potential units. However, VAE generated images are often blurred due to training targets based on pixel-level high gaussian likelihood.
Larsen et al improve VAE by adding a resistively trained discriminator from a generative resistively network and demonstrate that more realistic images can be generated. The closest to this work is the method proposed by deep et al, which uses an auto-encoder to combine learning visual representation with image restoration, but the picture restored using this method is not ideal in some cases, the restored area is not consistent with the whole picture, and the effect is not very good at the edge of the restored area. Yang et al proposed a multi-scale neural patch synthesis method based on joint optimization of image content and texture constraints in 2017, which not only preserves a context structure, but also generates high-frequency details by matching and adjusting the correlation between patches and the most similar intermediate layer features, thereby achieving the most advanced repair accuracy for high-resolution images at that time. Gao et al have studied the weakness of the traditional "fixed" model, have proposed an on-demand learning algorithm, is used for training the image restoration model with deep convolutional neural network, the main idea is to utilize the feedback mechanism to produce the training example that needs most oneself, thus study the model that can be promoted across the difficulty level. Aiming at the problems of the Context Encoder model, IIZUKA and the like of the early rice field university are improved, the design is expanded into two discriminators, and the trained global and local Context discriminators are used for distinguishing a real image and a repaired image respectively, so that the network can generate images which are locally and globally consistent. Liu et al believe that existing deep learning based image inpainting methods use standard convolution networks on the corrupted image, use of convolution filter responses conditioned on valid pixels and substitution values (typically averages) in the missing regions can often lead to artifacts such as color differences and blurring, and propose the use of partial convolution methods that can inpaint arbitrary non-central, irregular regions. Yan et al proposed in 2018 a "Shift-Net" model for filling any shape of missing region with sharp structures and fine textures, the encoder features of the known regions being shifted to serve as an estimate of the missing part, introducing a guiding loss on the decoder features to minimize the distance between the decoder features after the fully-connected layer and the ground-truth encoder features of the missing part, with this constraint, the decoder features of the missing region can be used to guide the shifting of the encoder features in the known regions.
In summary, researchers at home and abroad have proposed many methods for image restoration, but most of the restoration methods have low result precision and greatly improve performance. Aiming at the problems of low accuracy of the repair result, inconsistent visual repair effect, unstable training and the like of the existing method, the invention obtains the repair image with high precision, high accuracy and strong visual consistency by using the multi-scale generation type confrontation network model.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) early image inpainting methods used the diffusion equation to iteratively propagate low-level features of known regions along the mask boundary to unknown regions, and although performing well in inpainting, were limited to processing small, uniform regions.
(2) In existing convolutional neural network-based image inpainting methods, it performs well when a similar patch is found, but it is likely to fail when the dataset does not contain enough data to fill the unknown region.
(3) Image inpainting is used as a task to recover sparse signals from the input. By solving for sparse linear systems, images can be repaired from some corrupted input images. However, such algorithms require a high degree of structuring of the image.
(4) By applying a priori on the potential elements by a variational self-encoder, an image can be generated by sampling or interpolation from the potential elements. However, VAE generated images are often blurred due to training targets based on pixel-level high gaussian likelihood.
The difficulty in solving the above problems and defects is:
when the image damaged area is large, the repairing effect is poor and unsatisfactory, and the global and local consistency of the repaired image cannot be maintained, so that the repaired image lacks integrity.
The significance of solving the problems and the defects is as follows:
the stability of image restoration is improved, the restored image with high precision, high accuracy and strong visual consistency is obtained, and the restoration effect of the image is improved.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an image restoration method based on a multi-scale generation type confrontation network model.
The invention is realized in such a way that an image restoration method based on a multi-scale generation type confrontation network model comprises the following steps:
step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by using reconstruction loss and countermeasure loss.
And step two, improving the network structure of the discriminator, proposing a multi-scale discriminator structure on the basis of the global discriminator and the local discriminator, and carrying out countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images.
And step three, performing subsequent processing on the repaired image by using expansion convolution and a Poisson mixing method in a generator.
And step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.
Further, in step one, the multi-scale confrontation network model comprises a generation network for image restoration, and four additional discriminator networks for training assistance, namely two multi-scale discriminator networks, a global discriminator network and a local discriminator network.
Further, in step one, the generator uses a convolution self-encoder as a generator modelI.e. a standard encoder-decoder architecture that takes as input an image with missing regions, and generates a latent feature representation of the image by a convolution operation. The decoder architecture uses this latent feature representation to restore the original resolution by a transposed convolution operation, resulting in image content of the missing region. Unlike the original GAN model, which starts directly from the noise vector, the hidden representation obtained from the encoder captures more variations and relationships between the unknown and known regions, which are then input to the decoder to generate the content. The intermediate layer uses the expanded convolution, and each output pixel is calculated by using a larger input area without additional parameters and calculation amount, and compared with a standard convolution layer, the expanded convolution network model can calculate each output pixel under the influence of a larger pixel area of an input image. If the expansion convolution is not used, it will only use a small pixel area, which is not advantageousThe image composition is performed with more context information. The generator uses a standard self-encoder network, and an expansion convolutional layer is added on the basis, namely the generator network removes two layers of convolutional layers in the middle, and the types of the network layers, the sizes of convolutional cores, the number of zero padding of the convolutional cores, the step length and the number of output channels of the layers are sequentially arranged from left to right.
Further, in the first step, the discriminator compresses the image into corresponding small feature vectors based on a convolutional neural network. The prediction corresponds to a probability value that the image is authentic.
First, a local discriminatorDetermining whether the composite content of the missing region is authentic can help the network generate information of the missing content, which encourages the generated objects to be semantically effective. Its limitations are also apparent due to the locality of the local arbiter. The local discriminator loss can neither normalize the global structure of a face nor ensure the consistency of the inner and outer edges of the missing region. Therefore, the inconsistency of the pixel values of the repair picture along the repair area boundary is significant. Due to the limitation of local discriminators, another network structure named global discriminator is introducedTo determine the accuracy of the image as a whole.
Finally, a multi-scale discriminator network structure is proposed. The basic idea is to down-sample the real and synthesized images with down-sampling coefficients of 2 and 4, respectively, train two discriminatorsThe real image and the restored image are distinguished on two different scales, respectively. The process of repairing the image by the generator is strictly controlled by two discriminator networks which input images with different resolutions, and the two multi-scale discriminators and the global discriminator have similar architectures but have different receptive fields. Compared with the soleThe global discriminator and the combined multi-scale discriminator are used for training, so that the generator can be guided to generate the repairing picture with stronger global consistency and finer details, and the repairing effect of the whole picture is more reasonable visually. By adding the two multi-scale discriminators into the network, a restored picture with better effect can be obtained.
And removing the last two full-connection layers from the global arbiter and the local arbiter in the model, and keeping other structures unchanged. The global arbiter, local arbiter, and multi-scale arbiter network architectures are shown in table 2. From left to right, the network layer type, the convolution kernel size, the step length and the number of output channels of the layer are sequentially arranged. a. b, c and d are respectively
Further, in step one, the method for modeling by the loss function is as follows:
first introducing reconstruction losses to the generatorResponsible for capturing structural information of the missing region and keeping consistent with the context, i.e. L between the pixels of the restored image and the original image2Distance, z is noise mask:
but only using the lossesIt was observed that the resulting restored image content tended to be blurred and smooth. Because L is2The reason for the loss is due to L2The loss of (c) penalizes outliers severely encouraging the network to smoothly cross various assumptions to avoid large penalties. By using a discriminator, a penalty on antagonism is introduced, which reflects how the generator fools the discriminator to the maximum extent, and how the discriminator distinguishes between true and false. Antagonism loss is based on the loss of GAN, which learns an antagonism discriminationModel of deviceA loss gradient is provided for the generator model. Antagonism discriminatorSimultaneous pair generatorGenerating samples and true samples for prediction and attempting to distinguish them, and generatorsThe arbiter is obfuscated by generating samples that are as "true" as possible
Wherein, Pdata(x) And Pz(z) represents the distribution of the real data x and the noise variance z, respectively. The network is optimized by minimizing the generator loss and maximizing the arbiter loss.
Further, the Wassertein distance is used as an optimization method to train the GAN, and the specific method is thatThe sigmoid is removed in the last layer,andthe loss function of (2) does not take the logarithm of the loss function, and is updated every timeAfter the parameter, its absolute value is truncatedUntil a fixed constant is not exceeded, namely gradient clipping:
Four discrimination networksThe definition of the loss function is the same. The only difference is that the local arbiter only provides the trained loss gradient for the missing region, and the global arbiter and the multi-scale arbiter back-propagate the loss gradient over the entire image at different resolutions. The discriminators are defined as:
wherein, the local discriminatorIs input into the generatorAnd outputting the repaired part of the image and the corresponding part of the real image.
Wherein, the multi-scale discriminatorIs input into the generatorThe output image and the real image are down-sampled by 2 times, respectively.
Wherein, the multi-scale discriminatorIs input into the generatorThe output image and the real image are down-sampled by 4 times, respectively.
The overall loss function for the entire network optimization is defined as:
λ1、λ2、λ3、λ4weights corresponding to the different losses for balancing the influence of the different losses on the overall loss function, λ1、λ2、λ3、λ4The specific numerical value of (A) needs to be set manually in the experimental process.
Further, in the second step, the training process is divided into three stages. First, a generator network is trainedBy training the network with reconstruction loss, the generator can obtain fuzzy repair content, and the stage does not comprise countertraining and counterloss. Secondly, training all the discriminator networks by using the generator network finished by the first stage trainingAll discriminators are updated with loss of immunity. The last stage performs joint countermeasure training for the generator and all discriminators. Each stage prepares for the improvement of the next stage, thereby greatly improving the effectiveness and efficiency of network training, and the training process is completed by back propagation.
Setting lambda using default hyper-parameters when performing training for resistance loss1、λ2、λ3、λ4Are all 0.001. Training is done by adjusting the image size, using the image cropped to 256 × 256 as the input image. For the missing region, the input of the central square region in the image is set to 0, i.e. the missing part of the image, approximately covering the 1/4 image. The input for global discrimination is a full image of 256 × 256 size, the input for local discriminators is an image of a repair area of 128 × 128 size, and the input for the two multi-scale discriminators are full images of 128 × 128 and 64 × 64 size, respectively.
Another object of the present invention is to provide a repairing system for implementing the image repairing method based on the multi-scale generation type confrontation network model, wherein the image repairing system based on the multi-scale generation type confrontation network model comprises:
the depth generation countermeasure repair model building module is used for building a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
the image restoration module is used for improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of the global discriminator and the local discriminator, and restoring the image by performing countermeasure training on the multi-scale discriminator by using images with different resolutions;
the image subsequent processing module uses expansion convolution in the generator and carries out subsequent processing on the repaired image by using a Poisson mixing method;
and the image repairing module verifies the advantages of the image repairing algorithm based on the generative confrontation network model and the image repairing effect through CelebA, ImageNet and Place2 data sets.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;
step three, expanding convolution is used in a generator, and a Poisson mixing method is used for carrying out subsequent processing on the repaired image;
and step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;
step three, expanding convolution is used in a generator, and a Poisson mixing method is used for carrying out subsequent processing on the repaired image;
step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the image repairing effect through CelebA, ImageNet and Place2 data sets
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides an image restoration method based on a multi-scale generation type countermeasure network model, which provides a multi-scale generation countermeasure restoration model consisting of a generator and a plurality of countermeasure discriminators, and synthesizes missing contents from random noise by utilizing reconstruction loss and a plurality of countermeasure losses; by using the idea of WGAN and adopting EM distance to simulate data distribution, the network stability is improved and the picture restoration effect is improved. Finally, verification is carried out on the CelebA data set, subjective and objective evaluation methods are utilized to prove that the image restoration algorithm based on the multi-scale generation type countermeasure network, which is provided by the method, has higher restoration performance compared with the current image restoration method, corresponding training and testing are carried out on the ImageNet data set and the Places2 data set, the algorithm can be applied to restoration of various types of pictures, the algorithm has good effects, and the method has great significance in the fields of public security criminal investigation facial restoration, image scaling, redundant target elimination, image lossy compression, biomedical image application and the like.
Technical effect or experimental effect of comparison. The method comprises the following steps:
the first index is Peak Signal to Noise Ratio (PSNR), which is an objective standard for evaluating images, and is used to measure the pixel difference between a real image and a repaired image, and a larger value indicates less distortion.
Quantitative experimental results on PSNR
The second Index is a Structural Similarity Index (SSIM), which is used to evaluate the Structural Similarity between two images, and the value is a number between 0 and 1, and a larger value represents a smaller difference between the repaired image and the real image.
Quantitative experimental results on SSIM
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of an image inpainting method based on a multi-scale generative confrontation network model according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a generative confrontation network model according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a multi-scale discriminator model according to an embodiment of the invention.
Fig. 4 is a schematic diagram of a network architecture according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a comparison of repair results for different models provided by an embodiment of the present invention;
in the figure: fig. (a) is an original image; graph (b) is a missing image; panel (c) is the CE result; panel (d) GLCIC results; graph (e) is the result of the algorithm provided by the present invention.
Fig. 6 is a schematic diagram of a repair result on the ImageNet dataset according to the embodiment of the present invention.
FIG. 7 is a diagram illustrating the repair result on the Places2 data set according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of a repair result of two different pictures according to an embodiment of the present invention;
in the figure: figure (a) is an original picture; graph (b) is a missing picture; fig. (c) is a repair picture.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides an image restoration method based on a multi-scale generation type confrontation network model, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an image repairing method based on a multi-scale generation type confrontation network model provided by an embodiment of the present invention includes the following steps:
s101, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by using reconstruction loss and countermeasure loss.
S102, improving a network structure of the discriminator, proposing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and carrying out countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images.
And S103, performing subsequent processing on the repaired image by using a Poisson mixing method by using expansion convolution in a generator.
S104, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.
The present invention will be further described with reference to the following examples.
1. Summary of the invention
First, a deep-generation countermeasure restoration model composed of a generator and a countermeasure discriminator is proposed, and missing contents are synthesized from random noise using a reconstruction loss and a countermeasure loss. Secondly, a multi-scale discriminator structure is provided, and image restoration is carried out by using images with different resolutions for countertraining. Thirdly, the generator uses expansion convolution to reduce the information loss in the down-sampling process of the image, and a certain post-processing is carried out on the repaired image by using the currently popular Poisson mixing method. Finally, the advantages of the algorithm and the image restoration effect are demonstrated through experiments.
2. Related work
With the development of deep learning, a generative confrontation network model (GANs) proposed by Goodfellow et al in 2014 is a milestone development in the development of deep learning, and with the advent of GAN, the problem of blurring pictures generated by using a traditional VAE is solved, a frightening effect is achieved, and a large number of clear pictures can be generated theoretically.
The main inspiration of GAN comes from the idea of zero-sum game in game theory, and the whole network includes two network structures that are antagonistic to each other, namely, generation network g (generator) and discrimination network d (discriminator), as shown in fig. 2. And the game is continuously played through G and D, so that G learns the distribution of real data, and if the countermeasure network is used for generating images, the G can generate vivid images from noise after continuous training. G, D main functions are: g is a generative network, which inputs a random noise Z (random number) by which a false picture for spoofing D, G (Z), is generated. D is a discrimination network to discriminate whether a picture is "real". The input of the method is a picture, which may be from a real picture in a data set or from a picture generated by G, the output is the probability that the input picture is the real picture, if the output probability is 1, the D is represented to determine that the picture is the real picture, and if the output probability is 0, the D is represented to judge that the picture cannot be the real picture (namely, the picture generated by G). In the training process, the goal of generating the network G is to generate vivid false images as much as possible to deceive the discrimination network D. The goal of D is to try to distinguish between false images generated by G and true images. Thus, the training process of G and D forms a dynamic 'game process', and finally, the balance state, namely Nash equilibrium, is achieved. The result of the game is that in the most desirable state, G can generate a picture that is sufficiently true. For D, it is difficult to determine whether the picture generated by G is real or not, i.e. the output probability is 0.5, thus obtaining a generative network model G that can be used to generate the picture.
One of the main problems of GAN is instability in the learning process, such as network failure to converge, easy gradient disappearance and gradient descent, which results in a great deal of research on the problem. Wasserstein GAN proposed by Arjovsky et al improves GAN from the perspective of loss function, WGAN after the loss function improvement can obtain good performance results even on a full link layer, and the problem of unstable training is solved. Gulrajani et al improved on the basis of Wasserstein GAN, optimized the conditions of continuity limitation, solved the problems of disappearance of training gradient and gradient explosion and accelerated the convergence rate. The LSGAN model proposed by Mao et al uses a least square loss function instead of a loss function of GAN, and also alleviates the problems of unstable GAN training, poor quality of generated images and insufficient diversity.
Due to the rapid development of GAN in recent years, people have higher and higher requirements for resolution of GAN generated pictures, and another problem with GAN is that a network downsamples images in a pooling process to extract low-dimensional features, so that much key information in the images is lost, and a discriminator is easier to distinguish whether the images are true or false, so that gradients cannot indicate a correct optimization direction. How to effectively utilize the features extracted from each layer of the neural network, and fully extract the low-dimensional features of the image while minimizing the loss caused by the down-sampling process is a hot spot of current research. Yu et al in 2016 proposed an extended convolution method, which can expand the receptive field while keeping the feature size unchanged during the convolution process, effectively reducing the information loss caused by downsampling during the conventional convolution process, and using the method for image processing. The "pix 2 pixHD" model proposed by Wang et al uses conditional generation countermeasure networks (conditional GANs) to synthesize high-resolution realistic images, and uses a latest multi-scale generator-discriminator structure to improve the picture quality and the resolution of pictures while performing stable training, and fig. 3 shows a schematic diagram of multi-scale discriminator models, which have the same network structure but operate at different picture scales. These discriminators are referred to as D1, D2 and D3. In particular, the real and synthesized high resolution images are down sampled separately. Then training D1, D2 and D3 distinguish real images from synthetic images at three different scales, respectively.
The work of the invention is based on the method of "Context Encoder" proposed by Pathak et al and the method of "Global and Locally Consistent Image Completion" proposed by IIZUKA et al. The original purpose of GAN is to train the generative model using a convolutional neural network. These generators are trained with the aid of a discriminator, which is used to distinguish whether an image is generated by the generator or is real. The generator is trained to fool the arbiter while updating the arbiter. By using Mean Square Error (MSE) loss in combination with GAN loss, an image restoration network can be trained, avoiding the blur that is common when MSE loss is used alone. Using only this approach can make network training unstable. The present invention avoids this problem by replacing the loss of the traditional GAN with the loss in WGAN, using EM distance to measure the difference between data distributions, not training a pure generative model and tuning the learning process to prioritize stability. In addition, the framework and the training process are greatly optimized specially aiming at the image repairing problem. In particular, instead of using a single discriminator, a multi-scale discriminator similar to that in the "pix 2 pixHD" model is employed to improve the visual quality, using multiple discriminators.
3. Multi-scale countermeasure network model
In this section, the present invention introduces a multi-scale countermeasure network model, including a generation network for image restoration, four additional discriminant networks for training assistance, i.e., two multi-scale discriminant networks, a global discriminant network and a local discriminant network, so that the entire network can be trained to perform the image restoration task with excellent results. During training, the discriminators are trained to determine whether the image has been successfully repaired, while the generator is trained to fool all discriminators. Only through all the networks trained together can the generator actually repair the various images. The network architecture is shown in fig. 4.
3.1 generators
Using a convolutional autocoder as a generator modelI.e. a standard encoder-decoder architecture that takes as input an image with missing regions, and generates a latent feature representation of the image by a convolution operation. The decoder architecture uses this latent feature representation to restore the original resolution by a transposed convolution operation, resulting in image content of the missing region. Unlike the original GAN model, which starts directly from the noise vector, the hidden representation obtained from the encoder captures more variations and relationships between the unknown and known regions, which are then input to the decoder to generate the content. The intermediate layer uses the expanded convolution, and each output pixel is calculated by using a larger input area without additional parameters and calculation amount, and compared with a standard convolution layer, the expanded convolution network model can calculate each output pixel under the influence of a larger pixel area of an input image. If the expansion convolution is not used, it only uses a small pixel area, and can not use more context information to synthesize the image. The generator uses a standard self-encoder network, and an expansion convolutional layer is added on the basis, namely the generator network introduced in the existing document removes two layers of convolutional layers in the middle, and the network architecture is as shown in table 1, and sequentially comprises a network layer type, a convolutional kernel size, the number of zero padding of the convolutional kernel, a step length and the number of output channels of the layer from left to right.
Table 1 Generator network architecture
3.2 discriminator
By training the generator it is possible to fill in the corresponding pixels of the missing region with small reconstruction losses. Using the generator alone does not ensure that the filled area remains visually consistent. The generated image missing region is very blurred in pixels, and only the general shape of the missing region can be captured. In order to obtain a more vivid effect, a global discriminator, a local discriminator and a multi-scale discriminator are added as binary classifiers to distinguish true and false images, so as to distinguish whether the images are real or repaired. The discriminators help the network to improve the quality of the repair result, and a trained discriminator is not fooled by unrealistic images. These discriminators compress the image into corresponding small feature vectors based on a convolutional neural network. The prediction corresponds to a probability value that the image is authentic.
First, a local discriminatorDetermining whether the composite content of the missing region is authentic can help the network generate information of the missing content, which encourages the generated objects to be semantically effective. Its limitations are also apparent due to the locality of the local arbiter. The local discriminator loss can neither normalize the global structure of a face nor ensure the consistency of the inner and outer edges of the missing region. Therefore, the inconsistency of the pixel values of the repair picture along the repair area boundary is significant.
Due to the limitation of local discriminators, another network structure named global discriminator is introducedTo determine the accuracy of the image as a whole. The basic idea is that the content of the generated image restoration area is not only realistic, but also consistent with the context. The network with the global discriminator greatly relieves the problem of inconsistency, further improves the effect of generating the repair picture and ensures that the repair picture is more real.
Finally, a multi-scale discriminator network structure is proposed. The basic idea is to down-sample the real and synthesized images with down-sampling coefficients of 2 and 4, respectively, train two discriminatorsThe real image and the restored image are distinguished on two different scales, respectively. The process of repairing the image by the generator is strictly controlled by two discriminator networks which input images with different resolutions, and the two multi-scale discriminators and the global discriminator have similar architectures but have different receptive fields. Compared with the method of singly using the global discriminator, the combined multi-scale discriminator is used for training to guide the generator to generate the repairing picture with stronger global consistency and finer details, and the repairing effect of the whole picture is more reasonable visually. By adding the two multi-scale discriminators into the network, a restored picture with better effect can be obtained.
And removing the last two full-connection layers from the global arbiter and the local arbiter in the model, and keeping other structures unchanged. The global arbiter, local arbiter, and multi-scale arbiter network architectures are shown in table 2. From left to right, the network layer type, the convolution kernel size, the step length and the number of output channels of the layer are sequentially arranged. a. b, c and d are respectively
TABLE 2 Multi-Scale discriminator architecture
3.3 loss function
There are generally many reasonable ways to fill in missing image regions that are consistent with the context. This behavior can be modeled, for example, by a loss function. Thus introducing reconstruction losses to the generator firstResponsible for capturing structural information of the missing region and keeping consistent with the context, i.e. L between the pixels of the restored image and the original image2Distance, z is noise mask:
but only using the lossesIt is observed that the resulting restored image content tends to blur and smooth because of L2The reason for the loss is due to L2The loss of (c) penalizes outliers severely encouraging the network to smoothly cross various assumptions to avoid large penalties. By using a discriminator, a penalty on antagonism is introduced, which reflects how the generator fools the discriminator to the maximum extent, and how the discriminator distinguishes between true and false. The antagonism loss is based on the loss of GAN. To learn the generative model of the data distribution, the GAN learns a antagonism discriminator modelA loss gradient is provided for the generator model. Antagonism discriminatorSimultaneous pair generatorGenerating samples and true samples for prediction and attempting to distinguish them, and generatorsThe arbiter is obfuscated by generating samples that are as "true" as possible
Wherein, Pdata(x) And Pz(z) represents the distribution of the real data x and the noise variance z, respectively. By minimizing generator losses and maximizing discriminator lossesAnd optimizing the network.
The cross entropy (JS divergence) in the traditional GAN is not suitable for measuring the distance between generated data distribution and real data distribution, if training the GAN by optimizing the JS divergence can lead to that a correct optimization target can not be found, so the WGAN proposes to use Wasserein distance (called Earth-Mover distance) as an optimization method to train the GAN, and the specific method is thatThe sigmoid is removed in the last layer,andthe loss function of (2) does not take the logarithm of the loss function, and is updated every timeAfter a parameter, its absolute value is truncated to not exceed a fixed constant, the gradient clipping. The algorithm of the present invention does not use the traditional goal function of GAN but uses this approach:
Four discrimination networksThe definition of the loss function is the same. The only difference is that the local arbiter only provides the trained loss gradient for the missing region, and the global arbiter and the multi-scale arbiter back-propagate the loss gradient over the entire image at different resolutions. The discriminators are defined as:
wherein, the local discriminatorIs input into the generatorAnd outputting the repaired part of the image and the corresponding part of the real image.
Wherein, the multi-scale discriminatorIs input into the generatorThe output image and the real image are down-sampled by 2 times, respectively.
Wherein, the multi-scale discriminatorIs input into the generatorThe output image and the real image are down-sampled by 4 times, respectively.
In summary, the total loss function for the whole network optimization is defined as:
λ1、λ2、λ3、λ4weights corresponding to the different losses for balancing the influence of the different losses on the overall loss function, λ1、λ2、λ3、λ4The specific numerical value of (A) needs to be set manually in the experimental process.
4. Training
The work of the invention is based on the realization of a deep convolution antithetical neural network, and in order to effectively train the network, the training process is divided into three stages. First, a generator network is trainedBy training the network with reconstruction loss, the generator can obtain fuzzy repair content, and the stage does not comprise countertraining and counterloss. Secondly, training all the discriminator networks by using the generator network finished by the first stage trainingAll discriminators are updated with loss of immunity. The last stage performs joint countermeasure training for the generator and all discriminators. Each stage prepares for the improvement of the next stage, thereby greatly improving the effectiveness and efficiency of network training, and the training process is completed by back propagation.
When the training of the antagonism loss is carried out, the situation that the recognizer is too strong at the beginning of the training process is avoided. A default hyper-parameter (e.g., learning rate) is used. Setting of lambda1、λ2、λ3、λ4Are all 0.001. Training is done by adjusting the image size, using the image cropped to 256 × 256 as the input image. For the missing region, the input of the central square region in the image is set to 0, i.e. the missing part of the image, approximately covering the 1/4 image. The input for global discrimination is a full image of 256 × 256 size, the input for local discriminators is an image of a repair area of 128 × 128 size, and the input for the two multi-scale discriminators are full images of 128 × 128 and 64 × 64 size, respectively. Our network model can reasonably fill in missing regions, but sometimes the generated regions have color inconsistency with surrounding regions. To avoid this, a simple post-processing is performed by mixing the repaired area with the colors of the surrounding pixels. In particular, the present invention uses poisson image blending for subsequent processing of images.
5. Results and analysis of the experiments
The present invention trains a multi-scale generative confrontation network model by using 100000 images acquired from the CelebA dataset. 80000 sheets for training and 20000 sheets for testing, the data set includes a wide variety of face images, and the batch size is set to 32. The generator network goes through 20000 iterations; then training a discriminator to iterate for 10000 times; and finally, training the whole network 70000 times together. The device parameters are CPU: intel i7-8700, GPU: RTX2080Ti-11G, memory: DDR 4-3000-32G. The code runs under a Pythrch deep learning framework, and the whole network training completion time is about 5 days.
The obtained experimental results were compared with those of the "Context Encoders" method using only one discriminator acting on the repair area and the "Globally and Locally Consistent Image Completion" method using a generator and two discriminators. For comparative fairness, the model was retrained for the same number of iterations, and the results are shown in FIG. 5.
In each test image, the network will automatically cover the area in the middle of the image, since important components of the face (e.g., eyes, mouth, eyebrows, hair, nose) will typically be included in the middle. The four rows represent the repair results of four different test images, respectively. The first column a corresponds to four original non-missing images. The second column b is the missing image with the mask added. The third column c is the repair result of the "Context Encoders" network, and because the structure lacks the understanding of global consistency, the result repaired by the method has obvious global inconsistency and the repair effect of the missing area is very fuzzy, so that the requirement of the image repair task cannot be met.
The fourth column d is a repairing effect diagram of a global discriminator and a Locally discriminated 'global and local Consistent Image Completion' method, the network can repair the Image more reasonably by introducing the countermeasure loss, the local discriminator influences the Image missing region to successfully complete the repair of the missing region part, the global discriminator influences the whole Image according to the global inconsistency of the repaired Image to force the network to generate the Globally Consistent Image, so that the obvious edge difference is eliminated, and the repairing result is better. The fifth column e, the repair result of the algorithm proposed by the present invention, uses WGAN loss to make the training of the entire antagonistic network more stable. A multi-scale discriminator is added and is trained together with a global discriminator and a local discriminator. It can be seen that compared with the result of d, e is improved to a certain extent in the aspect of the details of restoration, the image integrity is higher, and the restoration effect is better.
Besides the visual effect, the invention also carries out quantitative evaluation on the CelebA test data set by using the PSNR and the SSIM, and the two indexes are calculated between the repair result obtained by different methods and the original face image.
The first index is peak signal-to-noise ratio (PSNR), an objective criterion for evaluating images, which directly measures the difference in pixel values, with larger values indicating less distortion. Assuming that the two input images are X and Y, respectively, the calculation formula is as follows:
where MSE represents the Mean Square Error (Mean Square Error) of the restored image X and the real image Y, H, W represents the height and width of the image, respectively, n represents the number of bits per pixel, and is generally 8, i.e. the number of pixel gray levels is 256, and the result is shown in table 3.
TABLE 3 results of quantitative experiments on PSNR
The second index is a Structural Similarity Index (SSIM), which is an index for measuring the similarity between two images, and is a number between 0 and 1, and a larger value represents a smaller difference between a repaired image and a real image, i.e., the image quality is better. When the two images are identical, their value is 1. Assuming that the two input images are X and Y, respectively, the calculation formula is as follows:
wherein, muXAnd muYRespectively represent the mean values of X, Y,. sigmaXAnd σYRespectively represents the standard deviation, sigma, of X and YXYRepresents the covariance of X and Y, and c1,c2Are respectively constant, avoiding denominator of 0. The calculation results are shown in table 4.
TABLE 4 results of quantitative experiments on SSIM
In addition, in order to prove that the algorithm provided by the invention can be suitable for various types of image restoration, 50000 images acquired from ImageNet data sets and 50000 images acquired from Places2 data sets are respectively used for correspondingly training the model of the invention. The network model training method is the same as the training method used in the CelebA dataset, and the experimental results are respectively shown in FIG. 6 and FIG. 7, which shows that the model also has good performance on ImageNet dataset and Places2 dataset.
In a word, the invention analyzes the defects of the existing algorithm, correspondingly introduces the principle of the generative confrontation network, applies the generative confrontation network to the image restoration algorithm, provides a multi-scale generative confrontation restoration model consisting of a generator and a plurality of confrontation discriminators, and synthesizes the missing content from random noise by utilizing reconstruction loss and a plurality of confrontation losses; by using the idea of WGAN and adopting EM distance to simulate data distribution, the network stability is improved and the picture restoration effect is improved. Finally, verification is carried out on the CelebA data set, and subjective and objective evaluation methods are utilized to prove that the image restoration algorithm based on the multi-scale generation type confrontation network has higher restoration performance compared with the current image restoration method, corresponding training and testing are carried out on the ImageNet data set and the Places2 data set, and the algorithm can be applied to restoration of various types of pictures and has good effect.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. An image restoration method based on a multi-scale generation type confrontation network model is characterized in that the image restoration method based on the multi-scale generation type confrontation network model comprises the following steps:
constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
improving a network structure of a discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;
expanding convolution is used in a generator, and a Poisson mixing method is utilized to carry out subsequent processing on the repaired image;
the advantages of the image restoration algorithm based on the generative confrontation network model and the restoration effect of the image are verified through CelebA, ImageNet and Place2 data sets.
2. The method as claimed in claim 1, wherein the multi-scale generation type confrontation network model comprises a generation network for image restoration, and four additional discriminant networks for training assistance, namely two multi-scale discriminant networks, a global discriminant network and a local discriminant network.
3. The method as claimed in claim 1, wherein the generator uses a convolutional auto-encoder as the generator modelNamely a standard encoder-decoder structure, the encoder structure takes an image with a missing region as an input, and generates a potential feature representation of the image through a convolution operation;
the decoder structure utilizes the potential feature representation to restore the original resolution through a transposition convolution operation to generate the image content of the missing area; unlike the original GAN model, which starts directly from the noise vector, the hidden representation obtained from the encoder captures more variations and relationships between the unknown and known regions, which are then input to the decoder to generate the content; the intermediate layer uses the expanded convolution, each output pixel is allowed to be calculated by using a larger input area, no additional parameter and calculated amount exist, and compared with a standard convolution layer, the expanded convolution network model can calculate each output pixel under the influence of a larger pixel area of an input image; the generator uses a standard self-encoder network, and an expansion convolutional layer is added on the basis, namely the generator network removes two layers of convolutional layers in the middle, and the types of the network layers, the sizes of convolutional cores, the number of zero padding of the convolutional cores, the step length and the number of output channels of the layers are sequentially arranged from left to right.
4. The image inpainting method based on the multi-scale generation type confrontation network model as claimed in claim 1, wherein the discriminator compresses the image into corresponding small feature vectors based on a convolutional neural network; predicting a probability value corresponding to the image being authentic;
first, a local discriminatorWhether the synthesized content of the missing area is real or not is determined, the network can be helped to generate the information of the missing content, and the generated object is encouraged to be semantically effective;
due to the locality of the local arbiter, another network structure named global arbiter is introducedTo determine the accuracy of the image as a whole;
finally, a multi-scale discriminator network structure is provided; the basic idea is to down-sample the real and synthesized images with down-sampling coefficients of 2 and 4, respectively, train two discriminatorsDistinguishing a real image and a restored image on two different scales respectively; the process of repairing an image by a generator is carried out by two discriminator networks with different resolution images as inputStrict control is performed, and the two multi-scale discriminators and the global discriminator have similar architectures but have different-sized reception fields;
removing the last two full-connection layers from the global arbiter and the local arbiter in the model, and keeping other structures unchanged; from left to right, sequentially setting the type of a network layer, the size of a convolution kernel, the step length and the number of output channels of the layer; a. b, c and d are respectively
5. The method for image inpainting based on multi-scale generative confrontation network model as claimed in claim 1, wherein the method of modeling by loss function is as follows:
first introducing reconstruction losses to the generatorResponsible for capturing structural information of the missing region and keeping consistent with the context, i.e. L between the pixels of the restored image and the original image2Distance, z is noise mask:
but only using the lossesThe resulting restored image content was observed to tend to be blurred and smooth; because L is2The reason for the loss is due to L2The loss of (1) penalizes outliers severely, encouraging the network to smoothly cross various assumptions to avoid large penalties; by using a discriminator, a penalty on antagonism is introduced, which reflects how the generator fools the discriminator to the maximum extent, and how the discriminator distinguishes between true and false; antagonism loss is based on the loss of GAN, which learns an antagonism discriminator modelProviding a loss gradient for the generator model; antagonism discriminatorSimultaneous pair generatorGenerating samples and true samples for prediction and attempting to distinguish, and generatorsBy generating a true sample for the confusion arbiter
Wherein, Pdata(x) And Pz(z) represents distributions of real data x and noise variance z, respectively; the network is optimized by minimizing the generator loss and maximizing the arbiter loss.
6. The image inpainting method based on the multi-scale generation type confrontation network model as claimed in claim 1, wherein Wasserein distance is used as an optimization method to train GANThe sigmoid is removed in the last layer,andis not taking the logarithm of the loss functionEach time of updateAfter a parameter, its absolute value is truncated to not more than a fixed constant, gradientclipping:
wherein l is a set of 1-Lipschitz functions;
four discrimination networksThe definition of the loss function is the same; the only difference is that the local discriminator only provides a loss gradient for training for the missing region, and the global discriminator and the multi-scale discriminator reversely propagate the loss gradient on the whole image with different resolutions; the discriminators are defined as:
wherein, the local discriminatorIs input into the generatorOutputting a repaired part of the image and a part corresponding to the real image;
wherein, the multi-scale discriminatorIs input into the generatorRespectively sampling the output image and the real image by 2 times;
wherein, the multi-scale discriminatorIs input into the generatorRespectively down-sampling 4 times of the output image and the real image;
the overall loss function for the entire network optimization is defined as:
λ1、λ2、λ3、λ4weights corresponding to the different losses for balancing the influence of the different losses on the overall loss function, λ1、λ2、λ3、λ4The specific value of (A) is required in the experimental processIt is artificially set.
7. The image inpainting method based on the multi-scale generative confrontation network model as claimed in claim 1, wherein the training process is divided into three stages; first, a generator network is trainedTraining the network by using reconstruction loss, wherein a generator can obtain fuzzy repair content, and the stage does not comprise countermeasure training and countermeasure loss; secondly, training all the discriminator networks by using the generator network finished by the first stage trainingUpdating all the discriminators by using the countermeasure loss; in the last stage, the generator and all the discriminators are subjected to combined confrontation training, and the training process is completed through back propagation;
setting lambda using default hyper-parameters when performing training for resistance loss1、λ2、λ3、λ4Are all 0.001; training is completed by adjusting the image size, and the image is cut into 256 × 256 images to be used as input images; for the missing region, the input of the central square region in the image is set to 0, i.e., the missing portion of the image, approximately covering the 1/4 image; the input for global discrimination is a full image of 256 × 256 size, the input for local discriminators is an image of a repair area of 128 × 128 size, and the input for the two multi-scale discriminators are full images of 128 × 128 and 64 × 64 size, respectively.
8. A restoration system for implementing the image restoration method based on the multi-scale generation type confrontation network model according to any one of claims 1 to 7, characterized in that the image restoration system based on the multi-scale generation type confrontation network model comprises:
the depth generation countermeasure repair model building module is used for building a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
the image restoration module is used for improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of the global discriminator and the local discriminator, and restoring the image by performing countermeasure training on the multi-scale discriminator by using images with different resolutions;
the image subsequent processing module uses expansion convolution in the generator and carries out subsequent processing on the repaired image by using a Poisson mixing method;
and the image repairing module verifies the advantages of the image repairing algorithm based on the generative confrontation network model and the image repairing effect through CelebA, ImageNet and Place2 data sets.
9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;
step three, expanding convolution is used in a generator, and a Poisson mixing method is used for carrying out subsequent processing on the repaired image;
and step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.
10. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;
improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;
step three, expanding convolution is used in a generator, and a Poisson mixing method is used for carrying out subsequent processing on the repaired image;
and step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011021917.1A CN112541864A (en) | 2020-09-25 | 2020-09-25 | Image restoration method based on multi-scale generation type confrontation network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011021917.1A CN112541864A (en) | 2020-09-25 | 2020-09-25 | Image restoration method based on multi-scale generation type confrontation network model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112541864A true CN112541864A (en) | 2021-03-23 |
Family
ID=75013872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011021917.1A Pending CN112541864A (en) | 2020-09-25 | 2020-09-25 | Image restoration method based on multi-scale generation type confrontation network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112541864A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113240613A (en) * | 2021-06-07 | 2021-08-10 | 北京航空航天大学 | Image restoration method based on edge information reconstruction |
CN113256541A (en) * | 2021-07-16 | 2021-08-13 | 四川泓宝润业工程技术有限公司 | Method for removing water mist from drilling platform monitoring picture by machine learning |
CN113345077A (en) * | 2021-06-10 | 2021-09-03 | 西北大学 | Method for reconstructing three-dimensional model of Qin cavity figure improved based on PIFu and 3D-GAN |
CN113362311A (en) * | 2021-06-10 | 2021-09-07 | 山东大学 | Deep generation network assisted functional full crown prosthesis form generation method |
CN113469177A (en) * | 2021-06-30 | 2021-10-01 | 河海大学 | Drainage pipeline defect detection method and system based on deep learning |
CN113837953A (en) * | 2021-06-11 | 2021-12-24 | 西安工业大学 | Image restoration method based on generation countermeasure network |
CN114359300A (en) * | 2022-03-18 | 2022-04-15 | 成都数之联科技股份有限公司 | Method, device and system for optimizing image segmentation model and storage medium |
CN115115783A (en) * | 2022-07-08 | 2022-09-27 | 西南石油大学 | Digital core construction method and system for simulating shale matrix nano-micron pores |
CN115689910A (en) * | 2022-09-07 | 2023-02-03 | 江苏济远医疗科技有限公司 | Image restoration method for processing multi-scale noise |
CN116320459A (en) * | 2023-01-08 | 2023-06-23 | 南阳理工学院 | Computer network communication data processing method and system based on artificial intelligence |
WO2024022485A1 (en) * | 2022-07-29 | 2024-02-01 | 中国人民解放军总医院第一医学中心 | Computer angiography imaging synthesis method based on multi-scale discrimination |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109146784A (en) * | 2018-07-27 | 2019-01-04 | 徐州工程学院 | A kind of image super-resolution rebuilding method based on multiple dimensioned generation confrontation network |
CN110222628A (en) * | 2019-06-03 | 2019-09-10 | 电子科技大学 | A kind of face restorative procedure based on production confrontation network |
CN110223259A (en) * | 2019-06-14 | 2019-09-10 | 华北电力大学(保定) | A kind of road traffic fuzzy image enhancement method based on production confrontation network |
CN110378844A (en) * | 2019-06-14 | 2019-10-25 | 杭州电子科技大学 | Motion blur method is gone based on the multiple dimensioned Image Blind for generating confrontation network is recycled |
AU2020100274A4 (en) * | 2020-02-25 | 2020-03-26 | Huang, Shuying DR | A Multi-Scale Feature Fusion Network based on GANs for Haze Removal |
US20200234402A1 (en) * | 2019-01-18 | 2020-07-23 | Ramot At Tel-Aviv University Ltd. | Method and system for end-to-end image processing |
-
2020
- 2020-09-25 CN CN202011021917.1A patent/CN112541864A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109146784A (en) * | 2018-07-27 | 2019-01-04 | 徐州工程学院 | A kind of image super-resolution rebuilding method based on multiple dimensioned generation confrontation network |
US20200234402A1 (en) * | 2019-01-18 | 2020-07-23 | Ramot At Tel-Aviv University Ltd. | Method and system for end-to-end image processing |
CN110222628A (en) * | 2019-06-03 | 2019-09-10 | 电子科技大学 | A kind of face restorative procedure based on production confrontation network |
CN110223259A (en) * | 2019-06-14 | 2019-09-10 | 华北电力大学(保定) | A kind of road traffic fuzzy image enhancement method based on production confrontation network |
CN110378844A (en) * | 2019-06-14 | 2019-10-25 | 杭州电子科技大学 | Motion blur method is gone based on the multiple dimensioned Image Blind for generating confrontation network is recycled |
AU2020100274A4 (en) * | 2020-02-25 | 2020-03-26 | Huang, Shuying DR | A Multi-Scale Feature Fusion Network based on GANs for Haze Removal |
Non-Patent Citations (1)
Title |
---|
李克文等: "多尺度生成式对抗网络图像修复算法", 《计算机科学与探索》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113240613A (en) * | 2021-06-07 | 2021-08-10 | 北京航空航天大学 | Image restoration method based on edge information reconstruction |
CN113345077B (en) * | 2021-06-10 | 2024-03-15 | 西北大学 | PIFU and 3D-GAN-based improved three-dimensional model reconstruction method for Qin cavity character |
CN113345077A (en) * | 2021-06-10 | 2021-09-03 | 西北大学 | Method for reconstructing three-dimensional model of Qin cavity figure improved based on PIFu and 3D-GAN |
CN113362311A (en) * | 2021-06-10 | 2021-09-07 | 山东大学 | Deep generation network assisted functional full crown prosthesis form generation method |
CN113837953A (en) * | 2021-06-11 | 2021-12-24 | 西安工业大学 | Image restoration method based on generation countermeasure network |
CN113837953B (en) * | 2021-06-11 | 2024-04-12 | 西安工业大学 | Image restoration method based on generation countermeasure network |
CN113469177A (en) * | 2021-06-30 | 2021-10-01 | 河海大学 | Drainage pipeline defect detection method and system based on deep learning |
CN113469177B (en) * | 2021-06-30 | 2024-04-26 | 河海大学 | Deep learning-based drainage pipeline defect detection method and system |
CN113256541A (en) * | 2021-07-16 | 2021-08-13 | 四川泓宝润业工程技术有限公司 | Method for removing water mist from drilling platform monitoring picture by machine learning |
CN114359300A (en) * | 2022-03-18 | 2022-04-15 | 成都数之联科技股份有限公司 | Method, device and system for optimizing image segmentation model and storage medium |
CN114359300B (en) * | 2022-03-18 | 2022-06-28 | 成都数之联科技股份有限公司 | Optimization method, device and system of image segmentation model and storage medium |
CN115115783A (en) * | 2022-07-08 | 2022-09-27 | 西南石油大学 | Digital core construction method and system for simulating shale matrix nano-micron pores |
CN115115783B (en) * | 2022-07-08 | 2023-08-15 | 西南石油大学 | Digital rock core construction method and system for simulating shale matrix nano-micro pores |
WO2024022485A1 (en) * | 2022-07-29 | 2024-02-01 | 中国人民解放军总医院第一医学中心 | Computer angiography imaging synthesis method based on multi-scale discrimination |
CN115689910A (en) * | 2022-09-07 | 2023-02-03 | 江苏济远医疗科技有限公司 | Image restoration method for processing multi-scale noise |
CN116320459B (en) * | 2023-01-08 | 2024-01-23 | 南阳理工学院 | Computer network communication data processing method and system based on artificial intelligence |
CN116320459A (en) * | 2023-01-08 | 2023-06-23 | 南阳理工学院 | Computer network communication data processing method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112541864A (en) | Image restoration method based on multi-scale generation type confrontation network model | |
CN111047516B (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN108520503B (en) | Face defect image restoration method based on self-encoder and generation countermeasure network | |
Liu et al. | Robust single image super-resolution via deep networks with sparse prior | |
Liu et al. | Learning converged propagations with deep prior ensemble for image enhancement | |
Zhang et al. | Adaptive residual networks for high-quality image restoration | |
EP3951702A1 (en) | Method for training image processing model, image processing method, network device, and storage medium | |
Kolesnikov et al. | PixelCNN models with auxiliary variables for natural image modeling | |
Tuzel et al. | Global-local face upsampling network | |
CN106408550A (en) | Improved self-adaptive multi-dictionary learning image super-resolution reconstruction method | |
CN112132959A (en) | Digital rock core image processing method and device, computer equipment and storage medium | |
Zhao et al. | Image super-resolution via adaptive sparse representation | |
CN110706303A (en) | Face image generation method based on GANs | |
CN114339409A (en) | Video processing method, video processing device, computer equipment and storage medium | |
CN113658040A (en) | Face super-resolution method based on prior information and attention fusion mechanism | |
CN115457568B (en) | Historical document image noise reduction method and system based on generation countermeasure network | |
Shiri et al. | Identity-preserving face recovery from stylized portraits | |
CN112801914A (en) | Two-stage image restoration method based on texture structure perception | |
Liu et al. | Survey on gan‐based face hallucination with its model development | |
Uddin et al. | A perceptually inspired new blind image denoising method using $ L_ {1} $ and perceptual loss | |
CN116645569A (en) | Infrared image colorization method and system based on generation countermeasure network | |
CN117151990B (en) | Image defogging method based on self-attention coding and decoding | |
CN116523985B (en) | Structure and texture feature guided double-encoder image restoration method | |
US20230110393A1 (en) | System and method for image transformation | |
CN114862699B (en) | Face repairing method, device and storage medium based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210323 |