CN112541864A

CN112541864A - Image restoration method based on multi-scale generation type confrontation network model

Info

Publication number: CN112541864A
Application number: CN202011021917.1A
Authority: CN
Inventors: 邵明文; 张文龙; 宋晓霞
Original assignee: Shandong To Letter Information Science And Technology Ltd; China University of Petroleum East China
Current assignee: Shandong To Letter Information Science And Technology Ltd; China University of Petroleum East China
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2021-03-23

Abstract

The invention belongs to the technical field of image restoration, and discloses an image restoration method and system based on a multi-scale generation type confrontation network model, wherein a deep generation confrontation restoration model consisting of a generator and a confrontation discriminator is constructed, and missing contents are synthesized from random noise by utilizing reconstruction loss and confrontation loss; improving the network structure of the discriminator, putting forward a multi-scale discriminator structure, carrying out countermeasure training on the discriminator structure, and repairing an image; performing subsequent processing on the repaired image by using a Poisson mixing method; and verifying the advantages of the image restoration algorithm based on the generative confrontation network model and the restoration effect of the image. The method generates a countermeasure repair model through multiple scales, and synthesizes missing contents from random noise by utilizing reconstruction loss and multiple countermeasure losses; by using the idea of WGAN and adopting EM distance to simulate data distribution, the network stability is improved and the picture restoration effect is improved.

Description

Image restoration method based on multi-scale generation type confrontation network model

Technical Field

The invention belongs to the technical field of image restoration, and particularly relates to an image restoration method based on a multi-scale generation type confrontation network model.

Background

Currently, with the rapid development of deep learning in the field of computer vision, the research on the problems of image editing (image editing) and image generation (image generation) has achieved significant success. The image restoration is taken as a research hotspot in the field of current deep learning, and has important significance in the real life of people. The existing image restoration method has various problems, so that the requirements of people cannot be met visually.

Image inpainting is a problem with traditional graphics: a certain area of a certain size is missed at a certain position on an image, and the missing area is recovered by other information, so that people cannot distinguish the repaired part. As shown in fig. 8, the two images have cups and flowers in the missing areas, respectively, so that people can easily complement the images according to the content of the surrounding images. Different human repairing effects are different, so that the principles of structure, similarity, texture consistency, structure priority and the like must be followed in the image repairing process. However, the task of image restoration is extremely difficult for computers, because the problem has no uniquely determined solution, and it is a concern of researchers how to use other information to assist restoration and how to judge whether the restoration result is true enough.

At present, the image restoration algorithm mainly includes three directions: the invention mainly aims at the image restoration algorithm based on deep learning. Early image inpainting methods such as Bertalmio et al iteratively propagate low-level features of known regions along mask boundaries to unknown regions using diffusion equations. Although performing well in repair, it is limited to treating small, uniform areas. By introducing texture synthesis, the repairing effect is further improved. In Zoran and Weiss, the image with missing pixels is recovered by learning the patch a priori. In recent years, Convolutional Neural Networks (CNNs) have greatly improved the performance of tasks such as classification, object detection, and segmentation of semantic images. Ren et al have learned a convolutional network, which greatly improves the performance of image restoration through an efficient patch matching algorithm. When a similar patch is found, it performs well, but when the dataset does not contain enough data to fill the unknown region, it is likely to fail. Since each part may be unique in object repair and no patch with a trusted missing region can be found. While this problem can be alleviated through the use of external databases, the next problem is the need to learn a high-level representation of a particular object class for patch matching. Wright et al take image inpainting as a task to recover sparse signals from the input. By solving for sparse linear systems, images can be repaired from some corrupted input images. However, such algorithms require a high degree of structuring of the image. The purpose of image inpainting is to enable algorithms to complete inpainting of images without strict constraints. Vincent et al introduced a de-noised self-encoder that could learn to reconstruct a clean signal from a corrupted input. Dosovitskiy et al demonstrate that object images can be reconstructed by inverting the deep convolutional network features through the decoder network. Kingma et al propose Variational Autocoders (VAEs) that allow images to be generated by sampling or interpolation from potential units by imposing a priori on the potential units. However, VAE generated images are often blurred due to training targets based on pixel-level high gaussian likelihood.

Larsen et al improve VAE by adding a resistively trained discriminator from a generative resistively network and demonstrate that more realistic images can be generated. The closest to this work is the method proposed by deep et al, which uses an auto-encoder to combine learning visual representation with image restoration, but the picture restored using this method is not ideal in some cases, the restored area is not consistent with the whole picture, and the effect is not very good at the edge of the restored area. Yang et al proposed a multi-scale neural patch synthesis method based on joint optimization of image content and texture constraints in 2017, which not only preserves a context structure, but also generates high-frequency details by matching and adjusting the correlation between patches and the most similar intermediate layer features, thereby achieving the most advanced repair accuracy for high-resolution images at that time. Gao et al have studied the weakness of the traditional "fixed" model, have proposed an on-demand learning algorithm, is used for training the image restoration model with deep convolutional neural network, the main idea is to utilize the feedback mechanism to produce the training example that needs most oneself, thus study the model that can be promoted across the difficulty level. Aiming at the problems of the Context Encoder model, IIZUKA and the like of the early rice field university are improved, the design is expanded into two discriminators, and the trained global and local Context discriminators are used for distinguishing a real image and a repaired image respectively, so that the network can generate images which are locally and globally consistent. Liu et al believe that existing deep learning based image inpainting methods use standard convolution networks on the corrupted image, use of convolution filter responses conditioned on valid pixels and substitution values (typically averages) in the missing regions can often lead to artifacts such as color differences and blurring, and propose the use of partial convolution methods that can inpaint arbitrary non-central, irregular regions. Yan et al proposed in 2018 a "Shift-Net" model for filling any shape of missing region with sharp structures and fine textures, the encoder features of the known regions being shifted to serve as an estimate of the missing part, introducing a guiding loss on the decoder features to minimize the distance between the decoder features after the fully-connected layer and the ground-truth encoder features of the missing part, with this constraint, the decoder features of the missing region can be used to guide the shifting of the encoder features in the known regions.

In summary, researchers at home and abroad have proposed many methods for image restoration, but most of the restoration methods have low result precision and greatly improve performance. Aiming at the problems of low accuracy of the repair result, inconsistent visual repair effect, unstable training and the like of the existing method, the invention obtains the repair image with high precision, high accuracy and strong visual consistency by using the multi-scale generation type confrontation network model.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) early image inpainting methods used the diffusion equation to iteratively propagate low-level features of known regions along the mask boundary to unknown regions, and although performing well in inpainting, were limited to processing small, uniform regions.

(2) In existing convolutional neural network-based image inpainting methods, it performs well when a similar patch is found, but it is likely to fail when the dataset does not contain enough data to fill the unknown region.

(3) Image inpainting is used as a task to recover sparse signals from the input. By solving for sparse linear systems, images can be repaired from some corrupted input images. However, such algorithms require a high degree of structuring of the image.

(4) By applying a priori on the potential elements by a variational self-encoder, an image can be generated by sampling or interpolation from the potential elements. However, VAE generated images are often blurred due to training targets based on pixel-level high gaussian likelihood.

The difficulty in solving the above problems and defects is:

when the image damaged area is large, the repairing effect is poor and unsatisfactory, and the global and local consistency of the repaired image cannot be maintained, so that the repaired image lacks integrity.

The significance of solving the problems and the defects is as follows:

the stability of image restoration is improved, the restored image with high precision, high accuracy and strong visual consistency is obtained, and the restoration effect of the image is improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an image restoration method based on a multi-scale generation type confrontation network model.

The invention is realized in such a way that an image restoration method based on a multi-scale generation type confrontation network model comprises the following steps:

step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by using reconstruction loss and countermeasure loss.

And step two, improving the network structure of the discriminator, proposing a multi-scale discriminator structure on the basis of the global discriminator and the local discriminator, and carrying out countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images.

And step three, performing subsequent processing on the repaired image by using expansion convolution and a Poisson mixing method in a generator.

And step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.

Further, in step one, the multi-scale confrontation network model comprises a generation network for image restoration, and four additional discriminator networks for training assistance, namely two multi-scale discriminator networks, a global discriminator network and a local discriminator network.

Further, in step one, the generator uses a convolution self-encoder as a generator model

I.e. a standard encoder-decoder architecture that takes as input an image with missing regions, and generates a latent feature representation of the image by a convolution operation. The decoder architecture uses this latent feature representation to restore the original resolution by a transposed convolution operation, resulting in image content of the missing region. Unlike the original GAN model, which starts directly from the noise vector, the hidden representation obtained from the encoder captures more variations and relationships between the unknown and known regions, which are then input to the decoder to generate the content. The intermediate layer uses the expanded convolution, and each output pixel is calculated by using a larger input area without additional parameters and calculation amount, and compared with a standard convolution layer, the expanded convolution network model can calculate each output pixel under the influence of a larger pixel area of an input image. If the expansion convolution is not used, it will only use a small pixel area, which is not advantageousThe image composition is performed with more context information. The generator uses a standard self-encoder network, and an expansion convolutional layer is added on the basis, namely the generator network removes two layers of convolutional layers in the middle, and the types of the network layers, the sizes of convolutional cores, the number of zero padding of the convolutional cores, the step length and the number of output channels of the layers are sequentially arranged from left to right.

Further, in the first step, the discriminator compresses the image into corresponding small feature vectors based on a convolutional neural network. The prediction corresponds to a probability value that the image is authentic.

First, a local discriminator

Determining whether the composite content of the missing region is authentic can help the network generate information of the missing content, which encourages the generated objects to be semantically effective. Its limitations are also apparent due to the locality of the local arbiter. The local discriminator loss can neither normalize the global structure of a face nor ensure the consistency of the inner and outer edges of the missing region. Therefore, the inconsistency of the pixel values of the repair picture along the repair area boundary is significant. Due to the limitation of local discriminators, another network structure named global discriminator is introduced

To determine the accuracy of the image as a whole.

Finally, a multi-scale discriminator network structure is proposed. The basic idea is to down-sample the real and synthesized images with down-sampling coefficients of 2 and 4, respectively, train two discriminators

The real image and the restored image are distinguished on two different scales, respectively. The process of repairing the image by the generator is strictly controlled by two discriminator networks which input images with different resolutions, and the two multi-scale discriminators and the global discriminator have similar architectures but have different receptive fields. Compared with the soleThe global discriminator and the combined multi-scale discriminator are used for training, so that the generator can be guided to generate the repairing picture with stronger global consistency and finer details, and the repairing effect of the whole picture is more reasonable visually. By adding the two multi-scale discriminators into the network, a restored picture with better effect can be obtained.

And removing the last two full-connection layers from the global arbiter and the local arbiter in the model, and keeping other structures unchanged. The global arbiter, local arbiter, and multi-scale arbiter network architectures are shown in table 2. From left to right, the network layer type, the convolution kernel size, the step length and the number of output channels of the layer are sequentially arranged. a. b, c and d are respectively

Further, in step one, the method for modeling by the loss function is as follows:

first introducing reconstruction losses to the generator

Responsible for capturing structural information of the missing region and keeping consistent with the context, i.e. L between the pixels of the restored image and the original image₂Distance, z is noise mask:

but only using the losses

It was observed that the resulting restored image content tended to be blurred and smooth. Because L is₂The reason for the loss is due to L₂The loss of (c) penalizes outliers severely encouraging the network to smoothly cross various assumptions to avoid large penalties. By using a discriminator, a penalty on antagonism is introduced, which reflects how the generator fools the discriminator to the maximum extent, and how the discriminator distinguishes between true and false. Antagonism loss is based on the loss of GAN, which learns an antagonism discriminationModel of device

A loss gradient is provided for the generator model. Antagonism discriminator

Simultaneous pair generator

Generating samples and true samples for prediction and attempting to distinguish them, and generators

The arbiter is obfuscated by generating samples that are as "true" as possible

Wherein, P_data(x) And P_z(z) represents the distribution of the real data x and the noise variance z, respectively. The network is optimized by minimizing the generator loss and maximizing the arbiter loss.

Further, the Wassertein distance is used as an optimization method to train the GAN, and the specific method is that

The sigmoid is removed in the last layer,

and

the loss function of (2) does not take the logarithm of the loss function, and is updated every time

After the parameter, its absolute value is truncatedUntil a fixed constant is not exceeded, namely gradient clipping:

wherein the content of the first and second substances,

is a set of 1-Lipschitz functions.

Four discrimination networks

The definition of the loss function is the same. The only difference is that the local arbiter only provides the trained loss gradient for the missing region, and the global arbiter and the multi-scale arbiter back-propagate the loss gradient over the entire image at different resolutions. The discriminators are defined as:

wherein, the local discriminator

Is input into the generator

And outputting the repaired part of the image and the corresponding part of the real image.

Wherein the global arbiter

Input is generator

And (4) outputting an image and a real image.

Wherein, the multi-scale discriminator

Is input into the generator

The output image and the real image are down-sampled by 2 times, respectively.

Wherein, the multi-scale discriminator

Is input into the generator

The output image and the real image are down-sampled by 4 times, respectively.

The overall loss function for the entire network optimization is defined as:

λ₁、λ₂、λ₃、λ₄weights corresponding to the different losses for balancing the influence of the different losses on the overall loss function, λ₁、λ₂、λ₃、λ₄The specific numerical value of (A) needs to be set manually in the experimental process.

Further, in the second step, the training process is divided into three stages. First, a generator network is trained

By training the network with reconstruction loss, the generator can obtain fuzzy repair content, and the stage does not comprise countertraining and counterloss. Secondly, training all the discriminator networks by using the generator network finished by the first stage training

All discriminators are updated with loss of immunity. The last stage performs joint countermeasure training for the generator and all discriminators. Each stage prepares for the improvement of the next stage, thereby greatly improving the effectiveness and efficiency of network training, and the training process is completed by back propagation.

Setting lambda using default hyper-parameters when performing training for resistance loss₁、λ₂、λ₃、λ₄Are all 0.001. Training is done by adjusting the image size, using the image cropped to 256 × 256 as the input image. For the missing region, the input of the central square region in the image is set to 0, i.e. the missing part of the image, approximately covering the 1/4 image. The input for global discrimination is a full image of 256 × 256 size, the input for local discriminators is an image of a repair area of 128 × 128 size, and the input for the two multi-scale discriminators are full images of 128 × 128 and 64 × 64 size, respectively.

Another object of the present invention is to provide a repairing system for implementing the image repairing method based on the multi-scale generation type confrontation network model, wherein the image repairing system based on the multi-scale generation type confrontation network model comprises:

the depth generation countermeasure repair model building module is used for building a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;

the image restoration module is used for improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of the global discriminator and the local discriminator, and restoring the image by performing countermeasure training on the multi-scale discriminator by using images with different resolutions;

the image subsequent processing module uses expansion convolution in the generator and carries out subsequent processing on the repaired image by using a Poisson mixing method;

and the image repairing module verifies the advantages of the image repairing algorithm based on the generative confrontation network model and the image repairing effect through CelebA, ImageNet and Place2 data sets.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

step one, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;

improving the network structure of the discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;

step three, expanding convolution is used in a generator, and a Poisson mixing method is used for carrying out subsequent processing on the repaired image;

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

step four, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the image repairing effect through CelebA, ImageNet and Place2 data sets

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides an image restoration method based on a multi-scale generation type countermeasure network model, which provides a multi-scale generation countermeasure restoration model consisting of a generator and a plurality of countermeasure discriminators, and synthesizes missing contents from random noise by utilizing reconstruction loss and a plurality of countermeasure losses; by using the idea of WGAN and adopting EM distance to simulate data distribution, the network stability is improved and the picture restoration effect is improved. Finally, verification is carried out on the CelebA data set, subjective and objective evaluation methods are utilized to prove that the image restoration algorithm based on the multi-scale generation type countermeasure network, which is provided by the method, has higher restoration performance compared with the current image restoration method, corresponding training and testing are carried out on the ImageNet data set and the Places2 data set, the algorithm can be applied to restoration of various types of pictures, the algorithm has good effects, and the method has great significance in the fields of public security criminal investigation facial restoration, image scaling, redundant target elimination, image lossy compression, biomedical image application and the like.

Technical effect or experimental effect of comparison. The method comprises the following steps:

the first index is Peak Signal to Noise Ratio (PSNR), which is an objective standard for evaluating images, and is used to measure the pixel difference between a real image and a repaired image, and a larger value indicates less distortion.

Quantitative experimental results on PSNR

The second Index is a Structural Similarity Index (SSIM), which is used to evaluate the Structural Similarity between two images, and the value is a number between 0 and 1, and a larger value represents a smaller difference between the repaired image and the real image.

Quantitative experimental results on SSIM

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of an image inpainting method based on a multi-scale generative confrontation network model according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a generative confrontation network model according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a multi-scale discriminator model according to an embodiment of the invention.

Fig. 4 is a schematic diagram of a network architecture according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating a comparison of repair results for different models provided by an embodiment of the present invention;

in the figure: fig. (a) is an original image; graph (b) is a missing image; panel (c) is the CE result; panel (d) GLCIC results; graph (e) is the result of the algorithm provided by the present invention.

Fig. 6 is a schematic diagram of a repair result on the ImageNet dataset according to the embodiment of the present invention.

FIG. 7 is a diagram illustrating the repair result on the Places2 data set according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a repair result of two different pictures according to an embodiment of the present invention;

in the figure: figure (a) is an original picture; graph (b) is a missing picture; fig. (c) is a repair picture.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides an image restoration method based on a multi-scale generation type confrontation network model, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, an image repairing method based on a multi-scale generation type confrontation network model provided by an embodiment of the present invention includes the following steps:

s101, constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by using reconstruction loss and countermeasure loss.

S102, improving a network structure of the discriminator, proposing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and carrying out countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images.

And S103, performing subsequent processing on the repaired image by using a Poisson mixing method by using expansion convolution in a generator.

S104, verifying the advantages of the image repairing algorithm based on the generative confrontation network model and the repairing effect of the image through CelebA, ImageNet and Place2 data sets.

The present invention will be further described with reference to the following examples.

1. Summary of the invention

First, a deep-generation countermeasure restoration model composed of a generator and a countermeasure discriminator is proposed, and missing contents are synthesized from random noise using a reconstruction loss and a countermeasure loss. Secondly, a multi-scale discriminator structure is provided, and image restoration is carried out by using images with different resolutions for countertraining. Thirdly, the generator uses expansion convolution to reduce the information loss in the down-sampling process of the image, and a certain post-processing is carried out on the repaired image by using the currently popular Poisson mixing method. Finally, the advantages of the algorithm and the image restoration effect are demonstrated through experiments.

2. Related work

With the development of deep learning, a generative confrontation network model (GANs) proposed by Goodfellow et al in 2014 is a milestone development in the development of deep learning, and with the advent of GAN, the problem of blurring pictures generated by using a traditional VAE is solved, a frightening effect is achieved, and a large number of clear pictures can be generated theoretically.

The main inspiration of GAN comes from the idea of zero-sum game in game theory, and the whole network includes two network structures that are antagonistic to each other, namely, generation network g (generator) and discrimination network d (discriminator), as shown in fig. 2. And the game is continuously played through G and D, so that G learns the distribution of real data, and if the countermeasure network is used for generating images, the G can generate vivid images from noise after continuous training. G, D main functions are: g is a generative network, which inputs a random noise Z (random number) by which a false picture for spoofing D, G (Z), is generated. D is a discrimination network to discriminate whether a picture is "real". The input of the method is a picture, which may be from a real picture in a data set or from a picture generated by G, the output is the probability that the input picture is the real picture, if the output probability is 1, the D is represented to determine that the picture is the real picture, and if the output probability is 0, the D is represented to judge that the picture cannot be the real picture (namely, the picture generated by G). In the training process, the goal of generating the network G is to generate vivid false images as much as possible to deceive the discrimination network D. The goal of D is to try to distinguish between false images generated by G and true images. Thus, the training process of G and D forms a dynamic 'game process', and finally, the balance state, namely Nash equilibrium, is achieved. The result of the game is that in the most desirable state, G can generate a picture that is sufficiently true. For D, it is difficult to determine whether the picture generated by G is real or not, i.e. the output probability is 0.5, thus obtaining a generative network model G that can be used to generate the picture.

One of the main problems of GAN is instability in the learning process, such as network failure to converge, easy gradient disappearance and gradient descent, which results in a great deal of research on the problem. Wasserstein GAN proposed by Arjovsky et al improves GAN from the perspective of loss function, WGAN after the loss function improvement can obtain good performance results even on a full link layer, and the problem of unstable training is solved. Gulrajani et al improved on the basis of Wasserstein GAN, optimized the conditions of continuity limitation, solved the problems of disappearance of training gradient and gradient explosion and accelerated the convergence rate. The LSGAN model proposed by Mao et al uses a least square loss function instead of a loss function of GAN, and also alleviates the problems of unstable GAN training, poor quality of generated images and insufficient diversity.

Due to the rapid development of GAN in recent years, people have higher and higher requirements for resolution of GAN generated pictures, and another problem with GAN is that a network downsamples images in a pooling process to extract low-dimensional features, so that much key information in the images is lost, and a discriminator is easier to distinguish whether the images are true or false, so that gradients cannot indicate a correct optimization direction. How to effectively utilize the features extracted from each layer of the neural network, and fully extract the low-dimensional features of the image while minimizing the loss caused by the down-sampling process is a hot spot of current research. Yu et al in 2016 proposed an extended convolution method, which can expand the receptive field while keeping the feature size unchanged during the convolution process, effectively reducing the information loss caused by downsampling during the conventional convolution process, and using the method for image processing. The "pix 2 pixHD" model proposed by Wang et al uses conditional generation countermeasure networks (conditional GANs) to synthesize high-resolution realistic images, and uses a latest multi-scale generator-discriminator structure to improve the picture quality and the resolution of pictures while performing stable training, and fig. 3 shows a schematic diagram of multi-scale discriminator models, which have the same network structure but operate at different picture scales. These discriminators are referred to as D1, D2 and D3. In particular, the real and synthesized high resolution images are down sampled separately. Then training D1, D2 and D3 distinguish real images from synthetic images at three different scales, respectively.

The work of the invention is based on the method of "Context Encoder" proposed by Pathak et al and the method of "Global and Locally Consistent Image Completion" proposed by IIZUKA et al. The original purpose of GAN is to train the generative model using a convolutional neural network. These generators are trained with the aid of a discriminator, which is used to distinguish whether an image is generated by the generator or is real. The generator is trained to fool the arbiter while updating the arbiter. By using Mean Square Error (MSE) loss in combination with GAN loss, an image restoration network can be trained, avoiding the blur that is common when MSE loss is used alone. Using only this approach can make network training unstable. The present invention avoids this problem by replacing the loss of the traditional GAN with the loss in WGAN, using EM distance to measure the difference between data distributions, not training a pure generative model and tuning the learning process to prioritize stability. In addition, the framework and the training process are greatly optimized specially aiming at the image repairing problem. In particular, instead of using a single discriminator, a multi-scale discriminator similar to that in the "pix 2 pixHD" model is employed to improve the visual quality, using multiple discriminators.

3. Multi-scale countermeasure network model

In this section, the present invention introduces a multi-scale countermeasure network model, including a generation network for image restoration, four additional discriminant networks for training assistance, i.e., two multi-scale discriminant networks, a global discriminant network and a local discriminant network, so that the entire network can be trained to perform the image restoration task with excellent results. During training, the discriminators are trained to determine whether the image has been successfully repaired, while the generator is trained to fool all discriminators. Only through all the networks trained together can the generator actually repair the various images. The network architecture is shown in fig. 4.

3.1 generators

Using a convolutional autocoder as a generator model

I.e. a standard encoder-decoder architecture that takes as input an image with missing regions, and generates a latent feature representation of the image by a convolution operation. The decoder architecture uses this latent feature representation to restore the original resolution by a transposed convolution operation, resulting in image content of the missing region. Unlike the original GAN model, which starts directly from the noise vector, the hidden representation obtained from the encoder captures more variations and relationships between the unknown and known regions, which are then input to the decoder to generate the content. The intermediate layer uses the expanded convolution, and each output pixel is calculated by using a larger input area without additional parameters and calculation amount, and compared with a standard convolution layer, the expanded convolution network model can calculate each output pixel under the influence of a larger pixel area of an input image. If the expansion convolution is not used, it only uses a small pixel area, and can not use more context information to synthesize the image. The generator uses a standard self-encoder network, and an expansion convolutional layer is added on the basis, namely the generator network introduced in the existing document removes two layers of convolutional layers in the middle, and the network architecture is as shown in table 1, and sequentially comprises a network layer type, a convolutional kernel size, the number of zero padding of the convolutional kernel, a step length and the number of output channels of the layer from left to right.

Table 1 Generator network architecture

3.2 discriminator

By training the generator it is possible to fill in the corresponding pixels of the missing region with small reconstruction losses. Using the generator alone does not ensure that the filled area remains visually consistent. The generated image missing region is very blurred in pixels, and only the general shape of the missing region can be captured. In order to obtain a more vivid effect, a global discriminator, a local discriminator and a multi-scale discriminator are added as binary classifiers to distinguish true and false images, so as to distinguish whether the images are real or repaired. The discriminators help the network to improve the quality of the repair result, and a trained discriminator is not fooled by unrealistic images. These discriminators compress the image into corresponding small feature vectors based on a convolutional neural network. The prediction corresponds to a probability value that the image is authentic.

First, a local discriminator

Determining whether the composite content of the missing region is authentic can help the network generate information of the missing content, which encourages the generated objects to be semantically effective. Its limitations are also apparent due to the locality of the local arbiter. The local discriminator loss can neither normalize the global structure of a face nor ensure the consistency of the inner and outer edges of the missing region. Therefore, the inconsistency of the pixel values of the repair picture along the repair area boundary is significant.

Due to the limitation of local discriminators, another network structure named global discriminator is introduced

To determine the accuracy of the image as a whole. The basic idea is that the content of the generated image restoration area is not only realistic, but also consistent with the context. The network with the global discriminator greatly relieves the problem of inconsistency, further improves the effect of generating the repair picture and ensures that the repair picture is more real.

The real image and the restored image are distinguished on two different scales, respectively. The process of repairing the image by the generator is strictly controlled by two discriminator networks which input images with different resolutions, and the two multi-scale discriminators and the global discriminator have similar architectures but have different receptive fields. Compared with the method of singly using the global discriminator, the combined multi-scale discriminator is used for training to guide the generator to generate the repairing picture with stronger global consistency and finer details, and the repairing effect of the whole picture is more reasonable visually. By adding the two multi-scale discriminators into the network, a restored picture with better effect can be obtained.

TABLE 2 Multi-Scale discriminator architecture

3.3 loss function

There are generally many reasonable ways to fill in missing image regions that are consistent with the context. This behavior can be modeled, for example, by a loss function. Thus introducing reconstruction losses to the generator first

but only using the losses

It is observed that the resulting restored image content tends to blur and smooth because of L₂The reason for the loss is due to L₂The loss of (c) penalizes outliers severely encouraging the network to smoothly cross various assumptions to avoid large penalties. By using a discriminator, a penalty on antagonism is introduced, which reflects how the generator fools the discriminator to the maximum extent, and how the discriminator distinguishes between true and false. The antagonism loss is based on the loss of GAN. To learn the generative model of the data distribution, the GAN learns a antagonism discriminator model

A loss gradient is provided for the generator model. Antagonism discriminator

Simultaneous pair generator

The arbiter is obfuscated by generating samples that are as "true" as possible

Wherein, P_data(x) And P_z(z) represents the distribution of the real data x and the noise variance z, respectively. By minimizing generator losses and maximizing discriminator lossesAnd optimizing the network.

The cross entropy (JS divergence) in the traditional GAN is not suitable for measuring the distance between generated data distribution and real data distribution, if training the GAN by optimizing the JS divergence can lead to that a correct optimization target can not be found, so the WGAN proposes to use Wasserein distance (called Earth-Mover distance) as an optimization method to train the GAN, and the specific method is that

The sigmoid is removed in the last layer,

and

After a parameter, its absolute value is truncated to not exceed a fixed constant, the gradient clipping. The algorithm of the present invention does not use the traditional goal function of GAN but uses this approach:

wherein the content of the first and second substances,

is a set of 1-Lipschitz functions.

Four discrimination networks

wherein, the local discriminator

Is input into the generator

Wherein the global arbiter

Input is generator

And (4) outputting an image and a real image.

Wherein, the multi-scale discriminator

Is input into the generator

The output image and the real image are down-sampled by 2 times, respectively.

Wherein, the multi-scale discriminator

Is input into the generator

The output image and the real image are down-sampled by 4 times, respectively.

In summary, the total loss function for the whole network optimization is defined as:

4. Training

The work of the invention is based on the realization of a deep convolution antithetical neural network, and in order to effectively train the network, the training process is divided into three stages. First, a generator network is trained

When the training of the antagonism loss is carried out, the situation that the recognizer is too strong at the beginning of the training process is avoided. A default hyper-parameter (e.g., learning rate) is used. Setting of lambda₁、λ₂、λ₃、λ₄Are all 0.001. Training is done by adjusting the image size, using the image cropped to 256 × 256 as the input image. For the missing region, the input of the central square region in the image is set to 0, i.e. the missing part of the image, approximately covering the 1/4 image. The input for global discrimination is a full image of 256 × 256 size, the input for local discriminators is an image of a repair area of 128 × 128 size, and the input for the two multi-scale discriminators are full images of 128 × 128 and 64 × 64 size, respectively. Our network model can reasonably fill in missing regions, but sometimes the generated regions have color inconsistency with surrounding regions. To avoid this, a simple post-processing is performed by mixing the repaired area with the colors of the surrounding pixels. In particular, the present invention uses poisson image blending for subsequent processing of images.

5. Results and analysis of the experiments

The present invention trains a multi-scale generative confrontation network model by using 100000 images acquired from the CelebA dataset. 80000 sheets for training and 20000 sheets for testing, the data set includes a wide variety of face images, and the batch size is set to 32. The generator network goes through 20000 iterations; then training a discriminator to iterate for 10000 times; and finally, training the whole network 70000 times together. The device parameters are CPU: intel i7-8700, GPU: RTX2080Ti-11G, memory: DDR 4-3000-32G. The code runs under a Pythrch deep learning framework, and the whole network training completion time is about 5 days.

The obtained experimental results were compared with those of the "Context Encoders" method using only one discriminator acting on the repair area and the "Globally and Locally Consistent Image Completion" method using a generator and two discriminators. For comparative fairness, the model was retrained for the same number of iterations, and the results are shown in FIG. 5.

In each test image, the network will automatically cover the area in the middle of the image, since important components of the face (e.g., eyes, mouth, eyebrows, hair, nose) will typically be included in the middle. The four rows represent the repair results of four different test images, respectively. The first column a corresponds to four original non-missing images. The second column b is the missing image with the mask added. The third column c is the repair result of the "Context Encoders" network, and because the structure lacks the understanding of global consistency, the result repaired by the method has obvious global inconsistency and the repair effect of the missing area is very fuzzy, so that the requirement of the image repair task cannot be met.

The fourth column d is a repairing effect diagram of a global discriminator and a Locally discriminated 'global and local Consistent Image Completion' method, the network can repair the Image more reasonably by introducing the countermeasure loss, the local discriminator influences the Image missing region to successfully complete the repair of the missing region part, the global discriminator influences the whole Image according to the global inconsistency of the repaired Image to force the network to generate the Globally Consistent Image, so that the obvious edge difference is eliminated, and the repairing result is better. The fifth column e, the repair result of the algorithm proposed by the present invention, uses WGAN loss to make the training of the entire antagonistic network more stable. A multi-scale discriminator is added and is trained together with a global discriminator and a local discriminator. It can be seen that compared with the result of d, e is improved to a certain extent in the aspect of the details of restoration, the image integrity is higher, and the restoration effect is better.

Besides the visual effect, the invention also carries out quantitative evaluation on the CelebA test data set by using the PSNR and the SSIM, and the two indexes are calculated between the repair result obtained by different methods and the original face image.

The first index is peak signal-to-noise ratio (PSNR), an objective criterion for evaluating images, which directly measures the difference in pixel values, with larger values indicating less distortion. Assuming that the two input images are X and Y, respectively, the calculation formula is as follows:

where MSE represents the Mean Square Error (Mean Square Error) of the restored image X and the real image Y, H, W represents the height and width of the image, respectively, n represents the number of bits per pixel, and is generally 8, i.e. the number of pixel gray levels is 256, and the result is shown in table 3.

TABLE 3 results of quantitative experiments on PSNR

The second index is a Structural Similarity Index (SSIM), which is an index for measuring the similarity between two images, and is a number between 0 and 1, and a larger value represents a smaller difference between a repaired image and a real image, i.e., the image quality is better. When the two images are identical, their value is 1. Assuming that the two input images are X and Y, respectively, the calculation formula is as follows:

wherein, mu_XAnd mu_YRespectively represent the mean values of X, Y,. sigma_XAnd σ_YRespectively represents the standard deviation, sigma, of X and Y_XYRepresents the covariance of X and Y, and c₁，c₂Are respectively constant, avoiding denominator of 0. The calculation results are shown in table 4.

TABLE 4 results of quantitative experiments on SSIM

In addition, in order to prove that the algorithm provided by the invention can be suitable for various types of image restoration, 50000 images acquired from ImageNet data sets and 50000 images acquired from Places2 data sets are respectively used for correspondingly training the model of the invention. The network model training method is the same as the training method used in the CelebA dataset, and the experimental results are respectively shown in FIG. 6 and FIG. 7, which shows that the model also has good performance on ImageNet dataset and Places2 dataset.

In a word, the invention analyzes the defects of the existing algorithm, correspondingly introduces the principle of the generative confrontation network, applies the generative confrontation network to the image restoration algorithm, provides a multi-scale generative confrontation restoration model consisting of a generator and a plurality of confrontation discriminators, and synthesizes the missing content from random noise by utilizing reconstruction loss and a plurality of confrontation losses; by using the idea of WGAN and adopting EM distance to simulate data distribution, the network stability is improved and the picture restoration effect is improved. Finally, verification is carried out on the CelebA data set, and subjective and objective evaluation methods are utilized to prove that the image restoration algorithm based on the multi-scale generation type confrontation network has higher restoration performance compared with the current image restoration method, corresponding training and testing are carried out on the ImageNet data set and the Places2 data set, and the algorithm can be applied to restoration of various types of pictures and has good effect.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image restoration method based on a multi-scale generation type confrontation network model is characterized in that the image restoration method based on the multi-scale generation type confrontation network model comprises the following steps:

constructing a depth generation countermeasure repair model consisting of a generator and an countermeasure discriminator, and synthesizing missing contents from random noise by utilizing reconstruction loss and countermeasure loss;

improving a network structure of a discriminator, providing a multi-scale discriminator structure on the basis of a global discriminator and a local discriminator, and performing countermeasure training on the multi-scale discriminator by using images with different resolutions to repair the images;

expanding convolution is used in a generator, and a Poisson mixing method is utilized to carry out subsequent processing on the repaired image;

the advantages of the image restoration algorithm based on the generative confrontation network model and the restoration effect of the image are verified through CelebA, ImageNet and Place2 data sets.

2. The method as claimed in claim 1, wherein the multi-scale generation type confrontation network model comprises a generation network for image restoration, and four additional discriminant networks for training assistance, namely two multi-scale discriminant networks, a global discriminant network and a local discriminant network.

3. The method as claimed in claim 1, wherein the generator uses a convolutional auto-encoder as the generator model

Namely a standard encoder-decoder structure, the encoder structure takes an image with a missing region as an input, and generates a potential feature representation of the image through a convolution operation;

the decoder structure utilizes the potential feature representation to restore the original resolution through a transposition convolution operation to generate the image content of the missing area; unlike the original GAN model, which starts directly from the noise vector, the hidden representation obtained from the encoder captures more variations and relationships between the unknown and known regions, which are then input to the decoder to generate the content; the intermediate layer uses the expanded convolution, each output pixel is allowed to be calculated by using a larger input area, no additional parameter and calculated amount exist, and compared with a standard convolution layer, the expanded convolution network model can calculate each output pixel under the influence of a larger pixel area of an input image; the generator uses a standard self-encoder network, and an expansion convolutional layer is added on the basis, namely the generator network removes two layers of convolutional layers in the middle, and the types of the network layers, the sizes of convolutional cores, the number of zero padding of the convolutional cores, the step length and the number of output channels of the layers are sequentially arranged from left to right.

4. The image inpainting method based on the multi-scale generation type confrontation network model as claimed in claim 1, wherein the discriminator compresses the image into corresponding small feature vectors based on a convolutional neural network; predicting a probability value corresponding to the image being authentic;

first, a local discriminator

Whether the synthesized content of the missing area is real or not is determined, the network can be helped to generate the information of the missing content, and the generated object is encouraged to be semantically effective;

due to the locality of the local arbiter, another network structure named global arbiter is introduced

To determine the accuracy of the image as a whole;

finally, a multi-scale discriminator network structure is provided; the basic idea is to down-sample the real and synthesized images with down-sampling coefficients of 2 and 4, respectively, train two discriminators

Distinguishing a real image and a restored image on two different scales respectively; the process of repairing an image by a generator is carried out by two discriminator networks with different resolution images as inputStrict control is performed, and the two multi-scale discriminators and the global discriminator have similar architectures but have different-sized reception fields;

removing the last two full-connection layers from the global arbiter and the local arbiter in the model, and keeping other structures unchanged; from left to right, sequentially setting the type of a network layer, the size of a convolution kernel, the step length and the number of output channels of the layer; a. b, c and d are respectively

5. The method for image inpainting based on multi-scale generative confrontation network model as claimed in claim 1, wherein the method of modeling by loss function is as follows:

first introducing reconstruction losses to the generator

but only using the losses

The resulting restored image content was observed to tend to be blurred and smooth; because L is₂The reason for the loss is due to L₂The loss of (1) penalizes outliers severely, encouraging the network to smoothly cross various assumptions to avoid large penalties; by using a discriminator, a penalty on antagonism is introduced, which reflects how the generator fools the discriminator to the maximum extent, and how the discriminator distinguishes between true and false; antagonism loss is based on the loss of GAN, which learns an antagonism discriminator model

Providing a loss gradient for the generator model; antagonism discriminator

Simultaneous pair generator

Generating samples and true samples for prediction and attempting to distinguish, and generators

By generating a true sample for the confusion arbiter

Wherein, P_data(x) And P_z(z) represents distributions of real data x and noise variance z, respectively; the network is optimized by minimizing the generator loss and maximizing the arbiter loss.

6. The image inpainting method based on the multi-scale generation type confrontation network model as claimed in claim 1, wherein Wasserein distance is used as an optimization method to train GAN

The sigmoid is removed in the last layer,

and

is not taking the logarithm of the loss functionEach time of update

After a parameter, its absolute value is truncated to not more than a fixed constant, gradientclipping:

wherein l is a set of 1-Lipschitz functions;

four discrimination networks

The definition of the loss function is the same; the only difference is that the local discriminator only provides a loss gradient for training for the missing region, and the global discriminator and the multi-scale discriminator reversely propagate the loss gradient on the whole image with different resolutions; the discriminators are defined as:

wherein, the local discriminator

Is input into the generator

Outputting a repaired part of the image and a part corresponding to the real image;

wherein the global arbiter

Input is generator

An image and a real image are obtained;

wherein, the multi-scale discriminator

Is input into the generator

Respectively sampling the output image and the real image by 2 times;

wherein, the multi-scale discriminator

Is input into the generator

Respectively down-sampling 4 times of the output image and the real image;

the overall loss function for the entire network optimization is defined as:

λ₁、λ₂、λ₃、λ₄weights corresponding to the different losses for balancing the influence of the different losses on the overall loss function, λ₁、λ₂、λ₃、λ₄The specific value of (A) is required in the experimental processIt is artificially set.

7. The image inpainting method based on the multi-scale generative confrontation network model as claimed in claim 1, wherein the training process is divided into three stages; first, a generator network is trained

Training the network by using reconstruction loss, wherein a generator can obtain fuzzy repair content, and the stage does not comprise countermeasure training and countermeasure loss; secondly, training all the discriminator networks by using the generator network finished by the first stage training

Updating all the discriminators by using the countermeasure loss; in the last stage, the generator and all the discriminators are subjected to combined confrontation training, and the training process is completed through back propagation;

setting lambda using default hyper-parameters when performing training for resistance loss₁、λ₂、λ₃、λ₄Are all 0.001; training is completed by adjusting the image size, and the image is cut into 256 × 256 images to be used as input images; for the missing region, the input of the central square region in the image is set to 0, i.e., the missing portion of the image, approximately covering the 1/4 image; the input for global discrimination is a full image of 256 × 256 size, the input for local discriminators is an image of a repair area of 128 × 128 size, and the input for the two multi-scale discriminators are full images of 128 × 128 and 64 × 64 size, respectively.

8. A restoration system for implementing the image restoration method based on the multi-scale generation type confrontation network model according to any one of claims 1 to 7, characterized in that the image restoration system based on the multi-scale generation type confrontation network model comprises:

9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

10. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: