CN114862679A

CN114862679A - Single-image super-resolution reconstruction method based on residual error generation countermeasure network

Info

Publication number: CN114862679A
Application number: CN202210499131.3A
Authority: CN
Inventors: 杨旭广; 杨欣; 李恒锐; 朱义天; 樊江锋; 周大可
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-05

Abstract

The invention discloses a single-image super-resolution reconstruction method for generating a countermeasure network based on residual errors, which comprises the steps of firstly establishing a generation network, inputting a low-resolution image into the generation network, and obtaining a generated high-resolution image; then, inputting the generated high-resolution image and a real high-resolution image corresponding to the low-resolution image into a discrimination network model together, and calculating the difference between the two images through a perception loss function; then training the generated network and the judgment network simultaneously to ensure that the loss of the generated high-resolution image relative to the real high-resolution image is less than or equal to a preset threshold value, and obtaining the trained generated network; and finally, inputting the low-resolution image of which the resolution needs to be improved into the trained generation network to obtain the reconstructed high-resolution image. The invention overcomes the defect that the difference between the input image and the output image cannot be perceptually reflected in the prior art, and improves the precision of the model and the quality of the generated image.

Description

Single-image super-resolution reconstruction method based on residual error generation countermeasure network

Technical Field

The invention relates to the technical field of image enhancement, in particular to a single-image super-resolution reconstruction method based on a residual error generation countermeasure network.

Background

Super-resolution reconstruction is a classic application belonging to the field of computer vision, and has wide application in monitoring equipment, microscopic imaging, video coding communication, video restoration, satellite imaging remote sensing, digital high-definition images, medical image processing and the like. Super-resolution reconstruction one or more frames of images are reconstructed into a higher resolution image or video by analyzing the signals of the digital images and employing algorithms of the software. The image style migration is to render the content map into a style map type drawing by respectively inputting a style map and a content map, wherein the output picture from the semantic dimension needs to be close to the input image and be close to the target picture in style, color and texture. Style migration is of great importance to understanding both the image and the pictorial representation.

The problem of image super-resolution reconstruction and the problem of stylization can be regarded as the problem of processing and transformation of images, one of which trains a feedforward convolutional neural network in a supervised mode, and a loss function is used for representing the difference between output and input images, and a pixel-by-pixel difference method is used as the loss function in the network. The method can be used for obtaining a trained network only by one-time feedforward, but has the defect that the method which applies the loss function of pixel-by-pixel difference solving cannot perceptually reflect the difference between an input image and an output image. And secondly, a perception loss function is established, high-level image features are extracted from the trained CNN for solving the difference, super-resolution image reconstruction is realized by minimizing the loss function, and the obtained image is high in quality after synthesis and stylization. But the disadvantage is that the training process is very slow and requires a long-term iterative optimization process.

The current super-resolution reconstruction still has a problem that is difficult to solve, namely, a one-to-many relationship between a low-resolution image (LR) and a converted high-resolution image (HR) can exist. This uncertainty becomes larger as the super resolution factor becomes larger.

Disclosure of Invention

The invention aims to solve the technical problem of providing a single-image super-resolution reconstruction method for generating a countermeasure network based on residual errors aiming at the defects involved in the background technology.

The invention adopts the following technical scheme for solving the technical problems:

the single-image super-resolution reconstruction method for generating the countermeasure network based on the residual error comprises the following steps:

step 1), establishing a generating network;

step 2), inputting the low-resolution image into a generation network to obtain a generated high-resolution image;

step 3), inputting the generated high-resolution image and the real high-resolution image corresponding to the low-resolution image into a discrimination network model together, and calculating the difference between the two images through a perception loss function;

step 4), training the generation network and the discrimination network simultaneously, so that the loss of the generated high-resolution image relative to the real high-resolution image is less than or equal to a preset threshold value, and obtaining the generated network after training;

and 5), inputting the low-resolution image of which the resolution needs to be improved into the trained generation network to obtain a reconstructed high-resolution image.

As a further optimization scheme of the single-image super-resolution reconstruction method for generating the countermeasure network based on the residual error, the generation network in the step 1) comprises a preprocessing layer, a core residual error network and an up-sampling layer;

the pretreatment layer comprises a first convolution layer, a second convolution layer and a relu activation layer, wherein the first convolution layer and the second convolution layer are alternated, and the depths of the first convolution layer and the second convolution layer are 64 and 256 respectively; the size of the first convolution layer convolution kernel is 9 multiplied by 9, and the size of the first convolution layer convolution kernel is 3 multiplied by 3;

the core residual error network comprises 16 residual error blocks, and the residual error blocks adopt a structure of a BN (batch normalization) layer + a convolutional layer of 64 feature maps + a relu active layer + a BN layer + a convolutional layer of 64 feature maps + a relu active layer;

the up-sampling layer performs up-sampling by using two sub-pixel convolution layers, so that the resolution of an input image is improved; and a convolution layer with the depth of 256 is respectively added before the two sub-pixel convolution layers, a relu activation layer is respectively added after the two sub-pixel convolution layers, and the image is reconstructed and amplified step by step so as to avoid the loss of image details caused by continuous amplification of the image.

As a further optimization scheme of the single-map super-resolution reconstruction method based on the residual error generation countermeasure network, the discrimination network described in step 3) uses a convolution layer of 32 feature maps with stride 1 + leak Relu activation layer + stride 2 + convolution layer of 32 feature maps with leak Relu activation layer + stride 1 + convolution layer of 64 feature maps with leak Relu activation layer + stride 2 + convolution layer of 64 feature maps with leak Relu activation layer + stride 1 of 128 feature maps with 1 + convolution layer of 128 feature maps with leak Relu activation layer + stride 2 + convolution layer of 256 feature maps with leak Relu activation layer + stride 2 + two convolution layers with leak Reval activation layer + stride 512 feature maps with 256 feature maps with stride 2 + stride 2.

As a further optimization scheme of the single-image super-resolution reconstruction method for generating the countermeasure network based on the residual error, the perception loss function in the step 3) comprises pixel-level MAE loss, VGG loss and discriminator loss;

the pixel level MAE loss is calculated directly using L1-loss;

the VGG loss

In the formula (I), the compound is shown in the specification,

representative of the generating network, I ^HR 、I ^LR Respectively representing high resolution image, lowResolution image phi _i,j Meaning vgg19 the jth convolutional layer, W, preceding the ith pooling layer in the network _i,j 、H _i,j The widths and the heights of the feature maps in the jth convolutional layer before the ith pooling layer in the VGG network respectively;

loss of discriminator

In the formula (I), the compound is shown in the specification,

represents the discriminator, and N is the size of back-size.

As a further optimization scheme of the single-image super-resolution reconstruction method for generating the countermeasure network based on the residual error, in the step 4), a training set uses a DIV2K data set, each image in the training set is transformed to 384x384 as a high-resolution input, and transformed to 96x96 as a low-resolution input, the training set comprises 800 training images, back-size is 16 during training, the number of iterations is 100000, and the network is reversely updated every 1000 times of training; the loss of perception is reduced by RMSprop optimization, learning rate 1e-4, no dropout.

Compared with the prior art, the technical scheme adopted by the invention has the following technical effects:

the invention overcomes the defect that the difference between the input image and the output image cannot be perceptually reflected in the prior art, and improves the precision of the model and the quality of the generated image.

Drawings

Fig. 1 is a schematic view of the general structure of the present invention.

Fig. 2 is a schematic diagram of a residual error generation network according to the present invention.

Fig. 3 is a schematic diagram of the residual block structure according to the present invention.

Fig. 4 is a schematic diagram of a discrimination network structure according to the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. In the drawings, components are exaggerated for clarity.

As shown in fig. 1, the invention discloses a single-image super-resolution reconstruction method for generating a countermeasure network based on residual errors, which comprises the following steps:

step 1), establishing a generating network;

the generation network comprises a preprocessing layer, a core residual error network and an up-sampling layer;

the up-sampling layer performs up-sampling by using two sub-pixel convolution layers, so that the resolution of an input image is improved; a convolution layer with the depth of 256 is respectively added before the two sub-pixel convolution layers, a relu activation layer is respectively added after the two sub-pixel convolution layers, and the image is reconstructed and amplified step by step so as to avoid the loss of image details caused by continuous amplification of the image;

the discrimination network adopts two structures of a convolution layer of 32 feature maps with stride being 1 + leak Relu active layer + stride being 2 + convolution layer of 32 feature maps with stride being 2 + leak Relu active layer + stride being 1 + convolution layer of 64 feature maps with leak Relu active layer + stride being 2 + convolution layer of 128 feature maps with leak Relu active layer + stride being 1 + convolution layer of 256 feature maps with leak Relu active layer + stride being 2 + convolution layer of 128 feature maps with leak Relu active layer + stride being 2 + 512 feature maps with leak Relu active layer + stride being 512 feature maps with trace Relu active layer + stride being 1 + 512 feature maps with stride being 2;

the perception loss function comprises pixel-level MAE loss, VGG loss and discriminator loss;

the pixel level MAE loss is calculated directly using L1-loss;

the VGG loss

In the formula (I), the compound is shown in the specification,

representative of the generating network, I ^HR 、I ^LR Respectively representing a high resolution image, a low resolution image, [ phi ] _i,j Meaning vgg19 the jth convolutional layer, W, preceding the ith pooling layer in the network _i,j 、H _i,j The widths and the heights of the feature maps in the jth convolutional layer before the ith pooling layer in the VGG network respectively;

loss of discriminator

In the formula (I), the compound is shown in the specification,

representing a discriminator, N is the size of back-size;

the training set uses a DIV2K data set, each image in the training set is transformed to the size of 384x384 as a high-resolution input, and is transformed to the size of 96x96 as a low-resolution input, the training set comprises 800 training images, back-size is 16 during training, the number of iterations is 100000, and the network is reversely updated every 1000 times of training; the perception loss is reduced by RMSprop optimization, the learning rate is 1e-4, and dropout is avoided;

FIG. 1 is an overall structure of the present invention, which comprises a generation network and a countermeasure network (for defining a series of loss functions), wherein the main structure of the generation network is a combination of a deep residual error network and an SRGAN network; the weights in the countermeasure network are parameters that map the input picture into the output image, and each penalty function computation scalar value is a measure of the difference between the image output and the target image. The image network is trained using Adam to keep the weighted sum of a series of loss functions down.

Fig. 2 is a schematic diagram of a generated network, and a network structure is generated by sampling the inside of the network up and down through stride convolution. The first and last residual convolutional layers kernel are 9x9, and the kernel for the remaining layers are all 3x 3. The network uses two stride 2 convolutional layers to perform down sampling in the network, and then the network is connected with five residual blocks, and finally two stride 1/2 deconvolution layers are used for up sampling.

Fig. 3 is a schematic diagram of the structure of the residual network, which is motivated by solving the "degradation" problem. ResNet can learn redundant blocks into identity mapping, has no influence on performance, and has a certain deep self-adaptive capacity, so that the training of a network with more layers is possible, and the network performance is improved. In the invention, 15 residual modules based on ResNet are used in the feedforward convolutional neural network.

Fig. 4 is a schematic structural diagram of a discrimination network, the core of which is based on vgg19 network, and is used for discriminating whether a picture is a high-resolution image or a super-resolution image, and measuring the difference between the high-resolution image and the super-resolution image through a perceptual loss function and an MSE loss function.

Examples are as follows: the training set used the DIV2K dataset, and transformed each image in the training set to a size of 384x384 as HR input and to a size of 96x96 as LR input, which was a total of 800 training images; set5, Set14 and BSD100 are adopted in the verification Set; the super-resolution reconstruction of x4 is completed through a trained model, the feature loss is minimized by extracting vgg19 at 2_2 layers, during training, bacth-size is 16, the iteration number is 100000 times, RMSprop optimization is performed, the learning rate is 1e-4, and dropout is avoided. And verifying the super-resolution performance of the network through the trained generation network by using a verification set.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The single-image super-resolution reconstruction method for generating the countermeasure network based on the residual error is characterized by comprising the following steps of:

step 1), establishing a generating network;

2. The single-picture super-resolution reconstruction method based on residual generation countermeasure network of claim 1, wherein the generation network in step 1) comprises a preprocessing layer, a core residual network and an upsampling layer;

3. The single-graph super-resolution reconstruction method based on residual generation countermeasure network of claim 2, the discrimination network described in step 3) uses two convolution layers of 256 characteristic maps of stride + leanreactivating layer + stride + 512 characteristic maps of 256 characteristic maps of 1 strand + leanreactivating layer + stride 2, and the convolution layers of 64 characteristic maps of stride + leanreactivating layer + stride of 2 strand + leakreactivating layer + stride of 64 characteristic maps of 2 strand + leakreactivating layer + stride of 1 strand + stride of 128 characteristic maps of 2 strand + leakreactivating layer + stride of 2.

4. The single-map super-resolution reconstruction method based on residual generation countermeasure network of claim 3, wherein the perceptual loss function in step 3) comprises pixel-level MAE loss, VGG loss, discriminator loss;

the pixel level MAE loss is calculated directly using L1-loss;

the VGG loss

In the formula (I), the compound is shown in the specification,

loss of discriminator

In the formula (I), the compound is shown in the specification,

represents the discriminator, and N is the size of back-size.

5. The single-image super-resolution reconstruction method based on residual error generation countermeasure network of claim 4, wherein in the step 4), a DIV2K data set is used in the training set, each image in the training set is transformed to 384x384 as a high resolution input, and transformed to 96x96 as a low resolution input, the training set comprises 800 training images, back-size is 16 during training, the number of iterations is 100000, and the network is reversely updated every 1000 times of training; the loss of perception is reduced by RMSprop optimization, learning rate 1e-4, no dropout.