CN112037131A

CN112037131A - Single-image super-resolution reconstruction method based on generation countermeasure network

Info

Publication number: CN112037131A
Application number: CN202010894651.5A
Authority: CN
Inventors: 王道累; 孙嘉珺; 朱瑞; 韩清鹏; 袁斌霞; 张天宇; 李明山; 李超
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-12-04

Abstract

The invention relates to a single image super-resolution reconstruction method based on a generation countermeasure network, which comprises the following steps: s1: establishing an image database; s2: constructing a generated countermeasure network, wherein the generated countermeasure network comprises a generation network module, a judgment module and a loss calculation module; s3: inputting the high-definition-low-definition image pairs in the training set and the verification set into a generation countermeasure network, and performing iterative training to obtain a trained generation countermeasure network; s4: and inputting the original high-definition images in the test set into the trained generation countermeasure network, and outputting reconstructed high-definition images. Compared with the prior art, the image reconstructed by the method has better peak signal-to-noise ratio and structural similarity compared with a baseline method, and the generation of artifacts is reduced to a great extent, so that the restored image contains more high-frequency details of the original image and looks truer and more natural.

Description

Single-image super-resolution reconstruction method based on generation countermeasure network

Technical Field

The invention relates to the field of digital image processing, in particular to a single-image super-resolution reconstruction method based on a generation countermeasure network.

Background

Single-frame image super-resolution (SISR) is a basic low-level visual problem, and is receiving increasing attention from the research community. In actual vision tasks, due to the quality of the associated electronic devices being uneven, weather disturbances, and other various unknown factors, the resolution of the images obtained often fall short of the requirements of these vision tasks. People cannot obtain the information they need in these too low resolution images. Meanwhile, it is often difficult to retrieve pictures containing such important information, and image super-resolution reconstruction is brought about by these requirements.

Much research in the prior art has focused on how to improve the quality of the reconstructed images. Chinese patent CN202010221409.1 discloses a method and apparatus for reconstructing super-resolution image based on neural network, in which a receptive field fusion unit and a channel information fusion unit are used to extract and fuse the features of the input image, so as to improve the quality of the reconstructed image. Compared with the traditional linear interpolation, bicubic interpolation and the like, the method has the advantages of higher training speed, higher network efficiency and better robustness. This method, however, tends to output results that are overly smooth and lack high frequency detail, fails to produce images that look truly natural, and has artifacts.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a single-image super-resolution reconstruction method based on a generation countermeasure network.

The purpose of the invention can be realized by the following technical scheme:

a single-image super-resolution reconstruction method based on a generation countermeasure network comprises the following steps:

s1: establishing an image database, wherein the image database comprises a plurality of high-definition-low-definition image pairs, the high-definition-low-definition image pairs comprise an original high-definition image and a low-resolution image obtained by down-sampling the original high-definition image, and the high-definition-low-definition image pairs in the image database are divided into a training set, a verification set and a test set;

s2: constructing a generated countermeasure network, wherein the generated countermeasure network comprises a generation network module, a judgment module and a loss calculation module;

s3: inputting high-definition-low-definition images in a training set and a verification set into a generation countermeasure network, performing iterative training to obtain a trained generation countermeasure network, and during the iterative training, when a generation network module is trained, fixing parameters of a discrimination module for training, and when the discrimination module is trained, fixing parameters of the generation network module for training;

s4: and inputting the original high-definition images in the test set into the trained generation countermeasure network, and outputting reconstructed high-definition images.

Preferably, the network generation module includes a shallow feature extraction network, a residual dense block, a dense feature fusion unit, and an upsampling network, which are connected in sequence.

Preferably, the shallow feature extraction network includes two 3 × 3 shallow convolutional layers, and performs shallow feature extraction on the input low-resolution image.

Preferably, the dense feature fusion unit comprises a Concat layer, a 1x1 convolution layer and a 3x3 convolution layer which are connected in sequence.

Preferably, the residual dense block comprises a plurality of RDB blocks, and the output F of the ith RDB block_iIs composed of

F_i＝H_RDB,i(F_i-1)＝H_RDB,i(H_RDB,i-1(…H_RDB,1(F₀)…))

Wherein H_RDB,iRepresenting the ith RDB module, the input of which is the first i-1 RDB modulesResult of block calculation, F₀The output results in the network are extracted for the shallow features.

Preferably, each of the RDB modules includes 6 3 × 3 convolutional layers, each of which is activated using a leakage ReLU function.

Preferably, the dense feature fusion unit includes a Concat layer, a 1x1 convolutional layer and a 3x3 convolutional layer connected in sequence, and the dense feature fusion unit is represented as:

F_GF＝H_GFF(F_-1,F₀,F₁…F₃₆)

wherein H_GFFAs a dense feature fusion function, F_GFFor the output of the dense feature fusion unit, [ F₁,…,F₃₆]Splicing the generated characteristic graphs of all RDB modules in the residual error dense block on a channel to form a whole, F_-1，F₀The output results of the two convolutional layers in the shallow feature extraction network are respectively.

Preferably, in the upsampling network, the size of the image is restored to 2 times of the original size by using one nearest neighbor interpolation, and then the size of the image is restored to 4 times of the original size by using a layer of trained sub-pixel convolution layer.

Preferably, the overall loss function L in the loss calculation module_GThe formula of (1) is:

L_G＝L_per+λL_con+ηL_adv

wherein L is_GAs a function of the global loss, L_perFor perception of loss, L_conFor content loss, L_advTo combat the loss.

Preferably, said perceptual loss L_perThe calculation formula of (2) is as follows:

wherein, W_ijTo determine the width, H, of the feature map obtained by convolution j times before the ith largest pooling layer in the module_ijTo identify the ith largest in the moduleThe height of the feature map obtained by convolution j times before the pooling layer, a is the abscissa of a certain pixel point in the current feature map, b is the ordinate of a certain pixel point in the current feature map,

to identify the feature map obtained by convolution j times before the ith largest pooling layer in the module, I_HRFor high definition images, I_LRFor low resolution images, G is the entire generated network.

Preferably, said content loss L_conThe calculation formula of (2) is as follows:

wherein L is_MSE(theta) represents the loss in MSE,

is input as

Represents the mapping from a low resolution image to a high resolution image,

represents the k-th original high-definition image,

representing the k-th low resolution image, N representing the number of training samples,

the generation of the mapping relation between the low-resolution image learned by the countermeasure network and the original high-definition image is shown.

Preferably, said antagonistic loss L_advThe calculation formula of (2) is as follows:

wherein, E_x～pGRepresenting the distribution of x samples in the generated data, E_x～pdataRepresenting the distribution of x samples in the real data, E_x～penaltyRepresents the distribution of x sampling in the gradient penalty, λ is the penalty term weight, D (x) is the output of the discrimination module,

to determine the gradient of the module to the input.

Preferably, the countermeasure objective function of the generation network module and the discrimination module in the generation countermeasure network is as follows:

wherein D represents the discrimination module, G represents the generation network, Z represents the random noise conforming to the Gaussian distribution input to the generation network, D (x) represents the prediction result of the discrimination module on the original image, D (G (Z))) represents the prediction result of the discrimination module on the generated image,

representing the distribution of x samples in the real data,

and Z is sampled in random prior distribution, and lambda is the weight of a penalty term.

Compared with the prior art, the invention has the following advantages:

(1) according to the method, the countermeasure network is generated to perform image super-resolution, the powerful image generation capacity of the countermeasure network is utilized, compared with the traditional super-resolution method based on interpolation, reconstruction and learning, the semantic information of the whole image and the high-frequency detail information on the original image are paid more attention to, and the reconstructed image has better peak signal-to-noise ratio and structural similarity compared with a baseline method;

(2) residual error learning effectively relieves gradient disappearance caused by increasing of network depth, so that the network depth is improved, good performance and efficiency can be kept, Dense connection strengthens feature propagation, encourages feature multiplexing, reduces parameter quantity, and the adopted RDB module removes a BN layer in Dense Net, thereby greatly reducing the generation of artifacts;

(3) different from the existing network for image super-resolution reconstruction which adopts a single up-sampling mode, such as only bilinear, transposed convolution and sub-pixel convolution, the method alternately uses the up-sampling mode of a nearest neighbor interpolation method and sub-pixel convolution, the nearest neighbor interpolation method realizes space transformation, and the sub-pixel convolution realizes depth-to-space transformation. This alternating implementation helps to promote information interaction between space and depth, while using sub-pixel convolution helps to reduce parameter and temporal complexity;

(4) the invention adopts the adjusted perception loss, content loss and confrontation loss as the target function of the confrontation network generated by training, so that the recovered image contains more high-frequency details of the original image and looks more real and natural.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a network module according to the present invention;

FIG. 3 is a schematic structural diagram of a determination module according to the present invention;

FIG. 4 is a comparison graph of the reconstructed image of the present invention with an original image, a Bicubic algorithm reconstructed image, and an SRCNN algorithm reconstructed image.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.

Examples

A single-image super-resolution reconstruction method based on a generation countermeasure network, as shown in FIG. 1, includes the following steps:

s1: the method comprises the steps of establishing an image database, wherein the image database comprises a plurality of high-definition-low-definition image pairs, the high-definition-low-definition image pairs comprise an original high-definition image and a low-resolution image obtained by down-sampling the original high-definition image, and the high-definition-low-definition image pairs in the image database are divided into a training set, a verification set and a test set.

In this embodiment, a plurality of sets of corresponding images with different resolutions are established as an image database, and in this embodiment, a DIV2K data set is used, where a training set includes 800 high-definition images, a verification set and a test set each include 100 high-definition images, and each high-definition image is subjected to downsampling processing to obtain a high-definition-low-definition image pair for generating confrontation network training. To expand the data, the original data is randomly flipped and rotated by 90 ° to enrich the sample.

S2: and constructing a generation countermeasure network, wherein the generation countermeasure network comprises a generation network module, a judgment module and a loss calculation module.

Specifically, as shown in fig. 2, the generation network module includes a shallow feature extraction network, a residual dense block, a dense feature fusion unit, and an upsampling network, which are connected in sequence.

Further, the shallow feature extraction network comprises two 3x3 shallow convolutional layers, and shallow feature extraction is performed on the low-resolution image by using the two 3x3 shallow convolutional layers. The feature extracted by the first shallow convolution is used for further shallow feature extraction and global residual learning, and may be represented as:

F_-1＝H_Conv1(I_LR)

the features extracted by the second shallow convolution are used as the input of the RDB module, and can be expressed as:

F₀＝H_Conv2(F_-1)

wherein H_Conv1、H_Conv2Representing the convolution of a first layer and a second layer in the shallow layer feature extraction process; f_-1，F₀Respectively representing the results, I, output after two layers of convolution_LRIs a low resolution image.

Further, the residual dense block includes a plurality of RDB blocks, which are processed by dividing the previous block into a plurality of blocksThe state of one RDB module is transferred to each layer of the current RDB module to realize a continuous memory mechanism, and the output F of the ith RDB module_iIs composed of

F_i＝H_RDB,i(F_i-1)＝H_RDB,i(H_RDB,i-1(…H_RDB,1(F₀)…))

Wherein H_RDB,iRepresents the ith RDB module, the input of which is the calculation result of the first i-1 RDB modules, F_i，F_i-1Are the results of the calculations inside each corresponding RDB block.

Each of the RDB modules includes 6 3 × 3 convolutional layers, each of which is activated using a leakage ReLU function. In this embodiment, the generation network uses 36 RDB modules in common.

All layered features of the original low resolution image are fully utilized by the RDB module, and the residual dense block contains dense connected layers and Local Feature Fusion (LFF) with Local Residual Learning (LRL). The residual dense blocks used also support continuous memory between the residual dense blocks. The output of one residual dense block can directly access the layers of the next residual dense block, thereby allowing state to be passed on continuously. Residual dense blocks each convolutional layer has access to all subsequent layers, passing information that needs to be preserved. And connecting the states of all the front layers of the previous residual error dense block and the current residual error dense block, and extracting local dense features by adaptively saving information through local feature fusion. After extracting the multilayer local dense features, Global Feature Fusion (GFF) is further carried out to adaptively reserve the hierarchical features in a global mode.

Further, the dense feature fusion unit comprises a Concat layer, a 1x1 convolutional layer and a 3x3 convolutional layer which are connected in sequence, functionally comprises global feature fusion and global residual learning, fully utilizes the characteristics of all the previous layers, and is represented as follows:

F_GF＝H_GFF(F_-1,F₀,F₁…F₃₆)

wherein H_GFFAs a dense feature fusion function, F_GF(F) is the output of the dense feature fusion unit_-1,F₀,F₁…F₃₆) And the generated characteristic graphs of all the RDB modules in the residual dense block are spliced on the channel to form a whole.

Specifically, the status features of the entire convolutional layer in the previous RDB module and the current RDB module are fused by using the Concat layer in a concataration mode, (F)_-1,F₀,F₁…F₃₆) Shown are the individual RDB modules, H_GFFFor the operation of 1x1 convolutional layer and 3x3 convolutional layer, 1x1 convolutional layer is used for adaptively fusing a series of features of different layers, and 3x3 convolutional layer is introduced to further extract features and then perform global residual learning, which can be expressed as:

F_DF＝F_-1+F_GF

wherein, F_DFRepresenting the input to the upsampling network.

In the up-sampling network, the size of the image is restored to be 2 times of the original size by using one-time nearest neighbor interpolation, and then the size of the image is restored to be 4 times of the original size by using a layer of trained sub-pixel convolution layer.

The reconstruction process is represented as:

I_SR＝G(I_LR)。

wherein G represents the entire generative model, I_SRRepresenting the reconstructed image.

Specifically, the structure of the determination module is shown in fig. 3, and in the present embodiment, the determination module is based on VGG 19. The discrimination module comprises 16 convolution layers, and after each convolution layer, a Leaky ReLU (alpha is 0.2) is used as an activation function. After passing through the two dense connection layers, calculating the pixel average value of each layer of feature graph finally obtained by the feature extraction module, then performing linear fusion on all obtained values, activating by a sigmoid function, and finally outputting the classification result of the input sample image by the discrimination module through a network.

The discrimination module takes a classic VGG19 network as a basic framework and can be regarded as two modules of feature extraction and linear classification. The feature extraction module comprises 16 convolution layers, and after each convolution layer, a Leaky ReLU is used as an activation function. To avoid the disappearance of the gradient, the BN layer is used after each convolution in the block except the first convolution. The judging module needs to judge whether the input sample image is true or false until the image can not be distinguished from the generator or the original high-definition data, and the generator and the judging module are considered to be optimal at the moment.

In particular, the overall loss function L in the loss calculation module_GThe formula of (1) is:

L_G＝L_per+λL_con+ηL_adv

The perceptual loss is defined on the activation layer of a pre-trained deep network (VGG19 network), where the distance between two activation features is minimal. The present invention uses features before the active layer, which will overcome two disadvantages of the original design: first, the features after activation are very sparse, especially after very deep networks, sparse activation provides weak supervision, resulting in poor performance. Second, studies have shown that using activated features also results in reconstructed intensities that are inconsistent with the real image. The loss of perception L_perThe calculation formula of (2) is as follows:

wherein, W_ijTo determine the width, H, of the feature map obtained by convolution j times before the ith largest pooling layer in the module_ijIn order to judge the height of the feature map obtained by convolution j times before the ith maximum pooling layer in the module, a is the abscissa of a certain pixel point in the current feature map, b is the ordinate of a certain pixel point in the current feature map,

The method adopts MSE (mean squared error) loss as the content loss of the model and is responsible for optimizing the square difference between corresponding pixels in the reconstructed image SR and the original high-definition image HR. The distance among pixel distributions is reduced, so that the accuracy of reconstructed image information can be effectively ensured, and a reconstructed image can obtain a higher peak signal-to-noise ratio. Content loss L_conThe calculation formula of (2) is as follows:

wherein L is_MSE(theta) represents the loss in MSE,

is input as

Represents the mapping from a low resolution image to a high resolution image,

represents the k-th original high-definition image,

The training process of the GAN network is the countermeasure process of a generator and a judging module, and after the generator outputs the SR image, the judging module judges whether the image is from the generator or from an original high-definition image. The method adopts the countermeasure loss proposed in the WGAN, uses the Wasserstein distance to measure the difference between the distribution of the generated model and the distribution of the real data, avoids the phenomenon of mode collapse of the GAN network, and ensures that the training is more stable. Against loss L_advThe calculation formula of (2) is as follows:

wherein, among others,

representing the distribution of x samples in the generated data,

representing the distribution of x samples in the real data, E_x～penaltyRepresents the distribution of x sampling in the gradient penalty, λ is the penalty term weight, D (x) is the output of the discrimination module,

to determine the gradient of the module to the input.

S3: and inputting the high-definition-low-definition image pairs in the training set and the verification set into the generation confrontation network, performing iterative training to obtain the trained generation confrontation network, and during the iterative training, when a network generation module is trained, fixing the parameters of the discrimination module for training, and when the discrimination module is trained, fixing the parameters of the generation network module for training.

Specifically, the countermeasure objective function of the generation network module and the discrimination module in the generation countermeasure network is as follows:

representing the distribution of x samples in the real data,

During the training process, recording the overall loss function L_GAnd observe that the generator parameters are fixed when they converge. And reasonably adjusting the number of learning rounds epoch and the learning rate to prevent the overfitting of the model from influencing the generalization capability of the model.

In the training process, the low-resolution image is subjected to feature extraction and reconstruction by the network generation module to obtain a reconstructed image, the original high-definition image and the reconstructed image are used as input data of the judgment module, the judgment module obtains the probability of sample classification through a final sigmoid function, and the judgment module and the network generation module are trained in turn: and fixing the parameters of the discrimination module when training the generation network module, optimizing the parameters of the generation network module, fixing the parameters of the generation network module when training the discrimination module, and optimizing the parameters of the discrimination module until the loss function is converged, thereby finishing the training.

According to the method, the countermeasure loss is improved based on the WGAN-GP theory, the input gradient of the discrimination module is punished, the optimized countermeasure loss can stably train the generated countermeasure network generated aiming at the picture, the hyper-parameters are hardly required to be adjusted, a sample with higher quality can be generated at a higher convergence speed, and stronger and more effective supervision is provided for model training.

In this embodiment, in order to objectively evaluate the quality of the reconstructed image SR, the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) are used as evaluation criteria of the image quality. PSNR is the most common and most widely used objective evaluation index for images, and is based on errors between corresponding pixel points, i.e., error-sensitive image quality evaluation. PSNR is in dB, with larger values indicating less distortion. Can be expressed as:

the SSIM measures image similarity from three aspects of brightness, contrast and structure, the value range of the SSIM is [0,1], and the larger the value is, the smaller the image distortion is, which can be specifically expressed as:

wherein, mu_X、μ_YMean values of images X and Y, respectively; sigma_X、σ_YRepresents the variance of images X and Y, respectively; sigma_XYRepresents the covariance of images X and Y; c₁、C₂、C₃Are all constants.

Record experimental data, experimental environment of this example: the system Win10, the display card Tesla P100, the deep learning framework Tensorflow2.0, and the comparison algorithms are Bicubic (Bicubic) and SRCNN (super-resolution convolutional neural network). The peak signal-to-noise ratio values and the structural similarity values of the Bicubic model, the SRCNN model and the model of the invention on the test picture are shown in the following tables.

	PSNR↑	SSIM↑
			Bicubic	23.98	0.5387
SRCNN	24.37	0.5508
			The invention	25.46	0.5673

The reconstruction result is shown in fig. 4, where (a) is the original high definition image HR; (b) reconstructing a result for the Bicubic algorithm; (c) reconstructing a result for the SRCNN algorithm; (d) the result is reconstructed for the algorithm of the invention.

The PSNR and SSIM results on the test picture are superior to Bicubic and SRCNN algorithms.

The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the technical spirit of the present invention.

Claims

1. A single-image super-resolution reconstruction method based on a generation countermeasure network is characterized by comprising the following steps:

s4: and inputting the original high-definition images in the test set into the trained generation countermeasure network, and outputting reconstructed images.

2. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 1, wherein the generative network module comprises a shallow feature extraction network, a residual dense block, a dense feature fusion unit and an upsampling network, which are connected in sequence.

3. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 2, wherein the shallow feature extraction network comprises two 3x3 shallow convolution layers, and performs shallow feature extraction on the input low-resolution image.

4. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 2, wherein the residual dense block comprises a plurality of RDB modules, and the output F of the ith RDB module_iComprises the following steps:

F_i＝H_RDB,i(F_i-1)＝H_RDB,i(H_RDB,i-1(…H_RDB,1(F₀)…))

wherein H_RDB,iRepresents the ith RDB module, the input of which is the calculation result of the first i-1 RDB modules, F₀The output results in the network are extracted for the shallow features.

5. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 2, wherein the dense feature fusion unit comprises a Concat layer, a 1x1 convolutional layer and a 3x3 convolutional layer connected in sequence, and is represented as:

F_GF＝H_GFF(F_-1,F₀,F₁…F₃₆)

6. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 2, wherein the upsampling network first uses a nearest neighbor interpolation to restore the image size to 2 times of the original image size, and then uses a trained sub-pixel convolution layer to restore the image size to 4 times of the original image size.

7. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 1, wherein the overall loss function L in the loss calculation module_GThe formula of (1) is:

L_G＝L_per+λL_con+ηL_adv

8. The single-image super-resolution reconstruction method based on the generative countermeasure network of claim 7, wherein the perceptual loss L is_perThe calculation formula of (2) is as follows:

wherein, W_ijTo determine the width, H, of the feature map obtained by convolution j times before the ith largest pooling layer in the module_ijIs a feature map obtained by convolution j times before the ith maximum pooling layer in the moduleA is the abscissa of a certain pixel point in the current feature map, b is the ordinate of a certain pixel point in the current feature map,

9. The single-image super-resolution reconstruction method based on generative countermeasure network as claimed in claim 7, wherein the content loss L is_conThe calculation formula of (2) is as follows:

wherein L is_MSE(theta) represents the loss in MSE,

is input as

Contains a hyper-parameter theta of the mapping function,

represents the k-th original high-definition image,

representing the k-th low resolution image and N representing the number of training samples.

10. The single-image super-resolution reconstruction method based on generation countermeasure network of claim 7, wherein the countermeasure loss L is_advThe calculation formula of (2) is as follows:

wherein,

representing the distribution of x samples in the generated data,

to determine the gradient of the module to the input.