CN113781311A

CN113781311A - Image super-resolution reconstruction method based on generation countermeasure network

Info

Publication number: CN113781311A
Application number: CN202111178655.4A
Authority: CN
Inventors: 姬国庆; 李永
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-10-10
Filing date: 2021-10-10
Publication date: 2021-12-10

Abstract

The invention discloses an image super-resolution reconstruction method based on a generation countermeasure network, which comprises the following steps: firstly, establishing an image database; secondly, constructing a generated countermeasure network, wherein the generated countermeasure network comprises a generation network, a judgment network and a loss calculation module; inputting the high-resolution-low-resolution image pair in the training set into a generated countermeasure network for iterative training, and verifying through a verification set to finally obtain a trained generated countermeasure network; and fourthly, inputting the original low-resolution images in the test set into the trained generation countermeasure network, and outputting the reconstructed high-resolution images. The method combines a self-attention mechanism, simultaneously considers the local and global characteristics of the image, can reconstruct the image which is more real and has clear texture, and the reconstructed image has higher peak signal-to-noise ratio and structural similarity.

Description

Image super-resolution reconstruction method based on generation countermeasure network

Technical Field

The invention relates to the technical field of image super-resolution, in particular to an image super-resolution reconstruction method based on a generation countermeasure network.

Background

With the technological progress and the expansion of the application field, the requirements of people on the image quality are greatly improved. In order to meet the demand of people for high-quality images, how to improve the resolution of the images becomes a research hotspot in the field of image processing. In the last 60 s of the century, remote sensing technology began to rise. The technology is that a sensor sensitive to electromagnetic waves is used for collecting electromagnetic wave information, and a remote sensing image is obtained after the information is processed. Compared with the image obtained by a common camera, the remote sensing image has the advantages of large detection range, less ground limitation, large amount of acquired information and the like. Therefore, the remote sensing image has wide application in the fields of object matching and detection, land coverage classification, urban economic level assessment, resource exploration and the like.

However, in the remote sensing imaging process, due to factors such as long-distance imaging, atmospheric turbulence, transmission noise and motion blur, the quality and spatial resolution of the remote sensing image are relatively poor and lower than those of a natural image. Furthermore, the terrestrial objects of the remotely sensed images are often of different dimensions, resulting in the coupling of their objects and their surroundings to each other in a joint distribution of their image patterns. The most direct and effective method for obtaining the high-resolution remote sensing image is to improve the hardware condition of the imaging equipment. However, the cost for improving the resolution of the image by improving the hardware device is high, and the technical implementation difficulty is high and is not feasible.

The appearance of the deep learning algorithm brings great revolution for solving the problem of image super-resolution reconstruction. Although the existing image super-resolution reconstruction technology can be used for solving the problem of low image resolution in the remote sensing image, research is needed for how to reconstruct the high-resolution remote sensing image aiming at the problems of complex background, wide visual field, generally small target, fuzzy recovered image, loss of high-frequency texture information in the original image and the like of the remote sensing image. Therefore, it is necessary to improve the quality of the reconstructed image.

Disclosure of Invention

The invention provides an image super-resolution reconstruction method based on a generation countermeasure network, which takes the generation countermeasure network as an algorithm overall frame, introduces a self-attention mechanism, generates an image by global reference instead of a local area, better reconstructs texture details of a super-resolution image by utilizing global characteristic information, improves the reconstruction effect, and the reconstructed image has higher peak signal-to-noise ratio and structural similarity.

In order to achieve the purpose, the invention adopts the following technical scheme:

a super-resolution reconstruction method based on a generation countermeasure network comprises the following steps;

the method comprises the steps of firstly, establishing an image database, wherein the image database comprises a plurality of high-resolution-low-resolution image pairs, the high-resolution-low-resolution image pairs comprise an original high-resolution image and an original low-resolution image obtained by down-sampling the original high-resolution image, and the high-resolution-low-resolution image pairs in the image database are divided into a training set, a verification set and a test set;

secondly, constructing a generated countermeasure network, wherein the generated countermeasure network comprises a generation network, a judgment network and a loss calculation module;

inputting the high-resolution-low-resolution image pair in the training set into a generated confrontation network for iterative training, fixing the parameters of a discrimination module for training when training the generated network module, fixing the parameters of the generated network module for training when training the discrimination module, and verifying through a verification set to finally obtain the trained generated confrontation network;

and fourthly, inputting the original low-resolution images in the test set into the trained generation countermeasure network, and outputting the reconstructed high-resolution images.

As a further improvement scheme of the technical scheme: in the first step, the original high-resolution image is downsampled to obtain an original low-resolution image, the downsampling method is double cubic interpolation, and the sum of weight convolutions of 16 pixel points of the image is used as a new pixel value.

As a further improvement scheme of the technical scheme: and in the second step, the network generation module comprises a shallow feature extraction network, a self-attention residual error module, a feature fusion unit and an up-sampling network which are connected in sequence.

As a further improvement scheme of the technical scheme: in the second step, the shallow feature extraction network comprises three convolution kernels with different sizes of 3x3, 5x5 and 7x7 to perform shallow feature extraction on the input image, the image is convolved through three receptive fields with different sizes, then the convolved feature image and the original image are sent to a Concatelayer, and then sent to a 1x1 convolution layer.

As a further improvement scheme of the technical scheme: in the second step, the self-attention residual error module comprises a plurality of ARDB modules and a single ARDB module, two convolution layers and a self-attention layer are adopted, the convolution layers are followed by an example normalization layer, and the activation function uses Parametric ReLU and simultaneously adds jump connection.

As a further improvement scheme of the technical scheme: in the second step, the feature fusion unit comprises a Concat layer, a 1x1 convolutional layer and a 3x3 convolutional layer which are connected in sequence, and the dense feature fusion unit is represented as:

F_DF＝H_DFF(F₀,F₁…F₂₈)

wherein H_DFFAs a dense feature fusion function, F_DFFor the output of the dense feature fusion unit, [ F₁,…,F₂₈]For the generation of characteristic maps from the ARDB modules in the residual-of-attention block, to form a whole, spliced together on the channels, F₀And extracting the output result of the network for the shallow feature.

As a further improvement scheme of the technical scheme: in the second step, two layers of sub-pixel convolution layers in the up-sampling network restore the image size to 4 times of the original size.

As a further improvement scheme of the technical scheme: in the second step, a network integral loss function L is generated in the loss calculation module_GThe formula of (1) is:

L_G＝L_per+λL_adv

wherein LG is the overall loss function of the network, L_perFor perception of loss, L_advTo combat the loss, λ is a weighting coefficient.

As a further improvement scheme of the technical scheme: in the second step, the perceptual loss L_perThe calculation formula of (2) is as follows:

wherein, W_ijAnd H_ijRepresenting the dimensions of the feature map in the VGG network,

represents a feature map, I, obtained after the jth convolution (before activation) and before the ith maximum pooling in the VGG network^HRFor high resolution images, I^LRFor low resolution images, G is the entire generated network. x and y represent the coordinates of the pixel points of the image in the width and height directions, respectively.

As a further improvement scheme of the technical scheme: in the second step, the loss L is resisted_advThe calculation formula of (2) is as follows:

wherein,

representing the distribution of x samples in the generated data,

the distribution of x sampled from the real data is shown, and D (x) is output by the discrimination module.

Compared with the prior art, the invention has the following advantages:

(1) an improved self-attention residual block is introduced to form a main body of a generating network based on a residual convolution network, a self-attention mechanism is introduced, the global features of the image are acquired in place in one step, the long-range and multi-level dependency relationship in the image is better processed by utilizing global feature information, and the distance and near details of each position in the image are coordinated and generated.

(2) Shallow layer multi-scale features are extracted by using convolution kernels with different sizes, feature fusion is carried out on the features extracted by the convolution kernels with different sizes, and original low-resolution image information is fully utilized.

(3) During training, the warsers protein distance is used for optimizing the confrontation training, so that the distance between the real distribution and the generated distribution can still be reflected under the condition that the real distribution and the generated distribution are not overlapped, a meaningful gradient can be provided, and the stability of model training is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic view of the flow structure of the present invention;

FIG. 2 is a schematic diagram of a network module according to the present invention;

FIG. 3 is a schematic structural diagram of a discrimination network module according to the present invention;

fig. 4 is a comparison graph of the image of the present invention and a Bicubic algorithm reconstructed image, an SRCNN algorithm reconstructed image, and an original high resolution image.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention. The invention is described in more detail in the following paragraphs by way of example with reference to the accompanying drawings. Advantages and features of the present invention will become apparent from the following description and from the claims. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

As shown in fig. 1, in an embodiment of the present invention, an image super-resolution reconstruction method based on a generation countermeasure network includes the following steps;

in this embodiment, a plurality of sets of corresponding images with different resolutions are established as an image database, and in this embodiment, a NWPU-resic 45 data set is used, where a training set includes 25000 high-resolution images, a verification set and a test set each include 2500 high-resolution images, and each high-resolution image is subjected to downsampling processing to obtain a high-resolution-low-resolution image pair for generating confrontation network training.

specifically, as shown in fig. 2, the network generation module includes a shallow feature extraction network, a self-attention residual block, a feature fusion unit, and an upsampling network, which are connected in sequence.

Further, the shallow feature extraction network comprises three convolution kernels with different sizes of 3x3, 5x5 and 7x7, and performs shallow feature extraction on an input image, convolves the image through three receptive fields with different sizes, and then sends the convolved feature image and the original image into a Concatelayer and then into a 1x1 convolution layer.

Further, the self-attention residual module includes a plurality of ARDB modules, and a continuous memory mechanism is implemented by passing the state of the previous ARDB module to each layer of the current ARDB module.

Each of the ARDB modules employs two 3 × 3 convolutional layers followed by an instance normalization layer and a self-attention layer, and the activation function employs parametricalrlu while adding jump connection, in this embodiment, the generation network uses 28 ARDB modules in total.

Further, the feature fusion unit comprises a Concat layer, a 1x1 convolutional layer and a 3x3 convolutional layer which are connected in sequence, and the dense feature fusion unit is represented as:

F_DF＝H_DFF(F₀,F₁…F₂₈)

Specifically, the status features, H, of the entire convolutional layer in the previous and current ARDB modules are fused in a collocation manner by using a collocation layer_DFFFor the operation of 1x1 convolutional layer and 3x3 convolutional layer, 1x1 convolutional layer is used for adaptively fusing a series of features of different layers, and 3x3 convolutional layer is introduced to further extract features and then global residual learning is carried out.

Further, the two sub-pixel convolution layers in the upsampling network restore the image size to 4 times the original size.

The reconstruction process is represented as:

I_SR＝G(I_LR)

wherein G represents the entire generative model, I_SRRepresenting the reconstructed image, I_LRRepresenting the input low resolution image.

Specifically, as shown in fig. 3, in this embodiment, a classic VGG-19 architecture is used as a basis, the discriminant network includes 8 convolutional layers, the number of features is increased from 64 to 512 as the number of network layers is deepened by using LeakyRelu as an activation function, and the feature size is continuously reduced.

Specifically, a network overall loss function L is generated in the loss calculation module_GThe formula of (1) is:

L_G＝L_per+λL_adv

In this embodiment, in order to generate a picture with more accurate brightness and realistic texture, the perceptual loss L based on the VGG network is set_perUsing the feature layer information before activation, rather than after activation, to make the calculation, L_perThe distance between two activated features is minimized by defining on the activation layer of the pre-trained deep network. The perceptual loss L_perThe calculation formula of (2) is as follows:

represents a feature map, I, obtained after the jth convolution (before activation) and before the ith maximum pooling in the VGG network^HRFor high resolution images, I^LRFor low resolution images, G is the entire generated network.

Based on the countermeasure game mechanism of the generation network and the discrimination network, the probability that the discrimination network is used for giving the true or false of the image is required after the generation network outputs an HR. In order to maximize the probability of the reconstructed image being screened by the discriminator, the invention adopts the countermeasure loss provided in the WGAN to replace the countermeasure loss of the original GAN model, can prevent the mode from collapsing and guide the training process, and the countermeasure loss L_advThe calculation formula of (2) is as follows:

wherein,

representing the distribution of x samples in the generated data,

specifically, a low-resolution image is input into a generation network to output a reconstructed image, the reconstructed image and an original high-resolution image are input into a discrimination network, alternate training is performed alternately to generate the network and the discrimination network, when a network module is generated by training, parameters of a fixed discrimination module are trained, when the discrimination module is trained, parameters of the fixed generation network module are trained, in order to enable the model to be better trained, a better reconstruction effect is finally obtained, and specific experimental hyper-parameters set in the embodiment are as follows: the learning rates of the generator and the discriminator are both 0.0001, the iteration times are 300K times, the learning rate is attenuated after 50K times, the blocksize is 32, and the RMSProp is selected by the optimization algorithm.

In the embodiment, in order to objectively evaluate the quality of the reconstructed image, the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) which are most commonly used in image super-resolution reconstruction are used for comparing and evaluating the reconstructed image.

The PSNR is essentially to calculate the error between pixel points of an original image and a reconstructed image so as to measure the similarity between the original image and the reconstructed image. The PSNR calculation method comprises the following steps:

wherein n is the bit number of each sampling value, the unit of the peak signal-to-noise ratio is dB, and the larger the value is, the smaller the error between the pixel points of the original image and the reconstructed image is, and the better the reconstruction effect is. MSE represents the mean square error, and for an image of size M × N, the MSE calculation formula is:

wherein, IHR_ijAnd ISR_ijThe pixel coordinates in the original image and the reconstructed image are the pixel values at (i, j), respectively.

The SSIM represents the similarity between the processed image structure and the original image structure, the numerical value is closer to 1, the better the generated result image is, and the SSIM calculation formula is as follows:

wherein, mu_xIs the average value of x, μ_yIs the average value of y and is,

is the variance of x and is,

is the variance of y, σ_xyIs the covariance of x and y, x₁＝(k₁L)²，c₁＝(k₂L)²Is a constant used to maintain stability. L is the dynamic range of the pixel, k₁And k₂Are constants of 0.01 and 0.03, respectively.

Record experimental data, experimental environment of this example: the system Win10, the graphics card RTX2060, the deep learning framework PyTorch, and the comparison algorithm are Bicubic (Bicubic) and SRCNN (super resolution convolutional neural network). The peak signal-to-noise ratio and the structural similarity of the Bicubic model, the SRCNN model and the model of the invention on a test picture are shown in the following table:

the evaluation results of the invention on PSNR and SSIM are higher than that of Bicubic algorithm and SRCNN algorithm. The specific reconstruction result in this example is shown in fig. 4, where (a) is an original high-resolution image, (b) is a Bicubic algorithm reconstructed image, (c) is an SRCNN algorithm reconstructed image, and (d) is a reconstructed image according to the present invention.

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner; the present invention may be readily implemented by those of ordinary skill in the art as illustrated in the accompanying drawings and described above; however, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims; meanwhile, any changes, modifications, and evolutions of the equivalent changes of the above embodiments according to the actual techniques of the present invention are still within the protection scope of the technical solution of the present invention.

Claims

1. An image super-resolution reconstruction method based on a generation countermeasure network is characterized by comprising the following steps;

2. The image super-resolution reconstruction method based on the generative countermeasure network of claim 1, wherein in the first step, the original high-resolution image is downsampled to obtain the original low-resolution image, the downsampling method is double cubic interpolation, and the new pixel value is obtained by using the summation of the weighted convolutions of 16 pixel points of the image.

3. The image super-resolution reconstruction method based on the generation countermeasure network of claim 1, wherein the network generation module in the second step comprises a shallow feature extraction network, a self-attention residual module, a feature fusion unit and an up-sampling network which are connected in sequence.

4. The image super-resolution reconstruction method based on the generative countermeasure network of claim 2, wherein the shallow feature extraction network comprises three convolution kernels of different sizes 3x3, 5x5 and 7x7 to perform shallow feature extraction on the input image, convolve the image by three receptive fields of different sizes, and then send the convolved feature map and the original image into a Concatayer and then into a convolution layer of 1x 1.

5. The super-resolution image reconstruction method based on the generative confrontation network as claimed in claim 2, wherein the self-attention residual module comprises a plurality of ARDB modules, a single ARDB module, two convolutional layers and a self-attention layer, the convolutional layers are followed by an instance normalization layer, the activation function is Parametric ReLU, and jump connection is added.

6. The image super-resolution reconstruction method based on the generative countermeasure network of claim 2, wherein the feature fusion unit comprises a Concat layer, a 1x1 convolutional layer and a 3x3 convolutional layer connected in sequence, and the dense feature fusion unit is represented as:

F_DF＝H_DFF(F₀,F₁…F₂₈)

7. The image super-resolution reconstruction method based on the generative countermeasure network as claimed in claim 2, wherein the two sub-pixel convolution layers in the up-sampling network restore the image size to 4 times the original size.

8. The image super-resolution reconstruction method based on the generative countermeasure network as claimed in claim 1, wherein the loss calculation module generates a network overall loss function L_GThe formula of (1) is:

L_G＝L_per+λL_adv

9. The image super-resolution reconstruction method based on generation countermeasure network of claim 8, wherein the perceptual loss L is_perThe calculation formula of (2) is as follows:

represents a feature map, I, obtained after the jth convolution and before the ith maximum pooling in the VGG network^HRFor high resolution images, I^LRG is the entire generation network for low resolution images; x and y represent the coordinates of the pixel points of the image in the width and height directions, respectively.

10. The image super-resolution reconstruction method based on generation countermeasure network of claim 8, wherein the countermeasure loss L is_advThe calculation formula of (2) is as follows:

wherein,

representing the distribution of x samples in the generated data,