CN111861886A

CN111861886A - Image super-resolution reconstruction method based on multi-scale feedback network

Info

Publication number: CN111861886A
Application number: CN202010682515.XA
Authority: CN
Inventors: 陈晓; 孙超文
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2020-10-30
Anticipated expiration: 2040-07-15
Also published as: CN111861886B

Abstract

The invention relates to an image super-resolution reconstruction method based on a multi-scale feedback network, which comprises the following steps: (1) establishing an image data set; (2) extracting the features of an input image, then using a multi-scale upper projection unit and a multi-scale lower projection unit to recursively realize low-resolution and high-resolution feature mapping to obtain high-resolution feature maps with different depths, then using convolution calculation to obtain a residual image for the high-resolution feature map, and finally interpolating the low-resolution image and adding the residual image to obtain an output image; (3) training a multi-scale feedback network by using a data set to generate a trained network model; (4) and inputting the low-resolution image to be processed into the trained network to obtain an output high-resolution image. The method can train the networks at different depths and expand the networks to other amplification factors through small parameter adjustment, saves training cost, can realize amplification of larger factor, and improves the peak signal-to-noise ratio and the structural similarity of the reconstructed image.

Description

Image super-resolution reconstruction method based on multi-scale feedback network

Technical Field

The invention relates to an image super-resolution reconstruction method based on a multi-scale feedback network, and belongs to the fields of computer vision and deep learning.

Background

The Super-resolution (SR) reconstruction technique is an important image processing technique in the field of computer vision, and is widely applied to the fields of medical imaging, security monitoring, remote sensing image quality improvement, image compression and target detection. The image super-Resolution reconstruction aims to establish a proper model to convert a Low Resolution (LR) image into a corresponding High Resolution (HR) image. Since a given LR image input corresponds to multiple possible HR images, the SR reconstruction problem is a challenging ill-conditioned inverse problem.

Currently, the proposed SR reconstruction methods are mainly classified into three major categories, which are interpolation-based methods, reconstruction-based methods, and learning-based methods, respectively. Among them, the SR method based on deep learning has attracted much attention in recent years with its superior reconstruction performance. SRCNN is used as the mountain-opening work in the field of deep learning technology SR, and fully shows the superiority of the convolutional neural network. Therefore, many networks propose a series of SR methods based on convolutional neural networks based on the SRCNN architecture. Depth is an important factor to provide a network with a larger receptive field and more context information, however, increasing depth is very likely to cause two problems: gradient vanishing/explosion and a number of network parameters.

In order to solve the gradient problem, researchers propose residual error learning, successfully train deeper networks, and some networks introduce dense connections to alleviate the gradient disappearance problem and encourage feature reuse; to reduce the parameters, researchers have proposed recursive learning to help weight sharing. Thanks to these mechanisms, many networks tend to construct deeper and more complex network structures to obtain higher evaluation indexes, however, many networks have the following problems through research and discovery:

the first and many SR methods achieve high performance of a deep network, but neglect the training difficulty of the network, resulting in the need to spend a huge training set and invest more training skills and time.

The second, most SR methods learn the hierarchical feature representation directly from the LR input and map to the output space in a feed-forward manner, such a one-way mapping relying on finite features in the LR image. And many feedforward networks which need preprocessing operation only adapt to single amplification factor, and the complicated operation for shifting to other amplification factors is extremely lack of flexibility.

Disclosure of Invention

The invention provides an image super-resolution reconstruction method based on a multi-scale feedback network, aiming at solving the problems in the prior art. The method is characterized by comprising the following steps:

Firstly, establishing a data set by using an image degradation model;

constructing a multi-scale feedback network, wherein the multi-scale feedback network comprises an image feature extraction module, an image feature mapping module and a high-resolution image calculation module;

step 2.1, extracting image characteristics;

LR image I to be inputted from network_LRInput feature extraction module f₀Generating an initial LR profile L⁰；

L⁰＝f₀(I_LR)

Let conv (f, n) denote the convolution layer, f is the convolution kernel size, and n is the number of channels; in the above formula f₀Composed of 2 convolution layers conv (3, n)₀) And conv (1, n), wherein n0 represents the number of channels of the initial low-resolution feature extraction layer, and n represents the number of input channels in the feature mapping module; first using conv (3, n)₀) Generating shallow features L with low resolution image information from input⁰Then, the number of channels is increased from n by using conv (1, n)₀Reducing to n;

step 2.2, image feature mapping;

low resolution feature map L^g-1Input recursive feedback module to generate high resolution feature map H^g：

Wherein G represents the number of multi-scale projection groups, i.e. the number of recursions;

representing the feature mapping process for the set of multi-scale projections in the g-th recursion. When g is equal to 1, the initial feature map L is represented⁰As an input to the first multi-scale projection group, when g is greater than 1, it indicates the LR feature map L to be produced by the previous multi-scale projection group ^g-1As a current input;

step 2.3, calculating a high-resolution image;

computing a residual image by cascading a plurality of HR feature map depths according to the following formula:

I^Res＝f_RM([H¹,H²,…,H^g])

wherein [ H ]¹,H²,…,H^g]Representing a deep cascade of multiple HR feature maps, f_RMDenotes the conv (3,3) operation, I^ResIs a residual image.

Interpolating LR image to obtain image and residual image I^ResAdding to obtain a reconstructed high-resolution image I^SR；

I^SR＝I^Res+f_US(I_LR)

Wherein f is_USIndicating an interpolation operation.

Step three, training a multi-scale feedback network;

and step four, reconstructing an image.

The technical scheme is further designed as follows: the process of establishing the data set by using the image degradation model in the step one is that I is given_LRRepresenting an LR image, I_HRRepresenting the corresponding HR image, the degradation process is represented as:

I_LR＝D(I_HR；)

modeling a degradation map that generates an LR image from an HR image, and modeling the degradation as a single downsampling operation:

therein ↓_sThe down-sampling operation is performed at a magnification s, which is a scale factor.

The interpolation algorithm is a bilinear interpolation algorithm or a bicubic interpolation algorithm.

The loss function for training the multi-scale feedback network in the third step is as follows:

wherein x is a set of weight parameters and bias parameters, i represents a sequence number of iterative training in the whole training process, and m represents the number of training images.

The invention has the beneficial effects that:

the modularized end-to-end system structure can flexibly train networks with different depths and arbitrarily expand the networks to other amplification factors only through small parameter adjustment, greatly saves the training cost, can successfully realize amplification with a larger factor (8 times), and improves the peak signal-to-noise ratio and the structural similarity of a reconstructed image. The method can also relieve the influence of ringing effect and chessboard artifact based on a convolutional neural network method, predict more high-frequency details and inhibit smooth components, so that the reconstructed image has clearer and sharper edge characteristics and is closer to a real high-resolution image.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a block diagram of a multi-scale feedback network;

FIG. 3 is a block diagram of a projection unit on multiple scales in a network;

fig. 4 is a block diagram of a projection unit in a multi-scale in a network.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments.

Examples

As shown in fig. 1, the image super-resolution reconstruction method based on the multi-scale feedback network of the present embodiment includes the following steps:

step 1, establishing a data set by using an image degradation model;

Let I_LRRepresenting an LR image, I_HRRepresenting the corresponding HR image, the degradation process is represented as:

I_LR＝D(I_HR；) (1)

This embodiment uses bicubic interpolation with antialiasing as a downsampling operation, taking m training images in DIV2K as a training set. Set5, Set14, Urban100, BSD100 and Manga109 were chosen as standard test sets and were down-sampled 2, 3, 4 and 8 times using a bicubic interpolation algorithm, respectively.

Step 2, constructing a multi-scale feedback network; the network structure is shown in fig. 2, and includes the following steps:

step 2.1, extracting image features;

initial LR image I_LRInput feature extraction module f₀Generating an initial LR profile L⁰：

L⁰＝f₀(I_LR) (3)

Let conv (f, n) denote the convolution layer, f the convolution kernel size and n the number of channels. Wherein f is₀Composed of 2 convolution layers conv (3, n)₀) And conv (1, n), wherein n0 represents the number of channels of the initial LR image feature extraction layer, and n represents the number of input channels in the feature mapping module. First using conv (3, n)₀) Generating shallow features L with LR image information from input ⁰Then, the number of channels is increased from n by using conv (1, n)₀Is reduced to n.

Step 2.2, image feature mapping;

using multi-scale up-projection unitsAnd forming a projection group by the projection units under multiple scales to realize low-resolution and high-resolution feature mapping in a recursion manner, so as to obtain high-resolution feature maps at different depths. Low resolution feature map L^g-1Input recursive feedback module to generate high resolution feature map H^g：

representing the feature mapping process for the set of multi-scale projections in the g-th recursion. When g is equal to 1, the initial feature map L is represented⁰As an input to the first multi-scale projection group, when g is greater than 1, it indicates the LR feature map L to be produced by the previous multi-scale projection group^g-1As the current input.

The operations include two operations, mapping the LR characteristic to the HR characteristic and the HR characteristic to the LR characteristic, the structures of which are shown in fig. 3 and 4.

The multi-scale up-projection unit maps the LR feature to the HR feature (the structure is shown in fig. 3) by the following six steps:

(1): LR characteristic map L calculated from previous cycle^g-1As input, deconvolution with different kernel sizes are used

And

performing an upsampling operation on the two branches to obtain two HR feature maps

And

and

respectively, Deconv1 (k)₁N) and Deconv2 (k)₂,n)，k₁And k₂Denotes the size of the deconvolution kernel, and n denotes the number of channels.

(2): mapping HR characteristics

And

cascading, using convolutions of different kernel sizes

And

performing a downsampling operation on two branches and generating two LR profiles

And

and

respectively represent Conv1 (k)₁2n) and Conv2 (k)₂2n), the number of channels of each branch is changed from n to 2 n.

(3): map of LR characteristics

And

cascading, pooling and dimensionality reduction using a 1 x 1 convolution,

and

mapping to an LR profile

C_uConv (1, n) is shown, and the number of channels per branch is changed from 2n to n. And all 1 x 1 convolutions add non-linear excitation to the learned representation of the previous layer.

(4): computing an input LR profile L^g-1And reconstructed LR feature maps

Residual error between

(5): deconvolution with different kernel sizes

And

respectively for residual errors

An upsampling operation is performed, and the residual error in the LR characteristic is mapped into the HR characteristic, so that a new HR residual error characteristic is generated

And

and

respectively, denote the deconvolution layer Deconv1 (k)₁N) and Deconv2 (k)₂N), the number of channels of each branch is still n.

(6): characterizing residual HR

And

cascading, overlapping with the cascaded HR characteristics in the step (2), and outputting the final HR characteristic diagram H of the upper projection unit through 1 × 1 convolution ^g。

C_hConv (1, n) is shown, the total number of channels added is 2n, and the number of output channels is reduced to n by Conv (1, n), and is kept the same as the number of input channels.

The multi-scale projection unit maps the HR feature to the LR feature (the structure is shown in fig. 4) by the following six steps:

step (1): the HR characteristic diagram H output by the projection unit on the multi-scale of the previous cycle^gAs input, convolutions with different kernel sizes are used

And

performing downsampling operation on the two branches to obtain two LR characteristic maps

And

and

respectively represent Conv1 (k)₁N) and Conv2 (k)₂,n)。

Step (2): map of LR characteristics

And

cascading, using deconvolution of different kernel sizes

And

performing an upsampling operation on two branches and generating two HR profiles

And

and

respectively, Deconv1 (k)₁2n) and Deconv2 (k)₂2n), the number of channels of each branch is changed from n to 2 n.

And (3): mapping HR characteristics

And

the processes of the cascade connection are carried out,and obtaining HR characteristic diagram by 1 multiplied by 1 convolution

C_dConv (1, n) is shown, and the number of channels per branch is changed from 2n to n.

And (4): computing an input HR profile H^gAnd a reconstructed HR profile

Residual error between

And (5): convolution with different kernel sizes

And

respectively for residual errors

A downsampling operation is performed, and the residual error in the HR characteristic is mapped into the LR characteristic, so as to generate a new LR residual error characteristic

And

and

respectively, the convolutional layers Conv1 (k)₁N) and Conv2 (k)₂N), the number of channels of each branch is still n.

And (6): LR characterization of residual errors

And

cascading, overlapping with the LR characteristics cascaded in the step 2, and outputting the final LR characteristic graph L of the lower projection unit through 1 × 1 convolution^g。

C_lConv (1, n) is shown, the total number of channels added is 2n, and Conv (1, n) reduces the number of output channels to n, keeping the same as the number of input channels.

Step 2.3, calculating a high-resolution image;

calculating a residual image by the following formula through depth cascading of the multiple high-resolution feature maps;

I^Res＝f_RM([H¹,H²,…,H^g]) (23)

wherein [ H ]¹,H²,…,H^g]Representing a deep cascade of multiple HR feature maps, f_RMRepresenting the conv (3,3) operation, a series of cascaded HR feature maps are input into conv (3,3) to generate a residual image I^Res。

Interpolating the low resolution image to obtain an image and a residual image I^ResAdding to generate a reconstructed high resolution image I^SR：

I^SR＝I^Res+f_US(I_LR) (24)

Wherein f is_USThe interpolation upsampling operation is represented by a bilinear interpolation algorithm, and a bicubic interpolation algorithm or other interpolation algorithms can also be used.

Step 3, training a multi-scale feedback network;

the batch of the network is set to 16 and data enhancement is performed using rotation and flipping. LR images and corresponding HR images of different sizes are input according to the magnification factor. Adam was used to optimize the network parameters, the momentum factor was 0.9, and the weight was attenuated by 0.0001. The initial learning rate value was set to 0.0001 and the learning rate was attenuated to half of the original per 200 iterations.

Different kernel sizes and fills are designed in each branch of the multi-scale projection unit and the kernel sizes and step sizes are adjusted according to the corresponding magnification. Both input and output use the RGB channels of the color image. The PReLU is used as an activation function behind all convolutional and deconvolution layers, except the reconstruction layer at the end of the network. And (3) training the network by using the image data set in the step (1) according to the process in the step (2) until the cost loss is reduced to a set value and the training reaches the maximum iteration times. By means of L₁The function is a loss function, and the expression is as follows:

Step 4, reconstructing an image;

and inputting the low-resolution image to be processed into the trained network to obtain an output high-resolution image.

The peak signal-to-noise ratio and the structural similarity are used as evaluation indexes to evaluate the model performance in 5 standard test sets of Set5, Set14, Urban100, BSD100 and Manga109, and all tests adopt y channels.

In order to verify the effectiveness and reliability of the method, comparison is carried out with a plurality of existing reconstruction methods on different magnification factors. The method was compared to the currently available 21 advanced methods at low magnification (× 2, × 3, × 4). Since many models are not suitable for high magnification (x 8), the method is compared to 12 advanced methods. For x 2 amplification, the method achieves the best peak signal-to-noise results in the five reference data sets. However, the peak signal-to-noise ratio and structural similarity of the method are superior to all other models for x 3, x 4 and x 8 amplification. The advantages are relatively more pronounced with increasing amplification factor, especially for x 8, demonstrating the effectiveness of the method in dealing with high magnification. In the five data sets, the method has higher objective evaluation indexes in the aspects of peak signal-to-noise ratio and structural similarity. It is demonstrated that the method is not only prone to the construction of regular artificial patterns, but is also good at reconstructing irregular natural patterns. The method has advantages in adapting to various scene characteristics and has surprising super-resolution reconstruction results for images with different characteristics.

The multi-scale feedback network method designed by the present embodiment uses only m (800) training images from DIV2K, and can still achieve superior reconstruction performance at 8 times magnification by a relatively small training set compared to other existing methods. By combining the multi-scale convolution with a feedback mechanism, the method can learn rich hierarchical feature representation on a plurality of context scales, capture image features of different scales, and refine low-level representation by using high-level features to better represent the mutual relation between HR (high-rate) images and LR (low-rate) images. In addition to combining the high-level information and the low-level information, the local information and the global information are combined through global residual learning and local residual feedback fusion, so that the quality of a reconstructed image is improved better. In addition, the modularized end-to-end architecture enables the method to train and flexibly train networks of different depths and arbitrarily expand to other magnification factors only through small parameter adjustment. The method can effectively relieve the influence of ringing effect and chessboard artifact, has excellent reconstruction performance compared with a plurality of advanced methods at present, and particularly has more obvious advantage in high-power amplification which is not good for a plurality of methods.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should be considered as the protection scope of the present invention, including but not limited to the use of the present method and its improvements and modifications for other image processing aspects, such as image classification, detection, denoising, enhancement, etc.

Claims

1. An image super-resolution reconstruction method based on a multi-scale feedback network is characterized by comprising the following steps:

firstly, establishing an image data set by using an image degradation model;

step 2.1, extracting image characteristics;

low resolution image I of network input_LRInput feature extraction module f₀Generating an initial low resolution profile L⁰；

L⁰＝f₀(I_LR)

Let conv (f, n) denote the convolution layer, f is the convolution kernel size, and n is the number of channels; in the above formula f₀Composed of 2 convolution layers conv (3, n)₀) And conv (1, n), wherein n0 represents the number of channels of the initial low-resolution feature extraction layer, and n represents the number of input channels in the feature mapping module; first using conv (3, n) ₀) Generating shallow features L with low resolution image information from input⁰Then, the number of channels is increased from n by using conv (1, n)₀Reducing to n;

step 2.2, image feature mapping;

forming a projection group by using a multi-scale upper projection unit and a multi-scale lower projection unit to recursively realize low-resolution and high-resolution feature mapping, and obtaining high-resolution feature maps at different depths; low resolution feature map L^g-1Input recursionFeedback module generates high resolution profile H^g：

representing the feature mapping process for the set of multi-scale projections in the g-th recursion. When g is equal to 1, the initial feature map L is represented⁰As an input to the first multi-scale projection group, when g is greater than 1, it indicates the LR feature map L to be produced by the previous multi-scale projection group^g-1As a current input;

the operations include two operations of mapping an LR feature to an HR feature and mapping an HR feature to an LR feature;

step 2.3, calculating a high-resolution image;

I^Res＝f_RM([H¹,H²,…,H^g])

wherein [ H ]¹,H²,…,H^g]Depth concatenation representing multiple high resolution feature maps, f_RMDenotes the conv (3,3) operation, I^ResIs a residual image;

interpolating the low resolution image to obtain an image and a residual image I ^ResAdding to generate a reconstructed high resolution image I^SR；

I^SR＝I^Res+f_US(I_LR)

Wherein f is_USIndicating an interpolation operation.

Step three, training a multi-scale feedback network;

step four, image reconstruction;

2. The image super-resolution reconstruction method based on the multi-scale feedback network as claimed in claim 1, wherein: the process of establishing the data set by using the image degradation model in the first step is that,

given of I_LRRepresenting low resolution images, I_HRRepresenting the corresponding high resolution image, the degradation process is represented as:

I_LR＝D(I_HR；)

modeling a degradation map that generates a low resolution image from a high resolution image, and modeling the degradation as a single downsampling operation:

3. The image super-resolution reconstruction method based on the multi-scale feedback network as claimed in claim 1, wherein: in the step 2.2 of the image feature mapping process, the process of mapping the LR feature into the HR feature is as follows:

And

And

and

respectively, Deconv1 (k)₁N) and Deconv2 (k)₂,n)，k₁And k₂Representing the size of a deconvolution kernel, and n representing the number of channels;

(2): mapping HR characteristics

And

cascading, using convolutions of different kernel sizes

And

And

and

respectively represent Conv1 (k)₁2n) and Conv2 (k)₂2n), the number of channels of each branch is changed from n to 2 n;

(3): map of LR characteristics

And

cascading, pooling and dimensionality reduction using a 1 x 1 convolution,

and

mapping to an LR profile

C_uConv (1, n) is shown, and the number of channels per branch is changed from 2n to n. And all 1 x 1 convolutions add non-linear excitation on the learned representation of the previous layer;

(4): computing an input LR profile L^g-1And reconstructed LR feature maps

Residual error between

(5): deconvolution with different kernel sizes

And

respectively for residual errors

And

and

respectively, denote the deconvolution layer Deconv1 (k)₁N) and Deconv2 (k)₂N), each ofThe number of the channels of the branch is still n;

(6): characterizing residual HR

And

cascading, overlapping with the cascaded HR characteristics in the step (2), and outputting the final HR characteristic diagram H of the upper projection unit through 1 × 1 convolution ^g；

4. The image super-resolution reconstruction method based on the multi-scale feedback network as claimed in claim 1, wherein: in the step 2.2, in the image feature mapping process, the process of mapping the HR feature to the LR feature is as follows:

(1): the HR characteristic diagram H output by the projection unit on the multi-scale of the previous cycle^gAs input, convolutions with different kernel sizes are used

And

And

and

respectively represent Conv1 (k)₁N) and Conv2 (k)₂,n)；

(2): map of LR characteristics

And

cascading, using deconvolution of different kernel sizes

And

And

and

respectively, Deconv1 (k)₁2n) and Deconv2 (k)₂2n), the number of channels of each branch is changed from n to 2 n;

and (3): mapping HR characteristics

And

cascading, and obtaining HR characteristic diagram by 1 × 1 convolution

C_dConv (1, n) is represented, and the number of channels of each branch is changed from 2n to n;

and (4): computing an input HR profile H^gAnd a reconstructed HR profile

Residual error between

And (5): convolution with different kernel sizes

And

respectively for residual errors

And

and

respectively, the convolutional layers Conv1 (k)₁N) and Conv2 (k)₂N), the number of channels of each branch is still n;

(6): LR characterization of residual errors

And

cascading, overlapping with the LR characteristics cascaded in the step (2), and outputting the final LR characteristic diagram L of the lower projection unit through 1 × 1 convolution^g；

5. The image super-resolution reconstruction method based on the multi-scale feedback network as claimed in claim 3, wherein: the interpolation algorithm is a bilinear interpolation algorithm or a bicubic interpolation algorithm, and other interpolation algorithms can also be used.

6. The image super-resolution reconstruction method based on the multi-scale feedback network as claimed in claim 1, wherein: the loss function for training the multi-scale feedback network in the third step is as follows: