CN111161150B

CN111161150B - Image super-resolution reconstruction method based on multi-scale attention cascade network

Info

Publication number: CN111161150B
Application number: CN201911392155.3A
Authority: CN
Inventors: 付利华; 李宗刚; 张博; 陈辉; 赵茹
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2023-06-23
Anticipated expiration: 2039-12-30
Also published as: CN111161150A

Abstract

The invention provides an image super-resolution reconstruction method based on a multi-scale attention cascade network, which comprises the steps of firstly, extracting shallow layer features of a low-resolution image by using convolution operation; then, inputting the shallow features into a feature extraction subnet to obtain cascading features; further, the cascade features pass through a convolution layer with a convolution kernel of 1 to obtain optimized features; inputting the optimized features into an image deep learning up-sampling module to obtain a reconstructed image

At the same time, for low resolution image I _LR Obtaining a reconstructed image by adopting a bicubic linear interpolation algorithm

Finally, the image is reconstructed

And

fusion to obtain final high-resolution reconstructed image I _SR . The method is suitable for super-resolution reconstruction of the image, and the obtained reconstructed image has high definition, more true texture and good sensory effect.

Description

Image super-resolution reconstruction method based on multi-scale attention cascade network

Technical Field

The invention belongs to the field of image restoration, relates to an image super-resolution reconstruction method, and in particular relates to an image super-resolution reconstruction method based on a multi-scale attention cascade network.

Background

Single image super resolution reconstruction (SISR) has recently received a lot of attention. In general, the purpose of SISR is to produce a visual High Resolution (HR) output from a Low Resolution (LR) input. However, the whole process is completely irreversible, as there are several solutions for the mapping between LR and HR. Therefore, a large number of image super-resolution reconstruction (SR) methods have been proposed, from early interpolation-based methods and model-based methods, to more recently depth-learning-based methods.

Interpolation-based methods are simple and fast, but cannot be applied more widely because of poor image quality. For more flexible SR methods, more advanced model-based methods and sparse matrix methods are proposed by exploiting strong image priors, such as non-local similarity, which, while flexible to produce relatively high quality HR images, still have some drawbacks: 1) Such methods often involve a time-consuming optimization process; 2) Reconstruction performance may rapidly degrade when image statistics are biased from images.

Convolutional Neural Networks (CNNs) have now been shown to provide significant performance in SISR problems. However, the conventional SR model has the following problems: 1) The characteristics are not utilized enough: most of the methods blindly increase the depth of the network to improve the performance of the network, but ignore the image feature characteristics of the LR. As the depth of the network increases, the information gradually disappears during transmission. How to fully utilize these features is critical to the network in reconstructing high quality images. 2) SR image detail loss: using an interpolation-amplified LR image as input will increase computational complexity while not favoring the learning of final image details. Therefore, recent methods are more focused on magnifying LR images. However, the effect of the SR image cannot be improved by merely enlarging the SR image with a single network structure.

In order to solve the problems, the invention provides a novel image super-resolution reconstruction method based on deep learning.

Disclosure of Invention

The invention aims to solve the problems that: in the existing super-resolution reconstruction method based on deep learning, most methods blindly increase the depth of a network to improve the performance of the network, but neglect the characteristics of fully utilizing an LR image; and as the depth of the network increases, the characteristic information gradually disappears in the transmission process; the LR image amplified by adopting the interpolation mode is used as the input of the network, so that the calculation complexity is increased, and the learning of the network on the image details is not facilitated. The novel super-resolution reconstruction method based on the deep learning needs to be provided, and the look and feel and the robustness of the image after super-resolution reconstruction are improved.

In order to solve the above problems, the present invention provides an image super-resolution reconstruction method based on a multi-scale attention cascade network, wherein a multi-scale attention block group with a U-shaped structure is used for extracting features of an LR image, and a combination mode of interpolation reconstruction and network reconstruction based on deep learning is adopted for performing super-resolution reconstruction on the LR image, comprising the following steps:

1) Will low resolution image I _LR As input to a multi-scale attention cascade network, pair I _LR Performing convolution operation to extract shallow layer feature F ₀ ；

2) Will shallow layer feature F ₀ Inputting a feature extraction sub-network formed by n multi-scale attention blocks, cascading the features output by each multi-scale attention block in the sub-network to obtain cascading features F _c ；

3) Will cascade feature F _c The number of parameters is reduced by a convolution layer with a convolution kernel of 1, and an optimized feature vector F is obtained _d Optimized feature vector F _d The training and the feature extraction of the data can be more effectively and intuitively carried out;

4) Feature vector F to be optimized _d In the input image deep learning up-sampling module, a reconstructed image is obtained

5) Pair I _LR Obtaining a reconstructed image by adopting an interpolation algorithm

Will->

And->

Fusion is carried out to obtain a final reconstructed image I _SR 。

As a further preferred mode, the obtaining cascade features in step 2) is specifically:

2.1 To shallow layer feature F) ₀ Inputting into a feature extraction subnet composed of n multi-scale attention blocks to obtain n features F respectively _i ，i＝1,2,3,…,n。

For the ith multi-scale attention block, the input is the feature output F of the previous multi-scale attention block _i-1 The output is characterized by F _i 。

Each multi-scale attention block consists of a U-shaped structure module, a bottleneck layer structure module and a residual error module.

2.1.1 For the U-shaped structure module in the ith multiscale attention block, the U-shaped structure module is formed by non-local mean, 3×3 convolution, 5×5 convolution, 7×7 convolution, attention mechanism, 5×5 convolution, 3×3 convolution, and non-local mean series; furthermore, a Concat layer is included between the two 3 x 3 convolutions; two 5 x 5 convolutions contain a Concat layer between them. The characteristic input being the output characteristic F of the preceding multiscale attention block _i-1 After the U-shaped structural module is processed, the characteristic F is obtained _i,0 ；

For the characteristic input F of the non-local mean value, firstly, the input characteristic F is respectively input into three parallel convolution layers to obtain three characteristics F _x ,F _y ,F _z Then, carrying out feature fusion on the three features through Concat operation, and inputting the fused features into a subsequent convolution layer to obtain a feature F _w Finally, feature F _w And adding the non-local mean value with the input characteristic F point by point to obtain the output characteristic of the non-local mean value.

For input feature F of the attention mechanism, first, feature F is input to the global pooling layer extraction channel information descriptor M _avg Then, the channel information descriptor M _avg And (3) inputting the obtained product into two subsequent convolution layers for further processing to obtain M, and finally multiplying the M with the characteristic F channel by channel to obtain the output characteristic of the attention mechanism.

2.1.2 For the bottleneck layer structure module in the ith multi-scale attention block, the bottleneck layer structure module is composed of two bottleneck layersAnd the two are connected in series. The input of the input is the output characteristic F of the U-shaped structure module in the multi-scale attention block _i,0 After the bottleneck layer structure module is processed, the characteristic F is obtained _i,2 ；

2.1.3 For the residual block in the ith multi-scale attention block, the residual block adds the output features of the previous multi-scale attention block and the output features of the bottleneck layer structure point by point. The input of which is the output characteristic F of the preceding multiscale attention block _i-1 And the output characteristics F of the bottleneck layer structure module in the multi-scale attention block _i,2 After processing by a residual error module, obtaining a characteristic F _i ；

2.2 Feature F) output for n multi-scale attention blocks _i I=1, 2,3, …, n, using the Concat join operation, gives a cascade feature F _c ：

F _c ＝Concat(F ₁ ,F ₂ ,...,F _n )

Where Concat (-) represents the operation of concatenating the features of the n multi-scale attention block outputs.

As a further preferred mode, step 3) is specifically:

3.1 To cascade feature F) _c Inputting into a convolution layer with a convolution kernel of 1, reducing the number of parameters, and obtaining an optimized characteristic F _d :

F _d ＝Conv _1×1 (F _c )

Wherein Conv _1×1 (. Cndot.) represents a convolution operation with a convolution kernel of 1.

As a further preferred mode, step 4) the obtaining of the reconstructed image of the image deep learning upsampling module

The image deep learning up-sampling module consists of a convolution layer with a convolution kernel of 3 and a sub-pixel convolution layer. Optimized feature F _d Obtaining a reconstructed image through an image deep learning up-sampling module

The specific process of (2) is as follows:

4.1 Using a convolution layer with a convolution kernel of 3 for the optimized feature F _d Rearranging to obtain feature F _e ：

F _e ＝Conv _3×3 (F _d )

Wherein Conv _3×3 (. Cndot.) represents a convolution operation with a convolution kernel of 3.

4.2 To rearranged features F) _e Is input into a sub-pixel convolution layer, amplified to a corresponding scale, and a reconstructed image is obtained

Wherein H is _Sp (. Cndot.) represents a subpixel convolution operation.

As a further preferred mode, step 5) said obtaining a final reconstructed image I _SR The method comprises the following specific steps of:

5.1 For low resolution image I) _LR Obtaining an interpolated reconstructed image by using a bicubic linear interpolation algorithm

5.2 A reconstructed image obtained by the image deep learning up-sampling module)

And reconstructed image interpolated by bicubic linear interpolation algorithm +.>

Fusing to obtain a final reconstructed image I _SR ：

Although the super-resolution reconstruction method using the interpolation algorithm is high in reconstruction speed, redundant information is added, and the super-resolution reconstruction effect is poor; the image super-resolution reconstruction method based on the deep learning is lack of a reasonable instruction in the reconstruction process, so that part of detail information in the reconstructed image is lost. The reconstruction result of the interpolation algorithm is used as the guidance of the reconstruction process of the image super-resolution reconstruction method based on the deep learning, so that the effect of image super-resolution reconstruction can be improved, and redundant information generated by the interpolation algorithm can be removed.

The invention provides an image super-resolution reconstruction method of a multi-scale attention cascade network, which comprises the steps of firstly extracting shallow layer features of a low-resolution image by using convolution operation; then, inputting the shallow features into a feature extraction subnet to obtain cascading features; further, the cascade features pass through a convolution layer with a convolution kernel of 1 to obtain an optimized feature vector; inputting the optimized feature vector into an image deep learning up-sampling module to obtain a reconstructed image

At the same time, for low resolution image I _LR Obtaining a reconstructed image by means of an interpolation algorithm>

Finally, the reconstructed image->

And->

Fusion to obtain final high-resolution reconstructed image I _SR . By the method, the problem of low detail definition of the reconstructed image by the existing image super-resolution reconstruction method is solved, and the appearance is improved; the method also solves the problem that the existing image super-resolution reconstruction algorithm based on deep learning cannot fully extract the low-resolution image features. The invention is suitable for super-resolution reconstruction of images and uses the inventionThe super-resolution reconstruction of the image is performed obviously, the obtained reconstructed image has high definition, more true texture and good perception effect.

Advantageous effects

Firstly, the invention adopts a group of multi-scale attention blocks to extract the characteristics of the low-resolution image, and can fully utilize the detail information of the low-resolution image; and secondly, realizing super-resolution reconstruction by adopting a mode of combining interpolation reconstruction and network reconstruction based on deep learning, and improving the reconstruction effect of the reconstructed image.

Drawings

FIG. 1 is a flow chart of an image super-resolution reconstruction method based on a multi-scale attention cascade network;

FIG. 2 is a network structure diagram of an image super-resolution method based on a multi-scale attention cascade network of the invention;

FIG. 3 is a block diagram of a multi-scale attention module of the present invention;

Detailed Description

Finally, the reconstructed image->

And->

Fusion to obtain the final productHigh resolution reconstructed image I _SR . The method is suitable for super-resolution reconstruction of the image, and the obtained high-resolution image has high definition, more real texture and good sensory effect by using the method to reconstruct the super-resolution.

As shown in fig. 1, the present invention includes the steps of:

1) Will low resolution image I _LR As input to the multiscale attention cascade network, a convolution operation is used from the low resolution image I _LR Extracting shallow layer characteristic F ₀ ：

F ₀ ＝H _sf (I _LR )

Wherein H is _sf () Representing a convolution operation.

2) Will shallow layer feature F ₀ Inputting a feature extraction subnet composed of a group of multi-scale attention blocks, cascading the features output by each multi-scale attention block in the subnet by using Cancat operation to obtain cascading features F _c ；

2.1.1 For the U-shaped structure module in the ith multiscale attention block, the module is formed by non-local mean, 3×3 convolution, 5×5 convolution, 7×7 convolution, attention mechanism, 5×5 convolution, 3×3 convolution, non-local mean series; furthermore, a Concat layer is included between the two 3 x 3 convolutions; two 5 x 5 convolutions contain a Concat layer between them. The input of which is the output characteristic F of the preceding multiscale attention block _i-1 The output of the U-shaped structural module being characteristic F _i,0 ；

Will feature F _i-1 Input to the U-shaped structure module to obtain the characteristic F _i,0 ：

F _i,0 ＝H _u (F _i-1 )

Wherein H is _u Representing the extraction of feature operations using U-shaped structural modules.

2.1.2 For the bottleneck layer structure module in the ith multi-scale attention block, the bottleneck layer structure module is formed by connecting two bottleneck layers in series. The input of the input is the output characteristic F of the U-shaped structure module in the multi-scale attention block _i,0 The output of the bottleneck layer structure module is characterized by F _i,2 ；

Each bottleneck layer structure module consists of two bottleneck layers. The feature input of the first bottleneck layer is the feature output F of the previous multi-scale attention block _i-1 The characteristic output is F _i,1 ：

F _i,1 ＝H _b (F _i-1 )

Wherein H is _b (. Cndot.) represents the first bottleneck layer operation.

Further extracting the characteristic F from the U-shaped structure module _i,0 And the output F of the first bottleneck layer _i,1 Inputting to a second bottleneck layer for detail fusion to obtain a feature F _i,2 ：

F _i,2 ＝H _c (F _i,0 ,F _i,1 )

Wherein H is _c (. Cndot.) represents a second bottleneck layer operation.

2.1.3 For the residual block in the ith multi-scale attention block, the block adds the output features of the previous multi-scale attention block and the output features of the bottleneck layer structure point by point. The input of which is the output characteristic F of the preceding multiscale attention block _i-1 And the output characteristics F of the bottleneck layer structure module in the multi-scale attention block _i,2 The output of the residual module is characteristic F _i ；

Output feature F of previous multiscale attention block _i-1 And feature F _i,2 Input to a residual error module to obtain a characteristic F _i ：

F _i ＝F _i-1 +F _i,2

2.2 Feature F) output for n multi-scale attention blocks _i I=1, 2,3, …, n, using a Concat connectionOperation to obtain cascade characteristic F _c ：

F _c ＝Concat(F ₁ ,F ₂ ,...,F _n )

F _d ＝Conv _1×1 (F _c )

F _e ＝Conv _3×3 (F _d )

Wherein H is _Sp (. Cndot.) represents a subpixel convolutional layer operation.

Will->

And->

Fusion is carried out to obtain a final reconstructed image I _SR 。

Fusing to obtain a final reconstructed image I _SR ：

Wherein I is _SR And (5) obtaining a final image super-resolution reconstruction result.

The invention has wide application in the field of image restoration, such as throwing large-size photo billboards, reducing image transmission pressure, enlarging thumbnail and the like. The present invention will be described in detail below with reference to the accompanying drawings.

1) Will low resolution image I _LR As input to a multi-scale attention cascade network, pair I _LR Performing convolution operationExtracted shallow features F ₀ ；

2) Will shallow layer feature F ₀ Inputting a feature extraction sub-network formed by a group of multi-scale attention blocks, cascading the features output by each multi-scale attention block in the sub-network to obtain cascading features F _c ；

3) Will cascade feature F _c The number of parameters is reduced by a convolution layer with a convolution kernel of 1, and an optimized feature vector F is obtained _d ；

5) For low resolution image I _LR Obtaining a reconstructed image by adopting a bicubic linear interpolation algorithm

Will->

And->

Fusion is carried out to obtain a final reconstructed image I _SR 。

The method was implemented based on the PyTorch deep learning framework under NVIDIA GeForce GTX 1080Ti and Ubuntu16.04 bit operating systems.

The invention provides an image super-resolution reconstruction method based on a multi-scale attention cascade network. The method is suitable for super-resolution reconstruction of the image, the super-resolution reconstruction is carried out by using the method, the obtained reconstructed image has high definition, more true texture and good sensory effect.

Claims

1. The image super-resolution reconstruction method based on the multi-scale attention cascade network is characterized by comprising the following steps of:

step 1) image I of low resolution _LR As a plurality ofInput of scale attention cascade network, for I _LR Performing a convolution operation to extract shallow features F ₀ ；

Step 2) shallow layer feature F ₀ Inputting a feature extraction subnet composed of n multi-scale attention blocks, cascading the features output by each multi-scale attention block in the subnet to obtain cascading features F _c The method comprises the steps of carrying out a first treatment on the surface of the Each multi-scale attention block consists of a U-shaped structure module, a bottleneck layer structure module and a residual error module;

step 3) feature F will cascade _c The optimized feature vector F is obtained by a convolution layer with a convolution kernel of 1 _d Optimized feature vector F _d The training and feature extraction of the data can be more effectively and intuitively carried out, and the method is specifically shown as follows:

F _d ＝Conv _1×1 (F _c )

wherein Conv _1×1 (g) A convolution operation representing a convolution kernel of 1;

step 4) optimizing feature vector F _d In the input image deep learning up-sampling module, a reconstructed image is obtained

Step 5) for low resolution image I _LR Obtaining a reconstructed image by adopting a bicubic linear interpolation algorithm

Will->

And->

Fusion is carried out to obtain a final reconstructed image I _SR The method is characterized by comprising the following steps:

2. the image super-resolution reconstruction method based on the multi-scale attention cascade network according to claim 1, wherein the method comprises the following steps of: the feature extraction sub-network described in the step 2 is composed of n multi-scale attention blocks, wherein the ith multi-scale attention block has the feature output F input as the previous multi-scale attention block _i-1 The output is characterized by F _i ；

For the U-shaped structural module in the ith multi-scale attention block, the input is the output characteristic F of the previous multi-scale attention block _i-1 After the U-shaped structural module is processed, the characteristic F is obtained _i,0 ；

For the bottleneck layer structure module in the ith multi-scale attention block, the bottleneck layer structure module is formed by connecting two bottleneck layers in series; output feature F of the previous multiscale attention block _i-1 Inputting a first bottleneck layer, inputting the output of the first bottleneck layer into a second bottleneck layer, and simultaneously receiving the output characteristics F of the U-shaped structure module in the multi-scale attention block by the second bottleneck layer _i,0 The output characteristic F of the bottleneck layer structure module is obtained through the process _i,2 ；

The residual modules in the ith multi-scale attention block are specifically: output feature F of the previous multi-scale attention block _i-1 And output features F of bottleneck layer structure _i,2 Adding point by point to obtain feature F _i 。

3. The image super-resolution reconstruction method based on the multi-scale attention cascade network according to claim 2, wherein the method comprises the following steps of:

the U-shaped structure module is formed by serial connection of a non-local mean value, a 3X 3 convolution, a 5X 5 convolution, a 7X 7 convolution, an attention mechanism, a 5X 5 convolution, a 3X 3 convolution and a non-local mean value; wherein a Concat layer is contained between the two 3X 3 convolutions; a Concat layer is arranged between the two 5X 5 convolution convolutions; the first 5 x 5 convolved input is the sum of the first non-local mean output and the first 3 x 3 convolved output, the 7 x 7 convolved input is the sum of the first 5 x 5 convolved output and the first non-local mean output, the second 5 x 5 convolved input is the Concat fusion of the first 5 x 5 convolved output and the attention mechanism output, and the second 3 x 3 convolved input is the Concat fusion of the first 3 x 3 convolved output and the second 5 x 5 convolved output.

4. A method for reconstructing an image super-resolution based on a multi-scale attention cascade network according to claim 3, wherein: the non-local mean value is formed by a convolution with three parallel convolution kernels of 1 and a convolution series for feature fusion;

5. A method for reconstructing an image super-resolution based on a multi-scale attention cascade network according to claim 3, wherein: the attention mechanism is formed by sequentially connecting a global pooling layer and two convolutions;

6. The image super-resolution reconstruction method based on the multi-scale attention cascade network according to claim 2, wherein the method comprises the following steps of: the bottleneck layer is formed by connecting two convolution layers in series.

7. The image super-resolution reconstruction method based on the multi-scale attention cascade network according to claim 1, wherein the method comprises the following steps of: the cascade operation described in step 2 is specifically represented as follows:

F _c ＝Concat(F ₁ ,F ₂ ,...,F _n )

wherein Concat (g) represents an operation of concatenating the characteristics of the n multi-scale attention block outputs, F _i I=1, 2,3, …, n denotes the characteristics of the n multi-scale attention block outputs.

8. The method for reconstructing an image super-resolution based on a multi-scale attention cascade network as recited in claim 1, wherein said obtaining a reconstructed image of step 4) is performed by

The specific steps of (a) are as follows:

F _e ＝Conv _3×3 (F _d )

Wherein Conv _3×3 (g) A convolution operation representing a convolution kernel of 3;

Wherein H is _Sp (g) Representing a sub-pixel convolution operation.