CN113538244B

CN113538244B - Lightweight super-resolution reconstruction method based on self-adaptive weight learning

Info

Publication number: CN113538244B
Application number: CN202110839570.XA
Authority: CN
Inventors: 程培涛; 张书豪; 张宇浩; 许威; 张岷; 苏成光
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2023-09-01
Anticipated expiration: 2041-07-23
Also published as: CN113538244A

Abstract

The application discloses a lightweight super-resolution reconstruction method based on self-adaptive weight learning, which comprises the following steps: extracting shallow features of the input image by using a feature extraction network; extracting shallow features of an image through a nonlinear mapping network containing a self-adaptive weight distribution mechanism, dynamically distributing weights of two branches by adopting a self-adaptive weight fusion branch by utilizing information extracted by the two branches of attention and non-attention, and splitting and fusing the two branches through a convolution layer to obtain deep features of the image; and extracting deep features of the image through a reconstruction network to obtain a reconstructed high-resolution image. The method greatly reduces the parameter quantity of two branches, better combines with the self-adaptive weight distribution mechanism, and reduces the parameter quantity of the network while improving the network reconstruction performance.

Description

Lightweight super-resolution reconstruction method based on self-adaptive weight learning

Technical Field

The application relates to the technical field of deep learning and computer vision, in particular to a lightweight super-resolution reconstruction method based on self-adaptive weight learning, and particularly relates to a super-resolution reconstruction technology.

Background

Image Super-Resolution reconstruction (SR) is an important research direction in the field of computer vision, aiming at restoring a corresponding High Resolution image (HR) from a given Low Resolution image (LR).

In recent years, an image super-resolution reconstruction method based on a deep convolutional neural network (Convolutional Neural Networks, CNN) achieves a good effect, and becomes one of important research directions in the field of super-resolution reconstruction. Dong et al first applied deep learning to super-resolution reconstruction in 2014, and proposed an image super-resolution reconstruction method (SRCNN) based on a deep convolutional neural network. After this, kim et al propose an ultra-deep convolutional neural network (VDSR) for image super-resolution reconstruction, which uses global jump connection deepening network, adding up-sampled low resolution images element by element to the output reconstructed image, improving network performance. Laplacian pyramid super-resolution networks (LapSRNs) proposed by Lai et al use a progressive upsampling, progressive prediction residual scheme to solve the speed and accuracy problems. Lim et al propose an enhanced deep super-resolution reconstruction network (EDSR) and a multi-scale deep super-resolution reconstruction system (MDSR), which delete the BN layer commonly used in the former method, so that the information updating range in the network is more flexible, thereby greatly improving the performance. Zhang et al references the dense connection network (DenseNet) proposed by Huang et al, and proposes a super-resolution reconstruction network (RDN) based on residual dense connection, which improves performance while reducing the number of parameters, but the dense connection network has higher time complexity, resulting in excessively long reasoning time.

The attention mechanism is one way to bias the allocation of computing resources of a device more toward the most meaningful characterization in information. In recent years, attention mechanisms have been successfully applied to deep convolution super-resolution reconstruction networks, directing the operational emphasis of the network to feature areas that contain more information. The attention mechanism is mainly divided into a channel attention mechanism and a spatial attention mechanism, wherein the representative work of the channel attention mechanism is a super-resolution reconstruction network (RCAN) based on residual attention, which is proposed by Zhang et al, and applies the channel attention to super-resolution reconstruction for the first time, and channel characteristics are adaptively adjusted through interdependence among channels, so that the reconstruction result of the RCAN exceeds an EDSR method in accuracy and visual effect, but the method only extracts first-order image characteristics, ignores higher-order image characteristics, and cannot acquire information outside a local area. To address this problem, dai et al propose a second order attention-based super-resolution reconstruction network (SAN) that adaptively refines the inter-channel features using second order feature statistics. This second-order channel attention mechanism focuses more on useful high frequency information, improving the network discrimination capability. The depth residual Non-local attention network (RNAN) proposed by Zhang et al utilizes a Non-local module to extract the spatial correlation between the whole feature maps, so as to realize a better reconstruction effect. The super-resolution reconstruction network (HAN) based on overall attention proposed by Niu et al combines two attention mechanisms to capture more useful information and learn the correlation of information between different depths, channels and locations.

The above method significantly improves the reconstruction performance, but with the continuous increase of the network parameters, the time complexity and the space complexity of the network are gradually increased, so that the methods cannot be applied to lightweight scenes such as mobile terminals. In response to this problem, a cascade residual network (CARN) proposed by Ahn et al uses a layer-by-layer, block-by-block, multi-level connection structure to efficiently transfer information, and the reconstruction performance is also greatly reduced although the parameter amount is reduced. Hui et al propose an Information Distillation Network (IDN) that aggregates current information with locally hopping connected information by means of a channel splitting strategy, thereby achieving good performance with a small amount of parameters. Later, hui et al have proposed an information multi-stage distillation network (IMDN), further improving IDN by an information fine distillation module, reusing channel splitting strategies, and thus extracting fine-grained image features. IMDNs perform well in terms of peak signal-to-noise ratio and test time, but their parameters are greater than most lightweight reconstruction networks (e.g., VDSR, IDN, memNet).

In order to further reduce the network scale, zhao et al propose a pixel level attention network (PAN) which obtains a better reconstruction result with a very small amount of parameters, but which contains a plurality of attention modules in its network structure, and requires setting severe super parameters and training strategies for the network, and at the same time, the characterization capability of the network is also reduced, and the training effect under the same data set is slightly inferior to other methods. Chen et al propose a reconstruction network based on a attention in attention mechanism that improves the reconstruction capability but increases the number of network parameters from 261K to 1063K. Therefore, the existing image super-resolution reconstruction method generally has the problem that the network model parameter is too large.

Disclosure of Invention

In order to solve the above-mentioned drawbacks in the prior art, the present application aims to provide a light-weight super-resolution reconstruction method based on adaptive weight learning, which is used for reducing the parameter quantity of a reconstruction model while ensuring to improve the image reconstruction quality.

The aim of the application is achieved by the following technical scheme.

A lightweight super-resolution reconstruction method based on adaptive weight learning comprises the following steps:

step 1, obtaining an image to be processed;

step 2, extracting shallow features of the input image by using a feature extraction network;

step 3, extracting the shallow image features through a nonlinear mapping network with an adaptive weight distribution mechanism, wherein the adaptive weight distribution mechanism utilizes information extracted by two branches of attention and inattention, adopts an adaptive weight fusion branch to dynamically distribute weights of the two branches, and simultaneously splits and fuses the two branches through a convolution layer to obtain deep image features;

and 4, extracting the deep features of the image through a reconstruction network to obtain a reconstructed high-resolution image.

Preferably, the nonlinear mapping network comprises a plurality of self-adaptive weight modules connected in series, the attention weight characteristics are continuously obtained after the image shallow characteristics sequentially pass through the self-adaptive weight modules, the self-adaptive weight modules learn the characteristic weights of each branch, the image shallow characteristics are adjusted by the characteristic weights and are output to the next self-adaptive weight module until the image shallow characteristics are adjusted by all the self-adaptive weight modules, and the image deep characteristics are obtained.

Preferably, the adaptive weighting module acquires the attention information of the feature using an attention branch; obtaining local inattention information for the feature using the inattention branches; and fusing the inattention branches and the attention branches by using the self-adaptive weight fusion branches to acquire weight fusion information.

Preferably, the attention information of the feature is acquired using an attention branch, including:

the number of channels of input information is reduced to half by using a 1X 1 convolution layer, the input information is input into a 3X 3 convolution layer added with a pixel level attention mechanism, different weights are distributed for different channels by the pixel level attention mechanism, and the final attention information is obtained by performing feature mapping by using the 3X 3 convolution layer.

Preferably, the local inattention information of the feature is acquired using an inattention branch, comprising:

the number of channels of the input information is reduced to half by using a 1×1 convolution layer, and feature mapping is performed by using only one 3×3 convolution layer, so that the final local inattention information is obtained.

Preferably, the fusing of the inattention branches and the attention branches using adaptive weight fusion branches includes:

the input information is averaged by using a global averaging pooling method to obtain global information, and the characteristic weight of each characteristic channel is learned while parameters are reduced through a full connection layer;

further learning the feature weight of each feature channel through a ReLU function and a full connection layer; after the Softmax function, an adaptive weight layer is used for respectively generating standard adaptive weights lambda for the attention information and the local non-attention information ₁ And lambda (lambda) ₂ ；

Channel reorganization is carried out on the previously obtained attention information and the local non-attention information by using 2 1 multiplied by 1 convolution layers, the obtained attention information and the local non-attention information are multiplied by corresponding self-adaptive weights respectively, and corresponding elements are added;

the final features are obtained by adding the 1 x 1 convolutional layer output to the original input information.

Preferably, the reconstruction network includes an upsampling module, a 3 x 3 convolution layer, and a bilinear interpolation connection layer.

Preferably, in the reconstruction network, the deep image features sequentially pass through two up-sampling modules and a 3×3 convolution layer which are connected in series to obtain reconstruction features, and finally, the reconstruction features are added with corresponding elements of the output features of the bilinear interpolation connection layer to obtain a final reconstructed high-resolution image.

Preferably, the bilinear interpolation connection layer directly acquires an input image, and adds the input image with the reconstruction feature through bilinear interpolation operation.

Preferably, the super-resolution reconstruction method includes the operation of a loss function employing L ₁ Loss.

Due to the adoption of the technical scheme, the application has the following beneficial effects:

the application adopts a nonlinear mapping network with a plurality of stacked self-adaptive weight modules, and each module can extract the characteristic information of different levels. Meanwhile, the application designs a self-adaptive weight distribution mechanism with low parameter number, the self-adaptive weight distribution mechanism fully utilizes the information extracted by the two branches of attention and non-attention, enhances high contribution degree information and suppresses redundant information, utilizes the self-adaptive weight fusion branch to distribute the weights of the two branches in a dynamic mode, simultaneously splits and fuses the two branches through a specific convolution layer, greatly reduces the parameter number of the attention branch and the non-attention branch, better combines with the self-adaptive weight distribution mechanism, and reduces the parameter number of the network while improving the network reconstruction performance.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and constitute a part of this specification, are incorporated in and constitute a part of this specification and do not limit the application in any way, and in which:

FIG. 1 is a flow chart of a lightweight super-resolution reconstruction network based on adaptive weight learning;

FIG. 2 is a schematic diagram of a lightweight super-resolution reconstruction network based on adaptive weight learning according to the present application;

FIG. 3 is a flow chart of the adaptive weight module architecture of the present application;

FIG. 4 is a schematic diagram of an adaptive weighting module structure according to the present application;

FIGS. 5 (a) - (f) are schematic diagrams showing the comparison of the 2-fold reconstruction of img_58060 in a BSD100 dataset by the method of the present application and other reconstruction methods; fig. 5 (a) shows an original image; FIGS. 5 (b) - (e) illustrate other methods; FIG. 5 (f) shows the method of the present application;

FIGS. 6 (a) - (f) are schematic diagrams showing the comparison of the effect of the method of the present application with other reconstruction methods after 3-fold reconstruction of img_062 in the Urban100 dataset; fig. 6 (a) shows an original image; FIGS. 6 (b) - (e) illustrate other methods; FIG. 6 (f) shows the method of the present application;

FIGS. 7 (a) - (f) are graphs showing the comparison of the effects of the method of the present application with other reconstruction methods after 4-fold reconstruction of img_093 in the Urban100 dataset; fig. 7 (a) shows an original image; FIGS. 7 (b) - (e) illustrate other methods; FIG. 7 (f) shows the method of the present application.

Detailed Description

The present application will now be described in detail with reference to the drawings and the specific embodiments thereof, wherein the exemplary embodiments and descriptions of the present application are provided for illustration of the application and are not intended to be limiting.

Referring to fig. 1, the lightweight super-resolution reconstruction method based on adaptive weight learning provided by the application comprises the following steps:

step S1, obtaining an image to be processed.

And S2, extracting the shallow image features of the input image by using a feature extraction network.

Wherein the feature extraction network extracts low resolution image data from only one 3 x 3 convolution layer, the process can be formulated as

x _shallow ＝f _shallow (I _LR ), (1)

Wherein f _shallow (. Cndot.) represents a convolution layer of convolution kernel size 3 x 3, which functions from the input low resolution image I _LR Extracting features from the Chinese herb, x _shallow Is the output of the feature extraction module.

And S3, extracting the shallow features of the image through a nonlinear mapping network containing self-adaptive weight learning, wherein a self-adaptive weight distribution mechanism utilizes information extracted by two branches of attention and non-attention, adopts a self-adaptive weight fusion branch to distribute weights of the two branches in a dynamic mode, and simultaneously splits and fuses the two branches through a convolution layer to obtain the deep features of the image.

As shown in fig. 2, a convolutional layer is arranged in the feature extraction network, the nonlinear mapping network comprises a plurality of self-adaptive weight modules (AWBs) connected in series, and the reconstruction network comprises a plurality of pixel-level attention upsampling modules; the image shallow features continuously obtain attention weight features after sequentially passing through the self-adaptive weight modules, the self-adaptive weight modules learn the feature weights of each branch, the image shallow features are adjusted by the feature weights and are output to the next self-adaptive module until the image shallow features are adjusted by all the self-adaptive weight modules, and the image deep features are obtained.

Further, the process of extracting deep features of the image can be expressed as follows

Wherein x is _n And outputting a characteristic diagram for the nth self-adaptive weight module.Showing the nth adaptive weight module, x _shallow Is the output of the feature extraction module.

Referring to fig. 3, in step 3, an image deep feature is obtained, including the steps of:

step S310, using the attention branch to acquire the attention information of the feature, referring to fig. 3 and 4, the steps are as follows:

the number of channels of the input information is first reduced to half using a 1 x 1 convolutional layer. And then input into a 3 x 3 convolution layer added with a pixel level attention mechanism (PA), wherein the pixel level attention mechanism can allocate different weights to different channels, so that the correlation between the channels of the feature map is improved better. Finally, a 3X 3 convolution layer is used for carrying out feature mapping to obtain the final attention information x _n ′。

Step 320, using the inattentive branches to obtain local inattention information for the feature, the steps are as follows:

firstly, reducing the channel number of input information to half of the original channel number by using a 1X 1 convolution layer, and then, performing feature mapping by using only one 3X 3 convolution layer to obtain final local inattention information x' _n ′。

Step S330, the self-adaptive weight fusion branches are used for fusing the non-attention branches and the attention branches, and weight fusion information is acquired, wherein the steps are as follows:

the input information is first averaged using a global averaging pooling method to obtain global information. And then, through a full connection layer, the parameters are reduced, and the characteristic weight of each characteristic channel is learned. And then, further learning the characteristic weight of each characteristic channel through a ReLU function and a full connection layer. After the characteristic weight value is subjected to a Softmax function, an adaptive weight layer is used for respectively generating standard adaptive weight lambda for the attention information and the local non-attention information ₁ And lambda (lambda) ₂ . And then, using 2 1 multiplied by 1 convolution layers to carry out channel recombination on the attention information and the local non-attention information obtained before, multiplying the attention information and the local non-attention information by corresponding adaptive weights respectively, and adding corresponding elements. Finally, the output of the 1 multiplied by 1 convolution layer is used to be added with the initial input information element by element to obtain the final deep image characteristic x _n 。

And S4, extracting the deep features of the image through a reconstruction network to obtain a reconstructed high-resolution image.

Wherein, firstly, two up-sampling modules and a 3×3 convolution layer which are connected in series are used for reconstructing the deep features of the image, and the reconstructed features are obtained. And then performing bilinear interpolation operation on the input image by using a bilinear interpolation connection layer, and adding the bilinear interpolation operation with corresponding elements of the reconstruction characteristics to obtain a final reconstructed high-resolution image.

Reconstructing the super-resolution image further comprises a loss function operation, wherein the loss function adopts L ₁ Loss.

To better illustrate the effectiveness of the present application, test datasets were tested and compared to multiple light-weight super-resolution models, DRRN, IDN, CARN, IMDN and PAN, respectively. The contents of the comparative experiments are as follows:

in the experiment, DIV2K is used as a training data set, the DIV2K data set comprises 800 high-quality RGB training images, bicubic interpolation is used in MATLAB to downsample a high-resolution image, and meanwhile 90 DEG, 180 DEG and 270 DEG rotation and horizontal overturning are used for data enhancement. In the test, the present application uses four data sets of Set5, set14, BSD100 and Urban100 as test data sets. Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) are used as objective quality assessment indices. All values were calculated on the Y channel in the YCbCr channel.

Objective quality assessment of the results of the different magnifications performed on the Set5, set14, BSD100 and Urban100 datasets by different methods are shown in table 1, where x 2, x 3 and x 4 represent the magnifications of the super-resolution reconstruction as 2-fold, 3-fold and 4-fold, respectively.

TABLE 1 average PSNR/SSIM for different super-resolution reconstruction methods on Set5, set14, BSD100 and Urban100 datasets

Note that bold marked numbers indicate best effect and underlined marked numbers indicate suboptimal effect

It can be seen from the table that the performance of the method of the present application is superior to other methods on most data sets, and in particular, the peak signal-to-noise ratio of the reconstructed results at x 2 and x 3 on Set5 data sets is improved by 0.03dB and 0.06dB, respectively, compared to the second ranked method. Although the reconstruction results at x 4 remained essentially identical or slightly elevated from the second name, the parameter amounts were only 65% of it, demonstrating that the inventive method achieved a better balance between performance and network size.

Fig. 5 (b) - (f) show the results of 2-fold super-resolution reconstruction of image 58060 in the BSD100 dataset by different methods, fig. 5 (a) being a true high-resolution image. From the figure, the reconstruction results of the CARN, IMDN and other methods can not reconstruct the third group of stripes, and the PAN can reconstruct the stripes, but the reconstruction effect of the method is the most clear.

Fig. 6 (b) - (f) show the results of 3-fold super-resolution reconstruction of image 062 in Urban100 by different methods, fig. 6 (a) being a true high-resolution image. As can be seen from the figure, the reconstruction results of other methods can incorrectly reconstruct the vertical edges of the window into the lateral edges, and the method of the application can correctly reconstruct the edge profile of the window.

Fig. 7 (b) - (f) show the results of 4-fold super-resolution reconstruction of image 093 in Urban100 by different methods, fig. 7 (a) being a true high-resolution image. It can be seen from the figure that the picture reconstructed by other methods can reconstruct the horizontal stripe into the vertical stripe by mistake, but the reconstruction result of the method of the application on the stripe is the most accurate and is very close to the original picture.

The application is not limited to the above embodiments, and based on the technical solution disclosed in the application, a person skilled in the art may make some substitutions and modifications to some technical features thereof without creative effort according to the technical content disclosed, and all the substitutions and modifications are within the protection scope of the application.

Claims

1. The lightweight super-resolution reconstruction method based on adaptive weight learning is characterized by comprising the following steps of:

step 1, obtaining an image to be processed;

the self-adaptive weight module acquires attention information of the feature by using an attention branch; obtaining local inattention information for the feature using the inattention branches; the self-adaptive weight fusion branches are used for fusing the non-attention branches and the attention branches, and weight fusion information is obtained;

fusing the inattentive branch and the attentive branch using an adaptive weight fusion branch, comprising:

further learning the feature weight of each feature channel through a ReLU function and a full connection layer; after the Softmax function, an adaptive weight layer is used for respectively generating standard adaptive weights for the attention information and the local non-attention informationAnd->；

adding the output of the 1X 1 convolution layer with the initial input information to obtain a final characteristic;

2. The light super-resolution reconstruction method based on self-adaptive weight learning according to claim 1, wherein the nonlinear mapping network comprises a plurality of self-adaptive weight modules connected in series, the image shallow features continuously obtain attention weight features after sequentially passing through the self-adaptive weight modules, the self-adaptive weight modules learn the feature weights of each branch, adjust the image shallow features by using the feature weights, and output the image shallow features to the next self-adaptive weight module until the image shallow features are adjusted by all the self-adaptive weight modules, so as to obtain the image deep features.

3. The method for lightweight super-resolution reconstruction based on adaptive weight learning according to claim 1, wherein the acquisition of the attention information of the feature using the attention branch comprises:

4. The method for lightweight super-resolution reconstruction based on adaptive weight learning according to claim 1, wherein the local inattention information of the feature is acquired using an inattention branch, comprising:

5. The method for reconstructing a lightweight super-resolution based on adaptive weight learning according to claim 1, wherein the reconstruction network comprises an upsampling module, a 3 x 3 convolution layer and a bilinear interpolation connection layer.

6. The light super-resolution reconstruction method based on self-adaptive weight learning according to claim 5, wherein in a reconstruction network, the image deep features sequentially pass through two up-sampling modules and a 3×3 convolution layer which are connected in series to obtain reconstruction features, and finally, the reconstruction features are added with corresponding elements of output features of a bilinear interpolation connection layer to obtain a final reconstructed high-resolution image.

7. The light-weight super-resolution reconstruction method based on adaptive weight learning of claim 5, wherein the bilinear interpolation connection layer directly acquires an input image, and adds the input image to the reconstruction feature through bilinear interpolation operation.

8. The method for reconstructing light super-resolution based on adaptive weight learning according to claim 1, wherein the super-resolution reconstruction method comprises the operation of a loss function, and the loss function adopts L ₁ Loss.