CN111583107A

CN111583107A - Image super-resolution reconstruction method and system based on attention mechanism

Info

Publication number: CN111583107A
Application number: CN202010257518.9A
Authority: CN
Inventors: 陈沅涛; 张艺兴; 陈曦; 张建明; 刘林武; 陶家俊; 张浩鹏; 文司昊; 谷科; 余飞
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-08-25

Abstract

The invention discloses an image super-resolution reconstruction method and system based on an attention mechanism, which comprises the following steps: extracting preliminary image characteristic information in the low-resolution image; gradually extracting the preliminary image feature information through a plurality of connected information extraction modules to obtain depth image feature information; and reconstructing a super-resolution image according to the depth image characteristic information. The image information is extracted through the information extraction module based on the attention mechanism, the correlation among channels is improved, the characteristic diagram information of each channel is fully fused, the definition and the visual effect of the reconstructed super-resolution image are improved, more details such as contour textures are effectively recovered, and finally the reconstruction module reconstructs high-resolution images with different scales according to the information.

Description

Image super-resolution reconstruction method and system based on attention mechanism

Technical Field

The invention relates to the technical field of image super-resolution reconstruction, in particular to an attention mechanism-based image super-resolution reconstruction method and system.

Background

Super-Resolution (SR) reconstruction was proposed by Harris et al in the first 60 s of the 20 th century, and can be divided into single reconstruction and multiple reconstructions according to the number of low-Resolution images required for reconstruction, where single-image Super-Resolution reconstruction is to reconstruct a low-Resolution image into a high-Resolution image visually pleasing through a certain algorithm.

In recent years, a super-resolution reconstruction method based on a Convolutional Neural Network (CNN) achieves good results, for example, in 2014, Dong et al first introduces a three-layer convolutional neural network into an image, and proposes a super-resolution reconstruction model (SRCNN) based on the convolutional neural network; in 2016, Kim et al proposed a very deep convolutional network-accurate super-resolution reconstruction model (VDSR) that increased the depth of the network to 20 layers; in order to control the number of model parameters, Kim et al also propose a deep recursive convolutional network model (DRCN) by recursive supervision and skip connection, which makes a significant improvement compared with the SRCNN; tai et al propose a Deep Recursive Residual Network (DRRN) that employs a weight sharing strategy to alleviate the huge parameter requirements in very deep networks; in 2018, an Information Distillation Network (IDN) was proposed as a deep but compact convolutional network, enabling the reconstruction of high resolution images directly from the original low resolution images. In order to improve the super-resolution effect of an image, deepening and expanding a network become a design trend, but the problems of large calculation amount, more memory consumption, long training time and the like are caused only by deepening the number of network layers, so that the method is not suitable for mobile and embedded visual application scenes. In addition, the above convolutional network-based approach, which handles channel characteristics equally, lacks flexibility in handling different types of information, such as high frequency information and low frequency information. Therefore, the problems of blurry reconstructed output images, poor visual effect, long operation time and the like commonly exist in the existing image super-resolution reconstruction method.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide a method and a system for reconstructing super-resolution image based on attention mechanism, so as to improve the definition and visual effect of the reconstructed output image.

The technical scheme adopted by the invention for solving the problems is as follows:

in the first aspect of the invention, an attention mechanism-based image super-resolution reconstruction method extracts preliminary image characteristic information in a low-resolution image; extracting the preliminary image feature information in a multi-dimensional way through a plurality of connected information extraction modules to obtain depth image feature information; reconstructing a super-resolution image according to the depth image characteristic information; the information extraction module comprises a plurality of groups of convolution modules and attention modules which are arranged at intervals, and the primary image feature information passes through the convolution modules to obtain first feature information containing a plurality of feature channels; and the first characteristic information executes an attention mechanism through the attention module, learns the characteristic weight of each characteristic channel, and adjusts the first characteristic information by using the characteristic weight to obtain the input of the next convolution module.

The image super-resolution reconstruction method based on the attention mechanism at least has the following beneficial effects: by arranging the information extraction module based on the attention mechanism, the correlation among channels is improved, the characteristic diagram information of each channel is fully fused, and the definition and the visual effect of the reconstructed super-resolution image are improved.

Further, the attention mechanism includes: averaging input information by using a global average pooling method to obtain global information; using a gating unit consisting of two fully-connected layers to learn the feature weight of each feature channel from the global information; and adjusting the input information by using the characteristic weight to obtain output information. The global average pooling averages the information of all points in the space into a value, can shield the distribution information in the space, and can better focus on the correlation among channels; a gating mechanism formed by two layers of full connection fuses characteristic diagram information of each channel, obtains a characteristic diagram weighted value by utilizing the correlation between the channels, and realizes self-adaptive adjustment of channel characteristics.

Further, the information extraction module comprises a local shallow layer network for extracting second feature information and a local deep layer network for extracting third feature information; in the information extraction module, the second feature information and the third feature information are processed according to the following operations:

R^k＝C(S(P₁ ^k,1/s),F_k-1)

wherein, F_k-1Is the input of the current information extraction module, C, S represents the join operation and the slice operation, respectively, P₁ ^kFor the output of the local shallow network in the kth information extraction module, R^kThe second characteristic information;

P^k＝P₂ ^k+R^k＝C_b(S(P₁ ^k,1-1/s))+C(S(P₁ ^k,1/s),F_k-1)

wherein, P₂ ^k、C_bRespectively for the output of the local deep network and the stacking convolution operation of the local deep network in the kth information extraction module, P^kIs the third characteristic information.

And after each feature extraction, the association and fusion of the feature map information of each channel are carried out, so that the value of the weight value of the feature map is ensured. The third characteristic information is obtained by combining the first characteristic information, the second characteristic information and the local deep network characteristic, so that the third characteristic information is richer and more effective in content.

Further, the local shallow layer network comprises a first convolution module, a first attention module, a second convolution module, a second attention module, a third convolution module and a third attention module which are connected in sequence, wherein the characteristic dimension of the first convolution module is 48, the characteristic dimension of the second convolution module is 32, and the characteristic dimension of the third convolution module is 64.

Further, the local deep network includes a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module, a sixth convolution module, and a sixth attention module, which are connected in sequence, where a characteristic dimension of the fourth convolution module is 64, a characteristic dimension of the fifth convolution module is 48, and a characteristic dimension of the sixth convolution module is 80.

Further, the information extraction module is further provided with an information compression module, the information compression module is provided with a compression convolution module for reducing dimensions, and the characteristic dimension of the compression convolution module is 64. The information compression module can compress redundant information of the features in the third feature information, and increase feature map information of the third feature information, so that the outline details are clearer.

Further, the reconstructed super-resolution image adopts a sub-pixel convolution layer reconstruction method, and the depth image characteristic information is processed according to the following operations:

I_SR＝H_REC(H_n(F_n-1))+U(I_LR)

wherein H_RECU denotes the reconstruction operation and the bicubic interpolation operation, I, respectively_SRRepresents the final output result, I_LRRepresenting the original input, H_nRepresenting the nth information extraction function.

Therefore, the low-resolution images are reconstructed into high-resolution images with different scales by utilizing the information learned by the information extraction module.

Further, the reconstructing the super-resolution image further includes a loss function operation, where the loss function adopts a mean absolute error mode, and the final output result is processed according to the following operations:

in a second aspect of the present invention, an attention-based image super-resolution reconstruction system includes: the characteristic extraction module is used for extracting the preliminary image characteristic information in the low-resolution image; the information extraction modules are used for extracting the preliminary image feature information in a multi-dimensional mode to obtain depth image feature information; the reconstruction module is used for reconstructing a super-resolution image according to the depth image characteristic information; the information extraction module comprises a plurality of groups of convolution modules arranged at intervals and an attention module, and the attention module executes an attention mechanism to adjust output information passing through the convolution modules.

The image super-resolution reconstruction system based on the attention mechanism at least has the following beneficial effects: by arranging the information extraction module based on the attention mechanism, the correlation among channels is improved, the characteristic diagram information of each channel is fully fused, and the definition and the visual effect of the reconstructed super-resolution image are improved.

In a third aspect of the present invention, a storage medium is characterized in that the storage medium stores computer-executable instructions for causing a computer to execute the attention-based image super-resolution reconstruction method as described above.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a flowchart of an image super-resolution reconstruction method based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a flow chart of the attention machine mechanism of FIG. 1;

FIG. 3 is a block diagram of the information extraction module of FIG. 1;

FIG. 4 is a block diagram of an image super-resolution reconstruction system based on an attention mechanism according to an embodiment of the present invention;

FIG. 5 is a block diagram of the local deep network and information compression module of FIG. 4;

FIG. 6 is a block diagram of the attention module of FIG. 4;

FIG. 7 is a comparison graph of the image reconstruction results of the Set5 data Set according to the present invention and the comparison method;

FIG. 8 is a comparison of the image reconstruction results on the Urban100 data set using the present invention and comparison method.

Detailed Description

Referring to fig. 1 and 3, an embodiment of the present invention provides an image super-resolution reconstruction method based on an attention mechanism, including step S110 of extracting preliminary image feature information in a low-resolution image; step S120, gradually extracting preliminary image characteristic information through a plurality of connected information extraction modules 200 to obtain depth image characteristic information; and S130, reconstructing a super-resolution image according to the depth image characteristic information. The information extraction module 200 comprises a plurality of groups of convolution modules 210 and attention modules 220 which are arranged at intervals, and the preliminary image feature information is subjected to the convolution modules 210 to obtain first feature information containing a plurality of feature channels; the first feature information is processed by the attention module 220 to execute an attention mechanism, learn a feature weight of each feature channel, and adjust the first feature information by using the feature weight to obtain an input of a next convolution module.

The image information is extracted by the plurality of connected information extraction modules 200, and an attention mechanism is executed, so that the correlation among channels is improved, the characteristic diagram information of each channel is fully fused, and the definition and the visual effect of the reconstructed super-resolution image are improved.

In this embodiment, features are extracted from the original image, and the original image data is extracted by convolution of two 3 × 3, and the feature dimension is 64, as shown in the following formula:

F₀＝H_FE(I_LR)

wherein I_LRRepresenting the original input, H_FEA function representing the extracted features is then selected,F₀representing the extracted features and the input for the next step.

In addition, in the embodiment, the number of the information extraction modules 200 is 4, and the process of extracting the image information is shown in the following formula:

F_k＝H_k(F_k-1),k＝1,2,…,n

wherein H_kDenotes the kth information extraction function, F_k-1And F_kRespectively representing the input and output of the kth information extraction block.

Referring to fig. 2 and 6, another embodiment, performing an attention mechanism includes: step S121, averaging input information by using a global average pooling method to obtain global information; step S122, learning the feature weight of each feature channel from global information by using a gate control unit formed by two fully-connected layers 222; and S123, adjusting the input information by using the characteristic weight to obtain output information.

In the present embodiment, an attention mechanism is executed, and the specific flow is as follows:

and step S121, averaging the input information by using a global average pooling method to obtain global information. By using Global Average Pooling (GAP) 221 to average the information of all points in space to a value, the distribution information in space can be masked, and the correlation between channels can be better concerned. In this embodiment, X ═ X₁,…,x_c,…,x_C]As input, the size of the C feature maps is H × W, the result after global average pooling presents global information z among the C feature maps, and the global information z of the C-th feature map_cAs shown in the following equation:

wherein x_c(i, j) is the C-th feature map x_cValue of position (i, j), H_GAPRepresenting a global average pooling function, such global information being helpful in expressing the entire image;

step S122, learning the feature weight of each feature channel from the global information by using the gate unit composed of two fully-connected layers 222. The two layers of full connection form a gating mechanism, and the calculation method of the gating unit s is shown as the following formula:

s＝g(W₂(W₁z))

where the sum g represents the Sigmoid function and the ReLU function, respectively. W₁、W₂Are all operated in full connection with layer 222, where W₁The dimension of (1) is C/r, r is a scaling factor, in the embodiment, the value of r is 16, and the number of channels is reduced by setting the parameter, so that the calculated amount is reduced. Then passes through a ReLU layer 223 with unchanged output dimension; w₂Is C, and the output dimension is 1 x C; and finally obtaining s through a Sigmoid function. s is a weight that can characterize C feature maps, and this weight is learned through the fully-connected layers 222 and the non-linear layers, so that end-to-end training can be performed, and the two fully-connected layers 222 serve to fuse feature map information of each channel.

And S123, adjusting the input information by using the characteristic weight to obtain output information. Finally, the obtained characteristic diagram weight value s is used for readjusting the input x_cThe following formula is shown.

Wherein s is_cAnd x_cIs the scaling factor and feature map in the c-th channel. Therefore, the channel characteristics can be adjusted in a self-adaptive mode through the characteristic diagram attention mechanism, and the representation capability of the network is enhanced.

Referring to fig. 3 and 5, another embodiment, the information extraction module 200 includes a local shallow network 400 for extracting second feature information and a local deep network 500 for extracting third feature information; in the information extraction module 200, the second feature information and the third feature information are processed as follows:

R^k＝C(S(P₁ ^k,1/s),F_k-1)

wherein, F_k-1Is a current information extraction module 200, C is a join operation 510, S is a slice operation 520, P₁ ^kFor the output of the local shallow network in the kth information extraction module 200, R^kIs the second characteristic information.

P^k＝P₂ ^k+R^k＝C_b(S(P₁ ^k,1-1/s))+C(S(P₁ ^k,1/s),F_k-1)

Wherein, P₂ ^k、C_bThe output of the local deep network and the stacking convolution operation of the local deep network, P, in the kth information extraction module 200, respectively^kIs the third characteristic information.

In another embodiment, the local shallow network 400 includes a first convolution module, a first attention module, a second convolution module, a second attention module, a third convolution module, and a third attention module connected in sequence, wherein the characteristic dimension of the first convolution module is 48, the characteristic dimension of the second convolution module is 32, and the characteristic dimension of the third convolution module is 64.

In another embodiment, the local deep network 500 includes a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module, a sixth convolution module, and a sixth attention module, which are connected in sequence, wherein the feature dimension of the fourth convolution module is 64, the feature dimension of the fifth convolution module is 48, and the feature dimension of the sixth convolution module is 80.

In another embodiment, the information extraction module 200 is further provided with an information compression module 230, and the information compression module 230 is provided with a compression convolution module for dimension reduction, and the characteristic dimension of the compression convolution module is 64. The information compression module 230 can compress redundant information of the features in the third feature information, and increase feature map information of the third feature information, so that the detail of the outline is clearer.

In another embodiment, the reconstruction of the super-resolution image adopts a sub-pixel convolution layer ESPCN reconstruction method, and the depth image characteristic information is processed according to the following operations:

I_SR＝H_REC(H_n(F_n-1))+U(I_LR)

In another embodiment, reconstructing the super-resolution image further includes a loss function operation, where the loss function is implemented by using an average absolute error, and the final output result is processed according to the following operations:

referring to fig. 4, another embodiment, an attention-based image super-resolution reconstruction system includes: a feature extraction module 100, configured to extract preliminary image feature information in the low-resolution image; the information extraction modules 200 are used for extracting the preliminary image feature information in a multi-dimensional manner to obtain depth image feature information; and a reconstruction module 300, configured to reconstruct the super-resolution image according to the depth image feature information.

The present embodiment extracts features from an original low-resolution image through a feature extraction block, and then adaptively adjusts feature channel information by an information extraction module 200 including a plurality of convolution modules 210 and an attention module 220, thereby enhancing the expressive power of the features. Through comparison of experimental data, the method can effectively recover more details such as contour texture and the like, and finally the reconstruction module reconstructs high-resolution images with different scales according to the information.

To better illustrate the effectiveness of the present invention, test data sets were experimented with and data compared to a plurality of advanced super-resolution models. The contents of the comparative experiment are as follows:

the experiment used four widely used benchmark test data sets of Set5, Set14, Urban100 and Manga109, and data were enhanced in three ways: rotate the picture 90 °, 180 ° and 270 °; horizontally turning over the picture; the image is reduced by factors of 0.9,0.8,0.7 and 0.6. And rebuilding data in the reference test data set in different scales after double and triple down sampling, and qualitatively and quantitatively comparing a rebuilding result with other most advanced super-resolution models, wherein the rebuilding result comprises SRCNN, VDSR, DRCN, DRRN and IDN. The first table shows the quantitative comparison results of different image super-resolution models under three different scales of x2, x3 and x4 by using the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM), wherein the higher the values of the two parameters are, the better the reconstruction effect is.

Watch 1

As can be seen from the data in Table I, the PSNR and SSIM of the method provided by the invention on the Set5, B100, Urban100 and Manga109 data sets are mostly superior to other super-resolution models, which fully shows the effectiveness of the attention-based mechanism in image super-resolution reconstruction.

The second table shows the running time quantitative comparison results of the x2, x3 and x4 image super-resolution models with different scales, wherein the time unit is second, and the compared super-resolution models are also SRCNN, VDSR, DRCN, DRRN and IDN.

Watch two

As can be seen from the table two data, the runtime of the method proposed by the present invention on the Set5, B100, Urban100 and Manga109 data sets is mostly lower than that of other super-resolution models, which fully indicates the high efficiency in image super-resolution reconstruction based on the attention mechanism.

Fig. 7 and 8 show examples of reconstruction results on Set5 and Urban100 data sets at the x4 scale, comparing super resolution models again SRCNN, VDSR, DRCN, DRRN, IDN. Compared with a contrast method, the method can generate clearer edges and richer details, and has better visual effect.

The above is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiments, and the present invention shall fall within the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means.

Claims

1. An image super-resolution reconstruction method based on an attention mechanism is characterized by comprising the following steps:

extracting preliminary image characteristic information in the low-resolution image;

extracting the preliminary image feature information in a multi-dimensional way through a plurality of connected information extraction modules to obtain depth image feature information;

reconstructing a super-resolution image according to the depth image characteristic information;

the information extraction module comprises a plurality of groups of convolution modules and attention modules which are arranged at intervals, and the preliminary image feature information passes through the convolution modules to obtain first feature information containing a plurality of feature channels; and the first characteristic information executes an attention mechanism through the attention module, learns the characteristic weight of each characteristic channel, and adjusts the first characteristic information by using the characteristic weight to obtain the input of the next convolution module.

2. The method for image super-resolution reconstruction based on attention mechanism according to claim 1, wherein the attention mechanism comprises:

averaging input information by using a global average pooling method to obtain global information;

using a gating unit consisting of two fully-connected layers to learn the feature weight of each feature channel from the global information;

and adjusting the input information by using the characteristic weight to obtain output information.

3. The method for image super-resolution reconstruction based on attention mechanism as claimed in claim 2, wherein the information extraction module comprises a local shallow layer network for extracting second feature information and a local deep layer network for extracting third feature information; in the information extraction module, the second feature information and the third feature information are processed according to the following operations:

R^k＝C(S(P₁ ^k,1/s),F_k-1)

P^k＝P₂ ^k+R^k＝C_b(S(P₁ ^k,1-1/s))+C(S(P₁ ^k,1/s),F_k-1)

4. The method for image super-resolution reconstruction based on attention mechanism of claim 3, wherein the local shallow layer network comprises a first convolution module, a first attention module, a second convolution module, a second attention module, a third convolution module and a third attention module which are connected in sequence, wherein the characteristic dimension of the first convolution module is 48, the characteristic dimension of the second convolution module is 32, and the characteristic dimension of the third convolution module is 64.

5. The method for image super-resolution reconstruction based on attention mechanism according to claim 4, wherein the local deep network comprises a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module, a sixth convolution module and a sixth attention module which are connected in sequence, wherein the characteristic dimension of the fourth convolution module is 64, the characteristic dimension of the fifth convolution module is 48, and the characteristic dimension of the sixth convolution module is 80.

6. The method for reconstructing the super-resolution image based on the attention mechanism is characterized in that the information extraction module is further provided with an information compression module, the information compression module is provided with a compressed convolution module for dimension reduction, and the characteristic dimension of the compressed convolution module is 64.

7. The method for reconstructing the super-resolution image based on the attention mechanism according to claim 6, wherein the reconstructed super-resolution image is reconstructed by a sub-pixel convolution layer reconstruction method, and the depth image feature information is processed according to the following operations:

I_SR＝H_REC(H_n(F_n-1))+U(I_LR)

8. The method for reconstructing image super resolution based on attention mechanism according to claim 7, wherein said reconstructing super resolution image further comprises a loss function operation, said loss function is in a mean absolute error manner, and said final output result is processed according to the following operations:

9. a system for applying the attention-based image super-resolution reconstruction method according to any one of claims 1 to 8, comprising:

the characteristic extraction module is used for extracting the preliminary image characteristic information in the low-resolution image;

the information extraction modules are used for extracting the preliminary image feature information in a multi-dimensional mode to obtain depth image feature information;

the reconstruction module is used for reconstructing a super-resolution image according to the depth image characteristic information;

the information extraction module comprises a plurality of groups of convolution modules arranged at intervals and an attention module, and the attention module executes an attention mechanism to adjust output information passing through the convolution modules.

10. Storage medium, characterized in that it stores computer-executable instructions for causing a computer to perform the method for attention-based image super-resolution reconstruction as claimed in any one of claims 1 to 8.