CN115082317B

CN115082317B - Image super-resolution reconstruction method for attention mechanism enhancement

Info

Publication number: CN115082317B
Application number: CN202210812384.1A
Authority: CN
Inventors: 陈超; 黄金煜; 赵彬; 黄豪
Original assignee: Sichuan University of Science and Engineering
Current assignee: Sichuan University of Science and Engineering
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2023-04-07
Anticipated expiration: 2042-07-11
Also published as: CN115082317A

Abstract

The invention discloses an attention mechanism enhanced image super-resolution reconstruction method which comprises the steps of constructing a convolutional neural network, training the neural network, extracting the characteristics of an original image by a head characteristic extraction module, sequentially extracting the characteristics by each filtering characteristic extraction module, inputting a later-stage characteristic image into an up-sampling module, reconstructing and outputting a reconstructed image with the resolution being higher than that of the original image by the up-sampling module and the like. The two spatial calibration modules are mutually matched and complemented, so that the filterability characteristic extraction module has relatively accurate extraction capability on different types of characteristic information; the filtering characteristic extraction modules are connected end to end along the depth direction of the network, the effect of filtering and eliminating repeated image information in different levels is finally achieved, more effective image information is reserved in a later-level characteristic diagram, and high-frequency detail characteristics can be reconstructed in a high-resolution image.

Description

Image super-resolution reconstruction method for attention mechanism enhancement

Technical Field

The invention belongs to the technical field of deep learning and image reconstruction, and particularly relates to an attention mechanism enhanced image super-resolution reconstruction method.

Background

Single Image Super-Resolution (SISR) is an important direction in the field of Image reconstruction. The method has the advantages that the corresponding algorithm is operated by using the computer, and then the high-resolution image corresponding to the existing low-resolution image is reconstructed, so that the aims of partially missing details and improving the image resolution are fulfilled. By means of the technology, the requirement of image transmission on broadband can be reduced, and the remote sensing observation and the positioning precision of lesion tissues and the like can be improved.

The attention mechanism is one of the important factors for human beings to be capable of sensing the external environment efficiently, and is applied to various fields of artificial intelligence later to selectively focus on required high-value information and suppress unnecessary features. Although some models currently use the attention mechanism for single-image super-resolution reconstruction, they basically directly use the existing attention mechanism, and do not perform adaptive improvement on the characteristics of image super-resolution reconstruction, so that the model performance is still low.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an attention mechanism enhanced image super-resolution reconstruction method, which aims at the characteristics of image super-resolution reconstruction and improves the network structure and the attention mechanism adaptively so as to improve the performance of a model.

In order to achieve the above purpose, the solution adopted by the invention is as follows: an attention mechanism enhanced image super-resolution reconstruction method comprises the following steps:

s100, constructing a convolutional neural network for super-resolution reconstruction of the image on a computer; the convolutional neural network comprises a header feature extraction module, a filterability feature extraction module and an up-sampling module, wherein the header feature extraction module is arranged at the front end of the convolutional neural network, the plurality of filterability feature extraction modules connected in sequence are arranged in the middle of the convolutional neural network, and the up-sampling module is arranged at the tail of the convolutional neural network;

the processing process of the characteristic diagram inside the filtering characteristic extraction module is represented as the following mathematical model:

E1＝ρ1(f1k3(P _n ))

E2＝ρ2(f1k1(P _n ))

E4＝ρ3(f3k51(E1))

E5＝ρ4(f3k52(E2))

E6＝f4k1([E3,E4,E5])

E7＝ρ6(f6k3(ρ5(f5k3(E6))))

/>

wherein, P _n A feature map representing the input to the filtering feature extraction module, f1k1, f2k1 and f4k1 each representing a convolution kernel size of1*1, f1k3, f5k3, and f6k3 each represents a convolution processing layer having a convolution kernel size of 3*3, f3k51 and f3k53 each represent a convolution processing layer having a convolution kernel size of 5*5, and ρ 1, ρ 2, ρ 3, ρ 4, ρ 5, and ρ 6 each represent an activation function ReLU, [ · c]The splicing operation of the characteristic diagrams in the channel direction is shown, fcA1 represents a preset space calibration module, fcA represents a post-regulation space calibration module,

representing the product of the element correspondences, P _n+1 A feature map representing the output of the filtering feature extraction module;

s200, acquiring a training image data set, and training the convolutional neural network by using the training image data set;

s300, acquiring an original image to be reconstructed, inputting the original image into the trained convolutional neural network, and outputting the original image after the characteristic extraction of the original image by the head characteristic extraction module to obtain a preceding stage characteristic diagram;

s400, each filtering characteristic extraction module sequentially receives a characteristic graph output by an upstream end of the filtering characteristic extraction module as input, and outputs the processed characteristic graph to a downstream end of the filtering characteristic extraction module until a last filtering characteristic extraction module outputs a post-stage characteristic graph;

s500, inputting the post-stage feature map into the up-sampling module, and then reconstructing and outputting a reconstructed image with the resolution being higher than that of the original image by the up-sampling module.

Further, the processing procedure of the characteristic diagram inside the preset space calibration module is represented as the following mathematical model:

KX＝[ASJ(Em),ASV(Em)]

KT1＝ξ1(Sf1(KX))

wherein Em represents the feature diagram input into the preset space calibration module, em is the feature diagram obtained after the f2k1 ([ E1, E2 ]) operation, ASJ and ASV represent global average pooling and variance pooling on the feature diagram in the channel direction, respectively, [ · ] represents splicing the feature diagram therein in the channel direction, ξ 1 represents an activation function sigmoid, sf1 represents a convolution operation processing layer with a convolution kernel size of 1*1, and KT1 represents the preset space calibration diagram output by the preset space calibration module.

Further, the processing procedure of the characteristic diagram inside the post-modulation space calibration module is represented as the following mathematical model:

KY＝[ASJ(E6),AAT(E6)]

KT2＝ξ2(Sf2(KY))

wherein, E6 represents the characteristic diagram input into the post-modulation space calibration module, ASJ and AAT represent global average pooling and maximum pooling of the characteristic diagram in the channel direction respectively, [. Cndot. ] represents splicing operation of the characteristic diagram in the channel direction, ξ 2 represents an activation function sigmoid, sf2 represents a convolution operation processing layer with a convolution kernel size of 1*1, and KT2 represents the post-modulation space calibration diagram output by the post-modulation space calibration module.

Furthermore, an internal tone connection is arranged in the filterability feature extraction module, the preset spatial calibration graph is transmitted to the output end of the post-modulation spatial calibration module through the internal tone connection, the post-modulation spatial calibration graph is added with the preset spatial calibration graph to obtain an integrated spatial calibration graph, and then the integrated spatial calibration graph is utilized to calibrate the feature graph E7.

Furthermore, a full-hierarchy channel calibration module is arranged in the convolutional neural network; adding the E3 characteristic diagram and the E6 characteristic diagram in each filtering characteristic extraction module to obtain an intersection characteristic diagram, inputting the intersection characteristic diagram in each filtering characteristic extraction module into the full-hierarchy channel calibration module, wherein the channel calibration diagram output by the full-hierarchy channel calibration module is used for calibrating each channel of the post-hierarchy characteristic diagram.

Further, the processing procedure of the feature map inside the full-hierarchy channel calibration module is represented as the following mathematical model:

QM1＝λ1(CAP(H1))+λ2(CAP(H2))+…+λn(CAP(Hn))

QM2＝ξc(CEP(Cf1a([H1,H2…,Hn])))

wherein, H1, H2, …, hn respectively represent the blend feature map output by each filtering feature extraction module, H1, H2, …, hn are input to the full-hierarchy channel calibration module, CEP and CAP respectively represent the global average pooling and maximum pooling of the feature map in the spatial direction, λ 1, λ 2, …, λ n all represent nonlinear mapping units, [ ·]The characteristic diagram is spliced in the channel direction, cf1a represents a convolution operation processing layer with the convolution kernel size of 1*1, ξ c represents an activation function sigmoid,

representing the element-corresponding product, and KD represents the channel calibration graph output by the full-hierarchy channel calibration module.

Furthermore, a front-back long connection is arranged in the convolutional neural network, the front-stage feature diagram is directly transmitted to the front end of the up-sampling module through the front-back long connection, the rear-stage feature diagram is firstly added with the front-stage feature diagram to obtain a comprehensive feature diagram, and then the up-sampling module takes the comprehensive feature diagram as input.

The invention has the beneficial effects that:

(1) In the filtering characteristic extraction module, the sizes of two convolution operation convolution kernels at the front end are 1*1 and 3*3 respectively, the characteristic information of a small visual field is obtained in the convolution operation process, then the characteristic diagram E1 and the characteristic diagram E2 are spliced and subjected to dimension reduction, and repeated image information which is relatively close to each other can be filtered in the process; on the basis of E1 and E2, respectively applying 5*5 convolution operations, which are superposed with the former 1*1 convolution and 3*3 convolution, equivalent to convolution kernel sizes of 5*5 and 7*7, so that feature information of relatively large fields of view is in feature map E4 and feature map E5; then, the feature map E3, the feature map E4 and the feature map E5 are spliced and subjected to dimension reduction, repeated image information far away can be filtered out in the process, and meanwhile fusion is achieved on effective image information of different scales; the filtering characteristic extraction modules are connected end to end along the depth direction of the network, the effect of filtering and eliminating repeated image information in different levels is finally achieved, more effective image information is reserved in a later-level characteristic diagram, and high-frequency detail characteristics can be reconstructed in a high-resolution image;

(2) In the deep convolutional neural network, the feature information of different levels in the image is extracted by adopting convolution operation superposed in the depth direction, and because super-resolution reconstruction is to carry out pixel-level reconstruction on the image, after the feature is extracted, a large amount of feature information close to a pixel level is required to be obtained, and relatively not too much abstract feature information spanning a large range is required, the method adopts a cross-scale spatial calibration mechanism, utilizes a post-modulation spatial calibration graph generated by a feature graph E6 to strengthen high-frequency information in a feature graph E7, can avoid over-inhibition on local feature information in the feature graph E7 in the calibration process, and improves the super-resolution reconstruction effect;

(3) The interior of the preset space calibration module adopts average pooling and variance pooling, and is biased to strengthen high-frequency information in local images, the interior of the post-regulation space calibration module adopts average pooling and maximum pooling, and is biased to strengthen global foreground information, and the two space calibration modules are mutually matched and complemented, so that the filtering characteristic extraction module has relatively accurate extraction capability on different types of characteristic information;

(4) The filterability characteristic extraction module is internally connected with a post-modulation space calibration graph, the post-modulation space calibration graph is added with a pre-modulation space calibration graph to obtain a comprehensive space calibration graph, the comprehensive space calibration graph is used for calibrating a characteristic graph E7, attention integration of the two space calibration modules is realized through the internal connection, and compared with the post-modulation space calibration graph, the comprehensive space calibration graph has more scales of receptive fields and has more comprehensive and accurate calibration effect;

(5) In the full-level channel calibration module, in one direction, maximum pooling and nonlinear mapping operations are respectively adopted for the blended feature maps output by each filtering feature extraction module, and then a plurality of obtained vectors are added to obtain a vector QM1, so that important information in the channel direction in different levels of feature maps can be finely mined; in the other direction, the blended feature maps output by each filtering feature extraction module are spliced and subjected to dimensionality reduction fusion, then the feature maps are subjected to global average pooling in the spatial direction and activated to obtain a vector QM2, and the feature maps are calibrated by utilizing the QM2, so that the effect of inhibiting low-frequency information interference in the images is good; and finally, performing element corresponding product operation on the QM1 and the QM2 to obtain a channel calibration graph, wherein when the channel calibration graph calibrates a post-stage characteristic graph, necessary background characteristics and important foreground characteristics can be considered at the same time, so that the characteristic information input into the up-sampling module is more comprehensive and effective, and the quality of the reconstructed image output by the network is further improved.

Drawings

FIG. 1 is a schematic diagram of a convolutional neural network structure in embodiment 1;

FIG. 2 is a schematic diagram of the internal structure of a filtering feature extraction module in embodiment 1;

FIG. 3 is a schematic diagram showing an internal structure of a filtering feature extraction module according to embodiment 2;

FIG. 4 is a schematic diagram of a convolutional neural network structure in embodiment 3;

FIG. 5 is a schematic diagram of an internal structure of the pre-tuning spatial calibration module according to the present invention;

FIG. 6 is a schematic diagram of an internal structure of the post-modulation spatial calibration module according to the present invention;

FIG. 7 is a schematic diagram of the internal structure of the full-hierarchy channel calibration module according to the present invention;

FIG. 8 is a schematic diagram of an internal structure of a non-linear mapping unit according to the present invention;

FIG. 9 is a schematic diagram of the internal structure of the upsampling module of the present invention;

in the drawings:

1-original image, 2-header feature extraction module, 3-filterability feature extraction module, 31-intonation connection, 32-nonlinear mapping unit, 4-up sampling module, 5-preset space calibration module, 6-postmodulation space calibration module, 7-full-hierarchy channel calibration module, 8-front-back long connection and 9-reconstructed image.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

example 1:

the invention provides an attention mechanism enhanced image super-resolution reconstruction method, the overall structure of a network is shown in figure 1, a head feature extraction module 2 is realized by adopting a conventional convolution operation processing layer, the convolution kernel size is 3*3, the head feature extraction module 2 takes an original image 1 as input, and the head feature extraction module 2 performs convolution operation and then outputs to obtain a preceding stage feature image. The number of the filtering characteristic extraction modules 3 is 4, the internal operation flow of the filtering characteristic extraction modules is shown in fig. 2, the first filtering characteristic extraction module 3 takes the front-stage characteristic diagram as input, and the later filtering characteristic extraction modules 3 take the characteristic diagram output by the filtering characteristic extraction module 3 at the upper part of the module as input in sequence. The filtering characteristic extraction module 3 is provided with an internal tone connection 31, the preset space calibration graph is transmitted to the output end of the post-tone space calibration module 6 through the internal tone connection 31, and the post-tone space calibration graph is added with the preset space calibration graph to obtain a comprehensive space calibration graph. The internal operation flow of the preset space calibration module 5 is shown in fig. 5, and the internal operation flow of the post-regulation space calibration module 6 is shown in fig. 6. The up-sampling module 4 is implemented by using the existing algorithm, and the internal operation flow thereof is as shown in fig. 9, and includes a 3*3 convolution operation processing layer, a sub-pixel convolution layer and another 3*3 convolution operation processing layer which are sequentially arranged.

In practical implementation, according to the model structure designed above, a convolutional neural network is constructed on a computer by using Python programming. A common DIV2K data set is obtained to serve as a training image data set, public data sets BSDS100 and Manga109 are used as testing image data sets, and corresponding low-resolution images are obtained by performing double-cubic downsampling on original high-definition images. Some operational details in training the convolutional neural network are shown in the following table:

in this embodiment, the preceding stage feature map, feature map P _n 、P _n+1 The numbers of channels E1, E2, E3, E4, E5, E6, and E7 are all 64, and their length and width dimensions are all equal to those of the original image 1. The effect of f2k1 and f4k1 is to reduce the number of channels in the feature map. Globally to the interior of the pre-conditioned space calibration module 5And after the average pooling and the variance pooling, obtaining two-dimensional matrixes with the number of channels being 1, and obtaining a preset space calibration chart after splicing, sf1 convolution de-scaling and xi 1 activation. The operation flow inside the post-tuning space calibration module 6 is similar to that inside the pre-tuning space calibration module 5.

The internal operation flow of the full-level channel calibration module 7 is shown in fig. 7, the full-level channel calibration module 7 is internally provided with a nonlinear mapping unit 32, the operation flow of the nonlinear mapping unit 32 is shown in fig. 8, the nonlinear mapping unit 32 includes an upper full-link layer, a ReLU active layer, a lower full-link layer and a sigmoid active function which are sequentially connected, wherein the number of input nodes of the upper full-link layer is 64, the number of output nodes is 16, the number of input nodes of the lower full-link layer is 16, the number of output nodes is 64, and a vector QM1 generated later contains 64 elements. For Cf1a convolution, the number of output feature map channels is 64, and after global average pooling in the spatial direction and activation of the ξ c function, a vector QM2 with the length of 64 is generated. And performing element-corresponding product on the QM1 and the QM2 to obtain a channel calibration graph, wherein the channel calibration graph is also a vector with the length of 64. In order to avoid the problem of gradient disappearance, the convolutional neural network of the present embodiment is further provided with a front-rear long connection 8, and the full-hierarchy channel calibration module 7 adds a feature map obtained by calibrating the rear-level feature map to the front-level feature map, and then inputs the feature map to the up-sampling module 4.

In order to illustrate the effect of performing super-resolution reconstruction on the image according to the present invention, the present embodiment also refers to some existing advanced algorithms for comparison experiments. The models of the control group are CSAR and MDCN, the training process and the data set used are identical to the present invention, and the test results on the test set are as follows:

according to the analysis of the test results, under the condition that the upsampling multiple is 2 and the upsampling multiple is 4, the quality of the reconstructed image 9 reconstructed by adopting the method provided by the invention in the embodiment 1 is obviously superior to that of a CSFM model and an MDCN model.

Example 2:

example 2 is another comparative experiment based on example 1, in example 2, the whole-layer channel calibration module 7 is eliminated, the filtering characteristic extraction module 3 and the up-sampling module 4 are the same as those in example 1 on the basis of the convolutional neural network of example 1, and the structure of the convolutional neural network in example 2 is shown in fig. 3. The network of example 2 was repeatedly trained using the same training procedure as in example 1, and then tested, with the following results:

according to the analysis of the test result, under the condition that the up-sampling multiple is 2 and the up-sampling multiple is 4, after the full-hierarchy channel calibration module 7 is cancelled, the performance of the model is reduced to a certain extent, and the full-hierarchy channel calibration module 7 is reversely proved to have a good effect of improving the image reconstruction quality.

Example 3:

example 3 is another comparative experiment based on example 1, example 3 is based on the convolutional neural network of example 1, the intermodulation connection 31 is eliminated, the rest of the network is consistent with example 1, and the structure of the filtering characteristic extraction module 3 in example 3 is shown in fig. 4. The network of example 3 was trained by repeating the same training procedure as in example 1, and then tested, resulting in the following results:

according to the analysis of the test results, under the condition that the upsampling multiple is 2 and the upsampling multiple is 4, the performance of the model is also reduced after the intonation connection 31 is cancelled, and the fact that the quality of image reconstruction can be improved by arranging the intonation connection 31 is also proved.

The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention.

Claims

1. An attention mechanism enhanced image super-resolution reconstruction method is characterized by comprising the following steps: the method comprises the following steps:

E1＝ρ1(f1k3(P _n ))

E2＝ρ2(f1k1(P _n ))

E4＝ρ3(f3k51(E1))

E5＝ρ4(f3k52(E2))

E6＝f4k1([E3,E4,E5])

E7＝ρ6(f6k3(ρ5(f5k3(E6))))

wherein, P _n A feature map input to the filtering feature extraction module is shown, wherein f1k1, f2k1 and f4k1 each show a convolution operation processing layer with a convolution kernel size of 1*1, f1k3, f5k3 and f6k3 each show a convolution operation processing layer with a convolution kernel size of 3*3, and f3k51 and f3k52 each show a convolution operation processing layer with a convolution kernel size of 3*3The convolution processing layer with a convolution kernel size of 5*5, ρ 1, ρ 2, ρ 3, ρ 4, ρ 5 and ρ 6 all represent the activation function ReLU [ · in]The splicing operation of the characteristic diagrams in the channel direction is shown, fcA1 represents a preset space calibration module, fcA represents a post-regulation space calibration module,

representing the product of element correspondences, P _n+1 A feature map representing the output of the filtering feature extraction module;

s500, inputting the post-stage feature map into the up-sampling module, and then reconstructing and outputting a reconstructed image with the resolution being greater than that of the original image by the up-sampling module.

2. The method for reconstructing image super resolution by enhancing attention mechanism as claimed in claim 1, wherein: the processing process of the characteristic diagram inside the preset space calibration module is represented as the following mathematical model:

KX＝[ASJ(Em),ASV(Em)]

KT1＝ξ1(Sf1(KX))

wherein Em represents the characteristic diagram input into the preset space calibration module, ASJ and ASV respectively represent global average pooling and variance pooling on the characteristic diagram in the channel direction, [. Cndot. ] represent splicing operation on the characteristic diagram in the channel direction, ξ 1 represents an activation function sigmoid, sf1 represents a convolution operation processing layer with a convolution kernel size of 1*1, and KT1 represents the preset space calibration diagram output by the preset space calibration module.

3. The method for reconstructing image super resolution with enhanced attention mechanism as claimed in claim 2, wherein: the processing process of the characteristic diagram inside the post-modulation space calibration module is represented as the following mathematical model:

KY＝[ASJ(E6),AAT(E6)]

KT2＝ξ2(Sf2(KY))

4. The method for reconstructing image super resolution with enhanced attention mechanism as claimed in claim 3, wherein: the filtering characteristic extraction module is internally provided with an internal tone connection, the preset space calibration graph is transmitted to the output end of the post-modulation space calibration module through the internal tone connection, the post-modulation space calibration graph is added with the preset space calibration graph to obtain an integrated space calibration graph, and the integrated space calibration graph is used for calibrating the characteristic graph E7.

5. The method for reconstructing the super-resolution image with the enhanced attention mechanism as claimed in claim 1, wherein: a full-hierarchy channel calibration module is arranged in the convolutional neural network; adding the E3 characteristic diagram and the E6 characteristic diagram in each filtering characteristic extraction module to obtain an intersection characteristic diagram, inputting the intersection characteristic diagram in each filtering characteristic extraction module into the full-hierarchy channel calibration module, wherein the channel calibration diagram output by the full-hierarchy channel calibration module is used for calibrating each channel of the post-hierarchy characteristic diagram.

6. The method for reconstructing image super resolution with enhanced attention mechanism as claimed in claim 5, wherein: the processing process of the characteristic diagram inside the full-hierarchy channel calibration module is represented as the following mathematical model:

QM1＝λ1(CAP(H1))+λ2(CAP(H2))+···+λn(CAP(Hn))

QM2＝ξc(CEP(Cf1a([H1,H2···,Hn])))

7. The method for reconstructing image super resolution by enhancing attention mechanism as claimed in claim 1, wherein: the long connection around being equipped with among the convolutional neural network, preceding stage characteristic diagram still passes through long connection direct transfer around to the front end of upsampling module, later stage characteristic diagram earlier with preceding stage characteristic diagram adds obtains comprehensive characteristic diagram, then the upsampling module again with comprehensive characteristic diagram is as the input.