CN115170392A - Single-image super-resolution algorithm based on attention mechanism - Google Patents
Single-image super-resolution algorithm based on attention mechanism Download PDFInfo
- Publication number
- CN115170392A CN115170392A CN202210719954.2A CN202210719954A CN115170392A CN 115170392 A CN115170392 A CN 115170392A CN 202210719954 A CN202210719954 A CN 202210719954A CN 115170392 A CN115170392 A CN 115170392A
- Authority
- CN
- China
- Prior art keywords
- attention
- output
- module
- input
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 29
- 239000010410 layer Substances 0.000 claims abstract description 73
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 14
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000013461 design Methods 0.000 claims abstract description 13
- 239000011229 interlayer Substances 0.000 claims abstract description 12
- CVKBYFCJQSPBOI-UHFFFAOYSA-N methyl 3-[(4-methylphenyl)sulfonylamino]benzoate Chemical compound COC(=O)C1=CC=CC(NS(=O)(=O)C=2C=CC(C)=CC=2)=C1 CVKBYFCJQSPBOI-UHFFFAOYSA-N 0.000 claims abstract 19
- 238000000034 method Methods 0.000 claims description 32
- 238000013507 mapping Methods 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 238000012512 characterization method Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 230000008034 disappearance Effects 0.000 claims description 4
- 230000003321 amplification Effects 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000005055 memory storage Effects 0.000 claims description 2
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 2
- 238000013519 translation Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims 4
- 230000009467 reduction Effects 0.000 claims 1
- 238000005259 measurement Methods 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000013341 scale-up Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000556720 Manga Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007850 degeneration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a single image super-resolution algorithm based on an attention mechanism, provides a new multi-scale attention residual block, improves a network design framework of residual in residual, and introduces interlayer attention. On the basis of the two innovation points, a new multi-scale whole attention network is proposed. The specific characteristics of the MSAB are as follows: (1) A channel attention mechanism and a space attention mechanism are introduced into a common residual block, a two-branch learning strategy is adopted, the two attention mechanisms are respectively used for different branches, and finally 1x1 convolutional layers are used for cascading. (2) On the basis, multi-scale convolution is introduced, two convolution blocks of 3x3 and 5x5 are adopted, feature extraction is also carried out respectively according to a double-branch strategy, and 1x1 convolution layers are used for cascading. The MSHAN network of the invention obtains remarkable results under the comprehensive measurement of the model performance and the parameter quantity.
Description
Technical Field
The invention relates to a super-resolution problem in image processing, in particular to a super-resolution method based on a convolutional neural network.
Background
With the continuous progress of deep learning technology, the powerful computing and characterizing capability of a Convolutional Neural Network (CNN) makes it splendid in the field of computer vision, gradually replacing those traditional learning methods. The convolutional neural network can not only extract shallow features of the image, such as background and contour, through convolutional kernels, but also extract features continuously through multiple layers of convolutional kernels, and can extract high-frequency details of the image.
Super-resolution (SR) is a typical pathological problem in image processing, and an appropriate filter is designed through an input Low-resolution (LR) image so as to output a High-resolution (HR) image corresponding to an enlarged scale. To solve this problem, many learning-based methods have been proposed to learn the mapping between LR and HR image pairs. The SR method based on learning is mainly divided into two types, namely, a traditional learning method (such as bicubic interpolation) and a deep learning method. Although some priori knowledge of the image can be implicitly learned through probability theory, the traditional learning method cannot learn deeper map features, and the generated high-resolution image lacks necessary high-frequency details. The deep learning-based method can rely on the powerful calculation and feature characterization capabilities of a convolutional neural network to extract the image features at a deeper level, and the main design of the method comprises two aspects: (1) The existing four SR frameworks include a pre-upsampling SR, a post-upsampling SR, a progressive upsampling SR, and an iterative upsampling SR, wherein the post-upsampling SR framework becomes a mainstream framework by virtue of its advantage of being able to reduce the number and complexity of model parameters. (2) Specific network structure design and deep convolutional network development to date, and a plurality of classic and efficient network structure designs, such as residual learning, recursive learning, dense connection and the like, emerge. Different designs can have different performance and parameter impacts on the network model, and a proper network structure design needs to be selected through continuous experiments.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a new Multi-scale global attention network (MSHAN) based on convolutional neural network to solve the problem that the former SR network equally processes information from different convolutional layers, different channels and different spatial locations.
In order to solve the problems, the invention adopts the technical scheme that: an attention-based single image super-resolution algorithm shown in fig. 1 comprises a Multi-scale attention Residual block (MSAB) and a network design framework with Improved Residual-in-Residual residuals (IRIR), in which inter-layer attention is introduced. The method comprises three steps in sequence, namely a shallow feature extraction operation, an intermediate feature mapping operation and an upsampling operation.
1) Shallow feature extraction operation with X and Y SR Respectively representing the input and the output of the whole network, and for an input low-resolution picture X, adopting a 3X3 convolutional layer to extract an initial shallow layer feature, as shown in the following formula:
F IFE =S IFENet (X)
wherein S IFENet Function representing shallow feature extraction module, extracted shallow F IFE The features are fed into the subsequent feature mapping section as initial input and are also used for global feature learning.
2) Intermediate feature mapping operation, the input of the intermediate feature mapping is the shallow feature extraction operation to obtain F IFE The basic unit of its main operation is a new multi-scale attention residual block, whose main structure is shown in fig. 2:
let the input of the multi-scale attention residual block be H 0 The input will first go through two parallel 3x3 convolution modules and 5x5 convolution modules to generate the corresponding output, as shown in the following formula:
whereinAndrepresenting the weights and offsets of the first convolutional layer of the 3x3 block,andweights and biases for the second convolution layer representing the 3x3 block; in the same way, the method has the advantages of,andrepresenting the weights and offsets of the first convolutional layer of a 5x5 block,andrepresenting the weights and offsets of the second convolutional layer of the 5x5 block. δ denotes the ReLU activation function, H m3 And H m5 Representing the output of the 3x3 and 5x5 modules, respectively.
Obtaining the output characteristic H of the 3x3 module m3 And 5x5 Module output characteristics H m5 Then, a cascade module is sent to fuse the convolved features under two different scales, and the dimension of the convolution layer is adjusted through 1x1 so as to be sent to a subsequent module for further processingThe characteristic extraction process comprises the following steps:
whereinRepresents the output of the first cascade module]Which is representative of a cascade operation,andrepresenting the weights and offsets of the 1x1 convolutional layers in the first concatenated block.
With the output of the first intermediate cascaded block, attention is applied to further weight the channels and spatial locations rich in important features. Two parallel branches are designed for this purpose, one branch generates a weight coefficient with the size of Cx1x1 through a channel attention mechanism to adjust the characteristic value of each channel; the other branch uses the spatial attention mechanism to generate a weight coefficient with the size of 1xHxW to adjust the characteristic value of the spatial position in each channel. By using the parallel branches, the network can further extract effective characteristic representation by using the correlation between the channel and the space position, thereby improving the performance of the network. Defining characteristics of an inputWhich contains C feature maps, each of which is then of size HxW.
The channel attention branch extraction process is shown in the channel attention module of FIG. 2, and first generates a total sum digital feature μ e R per channel through the global average pooling layer Cx1x1 The tie pooling layer acts on an individual feature channel, so the c-th channel of μ can be expressed as:
whereinRepresenting the c-th channel at position (i, j)The pixel value of (2). The digital signature μ is then fed into an activation function to perform convolution summation as follows:
whereinAndthe weights and offsets, respectively, of the first convolutional layer are used to change the number of channels by scaling gamma. For the same reason, the parameter isAndthe convolutional layer of (a) converts the number of channels to the original number. σ and δ represent sigmoid and ReLU activation functions, respectively.
In addition, the per-channel attention weight α fits a value between 0-1 by the sigmoid activation function σ and is used to rescale the input features. And after obtaining the channel attention coefficient alpha, multiplying each element of the original input characteristics of the channel attention coefficient alpha to obtain the final output of the channel attention module branch:
wherein H CA Indicating the final output of the channel attention module, F CA A per-channel multiplication representing a channel feature and its corresponding channel weight.
Characteristics of the output of the first cascade moduleWill be input into another spatial attention module branch to perform the spatial attention adjustment feature. As shown in the spatial attention module of FIG. 2, it can be seen that the spatial attention module has one less global-average pooling layer than the channel attention module, because the spatial attention module does not need to communicate the global-average spatial information compression into the statistical descriptor for each channel through the global-average pooling layer. The rest of the process is similar to the channel attention, and the spatial attention mask coefficient is shown as the following formula:
where σ and δ represent sigmoid and ReLU activation functions, respectively, the first weight beingAnd a deviation ofIs used to generate a per-channel feature map, and the generated feature map is combined with a single attention map by a weight ofAnd a deviation ofThe 1 × 1 convolutional layer of (1). Further, the sigmoid function σ normalizes the feature mapping to the range of 0-1 to obtain the spatial attention adaptive mask β. The scale factor γ of the convolutional layer is used to facilitate the change in dimension. After obtaining the spatial attention mask coefficient, the spatial attention mask coefficient is compared with the input featureThe multiplication of each element in the space position obtains the final output of the spatial attention module branch:
wherein H SA Representing the final output of the spatial attention Module, F SA Each element representing a spatial location feature and its corresponding spatial location weight is multiplied.
Obtaining the output H of the channel attention module CA And the output H of the spatial attention module SA Then, the input is used as an input and sent to a second cascade module to fuse the spatial characteristics of the two modules, the number of characteristic channels of the input is changed through a 1x1 convolution layer to achieve better transmission between the blocks, the output of the second cascade module is obtained, and then residual operation is performed on the output of the first cascade module and the input of the MSAB block to obtain the output of the whole MSAB, wherein the following formula is shown:
wherein H o Representing the final output of MSAB]Which is representative of a cascaded operation,andrepresenting the weights and offsets of the 1x1 convolutional layers in the second cascaded block. To make better use of the features with rich low frequency information, a short jump connection is introduced to the residual block. The jump connection not only enables the main part of the network to learn residual error information, but also can avoid the problem of gradient disappearance during network training.
The middle feature mapping operation mainly adopts a network design framework of residual errors and residual errors in a set, namely, the outer residual errors mainly comprise a series of stacked attention groups and a global residual errorResidual learning forms, while intra residuals are formed by stacking a series of attention blocks and local residual learning stacks. The design framework can solve the problems that training is difficult and performance cannot be improved remarkably due to the fact that long residual blocks are simply stacked. If there are N Attention Groups (AG) in the outlier residual, the input and output of the nth attention group can be used as AG n-1 And AG n Expressed, the following formula can be obtained:
AG n =H n (AG n-1 )
=H n (H n-1 (…(H 1 (AG 0 ))…))
wherein H n Is the operation of the nth attention group. AG 0 Is the input of the first attention block, i.e. the output F of the shallow feature module IFE . The attention set is stacked from a series of MSAB blocks, but a simple stack residual block does not effectively utilize the characteristics of the previous blocks, so a dense connection is introduced between each attention block, i.e. the inputs of all intermediate blocks are cascaded from the outputs of the previous attention blocks. MSAB mainly comprises dense connection, local feature fusion and local residual learning to form a continuous memory mechanism. It is through passing the previous MSAB characteristics to the current MSAB, thus realizing the persistent memory storage mechanism. By F m-1 And F m Denotes the input and output of the mth MSAB, both of which have G 0 Feature mapping, then the output of the mth MSAB can be expressed as:
F m =M m ([F m-1 ,F m-2 ,···,F 1 ])
wherein M is m Represents the function of the mth MSAB, [ F ] m-1 ,F m-2 ,···,F 1 ]Representing the cascaded operation of the 1 st to m-1 st MSAB blocks. The output of the preceding MSAB and each layer has a direct connection to all subsequent layers, which not only preserves the feed forward properties, but also extracts locally dense features.
Local feature fusion is used to adaptively fuse the state of the entire convolutional layer in the previous MSAB and the current MSAB. As shown by AG in fig. 1, the feature map of the m-1 st MSAB is directly introduced into the m-th MSAB in a serial manner, and it is important to reduce the number of features. Inspired by MemNet, on the other hand, 1 × 1 convolutional layers are introduced to adaptively control output information. This operation is named Local Feature Fusion (LFF) formula as follows:
wherein, F n,LF Representing the fused output of n MSAB features,indicates the function of the 1 × 1Conv layer in the nth AG. [ AG n-1 ,F 1 ,...,F M ]The concatenation of the input representing the previous AG with the M MSAB outputs in the current AG, a very deep dense network without LFF would be difficult to train as the growth rate G becomes larger.
Local residual learning is also introduced into the AG group to further improve the network feature liquidity, and the final output of the nth AG can be expressed as:
AG n =AG n-1 +F n,LF
it should be noted that local residual learning can further improve the network characterization capability, thereby obtaining better performance n It will then be fed into a 3x3 convolutional layer to produce the final output F of the entire intermediate feature mapping convolution N :
F N =Conv(AG N )
Where Conv represents the last convolutional layer, the resulting output F N Output of the attention module to be followed between layers, and initial input characteristics F IFE Are sent together to the next module.
After extracting the intermediate features of a series of intermediate attention groups, the model introduces a layer attention module (LA, as shown in fig. 3) whose input is the concatenation of the features of each intermediate attention group, as follows:
F LA =H LA ([AG 1 ,AG 2 ,…,AG N ])
wherein H LA The representation introduces an inter-layer attention function whose input is the output of all intermediate layers, so that all characteristic information from the previous layer can be fully utilized. F LA Representing the output of the inter-layer attention module. [ AG 1 ,AG 2 ,…,AG N ]Representing a cascade operation of 1 to N attention groups.
Obtaining an intermediate convolution layer output F N And the output F of the inter-layer attention module LA A long jump connection (LSC) is introduced to enhance the stability of network training, and the LSC can also obtain better network performance through residual learning, namely, the initial shallow feature F IFE Together with them, a per-element addition is performed, so that the final output of the whole intermediate feature mapping module can be represented as:
F MF =F IFE +F N +F LA
wherein F MF Representing the final output of the intermediate feature mapping step.
3) Upsampling operation the input of the upsampling operation is the output F of the previous intermediate feature mapping operation MF Sub-pixel convolution is then used as the last upsampling module that converts the scaled samples of a given magnification factor into upsampling by pixel translation, the sub-pixel convolution operation being used to aggregate the low resolution feature maps while mapping the features into the high dimensional space to reconstruct the HR image. The whole process is shown as the following formula:
Y SR =U ↑ (F MF )
=U ↑ (F IFW +F N +F LA )
wherein U is ↑ Representing a sub-pixel convolution operation, Y SR Is the reconstructed SR result. In addition, a long jump connection is introduced to stabilize the proposed training of the deep network, and the sub-pixel upsampling block is divided into F IFE +F N +F LA As an input.
The MSAB is used as a basic building block of the network, so that the image characteristics under different scales can be utilized by utilizing multi-scale convolution, and the attention of the network can be focused on a channel and a space position which are richer in high-frequency information through a channel attention mechanism and a space attention mechanism.
The method adopts IRIR network structure design, can stack a plurality of MSAB blocks for use, can extract image feature characterization more fully, and can also reduce the problems of gradient disappearance and gradient explosion easily encountered in model training.
The invention adopts convolution layers with two different scales of 3x3 and 5x5 and adopts a double-branch network design structure, the input characteristics respectively enter a 3x3 convolution module and a 5x5 convolution module, and the image characteristics under the two scales are extracted; and developing the correlation among different channels from the level of each channel by adopting a channel attention mechanism, and developing the correlation among different spatial positions in the channels from the spatial positions of the channels by adopting a spatial attention mechanism.
According to the invention, an interlayer attention module is introduced into the external residual error, and the correlation among different layers can be developed through the interlayer attention module, so that a network can allocate different attention weights to the features of different layers, and the characterization capability of the extracted features is automatically improved; the dense connection network design is introduced into the internal residual error, and the dense connection can enable the current MSAB to fully utilize the output characteristics of the previous MSAB to be cascaded together as input, so that the circulation of all levels of characteristic information is improved, and the characteristic characterization capability of the network is improved.
The image features obtained by convolution of 3x3 and 5x5 scales are cascaded together, and the number of channels is adjusted by a 1x1 convolution layer, so that the obtained multi-scale features can conveniently enter a next-layer module; image features respectively obtained by the channel attention module and the space attention module are also cascaded together, the number of channels output by the MSAB block is finally adjusted by a 1x1 convolution layer, and the channel number is added with each element of features input at the beginning of the MSAB to perform residual learning, so that the problem of gradient disappearance is avoided, and the feature extraction capability of the network is improved.
Has the advantages that: compared with the prior SR model (EDSR, RDN and the like) with better performance, the invention has certain improvement on PSNR/SSIM indexes no matter the amplification scale factor of x2, x3 or x4, as shown in Table 1. 4-5 can see that the reconstructed HR image is also truer and has clearer texture than other models in a qualitative view. FIG. 6 also shows that the present invention achieves good results from a comprehensive measure of model performance and number of parameters.
Drawings
Fig. 1 is a block diagram of the method.
Fig. 2 is a diagram illustrating a multi-scale attention residual block in the method.
FIG. 3 is a schematic diagram of an inter-layer attention module in the method.
Figure 4 is a qualitative comparison of this method with other methods at the scale up of the Urban100 test data set x 3.
Figure 5 is a qualitative comparison of this method with other methods at the Manga109 test data set x4 scale-up.
FIG. 6 is a comprehensive comparison of performance and model parameters for this and other methods at the Urban100 test data set x4 scale-up.
Detailed Description
The method is a multi-scale integral attention residual error network method based on an attention mechanism, can be used for the super-resolution problem of a single image, and can generate a high-resolution image with clear texture structure by inputting a fuzzy low-resolution image through a training network.
The present invention has conducted comparative studies on widely used data sets Set5, set14, BSD100, urban100 and Manga109, which contain 5, 14, 100 and 109 images, respectively. Set5, set14 and BSD100 contain natural scene images, whereas Urban100 is composed of Urban scene images, with many details in different frequency bands, whereas Manga109 is composed of japanese caricature images with many fine structures. The present invention trains the proposed MSHAN model using 800 high quality training images in DIV 2K. And performs data enhancement on these training images, including random horizontal flipping and random rotation of 90 °.
The present invention uses peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) as model evaluation indices. Higher PSNR and SSIM values indicate better quality of the high resolution pictures produced by the model. As is common in SISR, after removing pixels near the image boundary, all criteria are computed on the luminance channel of the image.
And simultaneously, reconstructing an over-resolution picture from a low-resolution picture of a corresponding high-resolution image by using a bicubic convolution kernel in a specific scale, and preprocessing all images by subtracting an average RGB value of a DIV2K data set. The low resolution pictures input for training are low resolution color blocks of size 48 × 48 randomly cropped from the DIV2K dataset LR images, with the minimum batch size set to 16, i.e., 16 pictures are trained at a time. Experimental ADAM optimizer training model (its parameter β) 1 Is 0.9, beta 2 Is 0.99, e is 10 -8 ) Initial learning rate set to 10 -4 Then every 3X 10 5 The number of the updates is reduced by half every 2 multiplied by 10 5 The number of sub-iterations is reduced by half. The invention uses a Pythrch framework to train and test a network model on a NVIDIAGTX 1080Ti GPU. In the MSHAN network, the feature number of all convolution layers is set to 64, the sizes of convolution kernels are only 1x1,3x3 and 5x5, after 1x1 convolution kernels are generally used for cascade operation, 5x5 convolution kernels only exist in the multi-scale attention block. The size of the attention group N in the network is set to 6 and the size of the multi-scale attention block M is set to 12.
Table 1 summarizes the quantitative results of the three scale-up factors (× 2, × 3, × 4) of the inventive MSHAN network on five reference datasets (Set 5, set14, BSD100, ubran100, manga 109) under the bicubic degeneration model. It can be seen from the table that when compared with the results of the EDSR, MSHAN can be higher by 0.1+ dB in the result on 5 reference data sets no matter which scale is enlarged, and although the MSHAN network structure depth is not as large as the EDSR, the network structure of the residual error in the improved residual error enables the network performance to be hierarchically improved, which is more beneficial to rapidly improving the network performance. Compared with the MSRN, although the network also applies multi-scale volume blocks, the network lacks the addition of channel attention and spatial attention and the application of inter-layer attention, so that the network treats each layer, each channel in the layer and each position of the channel are treated equally, attention cannot be paid to the more important layers, channels and spatial positions, the performance of the network cannot be improved to a certain extent, and the result is greatly different from the MSHAN. Similarly, RDN uses dense connections in the residual blocks and uses cascading between each block, but the result is slightly worse than MSHAN due to the lack of channel space attention and inter-layer attention. Especially, on the animation character data set formed by fine textures of Manga109, it is important whether the network training can focus on the most important fine texture features, and on the large scale factors of x2, x3 and x4, the result on the data set of Manga109, the RDN is about 0.1 dB less than the result on MSHAN. As illustrated in fig. 4-6, the present invention achieves significant results in a comprehensive measure of model performance and number of parameters compared to other classical SR models.
Table 1.
Claims (9)
1. A single-image super-resolution algorithm based on an attention mechanism is characterized in that: the method comprises three steps of shallow feature extraction operation, intermediate feature mapping operation and up-sampling operation;
1) Shallow feature extraction operation: with X and Y SR Respectively representing the input and the output of the whole network, and for an input low-resolution picture X, adopting a 3X3 convolutional layer to extract an initial shallow layer feature, as shown in the following formula:
F IFE =S IFENet (X)
wherein S IFENet Extracting a function representing a shallow feature extraction moduleShallow layer F of IFE The features are sent to a subsequent feature mapping part as initial input and are also used for learning of global features;
2) Intermediate feature mapping operation: the input of the intermediate feature mapping is a shallow feature extraction operation to obtain F IFE The basic unit of operation is a multi-scale attention residual block;
let the input of the multi-scale attention residual block be H 0 The input will first go through two parallel 3x3 convolution modules and 5x5 convolution modules to generate the corresponding output, as shown in the following formula:
whereinAndrepresenting the weights and offsets of the first convolutional layer of the 3x3 block,andweights and biases representing the second convolutional layer of the 3x3 block; in the same way, the method has the advantages of,andrepresenting the first convolutional layer of a 5x5 moduleThe weight and the deviation are calculated based on the weight,andweights and biases representing the second convolutional layer of the 5x5 block; δ denotes the ReLU activation function, H m3 And H m5 Represents the output of the 3x3 and 5x5 modules, respectively;
obtaining the output characteristic H of the 3x3 module m3 And 5x5 Module output characteristics H m5 Then, a cascade module is sent to fuse the convolved features under two different scales, and the dimension of the convolution layer is adjusted through 1x1 so as to be sent to a subsequent module for further feature extraction, wherein the process is as follows:
whereinRepresents the output of the first cascade module]Which is representative of a cascaded operation,andrepresenting the weights and offsets of the 1x1 convolutional layers in the first concatenated block;
3) And (3) upsampling operation: the input to the upsampling operation is the output F of the previous intermediate feature mapping operation MF Then using sub-pixel convolution as the last up-sampling module, which converts the proportional sampling of given amplification factor into up-sampling by pixel translation, the sub-pixel convolution operation is used to aggregate low resolution feature mapping, and simultaneously maps the features to high dimensional space to reconstruct HR image; the whole process is shown as the following formula:
Y SR =U ↑ (F MF )
=U ↑ (F IFE +F N +F LA )
wherein U is ↑ Representing a sub-pixel convolution operation, Y SR Is the reconstructed SR result; in addition, a long-hop connection is introduced to stabilize the proposed training of the deep network, sub-pixel upsampling blocks by F IFE +F N +F LA As an input.
2. The single image super-resolution algorithm based on the attention mechanism as claimed in claim 1, wherein: in the step 2), after the output of the first intermediate cascade block exists, applying an attention mechanism to further strengthen the weight of the channels rich in important features and the spatial positions; two parallel branches are designed for this purpose, one branch generates a weight coefficient with the size of Cx1x1 through a channel attention mechanism to adjust the characteristic value of each channel; the other branch utilizes a space attention mechanism to generate a weight coefficient with the size of 1xHxW to adjust the characteristic value of the space position in each channel; by utilizing the parallel branches, the network can further extract effective characteristic representation by utilizing the correlation between the channel and the space position, thereby improving the performance of the network; defining characteristics of an inputWhich contains C feature maps, each of which is then of size HxW.
3. The single image super-resolution algorithm based on the attention mechanism as claimed in claim 2, wherein: in the step 2), the channel attention branch extraction process: first, a total number characteristic mu epsilon R of each channel is generated through a global average pooling layer Cx1x1 The tie pooling layer is applied to the individual feature channels, so the c-th channel of μ is represented as:
whereinRepresenting the c-th channel at position (i, j)The pixel value of (a); the digital signature μ is then fed into an activation function to perform convolution summation as follows:
whereinAndthe weights and offsets, respectively, of the first convolutional layer are used to change the number of channels by scaling γ; for the same reason, the parameter isAndthe convolution layer converts the channel number into the original number; σ and δ represent sigmoid and ReLU activation functions, respectively.
In addition, the per-channel attention weight α fits a value between 0-1 by the sigmoid activation function σ and uses it to rescale the input features; and after obtaining the channel attention coefficient alpha, multiplying each element of the original input characteristics of the channel attention coefficient alpha to obtain the final output of the channel attention module branch:
wherein H CA Indicating the final output of the channel attention module, F CA A per-channel multiplication representing a channel feature and its corresponding channel weight;
characteristics of output of first cascade moduleWill be input into another spatial attention module branch to carry on the spatial attention adjustment characteristic; the spatial attention module has one global average less pooling layer than the channel attention module because the spatial attention module does not need to communicate that global spatial information is compressed into the statistical descriptor of each channel through the global average pooling layer; the rest of the process is similar to the channel attention, and the spatial attention mask coefficient is shown in the following formula:
where σ and δ represent sigmoid and ReLU activation functions, respectively, the first weight beingAnd a deviation ofIs used to generate a feature map for each channel, and the generated feature map is combined with a single attention map by a weight ofAnd a deviation ofThe 1 × 1 convolutional layer of (1); the sigmoid function sigma normalizes the feature mapping to be in a range of 0-1 to obtain a space attention self-adaptive mask beta; convolutional layerThe scale factor γ of (d) is used to facilitate the change in dimension; after obtaining the spatial attention mask coefficient, the spatial attention mask coefficient is compared with the input featureMultiplying each element at a spatial position to obtain the final output of the spatial attention module branch:
wherein H sA Representing the final output of the spatial attention Module, F SA Each element representing a spatial location feature and its corresponding spatial location weight is multiplied.
4. The single image super-resolution algorithm based on the attention mechanism as claimed in claim 3, wherein: in the step 2), the output H of the channel attention module is obtained CA And the output H of the spatial attention module SA Then, the input is used as an input and sent to a second cascade module to fuse the spatial characteristics of the two modules, the number of characteristic channels of the input is changed through a 1x1 convolution layer to achieve better transmission between the blocks, the output of the second cascade module is obtained, and then residual operation is performed on the output of the first cascade module and the input of the MSAB block to obtain the output of the whole MSAB, wherein the following formula is shown:
wherein H o Represents the final output of MSAB]Which is representative of a cascade operation,andrepresents the weight and bias of the 1x1 convolutional layer in the second cascaded block; is rich in for better utilizationThe feature of low frequency information, a short jump connection is introduced into the residual block, which not only enables the main part of the network to learn the residual information, but also avoids the problem of gradient disappearance during network training.
5. The single image super-resolution algorithm based on the attention mechanism as claimed in claim 4, wherein: in the step 2), a network design framework of residual errors and residual errors inside is adopted in the middle feature mapping operation, namely, the outer residual errors are mainly formed by stacking a series of attention groups and global residual error learning, and the inner residual errors are formed by stacking a series of attention blocks and local residual error learning stacks; if there are N Attention Groups (AG) in the outlier residual, the input/output AG of the nth attention group n-1 And AG n Expressed as follows:
AG n =H n (AG n-1 )
=H n (H n-1 (…(H 1 (AG 0 ))…))
wherein H n Is the operation of the nth attention group; AG 0 Is the input of the first attention block, i.e. the output F of the shallow feature module IFE (ii) a Attention groups are stacked from a series of MSAB blocks, but simple stacked residual blocks do not effectively utilize the characteristics of previous blocks, so dense connection is introduced between each attention block, namely the input of all intermediate blocks is formed by cascading the outputs of the previous attention blocks; MSAB mainly comprises dense connection, local feature fusion and local residual learning to form a continuous memory mechanism; the method is characterized in that a continuous memory storage mechanism is realized by transmitting the previous MSAB characteristics to the current MSAB; by F m-1 And F m Representing the input and output of the m-th MSAB, both having G 0 Feature mapping, then the output of the mth MSAB is represented as:
F m =M m ([F m-1 ,F m-2 ,…,F 1 ])
wherein M is m Represents the function of the mth MSAB, [ F ] m-1 ,F m-2 ,…,F 1 ]) Represents the cascade operation of 1 st to m-1 st MSAB blocks; the output of the preceding MSAB and each layer has a direct connection to all subsequent layers, which not only preserves the feed forward properties, but also extracts locally dense features.
6. The single image super-resolution algorithm based on the attention mechanism as claimed in claim 5, wherein: in the step 2), local feature fusion is used for adaptively fusing the states of the whole convolutional layer in the previous MSAB and the current MSAB; the feature mapping of the (m-1) th MSAB is directly introduced into the (m) th MSAB in a serial mode, so that the feature quantity reduction is of great importance; on the other hand, inspired by MemNet, 1 × 1 convolutional layers are introduced to adaptively control output information; this operation is named Local Feature Fusion (LFF) formula as follows:
wherein, F n,LF Representing the fused output of n MSAB features,indicates the function of the 1 × 1Conv layer in the nth AG. [ AG n-1 ,F 1 ,...,F M ]The concatenation of the input representing the previous AG with the M MSAB outputs in the current AG, a very deep dense network without LFF would be difficult to train as the growth rate G becomes larger.
7. The single-image super-resolution algorithm based on the attention mechanism as claimed in claim 6, wherein: in the step 2), local residual learning is also introduced into the AG group to further improve the network feature currency, and the final output of the nth AG is represented as:
AG n =AG n-1 +F n,LF
local residual learning further improves network characterization capabilities to achieve better performance, resulting in a final attention groupOutput AG n It will then be fed into a 3x3 convolutional layer to produce the final output F of the entire intermediate feature mapping convolution N :
F N =Conv(AG N )
Where Conv represents the last convolutional layer, the resulting output F N Output of the attention module to be followed and initial input characteristics F IFE Are sent together to the next module.
8. The single-image super-resolution algorithm based on the attention mechanism as claimed in claim 7, wherein: in step 2), after extracting the intermediate features of a series of intermediate attention groups, the model introduces a layer attention module, and the input of the module is the cascade connection of the features of each intermediate attention group, and the process is as follows:
F LA =H LA ([AG 1 ,AG 2 ,…,AG N ])
wherein H LA Representing the attention function between the introduced layers, the input of the function is the output of all the intermediate layers, so that all the characteristic information from the previous layer can be fully utilized; f LA Represents the output of the inter-layer attention module; [ AG 1 ,AG 2 ,…,AG N ]Representing a cascade operation of 1 to N attention groups.
9. The single image super-resolution algorithm based on attention mechanism as claimed in claim 8, wherein: in the step 2), obtaining an intermediate convolution layer output F N And the output F of the inter-layer attention module LA A long jump connection LSC is introduced to enhance the stability of network training, and the LSC can also acquire better network performance through residual learning, namely, an initial shallow feature F IFE Together with them, a per-element addition is performed, so that the final output of the whole intermediate feature mapping module is represented as:
M MF =F IFE +F N +F LA
wherein F MF Representing intermediate feature mappingsAnd (5) final output of the steps.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210719954.2A CN115170392A (en) | 2022-06-23 | 2022-06-23 | Single-image super-resolution algorithm based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210719954.2A CN115170392A (en) | 2022-06-23 | 2022-06-23 | Single-image super-resolution algorithm based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115170392A true CN115170392A (en) | 2022-10-11 |
Family
ID=83486727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210719954.2A Pending CN115170392A (en) | 2022-06-23 | 2022-06-23 | Single-image super-resolution algorithm based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115170392A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116594061A (en) * | 2023-07-18 | 2023-08-15 | 吉林大学 | Seismic data denoising method based on multi-scale U-shaped attention network |
CN117522682A (en) * | 2023-12-04 | 2024-02-06 | 无锡日联科技股份有限公司 | Method, device, equipment and medium for reconstructing resolution of radiographic image |
-
2022
- 2022-06-23 CN CN202210719954.2A patent/CN115170392A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116594061A (en) * | 2023-07-18 | 2023-08-15 | 吉林大学 | Seismic data denoising method based on multi-scale U-shaped attention network |
CN116594061B (en) * | 2023-07-18 | 2023-09-22 | 吉林大学 | Seismic data denoising method based on multi-scale U-shaped attention network |
CN117522682A (en) * | 2023-12-04 | 2024-02-06 | 无锡日联科技股份有限公司 | Method, device, equipment and medium for reconstructing resolution of radiographic image |
CN117522682B (en) * | 2023-12-04 | 2024-08-13 | 无锡日联科技股份有限公司 | Method, device, equipment and medium for reconstructing resolution of radiographic image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112330542B (en) | Image reconstruction system and method based on CRCSAN network | |
CN110033410B (en) | Image reconstruction model training method, image super-resolution reconstruction method and device | |
Hui et al. | Fast and accurate single image super-resolution via information distillation network | |
CN111192200A (en) | Image super-resolution reconstruction method based on fusion attention mechanism residual error network | |
CN112102177B (en) | Image deblurring method based on compression and excitation mechanism neural network | |
Luo et al. | Lattice network for lightweight image restoration | |
CN115170392A (en) | Single-image super-resolution algorithm based on attention mechanism | |
CN107123089A (en) | Remote sensing images super-resolution reconstruction method and system based on depth convolutional network | |
CN111951164B (en) | Image super-resolution reconstruction network structure and image reconstruction effect analysis method | |
CN112862689A (en) | Image super-resolution reconstruction method and system | |
CN112801904B (en) | Hybrid degraded image enhancement method based on convolutional neural network | |
CN113837946B (en) | Lightweight image super-resolution reconstruction method based on progressive distillation network | |
CN112561799A (en) | Infrared image super-resolution reconstruction method | |
CN114841856A (en) | Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention | |
CN113962882B (en) | JPEG image compression artifact eliminating method based on controllable pyramid wavelet network | |
CN112819705B (en) | Real image denoising method based on mesh structure and long-distance correlation | |
CN117575915B (en) | Image super-resolution reconstruction method, terminal equipment and storage medium | |
CN116091313A (en) | Image super-resolution network model and reconstruction method | |
CN116777745A (en) | Image super-resolution reconstruction method based on sparse self-adaptive clustering | |
CN115187455A (en) | Lightweight super-resolution reconstruction model and system for compressed image | |
CN113139899A (en) | Design method of high-quality light-weight super-resolution reconstruction network model | |
CN116485654A (en) | Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer | |
CN110111252A (en) | Single image super-resolution method based on projection matrix | |
CN112070676B (en) | Picture super-resolution reconstruction method of double-channel multi-perception convolutional neural network | |
CN113344786B (en) | Video transcoding method, device, medium and equipment based on geometric generation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |