CN115222601A - Image super-resolution reconstruction model and method based on residual mixed attention network - Google Patents
Image super-resolution reconstruction model and method based on residual mixed attention network Download PDFInfo
- Publication number
- CN115222601A CN115222601A CN202210940743.1A CN202210940743A CN115222601A CN 115222601 A CN115222601 A CN 115222601A CN 202210940743 A CN202210940743 A CN 202210940743A CN 115222601 A CN115222601 A CN 115222601A
- Authority
- CN
- China
- Prior art keywords
- module
- residual
- attention
- resolution
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 36
- 238000000926 separation method Methods 0.000 claims abstract description 36
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 21
- 238000010586 diagram Methods 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 12
- 230000003993 interaction Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 230000015556 catabolic process Effects 0.000 claims description 3
- 238000006731 degradation reaction Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000004927 fusion Effects 0.000 abstract description 4
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 108091008695 photoreceptors Proteins 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Processing (AREA)
Abstract
The invention provides an image super-resolution reconstruction model and method based on a residual mixed attention network, which comprises the following steps: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module. The shallow layer feature extraction module is used for extracting shallow layer features from the low-resolution image; the deep layer feature extraction module is formed by connecting a plurality of cascade residual separation mixed attention groups and global residuals, and performs feature extraction and fusion on the shallow layer features to obtain deep layer features; the reconstruction module uses sub-pixel convolution to perform up-sampling on the deep features to obtain an image with higher resolution. The residual error separation mixed attention module splits the feature graph by adopting a channel separation technology, and sends the feature graph into the two branch modules in parallel for processing, and the local features extracted by the residual error triple attention module and the global features extracted by the high-efficiency Swin Transformer module are fused to obtain rich high-frequency and low-frequency information. By the method, the image with richer details can be obtained, and the super-resolution reconstruction with higher precision is realized.
Description
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to an image super-resolution reconstruction model and method based on a residual mixed attention network.
Background
In the field of electronic image applications, it is often desirable to obtain high resolution images. High resolution means that the density of pixels in the image is high, providing more detail that is essential in many practical applications. For example, high resolution medical images are very helpful for physicians to make correct diagnoses; similar objects are easily distinguished from the like using high resolution satellite images; the performance of pattern recognition in computer vision is greatly enhanced if high resolution images can be provided.
Due to capital and technical limitations, the pictures obtained are in most cases difficult to achieve with the desired resolution. However, the image is subjected to various constraints in the acquisition process, such as optical blur caused by defocusing, diffraction and the like in the digital imaging process, motion blur caused by limited shutter speed, aliasing effect influenced by the density of the sensing unit, random noise in the image photoreceptor or in the image transmission process and the like, and the factors influence the generation quality of the image. Therefore, it is highly necessary to find a way to enhance the current resolution level.
The image super-resolution reconstruction technology is used as a post-processing means, and can enhance the resolution of an image without increasing the hardware cost. Image super-resolution reconstruction aims at algorithmically restoring a high-resolution image from a given low-resolution image. At present, methods for image super-resolution reconstruction can be mainly divided into three categories: interpolation-based super-resolution reconstruction, reconstruction-based super-resolution reconstruction, and learning-based super-resolution reconstruction. The interpolation method is to insert new pixels around the original pixels of the image to increase the size of the image, and assign values to the pixels after the pixels are inserted, so as to restore the image content and achieve the effect of improving the image resolution. The method is simple in calculation and easy to understand and implement, but the problems of ringing effect and serious loss of high-frequency information can occur in the reconstruction result. The reconstruction method is to establish an observation model in the process of image acquisition and then realize super-resolution reconstruction by solving the inverse problem of the observation model. The reconstruction method improves on the recovery details, but its performance degrades as the scale factor increases, and this method is time consuming. The super-resolution reconstruction based on the deep learning recovers high-frequency information by utilizing the end-to-end mapping established between a low-resolution image block and a high-resolution image block, and obtains a good reconstruction result.
At present, a mainstream algorithm usually designs a very deep network architecture, long-time training is needed, and as the depth of a network is deeper, the training difficulty is higher, and the required training skill is increased. Meanwhile, the low-resolution input contains abundant low-frequency information which is treated equally among channels, so that the convolutional neural network is prevented from learning more high-frequency information. In addition, the current convolutional neural network for super-resolution does not fully utilize features on multiple scales, and the learning capability of the network is limited. Meanwhile, only a convolutional neural network is used for constructing a model, and the self-similarity inside the image cannot be fully utilized to capture the remote dependence inside the image. Therefore, it is necessary to solve the existing problems and reconstruct a high quality image.
Disclosure of Invention
In view of this, the present invention provides an image super-resolution reconstruction model and method based on a residual hybrid attention network, so as to recover more texture details and improve the image super-resolution reconstruction accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme: the image super-resolution reconstruction model based on the residual mixed attention network comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module: the shallow feature extraction module is composed of a 3 x 3 convolutional layer, extracts shallow features of the low-resolution input image by using the characteristic that the convolutional layer is good at extracting the features, and specifically operates as follows:
M 0 =H SF (I LR )
wherein H SF (. Represents a shallow feature extraction Module, I) LR Representing an input low resolution image, M 0 Representing a shallow feature map;
the deep layer feature extraction module consists of a multiple residual error separation mixed attention group and a convolution of 3 multiplied by 3, and extracts high-level features from the shallow layer features, and the process is expressed as follows:
M DF =f 3×3 (M n )
whereinDenotes the ith multiple residual separation mixed attention group, n denotes the number of multiple mixed attention residual groups, M i-1 、M i 、M n Intermediate feature map representing multiple residual separation mixed attention groups, f 3×3 (. Cndot.) denotes convolution operation with convolution kernel size of 3X 3, M DF A deep level feature map is shown.
The reconstruction module is composed of a sub-pixel convolution layer and a convolution of 3 multiplied by 3, one sub-pixel convolution layer is utilized to carry out up-sampling on the deep features extracted by the deep feature extraction module, the information flow is reformed into a feature map with specified up-sampling multiplying power, and the process is described as follows:
I SR =H UP (M DF +M 0 )+Bicubic(I LR )
in which I SR Representing the high-resolution image after super-resolution reconstruction, H UP (. Cndot.) denotes the reconstruction module, bicubic (. Cndot.) denotes Bicubic interpolation of the low resolution image to the target resolution.
In a preferred embodiment, the multi-residual separation mixed attention set is composed of a plurality of residual separation mixed attention modules and a 3 × 3 convolutional layer.
In a preferred embodiment, the residual separation mixed attention module comprises two 1 × 1 convolution layers, a residual triple attention module, an efficient Swin Transformer module, residual concatenation, channel splitting and splicing operations, and the specific calculation formula expression is as follows:
X 1 ,X 2 =H SPL (f 1×1 (X))
Y 1 =RTAB(X 1 )
Y 2 =ESTB(X 2 )
Z=f 1×1 (H CAT (Y 1 ,Y 2 ))+X
wherein X represents the input characteristics of the residual error separation mixed attention module, f 1×1 (. Represents a 1X 1 convolutional layer, H SPL (. Represents an operation of channel splitting, X 1 、X 2 Representing the feature graph after splitting, RTAB representing the operation of the residual triple attention module, Y 1 Represents the output characteristics of the residual triple attention module, ESTB represents the operation of the efficient Swin Transformer module, Y 2 Represents the output characteristics of the high-efficiency Swin Transformer module, H CAT (. Cndot.) denotes the operation of channel stitching, and Z denotes the output characteristics of the residual separation mixed attention module.
In a preferred embodiment, the residual triple attention module is composed of two 3 × 3 convolutional layers, a RELU activation function, a residual connection, and a triple attention module, and the specific formula is as follows:
X O =F TAM (f 3×3 (RELU(f 3×3 (X I ))))+X
wherein, X I Representing the input features of the residual triple attention module, X o Representing the output characteristics of the residual triple attention module, F TAM (. Cndot.) represents the operation of the triple attention module, RELU (. Cndot.) represents the RELU activation function; the triple attention module is composed of two cross-dimension interaction modules and a space attention module, and a specific calculation formula expression is as follows:
whereinRepresenting output characteristics of two cross-dimension interaction modules,the output characteristics of the spatial attention module are indicated, and Y indicates the output characteristics of the triple attention module.
In a preferred embodiment, the cross-dimension interaction module comprises a 7 × 7 convolutional layer, a dimension replacement operation, a Z-Pool layer, a channel connection operation, a maximum pooling operation, and an average pooling operation; the specific calculation formula is as follows:
X′ 1 =H PER (X 1 )
Z-Pool(X)=H CAT (H MP (X),H AP (X))
X″ 1 =Z-Pool(X′ 1 )
wherein, X' 1 Denotes the result of the dimension replacement operation, X ″) 1 Shows the result of the Z-Pool layer operation, H PER (. Cndot.) denotes a dimension replacement operation, Z-Pool (. Cndot.) denotes an operation of a Z-Pool layer, H CAT (. Cndot.) represents an operation that joins along a particular dimension in a given input sequence feature graph, H MP (. And H) AP Denotes the maximum pooling operation and the average pooling operation along a particular dimension, respectively, f 7×7 (. H) represents a convolution operation with a convolution kernel size of 7 × 7, H IN (. Cndot.) represents an example normalization operation, δ (-) represents a Sigmoid function, and · represents a multiply-by-channel operation.
In a preferred embodiment, the spatial attention module comprises a 7 × 7 convolutional layer, a Z-Pool layer and an example normalization operation, and the specific calculation formula is as follows:
wherein X 3 、Representing the input features and the output features of the spatial attention module.
In a preferred embodiment, the high-efficiency Swin Transformer module consists of two layers of normalization operations, a self-attention calculation module based on moving windows, a local feature extraction feed-forward network and residual connection;
the calculation formula of the high-efficiency Swin Transformer module is as follows:
Q=LN(W Q X)
K=LN(W K X)
V=LN(W V X)
wherein, W Q 、W K 、W V A transformation matrix representing the computation of Q, K, V, LN (-) representing a layer normalization operation, Q, K, V representing the query, key and value matrices, respectively, softMax (-) representing a SoftMax function, SW-MSA (-) representing a moving window self-attention module, leFF (-) representing a locally enhanced multi-layer perceptron module,the output characteristics of the efficient Swin Transformer module are shown, d is the dimension of the K matrix, and Attention is shown in self-Attention calculation.
The invention also provides an image super-resolution reconstruction method based on the residual mixed attention network, which adopts the image super-resolution reconstruction model based on the residual mixed attention network and comprises the following steps:
step S1: establishing a training set according to the image degradation model to obtain N low-resolution images I LR And N low resolution images I LR Corresponding true high resolution image I HR (ii) a Wherein N is an integer greater than 1;
step S2: inputting the low-resolution image to a shallow feature extraction module to extract shallow features of the image;
and step S3: inputting the shallow features into a deep feature extraction module to extract deep features;
and step S4: inputting the deep features into a reconstruction module, performing sub-pixel convolution to complete up-sampling processing, and reconstructing a final high-resolution image;
step S5: optimizing the image super-resolution reconstruction model through a loss function, wherein the loss function uses an average L1 error between the N reconstructed high-resolution images and the corresponding real high-resolution images, and the expression is as follows:
wherein L1 represents the L1 loss function.
Compared with the prior art, the invention has the following beneficial effects: the invention combines a convolutional neural network and a Transformer, adopts a channel separation technology, splits a characteristic diagram, and sends the characteristic diagram into two branch modules for processing in parallel. And through a residual error separation mixed attention module, fusing local features extracted by a triple attention module based on a convolutional neural network and global features extracted by a transform-based efficient Swin transform module to obtain rich high-frequency and low-frequency information. By the method, images with richer details can be obtained, and super-resolution reconstruction with higher precision is realized.
Drawings
FIG. 1 is a diagram of a residual separation hybrid attention network architecture in a preferred embodiment of the present invention;
FIG. 2 is a diagram of a residual separation mixed attention group configuration in a preferred embodiment of the present invention;
FIG. 3 is a block diagram of a residual separation hybrid attention module in a preferred embodiment of the present invention;
FIG. 4 is a diagram of a residual triple attention module in a preferred embodiment of the present invention;
FIG. 5 is a block diagram of a high efficiency Swin Transformer module in a preferred embodiment of the present invention;
fig. 6 is a flowchart illustrating an image super-resolution reconstruction method according to a preferred embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1 to 6, the present embodiment provides an image super-resolution reconstruction model based on a residual separation hybrid attention network, wherein the model includes: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module. The shallow layer feature extraction module is used for extracting shallow layer features from the low-resolution image; the deep layer feature extraction module is formed by connecting a plurality of cascade residual separation mixed attention groups and global residuals, and performs feature extraction and fusion on the shallow layer features to obtain deep layer features; the reconstruction module uses the sub-pixel convolution layer to perform up-sampling on the deep features to obtain an image with higher resolution. The residual error separation mixed attention module splits the feature graph by adopting a channel separation technology, and sends the feature graph into the two branch modules in parallel for processing, and the local features extracted by the residual error triple attention module and the global features extracted by the high-efficiency Swin Transformer module are fused to obtain rich high-frequency and low-frequency information.
Step 1, performing shallow feature extraction on an input image, specifically:
by utilizing the characteristic that the convolutional layer is good at extracting the features, a 3 x 3 convolutional layer is used for extracting the shallow features of the low-resolution input image, and the specific operation is as follows:
M 0 =H SF (I LR )
wherein H SF (. Represents a shallow feature extraction Module, I) LR Representing an input low resolution image, M 0 Representing a shallow feature map;
and 2, performing feature extraction and feature fusion on the shallow features through a deep feature extraction module formed by connecting a plurality of cascade residual separation mixed attention groups and global residuals, and obtaining deep features:
step 2.1, as shown in fig. 2, by inputting the shallow feature into the deep feature extraction module formed by connecting n cascaded residual separation mixed attention groups and the global residual, feature extraction and feature fusion can be performed on the shallow feature to obtain richer and deeper features, where the cascaded residual separation mixed attention group includes m cascaded residual separation mixed attention modules, where n =6 and m =6 in this example. The specific calculation is represented by the following formula:
M DF =f 3×3 (M n )
whereinDenotes the ith multiple residual separation mixed attention group, n denotes the number of multiple mixed attention residual groups, M i-1 、M i 、M n Intermediate feature map representing multiple residual separation mixed attention set, f 3×3 (. -) represents a convolution operation with a convolution kernel size of 3 × 3, M DF A deep level feature map is shown.
Fig. 3 is a block diagram of a residual separation mixed attention module comprising 1 residual triple attention module, 1 high efficiency Swin Transformer module, two 1 × 1 convolution layers and channel splitting and joining operations, whose calculation is represented by the following formula:
X 1 ,X 2 =H SPL (f 1×1 (X))
Y 1 =RTAB(X 1 )
Y 2 =ESTB(X 2 )
z=f 1×1 (H CAT (Y 1 ,Y 2 ))+X
wherein X represents the input characteristics of the residual separation hybrid attention module, f 1×1 (. Represents a 1X 1 convolutional layer, H SPL (. A) tableOperation of show channel splitting, X 1 、X 2 Representing the feature graph after splitting, RTAB representing the operation of the residual triple attention module, Y 1 Represents the output characteristics of the residual triple attention module, ESTB represents the operation of the efficient Swin Transformer module, Y 2 Represents the output characteristics of the high-efficiency Swin Transformer module, H CAT (. Cndot.) denotes the operation of channel stitching and Z denotes the output characteristics of the residual separation hybrid attention module.
Fig. 4 is a block diagram of a residual triple attention module containing one two 3 x 3 convolutional layers, the RELU activation function, the triple attention module, the average pooling operation, and the residual concatenation. The residual triple attention module fully utilizes the advantage that the CNN is good at extracting inherent characteristics by utilizing the correlation among different dimensions of the characteristic diagram, so that a network can learn richer image information, and the calculation of the residual triple attention module is represented by the following formula:
X O =F TAM (f 3×3 (RELU(f 3×3 (X I ))))+X
wherein X I Representing the input features of the residual triple attention module, X O Representing the output characteristics of the residual triple attention module, F TAM (. Cndot.) represents the operation of the triple attention module, and RELU (. Cndot.) represents the RELU activation function. The triple attention module is composed of two cross-dimension interaction modules and a space attention module, and a specific calculation formula expression is as follows:
whereinRepresenting the output characteristics of two cross-dimension interaction modules,the output characteristics of the spatial attention module are indicated, and Y indicates the output characteristics of the triple attention module. The cross-dimension interaction module comprises a 7 x 7 convolution layer and dimensionDegree displacement operation, Z-Pool layer, channel connection operation, maximum pooling operation, and average pooling operation. The specific calculation formula is as follows:
X′ 1 =H PER (X 1 )
Z-Pool(X)=H CAT (H MP (X),H AO (X))
X″ 1 =Z-Pool(X′ 1 )
wherein, X' 1 Denotes the result of the dimension permutation operation, X ″ 1 Shows the result of the Z-Pool layer operation, H PER (. Cndot.) denotes a dimension replacement operation, Z-Pool (. Cndot.) denotes an operation of a Z-Pool layer, H CAT (. H) represents operations that connect along a particular dimension in a given input sequence feature graph MP (. And H) AP Denotes the maximum pooling operation and the average pooling operation along a particular dimension, respectively, f 7×7 (. H) represents a convolution operation with a convolution kernel size of 7 × 7, H IN (. Cndot.) represents an example normalization operation, δ (-) represents a Sigmoid function,. Cndot.represents a multiply-by-channel operation,output features of the cross-dimension interaction module are represented.
The spatial attention module comprises a 7 multiplied by 7 convolutional layer, a Z-Pool layer and an example normalization operation, and the specific calculation formula is as follows:
wherein X 3 、Representing the input features and the output features of the spatial attention module.
Fig. 5 is a block diagram of an efficient Swin Transformer module that includes two layer normalization operations, a moving window self-attention module, a local enhanced multi-layer perceptron module, and residual join operations. Its calculation is represented by the following formula:
Q=LN(W Q X)
K=LN(W K X)
V=LN(W V X)
wherein, W Q 、W K 、W V Representing the transformation matrix of computation Q, K, V, d representing the dimensions of the K matrix, attention representing the self-Attention computation, LN (-) representing the layer normalization operation, Q, K, V representing the query, key and value matrices, softMax (-) representing the SoftMax function, SW-MSA (-) representing the moving window self-Attention module, leFF (-) representing the locally enhanced multi-layer perceptron module,and (3) representing the output characteristics of the high-efficiency Swin transducer module.
And 3, reconstructing the deep characteristic diagram, and reconstructing the characteristic diagram predicted by up-sampling into a high-resolution image, wherein the expression is as follows:
I SR =H UP (M DF +M 0 )+Bicubic(I LR )
wherein I SR Representing super resolutionHigh resolution image after rate reconstruction, H UP (. Cndot.) denotes the reconstruction module, bicubic (. Cndot.) denotes Bicubic interpolation of the low resolution image to the target resolution.
The image super-resolution reconstruction method applied to the image super-resolution reconstruction model comprises the following specific contents.
S1, establishing a training set according to an image degradation model to obtain N low-resolution images I LR And N low resolution images I LR Corresponding true high resolution image I HR (ii) a Wherein N is an integer greater than 1;
s2, inputting the low-resolution image to a shallow feature extraction module to extract shallow features of the image;
s3, inputting the shallow features into a deep feature extraction module to extract deep features;
s4, inputting the deep features into a reconstruction module, performing sub-pixel convolution to complete up-sampling processing, and reconstructing a final high-resolution image;
s5, optimizing the image super-resolution reconstruction model through a loss function, wherein the loss function uses an average L1 error between the N reconstructed high-resolution images and the corresponding real high-resolution images, and the expression is as follows:
wherein L1 represents the L1 loss function.
In order to better illustrate the effectiveness of the present invention, the examples of the present invention also employ a comparative experiment to compare the reconstruction effect.
Specifically, in the embodiment of the present invention, 800 high-resolution images in DIV2K are used as a training Set, and Set5, set14, B100, urban100, and Manga109 are used as a test Set, respectively. And carrying out double-three down sampling on the original high-resolution image to obtain a corresponding low-resolution image.
When the training set is constructed, training and testing of the model is performed on the pyrrch framework. Cropping low-resolution images in a training setEach time 48 image blocks of 64 × 64 are randomly input, 500 epochs are trained. Optimization of network parameters is achieved using the Adam gradient descent method, wherein parameters of the Adam optimizer are set to β 1=0.9, β 2=0.999, and ∈ =10 -8 . Initial setting of learning rate is 2 × 10 -4 And is reduced by half after the 250, 400, 425, 450, 475 th epoch. The number of RSHABs is set to 36 and the number of channels is set to 180. Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) were used to evaluate model performance. The performance of the model is tested by using five reference data sets of Set5, set14, B100, urban100 and Manga109, 11 representative image super-resolution reconstruction methods are selected in a comparison experiment and compared with the experimental results of the invention, and the experimental results are shown in table 1, wherein RSHAN is the method provided by the invention.
TABLE 1 average PSNR and SSIM value comparison across 5 test sets
In summary, compared with the prior art, the present invention has the following advantages and effects:
(1) According to the embodiment of the invention, the local modeling capability of the residual error triple attention module based on the convolutional neural network and the non-local modeling capability of the high-efficiency Swin transducer module are combined by adopting the residual error separation mixed attention module. On the premise of ensuring that the quantity of parameters is similar to that of the reconstruction model with the best performance at present, the information relation among different dimensionalities of the image is effectively utilized, the capability of the super-resolution reconstruction model for extracting high-frequency information is remarkably improved, and meanwhile, the extracted features are richer by utilizing the captured long-distance dependence relation.
(2) According to the embodiment of the invention, by adopting the structure that the global residual is embedded with the cascade residual, the network can bypass the low-frequency information in the low-resolution input, learn more high-frequency residual information and acquire rich detailed characteristics. And a deep network does not need to be established, and a high-resolution reconstructed image with good effect can be obtained.
Claims (8)
1. The image super-resolution reconstruction model based on the residual mixed attention network is characterized by comprising a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module: the shallow feature extraction module is composed of a 3 x 3 convolutional layer, extracts shallow features of the low-resolution input image by using the characteristic that the convolutional layer is good at extracting the features, and specifically operates as follows:
M 0 =H SF (I LR )
wherein H SF (. Represents a shallow feature extraction Module, I) LR Representing an input low resolution image, M 0 Representing a shallow feature map;
the deep layer feature extraction module consists of a multiple residual error separation mixed attention group and a convolution of 3 multiplied by 3, and extracts high-level features from the shallow layer features, and the process is expressed as follows:
M DF =f 3×3 (M n )
whereinDenotes the ith multiple residual separation mixed attention group, n denotes the number of multiple mixed attention residual groups, M i-1 、M i 、M n Intermediate feature map representing multiple residual separation mixed attention set, f 3×3 (. Cndot.) denotes convolution operation with convolution kernel size of 3X 3, M DF Representing a deep layer characteristic diagram;
the reconstruction module is composed of a sub-pixel convolution layer and a convolution of 3 multiplied by 3, one sub-pixel convolution layer is utilized to carry out up-sampling on the deep features extracted by the deep feature extraction module, the information flow is reformed into a feature map with specified up-sampling multiplying power, and the process is described as follows:
I SR =H UP (M DF +M 0 )+Bicubic(I LR )
in which I SR Representing the high-resolution image after super-resolution reconstruction, H UP (. Cndot.) denotes the reconstruction module, bicubic (. Cndot.) denotes Bicubic interpolation of the low resolution image to the target resolution.
2. The residual mixed attention network-based image super-resolution reconstruction model of claim 1, wherein the multiple residual separated mixed attention group is composed of a plurality of residual separated mixed attention modules and a 3 x 3 convolutional layer.
3. The residual mixed attention network-based image super-resolution reconstruction model of claim 2, wherein the residual separation mixed attention module comprises two 1 × 1 convolution layers, a residual triple attention module, an efficient Swin transform module, residual connection, channel splitting and splicing operations, and the specific calculation formula expression is as follows:
X 1 ,X 2 =H SPL (f 1×1 (X))
Y 1 =RTAB(X 1 )
Y 2 =ESTB(X 2 )
Z=f 1×1 (H CAT (Y 1 ,Y 2 ))+X
wherein X represents the input characteristics of the residual separation hybrid attention module, f 1×1 (. Represents a 1X 1 convolutional layer, H SPL (. Represents an operation of channel splitting, X 1 、X 2 Representing the feature graph after splitting, RTAB representing the operation of the residual triple attention module, Y 1 Represents the output characteristics of the residual triple attention module, ESTB represents the operation of the efficient Swin Transformer module, Y 2 Represents the output characteristics of the high-efficiency Swin Transformer module, H CAT (. Cndot.) denotes the operation of channel stitching, and Z denotes the output characteristics of the residual separation mixed attention module.
4. The residual hybrid attention network-based image super-resolution reconstruction model according to claim 3, wherein the residual triple attention module is composed of two 3 × 3 convolution layers, a RELU activation function, a residual connection and a triple attention module, and the specific formula expression is as follows:
X O =F TAM (f 3×3 (RELU(f 3×3 (X I ))))+X
wherein X I Input features representing residual triple attention modules, X O Representing the output characteristics of the residual triple attention module, F TAM (. Cndot.) represents the operation of the triple attention module, RELU (. Cndot.) represents the RELU activation function; the triple attention module is composed of two cross-dimension interaction modules and a space attention module, and a specific calculation formula expression is as follows:
5. The residual hybrid attention network-based image super-resolution reconstruction model of claim 4, wherein the cross-dimension interaction module comprises a 7 x 7 convolutional layer, a dimension permutation operation, a Z-Pool layer, a channel connection operation, a maximum pooling operation and an average pooling operation; the specific calculation formula is as follows:
X′ 1 =H PER (X 1 )
Z-Pool(X)=H CAT (H MP (X),H AP (X))
X″ 1 =Z-Pool(X′ 1 )
wherein, X' 1 Denotes the result of the dimension permutation operation, X' 1 Shows the result of the Z-Pool layer operation, H PER (. Cndot.) denotes a dimension replacement operation, Z-Pool (. Cndot.) denotes an operation of a Z-Pool layer, H CAT (. H) represents operations that connect along a particular dimension in a given input sequence feature graph MP (. Cndot.) and H AP (. Cndot.) denotes the maximum pooling operation and the average pooling operation along a particular dimension, respectively, f 7×7 (. H) represents a convolution operation with a convolution kernel size of 7 × 7, H IN (. Cndot.) represents an example normalization operation, δ (-) represents a Sigmoid function, and · represents a multiply-by-channel operation.
6. The residual mixed attention network-based image super-resolution reconstruction model of claim 4, wherein the spatial attention module comprises a 7 x 7 convolutional layer, a Z-Pool layer and an example normalization operation, and the specific calculation formula is as follows:
7. The residual mixed attention network-based image super-resolution reconstruction model of claim 3, wherein the high-efficiency Swin Transformer module is composed of two-layer normalization, a self-attention calculation module based on moving window, a local feature extraction feed-forward network and residual connection;
the calculation formula of the high-efficiency Swin Transformer module is as follows:
Q=LN(W Q X)
K=LN(W K X)
V=LN(W V X)
wherein, W Q 、W K 、W V A transformation matrix representing the computation of Q, K, V, LN (-) representing a layer normalization operation, Q, K, V representing the query, key and value matrices, softMax (-) representing a SoftMax function, SW-MSA (-) representing a moving-window self-attention module, leFF (-) representing a locally enhanced multi-layer perceptron module,the output characteristics of the efficient Swin Transformer module are shown, d is the dimension of the K matrix, and Attention is shown in self-Attention calculation.
8. The image super-resolution reconstruction method based on the residual mixed attention network is characterized in that the image super-resolution reconstruction model based on the residual mixed attention network of any one of claims 1 to 7 is adopted, and the method comprises the following steps:
step S1: establishing a training set according to the image degradation model to obtain N low-resolution images I LR And N low resolution images I LR Corresponding true high resolution image I HR (ii) a Wherein N is an integer greater than 1;
step S2: inputting the low-resolution image to a shallow feature extraction module to extract shallow features of the image;
and step S3: inputting the shallow features into a deep feature extraction module to extract deep features;
and step S4: inputting the deep features into a reconstruction module, performing sub-pixel convolution to complete up-sampling processing, and reconstructing a final high-resolution image;
step S5: optimizing the image super-resolution reconstruction model through a loss function, wherein the loss function uses an average L1 error between the N reconstructed high-resolution images and the corresponding real high-resolution images, and the expression is as follows:
wherein L1 represents the L1 loss function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210940743.1A CN115222601A (en) | 2022-08-06 | 2022-08-06 | Image super-resolution reconstruction model and method based on residual mixed attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210940743.1A CN115222601A (en) | 2022-08-06 | 2022-08-06 | Image super-resolution reconstruction model and method based on residual mixed attention network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115222601A true CN115222601A (en) | 2022-10-21 |
Family
ID=83615969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210940743.1A Pending CN115222601A (en) | 2022-08-06 | 2022-08-06 | Image super-resolution reconstruction model and method based on residual mixed attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115222601A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115761248A (en) * | 2022-12-22 | 2023-03-07 | 深圳大学 | Image processing method, device, equipment and storage medium |
CN116071243A (en) * | 2023-03-27 | 2023-05-05 | 江西师范大学 | Infrared image super-resolution reconstruction method based on edge enhancement |
CN116385265A (en) * | 2023-04-06 | 2023-07-04 | 北京交通大学 | Training method and device for image super-resolution network |
CN116402692A (en) * | 2023-06-07 | 2023-07-07 | 江西财经大学 | Depth map super-resolution reconstruction method and system based on asymmetric cross attention |
CN116523759A (en) * | 2023-07-04 | 2023-08-01 | 江西财经大学 | Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism |
CN116664397A (en) * | 2023-04-19 | 2023-08-29 | 太原理工大学 | TransSR-Net structured image super-resolution reconstruction method |
CN117132472A (en) * | 2023-10-08 | 2023-11-28 | 兰州理工大学 | Forward-backward separable self-attention-based image super-resolution reconstruction method |
CN117173025A (en) * | 2023-11-01 | 2023-12-05 | 华侨大学 | Single-frame image super-resolution method and system based on cross-layer mixed attention transducer |
CN117237197A (en) * | 2023-11-08 | 2023-12-15 | 华侨大学 | Image super-resolution method and device based on cross attention mechanism and Swin-transducer |
CN117422614A (en) * | 2023-12-19 | 2024-01-19 | 华侨大学 | Single-frame image super-resolution method and device based on hybrid feature interaction transducer |
CN117495680A (en) * | 2024-01-02 | 2024-02-02 | 华侨大学 | Multi-contrast nuclear magnetic resonance image super-resolution method based on feature fusion transducer |
CN117575915A (en) * | 2024-01-16 | 2024-02-20 | 闽南师范大学 | Image super-resolution reconstruction method, terminal equipment and storage medium |
CN117934289A (en) * | 2024-03-25 | 2024-04-26 | 山东师范大学 | System and method for integrating MRI super-resolution and synthesis tasks |
CN118052717A (en) * | 2024-04-15 | 2024-05-17 | 北京数慧时空信息技术有限公司 | Training method of image superdivision model and image superdivision method |
CN118097321A (en) * | 2024-04-29 | 2024-05-28 | 济南大学 | Vehicle image enhancement method and system based on CNN and transducer |
-
2022
- 2022-08-06 CN CN202210940743.1A patent/CN115222601A/en active Pending
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115761248A (en) * | 2022-12-22 | 2023-03-07 | 深圳大学 | Image processing method, device, equipment and storage medium |
CN116071243A (en) * | 2023-03-27 | 2023-05-05 | 江西师范大学 | Infrared image super-resolution reconstruction method based on edge enhancement |
CN116385265B (en) * | 2023-04-06 | 2023-10-17 | 北京交通大学 | Training method and device for image super-resolution network |
CN116385265A (en) * | 2023-04-06 | 2023-07-04 | 北京交通大学 | Training method and device for image super-resolution network |
CN116664397B (en) * | 2023-04-19 | 2023-11-10 | 太原理工大学 | TransSR-Net structured image super-resolution reconstruction method |
CN116664397A (en) * | 2023-04-19 | 2023-08-29 | 太原理工大学 | TransSR-Net structured image super-resolution reconstruction method |
CN116402692B (en) * | 2023-06-07 | 2023-08-18 | 江西财经大学 | Depth map super-resolution reconstruction method and system based on asymmetric cross attention |
CN116402692A (en) * | 2023-06-07 | 2023-07-07 | 江西财经大学 | Depth map super-resolution reconstruction method and system based on asymmetric cross attention |
CN116523759B (en) * | 2023-07-04 | 2023-09-05 | 江西财经大学 | Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism |
CN116523759A (en) * | 2023-07-04 | 2023-08-01 | 江西财经大学 | Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism |
CN117132472A (en) * | 2023-10-08 | 2023-11-28 | 兰州理工大学 | Forward-backward separable self-attention-based image super-resolution reconstruction method |
CN117132472B (en) * | 2023-10-08 | 2024-05-31 | 兰州理工大学 | Forward-backward separable self-attention-based image super-resolution reconstruction method |
CN117173025B (en) * | 2023-11-01 | 2024-03-01 | 华侨大学 | Single-frame image super-resolution method and system based on cross-layer mixed attention transducer |
CN117173025A (en) * | 2023-11-01 | 2023-12-05 | 华侨大学 | Single-frame image super-resolution method and system based on cross-layer mixed attention transducer |
CN117237197A (en) * | 2023-11-08 | 2023-12-15 | 华侨大学 | Image super-resolution method and device based on cross attention mechanism and Swin-transducer |
CN117237197B (en) * | 2023-11-08 | 2024-03-01 | 华侨大学 | Image super-resolution method and device based on cross attention mechanism |
CN117422614B (en) * | 2023-12-19 | 2024-03-12 | 华侨大学 | Single-frame image super-resolution method and device based on hybrid feature interaction transducer |
CN117422614A (en) * | 2023-12-19 | 2024-01-19 | 华侨大学 | Single-frame image super-resolution method and device based on hybrid feature interaction transducer |
CN117495680A (en) * | 2024-01-02 | 2024-02-02 | 华侨大学 | Multi-contrast nuclear magnetic resonance image super-resolution method based on feature fusion transducer |
CN117495680B (en) * | 2024-01-02 | 2024-05-24 | 华侨大学 | Multi-contrast nuclear magnetic resonance image super-resolution method based on feature fusion transducer |
CN117575915A (en) * | 2024-01-16 | 2024-02-20 | 闽南师范大学 | Image super-resolution reconstruction method, terminal equipment and storage medium |
CN117934289A (en) * | 2024-03-25 | 2024-04-26 | 山东师范大学 | System and method for integrating MRI super-resolution and synthesis tasks |
CN118052717A (en) * | 2024-04-15 | 2024-05-17 | 北京数慧时空信息技术有限公司 | Training method of image superdivision model and image superdivision method |
CN118097321A (en) * | 2024-04-29 | 2024-05-28 | 济南大学 | Vehicle image enhancement method and system based on CNN and transducer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115222601A (en) | Image super-resolution reconstruction model and method based on residual mixed attention network | |
Fang et al. | A hybrid network of cnn and transformer for lightweight image super-resolution | |
CN113362223B (en) | Image super-resolution reconstruction method based on attention mechanism and two-channel network | |
CN111861961A (en) | Multi-scale residual error fusion model for single image super-resolution and restoration method thereof | |
CN111311490A (en) | Video super-resolution reconstruction method based on multi-frame fusion optical flow | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
CN111105352A (en) | Super-resolution image reconstruction method, system, computer device and storage medium | |
CN112365403B (en) | Video super-resolution recovery method based on deep learning and adjacent frames | |
Luo et al. | Lattice network for lightweight image restoration | |
CN115358932B (en) | Multi-scale feature fusion face super-resolution reconstruction method and system | |
CN109949217B (en) | Video super-resolution reconstruction method based on residual learning and implicit motion compensation | |
CN112949636B (en) | License plate super-resolution recognition method, system and computer readable medium | |
CN115496658A (en) | Lightweight image super-resolution reconstruction method based on double attention mechanism | |
CN112699844A (en) | Image super-resolution method based on multi-scale residual error level dense connection network | |
Yang et al. | License plate image super-resolution based on convolutional neural network | |
Chen et al. | Remote sensing image super-resolution via residual aggregation and split attentional fusion network | |
CN114926337A (en) | Single image super-resolution reconstruction method and system based on CNN and Transformer hybrid network | |
CN115526779A (en) | Infrared image super-resolution reconstruction method based on dynamic attention mechanism | |
Fan et al. | Global sensing and measurements reuse for image compressed sensing | |
CN109615576B (en) | Single-frame image super-resolution reconstruction method based on cascade regression basis learning | |
CN114022356A (en) | River course flow water level remote sensing image super-resolution method and system based on wavelet domain | |
CN113379606A (en) | Face super-resolution method based on pre-training generation model | |
CN116385265B (en) | Training method and device for image super-resolution network | |
Liu et al. | A densely connected face super-resolution network based on attention mechanism | |
CN115330631A (en) | Multi-scale fusion defogging method based on stacked hourglass network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |