CN115222601A - Image super-resolution reconstruction model and method based on residual mixed attention network - Google Patents

Image super-resolution reconstruction model and method based on residual mixed attention network Download PDF

Info

Publication number
CN115222601A
CN115222601A CN202210940743.1A CN202210940743A CN115222601A CN 115222601 A CN115222601 A CN 115222601A CN 202210940743 A CN202210940743 A CN 202210940743A CN 115222601 A CN115222601 A CN 115222601A
Authority
CN
China
Prior art keywords
module
residual
attention
resolution
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210940743.1A
Other languages
Chinese (zh)
Inventor
黄峰
郑伟煌
沈英
吴靖
陈丽琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
"belt And Road" Spatial Information Corridor Haisi Research Institute
Fuzhou University
Original Assignee
"belt And Road" Spatial Information Corridor Haisi Research Institute
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by "belt And Road" Spatial Information Corridor Haisi Research Institute, Fuzhou University filed Critical "belt And Road" Spatial Information Corridor Haisi Research Institute
Priority to CN202210940743.1A priority Critical patent/CN115222601A/en
Publication of CN115222601A publication Critical patent/CN115222601A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image super-resolution reconstruction model and method based on a residual mixed attention network, which comprises the following steps: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module. The shallow layer feature extraction module is used for extracting shallow layer features from the low-resolution image; the deep layer feature extraction module is formed by connecting a plurality of cascade residual separation mixed attention groups and global residuals, and performs feature extraction and fusion on the shallow layer features to obtain deep layer features; the reconstruction module uses sub-pixel convolution to perform up-sampling on the deep features to obtain an image with higher resolution. The residual error separation mixed attention module splits the feature graph by adopting a channel separation technology, and sends the feature graph into the two branch modules in parallel for processing, and the local features extracted by the residual error triple attention module and the global features extracted by the high-efficiency Swin Transformer module are fused to obtain rich high-frequency and low-frequency information. By the method, the image with richer details can be obtained, and the super-resolution reconstruction with higher precision is realized.

Description

Image super-resolution reconstruction model and method based on residual mixed attention network
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to an image super-resolution reconstruction model and method based on a residual mixed attention network.
Background
In the field of electronic image applications, it is often desirable to obtain high resolution images. High resolution means that the density of pixels in the image is high, providing more detail that is essential in many practical applications. For example, high resolution medical images are very helpful for physicians to make correct diagnoses; similar objects are easily distinguished from the like using high resolution satellite images; the performance of pattern recognition in computer vision is greatly enhanced if high resolution images can be provided.
Due to capital and technical limitations, the pictures obtained are in most cases difficult to achieve with the desired resolution. However, the image is subjected to various constraints in the acquisition process, such as optical blur caused by defocusing, diffraction and the like in the digital imaging process, motion blur caused by limited shutter speed, aliasing effect influenced by the density of the sensing unit, random noise in the image photoreceptor or in the image transmission process and the like, and the factors influence the generation quality of the image. Therefore, it is highly necessary to find a way to enhance the current resolution level.
The image super-resolution reconstruction technology is used as a post-processing means, and can enhance the resolution of an image without increasing the hardware cost. Image super-resolution reconstruction aims at algorithmically restoring a high-resolution image from a given low-resolution image. At present, methods for image super-resolution reconstruction can be mainly divided into three categories: interpolation-based super-resolution reconstruction, reconstruction-based super-resolution reconstruction, and learning-based super-resolution reconstruction. The interpolation method is to insert new pixels around the original pixels of the image to increase the size of the image, and assign values to the pixels after the pixels are inserted, so as to restore the image content and achieve the effect of improving the image resolution. The method is simple in calculation and easy to understand and implement, but the problems of ringing effect and serious loss of high-frequency information can occur in the reconstruction result. The reconstruction method is to establish an observation model in the process of image acquisition and then realize super-resolution reconstruction by solving the inverse problem of the observation model. The reconstruction method improves on the recovery details, but its performance degrades as the scale factor increases, and this method is time consuming. The super-resolution reconstruction based on the deep learning recovers high-frequency information by utilizing the end-to-end mapping established between a low-resolution image block and a high-resolution image block, and obtains a good reconstruction result.
At present, a mainstream algorithm usually designs a very deep network architecture, long-time training is needed, and as the depth of a network is deeper, the training difficulty is higher, and the required training skill is increased. Meanwhile, the low-resolution input contains abundant low-frequency information which is treated equally among channels, so that the convolutional neural network is prevented from learning more high-frequency information. In addition, the current convolutional neural network for super-resolution does not fully utilize features on multiple scales, and the learning capability of the network is limited. Meanwhile, only a convolutional neural network is used for constructing a model, and the self-similarity inside the image cannot be fully utilized to capture the remote dependence inside the image. Therefore, it is necessary to solve the existing problems and reconstruct a high quality image.
Disclosure of Invention
In view of this, the present invention provides an image super-resolution reconstruction model and method based on a residual hybrid attention network, so as to recover more texture details and improve the image super-resolution reconstruction accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme: the image super-resolution reconstruction model based on the residual mixed attention network comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module: the shallow feature extraction module is composed of a 3 x 3 convolutional layer, extracts shallow features of the low-resolution input image by using the characteristic that the convolutional layer is good at extracting the features, and specifically operates as follows:
M 0 =H SF (I LR )
wherein H SF (. Represents a shallow feature extraction Module, I) LR Representing an input low resolution image, M 0 Representing a shallow feature map;
the deep layer feature extraction module consists of a multiple residual error separation mixed attention group and a convolution of 3 multiplied by 3, and extracts high-level features from the shallow layer features, and the process is expressed as follows:
Figure BDA0003785489910000031
M DF =f 3×3 (M n )
wherein
Figure BDA0003785489910000032
Denotes the ith multiple residual separation mixed attention group, n denotes the number of multiple mixed attention residual groups, M i-1 、M i 、M n Intermediate feature map representing multiple residual separation mixed attention groups, f 3×3 (. Cndot.) denotes convolution operation with convolution kernel size of 3X 3, M DF A deep level feature map is shown.
The reconstruction module is composed of a sub-pixel convolution layer and a convolution of 3 multiplied by 3, one sub-pixel convolution layer is utilized to carry out up-sampling on the deep features extracted by the deep feature extraction module, the information flow is reformed into a feature map with specified up-sampling multiplying power, and the process is described as follows:
I SR =H UP (M DF +M 0 )+Bicubic(I LR )
in which I SR Representing the high-resolution image after super-resolution reconstruction, H UP (. Cndot.) denotes the reconstruction module, bicubic (. Cndot.) denotes Bicubic interpolation of the low resolution image to the target resolution.
In a preferred embodiment, the multi-residual separation mixed attention set is composed of a plurality of residual separation mixed attention modules and a 3 × 3 convolutional layer.
In a preferred embodiment, the residual separation mixed attention module comprises two 1 × 1 convolution layers, a residual triple attention module, an efficient Swin Transformer module, residual concatenation, channel splitting and splicing operations, and the specific calculation formula expression is as follows:
X 1 ,X 2 =H SPL (f 1×1 (X))
Y 1 =RTAB(X 1 )
Y 2 =ESTB(X 2 )
Z=f 1×1 (H CAT (Y 1 ,Y 2 ))+X
wherein X represents the input characteristics of the residual error separation mixed attention module, f 1×1 (. Represents a 1X 1 convolutional layer, H SPL (. Represents an operation of channel splitting, X 1 、X 2 Representing the feature graph after splitting, RTAB representing the operation of the residual triple attention module, Y 1 Represents the output characteristics of the residual triple attention module, ESTB represents the operation of the efficient Swin Transformer module, Y 2 Represents the output characteristics of the high-efficiency Swin Transformer module, H CAT (. Cndot.) denotes the operation of channel stitching, and Z denotes the output characteristics of the residual separation mixed attention module.
In a preferred embodiment, the residual triple attention module is composed of two 3 × 3 convolutional layers, a RELU activation function, a residual connection, and a triple attention module, and the specific formula is as follows:
X O =F TAM (f 3×3 (RELU(f 3×3 (X I ))))+X
wherein, X I Representing the input features of the residual triple attention module, X o Representing the output characteristics of the residual triple attention module, F TAM (. Cndot.) represents the operation of the triple attention module, RELU (. Cndot.) represents the RELU activation function; the triple attention module is composed of two cross-dimension interaction modules and a space attention module, and a specific calculation formula expression is as follows:
Figure BDA0003785489910000051
wherein
Figure BDA0003785489910000052
Representing output characteristics of two cross-dimension interaction modules,
Figure BDA0003785489910000053
the output characteristics of the spatial attention module are indicated, and Y indicates the output characteristics of the triple attention module.
In a preferred embodiment, the cross-dimension interaction module comprises a 7 × 7 convolutional layer, a dimension replacement operation, a Z-Pool layer, a channel connection operation, a maximum pooling operation, and an average pooling operation; the specific calculation formula is as follows:
X′ 1 =H PER (X 1 )
Z-Pool(X)=H CAT (H MP (X),H AP (X))
X″ 1 =Z-Pool(X′ 1 )
Figure BDA0003785489910000054
Figure BDA0003785489910000055
wherein, X' 1 Denotes the result of the dimension replacement operation, X ″) 1 Shows the result of the Z-Pool layer operation, H PER (. Cndot.) denotes a dimension replacement operation, Z-Pool (. Cndot.) denotes an operation of a Z-Pool layer, H CAT (. Cndot.) represents an operation that joins along a particular dimension in a given input sequence feature graph, H MP (. And H) AP Denotes the maximum pooling operation and the average pooling operation along a particular dimension, respectively, f 7×7 (. H) represents a convolution operation with a convolution kernel size of 7 × 7, H IN (. Cndot.) represents an example normalization operation, δ (-) represents a Sigmoid function, and · represents a multiply-by-channel operation.
In a preferred embodiment, the spatial attention module comprises a 7 × 7 convolutional layer, a Z-Pool layer and an example normalization operation, and the specific calculation formula is as follows:
Figure BDA0003785489910000056
wherein X 3
Figure BDA0003785489910000057
Representing the input features and the output features of the spatial attention module.
In a preferred embodiment, the high-efficiency Swin Transformer module consists of two layers of normalization operations, a self-attention calculation module based on moving windows, a local feature extraction feed-forward network and residual connection;
the calculation formula of the high-efficiency Swin Transformer module is as follows:
Q=LN(W Q X)
K=LN(W K X)
V=LN(W V X)
Figure BDA0003785489910000061
Figure BDA0003785489910000062
Figure BDA0003785489910000063
wherein, W Q 、W K 、W V A transformation matrix representing the computation of Q, K, V, LN (-) representing a layer normalization operation, Q, K, V representing the query, key and value matrices, respectively, softMax (-) representing a SoftMax function, SW-MSA (-) representing a moving window self-attention module, leFF (-) representing a locally enhanced multi-layer perceptron module,
Figure BDA0003785489910000064
the output characteristics of the efficient Swin Transformer module are shown, d is the dimension of the K matrix, and Attention is shown in self-Attention calculation.
The invention also provides an image super-resolution reconstruction method based on the residual mixed attention network, which adopts the image super-resolution reconstruction model based on the residual mixed attention network and comprises the following steps:
step S1: establishing a training set according to the image degradation model to obtain N low-resolution images I LR And N low resolution images I LR Corresponding true high resolution image I HR (ii) a Wherein N is an integer greater than 1;
step S2: inputting the low-resolution image to a shallow feature extraction module to extract shallow features of the image;
and step S3: inputting the shallow features into a deep feature extraction module to extract deep features;
and step S4: inputting the deep features into a reconstruction module, performing sub-pixel convolution to complete up-sampling processing, and reconstructing a final high-resolution image;
step S5: optimizing the image super-resolution reconstruction model through a loss function, wherein the loss function uses an average L1 error between the N reconstructed high-resolution images and the corresponding real high-resolution images, and the expression is as follows:
Figure BDA0003785489910000071
wherein L1 represents the L1 loss function.
Compared with the prior art, the invention has the following beneficial effects: the invention combines a convolutional neural network and a Transformer, adopts a channel separation technology, splits a characteristic diagram, and sends the characteristic diagram into two branch modules for processing in parallel. And through a residual error separation mixed attention module, fusing local features extracted by a triple attention module based on a convolutional neural network and global features extracted by a transform-based efficient Swin transform module to obtain rich high-frequency and low-frequency information. By the method, images with richer details can be obtained, and super-resolution reconstruction with higher precision is realized.
Drawings
FIG. 1 is a diagram of a residual separation hybrid attention network architecture in a preferred embodiment of the present invention;
FIG. 2 is a diagram of a residual separation mixed attention group configuration in a preferred embodiment of the present invention;
FIG. 3 is a block diagram of a residual separation hybrid attention module in a preferred embodiment of the present invention;
FIG. 4 is a diagram of a residual triple attention module in a preferred embodiment of the present invention;
FIG. 5 is a block diagram of a high efficiency Swin Transformer module in a preferred embodiment of the present invention;
fig. 6 is a flowchart illustrating an image super-resolution reconstruction method according to a preferred embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1 to 6, the present embodiment provides an image super-resolution reconstruction model based on a residual separation hybrid attention network, wherein the model includes: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module. The shallow layer feature extraction module is used for extracting shallow layer features from the low-resolution image; the deep layer feature extraction module is formed by connecting a plurality of cascade residual separation mixed attention groups and global residuals, and performs feature extraction and fusion on the shallow layer features to obtain deep layer features; the reconstruction module uses the sub-pixel convolution layer to perform up-sampling on the deep features to obtain an image with higher resolution. The residual error separation mixed attention module splits the feature graph by adopting a channel separation technology, and sends the feature graph into the two branch modules in parallel for processing, and the local features extracted by the residual error triple attention module and the global features extracted by the high-efficiency Swin Transformer module are fused to obtain rich high-frequency and low-frequency information.
Step 1, performing shallow feature extraction on an input image, specifically:
by utilizing the characteristic that the convolutional layer is good at extracting the features, a 3 x 3 convolutional layer is used for extracting the shallow features of the low-resolution input image, and the specific operation is as follows:
M 0 =H SF (I LR )
wherein H SF (. Represents a shallow feature extraction Module, I) LR Representing an input low resolution image, M 0 Representing a shallow feature map;
and 2, performing feature extraction and feature fusion on the shallow features through a deep feature extraction module formed by connecting a plurality of cascade residual separation mixed attention groups and global residuals, and obtaining deep features:
step 2.1, as shown in fig. 2, by inputting the shallow feature into the deep feature extraction module formed by connecting n cascaded residual separation mixed attention groups and the global residual, feature extraction and feature fusion can be performed on the shallow feature to obtain richer and deeper features, where the cascaded residual separation mixed attention group includes m cascaded residual separation mixed attention modules, where n =6 and m =6 in this example. The specific calculation is represented by the following formula:
Figure BDA0003785489910000091
M DF =f 3×3 (M n )
wherein
Figure BDA0003785489910000092
Denotes the ith multiple residual separation mixed attention group, n denotes the number of multiple mixed attention residual groups, M i-1 、M i 、M n Intermediate feature map representing multiple residual separation mixed attention set, f 3×3 (. -) represents a convolution operation with a convolution kernel size of 3 × 3, M DF A deep level feature map is shown.
Fig. 3 is a block diagram of a residual separation mixed attention module comprising 1 residual triple attention module, 1 high efficiency Swin Transformer module, two 1 × 1 convolution layers and channel splitting and joining operations, whose calculation is represented by the following formula:
X 1 ,X 2 =H SPL (f 1×1 (X))
Y 1 =RTAB(X 1 )
Y 2 =ESTB(X 2 )
z=f 1×1 (H CAT (Y 1 ,Y 2 ))+X
wherein X represents the input characteristics of the residual separation hybrid attention module, f 1×1 (. Represents a 1X 1 convolutional layer, H SPL (. A) tableOperation of show channel splitting, X 1 、X 2 Representing the feature graph after splitting, RTAB representing the operation of the residual triple attention module, Y 1 Represents the output characteristics of the residual triple attention module, ESTB represents the operation of the efficient Swin Transformer module, Y 2 Represents the output characteristics of the high-efficiency Swin Transformer module, H CAT (. Cndot.) denotes the operation of channel stitching and Z denotes the output characteristics of the residual separation hybrid attention module.
Fig. 4 is a block diagram of a residual triple attention module containing one two 3 x 3 convolutional layers, the RELU activation function, the triple attention module, the average pooling operation, and the residual concatenation. The residual triple attention module fully utilizes the advantage that the CNN is good at extracting inherent characteristics by utilizing the correlation among different dimensions of the characteristic diagram, so that a network can learn richer image information, and the calculation of the residual triple attention module is represented by the following formula:
X O =F TAM (f 3×3 (RELU(f 3×3 (X I ))))+X
wherein X I Representing the input features of the residual triple attention module, X O Representing the output characteristics of the residual triple attention module, F TAM (. Cndot.) represents the operation of the triple attention module, and RELU (. Cndot.) represents the RELU activation function. The triple attention module is composed of two cross-dimension interaction modules and a space attention module, and a specific calculation formula expression is as follows:
Figure BDA0003785489910000101
wherein
Figure BDA0003785489910000102
Representing the output characteristics of two cross-dimension interaction modules,
Figure BDA0003785489910000103
the output characteristics of the spatial attention module are indicated, and Y indicates the output characteristics of the triple attention module. The cross-dimension interaction module comprises a 7 x 7 convolution layer and dimensionDegree displacement operation, Z-Pool layer, channel connection operation, maximum pooling operation, and average pooling operation. The specific calculation formula is as follows:
X′ 1 =H PER (X 1 )
Z-Pool(X)=H CAT (H MP (X),H AO (X))
X″ 1 =Z-Pool(X′ 1 )
Figure BDA0003785489910000111
Figure BDA0003785489910000112
wherein, X' 1 Denotes the result of the dimension permutation operation, X ″ 1 Shows the result of the Z-Pool layer operation, H PER (. Cndot.) denotes a dimension replacement operation, Z-Pool (. Cndot.) denotes an operation of a Z-Pool layer, H CAT (. H) represents operations that connect along a particular dimension in a given input sequence feature graph MP (. And H) AP Denotes the maximum pooling operation and the average pooling operation along a particular dimension, respectively, f 7×7 (. H) represents a convolution operation with a convolution kernel size of 7 × 7, H IN (. Cndot.) represents an example normalization operation, δ (-) represents a Sigmoid function,. Cndot.represents a multiply-by-channel operation,
Figure BDA0003785489910000113
output features of the cross-dimension interaction module are represented.
The spatial attention module comprises a 7 multiplied by 7 convolutional layer, a Z-Pool layer and an example normalization operation, and the specific calculation formula is as follows:
Figure BDA0003785489910000114
wherein X 3
Figure BDA0003785489910000115
Representing the input features and the output features of the spatial attention module.
Fig. 5 is a block diagram of an efficient Swin Transformer module that includes two layer normalization operations, a moving window self-attention module, a local enhanced multi-layer perceptron module, and residual join operations. Its calculation is represented by the following formula:
Q=LN(W Q X)
K=LN(W K X)
V=LN(W V X)
Figure BDA0003785489910000121
Figure BDA0003785489910000122
Figure BDA0003785489910000123
wherein, W Q 、W K 、W V Representing the transformation matrix of computation Q, K, V, d representing the dimensions of the K matrix, attention representing the self-Attention computation, LN (-) representing the layer normalization operation, Q, K, V representing the query, key and value matrices, softMax (-) representing the SoftMax function, SW-MSA (-) representing the moving window self-Attention module, leFF (-) representing the locally enhanced multi-layer perceptron module,
Figure BDA0003785489910000124
and (3) representing the output characteristics of the high-efficiency Swin transducer module.
And 3, reconstructing the deep characteristic diagram, and reconstructing the characteristic diagram predicted by up-sampling into a high-resolution image, wherein the expression is as follows:
I SR =H UP (M DF +M 0 )+Bicubic(I LR )
wherein I SR Representing super resolutionHigh resolution image after rate reconstruction, H UP (. Cndot.) denotes the reconstruction module, bicubic (. Cndot.) denotes Bicubic interpolation of the low resolution image to the target resolution.
The image super-resolution reconstruction method applied to the image super-resolution reconstruction model comprises the following specific contents.
S1, establishing a training set according to an image degradation model to obtain N low-resolution images I LR And N low resolution images I LR Corresponding true high resolution image I HR (ii) a Wherein N is an integer greater than 1;
s2, inputting the low-resolution image to a shallow feature extraction module to extract shallow features of the image;
s3, inputting the shallow features into a deep feature extraction module to extract deep features;
s4, inputting the deep features into a reconstruction module, performing sub-pixel convolution to complete up-sampling processing, and reconstructing a final high-resolution image;
s5, optimizing the image super-resolution reconstruction model through a loss function, wherein the loss function uses an average L1 error between the N reconstructed high-resolution images and the corresponding real high-resolution images, and the expression is as follows:
Figure BDA0003785489910000131
wherein L1 represents the L1 loss function.
In order to better illustrate the effectiveness of the present invention, the examples of the present invention also employ a comparative experiment to compare the reconstruction effect.
Specifically, in the embodiment of the present invention, 800 high-resolution images in DIV2K are used as a training Set, and Set5, set14, B100, urban100, and Manga109 are used as a test Set, respectively. And carrying out double-three down sampling on the original high-resolution image to obtain a corresponding low-resolution image.
When the training set is constructed, training and testing of the model is performed on the pyrrch framework. Cropping low-resolution images in a training setEach time 48 image blocks of 64 × 64 are randomly input, 500 epochs are trained. Optimization of network parameters is achieved using the Adam gradient descent method, wherein parameters of the Adam optimizer are set to β 1=0.9, β 2=0.999, and ∈ =10 -8 . Initial setting of learning rate is 2 × 10 -4 And is reduced by half after the 250, 400, 425, 450, 475 th epoch. The number of RSHABs is set to 36 and the number of channels is set to 180. Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) were used to evaluate model performance. The performance of the model is tested by using five reference data sets of Set5, set14, B100, urban100 and Manga109, 11 representative image super-resolution reconstruction methods are selected in a comparison experiment and compared with the experimental results of the invention, and the experimental results are shown in table 1, wherein RSHAN is the method provided by the invention.
TABLE 1 average PSNR and SSIM value comparison across 5 test sets
Figure BDA0003785489910000141
In summary, compared with the prior art, the present invention has the following advantages and effects:
(1) According to the embodiment of the invention, the local modeling capability of the residual error triple attention module based on the convolutional neural network and the non-local modeling capability of the high-efficiency Swin transducer module are combined by adopting the residual error separation mixed attention module. On the premise of ensuring that the quantity of parameters is similar to that of the reconstruction model with the best performance at present, the information relation among different dimensionalities of the image is effectively utilized, the capability of the super-resolution reconstruction model for extracting high-frequency information is remarkably improved, and meanwhile, the extracted features are richer by utilizing the captured long-distance dependence relation.
(2) According to the embodiment of the invention, by adopting the structure that the global residual is embedded with the cascade residual, the network can bypass the low-frequency information in the low-resolution input, learn more high-frequency residual information and acquire rich detailed characteristics. And a deep network does not need to be established, and a high-resolution reconstructed image with good effect can be obtained.

Claims (8)

1. The image super-resolution reconstruction model based on the residual mixed attention network is characterized by comprising a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module: the shallow feature extraction module is composed of a 3 x 3 convolutional layer, extracts shallow features of the low-resolution input image by using the characteristic that the convolutional layer is good at extracting the features, and specifically operates as follows:
M 0 =H SF (I LR )
wherein H SF (. Represents a shallow feature extraction Module, I) LR Representing an input low resolution image, M 0 Representing a shallow feature map;
the deep layer feature extraction module consists of a multiple residual error separation mixed attention group and a convolution of 3 multiplied by 3, and extracts high-level features from the shallow layer features, and the process is expressed as follows:
Figure FDA0003785489900000011
M DF =f 3×3 (M n )
wherein
Figure FDA0003785489900000012
Denotes the ith multiple residual separation mixed attention group, n denotes the number of multiple mixed attention residual groups, M i-1 、M i 、M n Intermediate feature map representing multiple residual separation mixed attention set, f 3×3 (. Cndot.) denotes convolution operation with convolution kernel size of 3X 3, M DF Representing a deep layer characteristic diagram;
the reconstruction module is composed of a sub-pixel convolution layer and a convolution of 3 multiplied by 3, one sub-pixel convolution layer is utilized to carry out up-sampling on the deep features extracted by the deep feature extraction module, the information flow is reformed into a feature map with specified up-sampling multiplying power, and the process is described as follows:
I SR =H UP (M DF +M 0 )+Bicubic(I LR )
in which I SR Representing the high-resolution image after super-resolution reconstruction, H UP (. Cndot.) denotes the reconstruction module, bicubic (. Cndot.) denotes Bicubic interpolation of the low resolution image to the target resolution.
2. The residual mixed attention network-based image super-resolution reconstruction model of claim 1, wherein the multiple residual separated mixed attention group is composed of a plurality of residual separated mixed attention modules and a 3 x 3 convolutional layer.
3. The residual mixed attention network-based image super-resolution reconstruction model of claim 2, wherein the residual separation mixed attention module comprises two 1 × 1 convolution layers, a residual triple attention module, an efficient Swin transform module, residual connection, channel splitting and splicing operations, and the specific calculation formula expression is as follows:
X 1 ,X 2 =H SPL (f 1×1 (X))
Y 1 =RTAB(X 1 )
Y 2 =ESTB(X 2 )
Z=f 1×1 (H CAT (Y 1 ,Y 2 ))+X
wherein X represents the input characteristics of the residual separation hybrid attention module, f 1×1 (. Represents a 1X 1 convolutional layer, H SPL (. Represents an operation of channel splitting, X 1 、X 2 Representing the feature graph after splitting, RTAB representing the operation of the residual triple attention module, Y 1 Represents the output characteristics of the residual triple attention module, ESTB represents the operation of the efficient Swin Transformer module, Y 2 Represents the output characteristics of the high-efficiency Swin Transformer module, H CAT (. Cndot.) denotes the operation of channel stitching, and Z denotes the output characteristics of the residual separation mixed attention module.
4. The residual hybrid attention network-based image super-resolution reconstruction model according to claim 3, wherein the residual triple attention module is composed of two 3 × 3 convolution layers, a RELU activation function, a residual connection and a triple attention module, and the specific formula expression is as follows:
X O =F TAM (f 3×3 (RELU(f 3×3 (X I ))))+X
wherein X I Input features representing residual triple attention modules, X O Representing the output characteristics of the residual triple attention module, F TAM (. Cndot.) represents the operation of the triple attention module, RELU (. Cndot.) represents the RELU activation function; the triple attention module is composed of two cross-dimension interaction modules and a space attention module, and a specific calculation formula expression is as follows:
Figure FDA0003785489900000031
wherein
Figure FDA0003785489900000032
Representing output characteristics of two cross-dimension interaction modules,
Figure FDA0003785489900000033
the output characteristics of the spatial attention module are indicated, and Y indicates the output characteristics of the triple attention module.
5. The residual hybrid attention network-based image super-resolution reconstruction model of claim 4, wherein the cross-dimension interaction module comprises a 7 x 7 convolutional layer, a dimension permutation operation, a Z-Pool layer, a channel connection operation, a maximum pooling operation and an average pooling operation; the specific calculation formula is as follows:
X′ 1 =H PER (X 1 )
Z-Pool(X)=H CAT (H MP (X),H AP (X))
X″ 1 =Z-Pool(X′ 1 )
Figure FDA0003785489900000034
Figure FDA0003785489900000035
wherein, X' 1 Denotes the result of the dimension permutation operation, X' 1 Shows the result of the Z-Pool layer operation, H PER (. Cndot.) denotes a dimension replacement operation, Z-Pool (. Cndot.) denotes an operation of a Z-Pool layer, H CAT (. H) represents operations that connect along a particular dimension in a given input sequence feature graph MP (. Cndot.) and H AP (. Cndot.) denotes the maximum pooling operation and the average pooling operation along a particular dimension, respectively, f 7×7 (. H) represents a convolution operation with a convolution kernel size of 7 × 7, H IN (. Cndot.) represents an example normalization operation, δ (-) represents a Sigmoid function, and · represents a multiply-by-channel operation.
6. The residual mixed attention network-based image super-resolution reconstruction model of claim 4, wherein the spatial attention module comprises a 7 x 7 convolutional layer, a Z-Pool layer and an example normalization operation, and the specific calculation formula is as follows:
Figure FDA0003785489900000041
wherein X 3
Figure FDA0003785489900000042
Representing the input features and the output features of the spatial attention module.
7. The residual mixed attention network-based image super-resolution reconstruction model of claim 3, wherein the high-efficiency Swin Transformer module is composed of two-layer normalization, a self-attention calculation module based on moving window, a local feature extraction feed-forward network and residual connection;
the calculation formula of the high-efficiency Swin Transformer module is as follows:
Q=LN(W Q X)
K=LN(W K X)
V=LN(W V X)
Figure FDA0003785489900000043
Figure FDA0003785489900000044
Figure FDA0003785489900000045
wherein, W Q 、W K 、W V A transformation matrix representing the computation of Q, K, V, LN (-) representing a layer normalization operation, Q, K, V representing the query, key and value matrices, softMax (-) representing a SoftMax function, SW-MSA (-) representing a moving-window self-attention module, leFF (-) representing a locally enhanced multi-layer perceptron module,
Figure FDA0003785489900000046
the output characteristics of the efficient Swin Transformer module are shown, d is the dimension of the K matrix, and Attention is shown in self-Attention calculation.
8. The image super-resolution reconstruction method based on the residual mixed attention network is characterized in that the image super-resolution reconstruction model based on the residual mixed attention network of any one of claims 1 to 7 is adopted, and the method comprises the following steps:
step S1: establishing a training set according to the image degradation model to obtain N low-resolution images I LR And N low resolution images I LR Corresponding true high resolution image I HR (ii) a Wherein N is an integer greater than 1;
step S2: inputting the low-resolution image to a shallow feature extraction module to extract shallow features of the image;
and step S3: inputting the shallow features into a deep feature extraction module to extract deep features;
and step S4: inputting the deep features into a reconstruction module, performing sub-pixel convolution to complete up-sampling processing, and reconstructing a final high-resolution image;
step S5: optimizing the image super-resolution reconstruction model through a loss function, wherein the loss function uses an average L1 error between the N reconstructed high-resolution images and the corresponding real high-resolution images, and the expression is as follows:
Figure FDA0003785489900000051
wherein L1 represents the L1 loss function.
CN202210940743.1A 2022-08-06 2022-08-06 Image super-resolution reconstruction model and method based on residual mixed attention network Pending CN115222601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210940743.1A CN115222601A (en) 2022-08-06 2022-08-06 Image super-resolution reconstruction model and method based on residual mixed attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210940743.1A CN115222601A (en) 2022-08-06 2022-08-06 Image super-resolution reconstruction model and method based on residual mixed attention network

Publications (1)

Publication Number Publication Date
CN115222601A true CN115222601A (en) 2022-10-21

Family

ID=83615969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210940743.1A Pending CN115222601A (en) 2022-08-06 2022-08-06 Image super-resolution reconstruction model and method based on residual mixed attention network

Country Status (1)

Country Link
CN (1) CN115222601A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115761248A (en) * 2022-12-22 2023-03-07 深圳大学 Image processing method, device, equipment and storage medium
CN116071243A (en) * 2023-03-27 2023-05-05 江西师范大学 Infrared image super-resolution reconstruction method based on edge enhancement
CN116385265A (en) * 2023-04-06 2023-07-04 北京交通大学 Training method and device for image super-resolution network
CN116402692A (en) * 2023-06-07 2023-07-07 江西财经大学 Depth map super-resolution reconstruction method and system based on asymmetric cross attention
CN116523759A (en) * 2023-07-04 2023-08-01 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism
CN116664397A (en) * 2023-04-19 2023-08-29 太原理工大学 TransSR-Net structured image super-resolution reconstruction method
CN117132472A (en) * 2023-10-08 2023-11-28 兰州理工大学 Forward-backward separable self-attention-based image super-resolution reconstruction method
CN117173025A (en) * 2023-11-01 2023-12-05 华侨大学 Single-frame image super-resolution method and system based on cross-layer mixed attention transducer
CN117237197A (en) * 2023-11-08 2023-12-15 华侨大学 Image super-resolution method and device based on cross attention mechanism and Swin-transducer
CN117422614A (en) * 2023-12-19 2024-01-19 华侨大学 Single-frame image super-resolution method and device based on hybrid feature interaction transducer
CN117495680A (en) * 2024-01-02 2024-02-02 华侨大学 Multi-contrast nuclear magnetic resonance image super-resolution method based on feature fusion transducer
CN117575915A (en) * 2024-01-16 2024-02-20 闽南师范大学 Image super-resolution reconstruction method, terminal equipment and storage medium
CN117934289A (en) * 2024-03-25 2024-04-26 山东师范大学 System and method for integrating MRI super-resolution and synthesis tasks
CN118052717A (en) * 2024-04-15 2024-05-17 北京数慧时空信息技术有限公司 Training method of image superdivision model and image superdivision method
CN118097321A (en) * 2024-04-29 2024-05-28 济南大学 Vehicle image enhancement method and system based on CNN and transducer

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115761248A (en) * 2022-12-22 2023-03-07 深圳大学 Image processing method, device, equipment and storage medium
CN116071243A (en) * 2023-03-27 2023-05-05 江西师范大学 Infrared image super-resolution reconstruction method based on edge enhancement
CN116385265B (en) * 2023-04-06 2023-10-17 北京交通大学 Training method and device for image super-resolution network
CN116385265A (en) * 2023-04-06 2023-07-04 北京交通大学 Training method and device for image super-resolution network
CN116664397B (en) * 2023-04-19 2023-11-10 太原理工大学 TransSR-Net structured image super-resolution reconstruction method
CN116664397A (en) * 2023-04-19 2023-08-29 太原理工大学 TransSR-Net structured image super-resolution reconstruction method
CN116402692B (en) * 2023-06-07 2023-08-18 江西财经大学 Depth map super-resolution reconstruction method and system based on asymmetric cross attention
CN116402692A (en) * 2023-06-07 2023-07-07 江西财经大学 Depth map super-resolution reconstruction method and system based on asymmetric cross attention
CN116523759B (en) * 2023-07-04 2023-09-05 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism
CN116523759A (en) * 2023-07-04 2023-08-01 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism
CN117132472A (en) * 2023-10-08 2023-11-28 兰州理工大学 Forward-backward separable self-attention-based image super-resolution reconstruction method
CN117132472B (en) * 2023-10-08 2024-05-31 兰州理工大学 Forward-backward separable self-attention-based image super-resolution reconstruction method
CN117173025B (en) * 2023-11-01 2024-03-01 华侨大学 Single-frame image super-resolution method and system based on cross-layer mixed attention transducer
CN117173025A (en) * 2023-11-01 2023-12-05 华侨大学 Single-frame image super-resolution method and system based on cross-layer mixed attention transducer
CN117237197A (en) * 2023-11-08 2023-12-15 华侨大学 Image super-resolution method and device based on cross attention mechanism and Swin-transducer
CN117237197B (en) * 2023-11-08 2024-03-01 华侨大学 Image super-resolution method and device based on cross attention mechanism
CN117422614B (en) * 2023-12-19 2024-03-12 华侨大学 Single-frame image super-resolution method and device based on hybrid feature interaction transducer
CN117422614A (en) * 2023-12-19 2024-01-19 华侨大学 Single-frame image super-resolution method and device based on hybrid feature interaction transducer
CN117495680A (en) * 2024-01-02 2024-02-02 华侨大学 Multi-contrast nuclear magnetic resonance image super-resolution method based on feature fusion transducer
CN117495680B (en) * 2024-01-02 2024-05-24 华侨大学 Multi-contrast nuclear magnetic resonance image super-resolution method based on feature fusion transducer
CN117575915A (en) * 2024-01-16 2024-02-20 闽南师范大学 Image super-resolution reconstruction method, terminal equipment and storage medium
CN117934289A (en) * 2024-03-25 2024-04-26 山东师范大学 System and method for integrating MRI super-resolution and synthesis tasks
CN118052717A (en) * 2024-04-15 2024-05-17 北京数慧时空信息技术有限公司 Training method of image superdivision model and image superdivision method
CN118097321A (en) * 2024-04-29 2024-05-28 济南大学 Vehicle image enhancement method and system based on CNN and transducer

Similar Documents

Publication Publication Date Title
CN115222601A (en) Image super-resolution reconstruction model and method based on residual mixed attention network
Fang et al. A hybrid network of cnn and transformer for lightweight image super-resolution
CN113362223B (en) Image super-resolution reconstruction method based on attention mechanism and two-channel network
CN111861961A (en) Multi-scale residual error fusion model for single image super-resolution and restoration method thereof
CN111311490A (en) Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
CN111105352A (en) Super-resolution image reconstruction method, system, computer device and storage medium
CN112365403B (en) Video super-resolution recovery method based on deep learning and adjacent frames
Luo et al. Lattice network for lightweight image restoration
CN115358932B (en) Multi-scale feature fusion face super-resolution reconstruction method and system
CN109949217B (en) Video super-resolution reconstruction method based on residual learning and implicit motion compensation
CN112949636B (en) License plate super-resolution recognition method, system and computer readable medium
CN115496658A (en) Lightweight image super-resolution reconstruction method based on double attention mechanism
CN112699844A (en) Image super-resolution method based on multi-scale residual error level dense connection network
Yang et al. License plate image super-resolution based on convolutional neural network
Chen et al. Remote sensing image super-resolution via residual aggregation and split attentional fusion network
CN114926337A (en) Single image super-resolution reconstruction method and system based on CNN and Transformer hybrid network
CN115526779A (en) Infrared image super-resolution reconstruction method based on dynamic attention mechanism
Fan et al. Global sensing and measurements reuse for image compressed sensing
CN109615576B (en) Single-frame image super-resolution reconstruction method based on cascade regression basis learning
CN114022356A (en) River course flow water level remote sensing image super-resolution method and system based on wavelet domain
CN113379606A (en) Face super-resolution method based on pre-training generation model
CN116385265B (en) Training method and device for image super-resolution network
Liu et al. A densely connected face super-resolution network based on attention mechanism
CN115330631A (en) Multi-scale fusion defogging method based on stacked hourglass network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination