CN116797461A - Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism - Google Patents
Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism Download PDFInfo
- Publication number
- CN116797461A CN116797461A CN202310853109.9A CN202310853109A CN116797461A CN 116797461 A CN116797461 A CN 116797461A CN 202310853109 A CN202310853109 A CN 202310853109A CN 116797461 A CN116797461 A CN 116797461A
- Authority
- CN
- China
- Prior art keywords
- resolution
- binocular
- attention
- module
- view
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000007246 mechanism Effects 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000000694 effects Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims description 42
- 230000006870 function Effects 0.000 claims description 41
- 238000011176 pooling Methods 0.000 claims description 24
- 230000002452 interceptive effect Effects 0.000 claims description 22
- 238000012360 testing method Methods 0.000 claims description 21
- 230000003993 interaction Effects 0.000 claims description 17
- 230000004927 fusion Effects 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000008521 reorganization Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 claims description 4
- 241000282326 Felis catus Species 0.000 claims description 3
- 238000000137 annealing Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 2
- 238000005215 recombination Methods 0.000 claims 1
- 230000006798 recombination Effects 0.000 claims 1
- 238000011084 recovery Methods 0.000 claims 1
- 238000005096 rolling process Methods 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 abstract description 2
- 238000007499 fusion processing Methods 0.000 abstract description 2
- 230000014759 maintenance of location Effects 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013441 quality evaluation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
Description
技术领域Technical field
本发明涉及双目图像超分辨率技术领域,特别是基于多级强化注意力机制的双目图像超分辨率重建方法。The present invention relates to the technical field of binocular image super-resolution, in particular to a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism.
背景技术Background technique
实现双目图像超分辨率的一种很直接的方式是对左右两图分别使用单图像超分辨率算法。注意力机制是深度学习领域非常重要的一个研究方向,在过去的几年里,已经涌现了一些性能较为优越的单图像超分辨率算法,例如基于通道注意力构建的RCAN,基于像素注意力机制构建的PAN,基于Transformer自注意力机制构建的SwinIR,基于多尺度大核注意力的MAN等。然而,仅使用单图像超分辨率方法分别独立重建双目图像只使用了图像内的自相似性来恢复细节,但忽视了跨视图之间可以被利用的额外信息,即跨视图相似性,限制了超分辨性能的进一步提高。因此,充分利用跨视图信息可以帮助重建更高质量的超分辨图像,因为一个视图相对于另一个视图对同一个场景区域可能有补充的信息。随着社会的需求,各种适应时代的超分辨率重建技术被提出,双目图像超分辨率重建技术也被应用于各领域的研究基础中,具有极大的应用研究意义。由此产生了基于视差注意力构建的PASSRnet,基于双向视差注意力机制构建的iPASSR,基于Transformer自注意力机制构建的SwinFSR,基于大核卷积注意力机制构建的CVHSSR等。A very straightforward way to achieve binocular image super-resolution is to use a single-image super-resolution algorithm for the left and right images respectively. Attention mechanism is a very important research direction in the field of deep learning. In the past few years, some single-image super-resolution algorithms with superior performance have emerged, such as RCAN based on channel attention, and pixel attention mechanism based on PAN built, SwinIR built based on the Transformer self-attention mechanism, MAN based on multi-scale large-core attention, etc. However, using only single-image super-resolution methods to independently reconstruct binocular images only uses intra-image self-similarity to recover details, but ignores the additional information that can be exploited across views, that is, cross-view similarity, limitations further improve the super-resolution performance. Therefore, fully utilizing cross-view information can help reconstruct higher-quality super-resolution images, since one view may have complementary information about the same scene area relative to another view. With the needs of society, various super-resolution reconstruction technologies adapted to the times have been proposed. Binocular image super-resolution reconstruction technology has also been applied to the research foundation in various fields, which has great application research significance. This resulted in PASSRnet based on disparity attention, iPASSR based on bidirectional disparity attention mechanism, SwinFSR based on Transformer self-attention mechanism, CVHSSR based on large kernel convolution attention mechanism, etc.
尽管为了有效地从内部视图和跨试图中提取更多的特征,有许多尝试探索将多种注意力机制与双目图像超分辨率结合,但大部分双目图像超分辨率方法,在恢复图像自然纹理和边缘细节上差强人意仍然是一个悬而未决的问题。因此,如何基于多种注意力机制,有效利用不同注意力特征间的视点间依赖性,以重建超分辨率双目图像,需要进一步探索。Although there are many attempts to explore combining multiple attention mechanisms with binocular image super-resolution in order to effectively extract more features from internal views and across views, most of the binocular image super-resolution methods are ineffective in recovering images. Natural texture and edge detail are still an open question. Therefore, how to effectively utilize the inter-viewpoint dependence between different attention features based on multiple attention mechanisms to reconstruct super-resolution binocular images requires further exploration.
发明内容Contents of the invention
有鉴于此,本发明的目的在于提供一种基于多级强化注意力机制的双目图像超分辨率重建方法,充分利用视图内的特征信息进行融合处理,且使用频域相关的损失函数对频域进行处理,加强低频信息和图像整体结构的保留,对使得超分辨率后的双目图像恢复出更好的效果,恢复更加清晰的纹理和边缘细节。In view of this, the purpose of the present invention is to provide a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism, making full use of the feature information in the view for fusion processing, and using a frequency domain-related loss function to Processing in the domain enhances the retention of low-frequency information and the overall structure of the image, and has a better effect on restoring the binocular image after super-resolution, and restores clearer texture and edge details.
为实现上述目的,本发明采用如下技术方案:基于多级强化注意力机制的双目图像超分辨率重建方法,包括以下步骤:To achieve the above objectives, the present invention adopts the following technical solution: a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism, which includes the following steps:
步骤S1,建立双目图像训练集;将双目图像超分辨率数据集划分成训练集和测试集,低分辨率图像是通过双三次下采样生成的;在训练阶段,生成的低分辨率图像被裁剪成小块,相应的高分辨率图像也被裁剪,同时这些小块被随机地水平和垂直翻转用来增强训练数据;Step S1, establish a binocular image training set; divide the binocular image super-resolution data set into a training set and a test set. The low-resolution images are generated by bicubic downsampling; in the training phase, the generated low-resolution images are cropped into small patches, the corresponding high-resolution images are also cropped, and these patches are randomly flipped horizontally and vertically to enhance the training data;
步骤S2,建立并训练基于多级强化注意力机制的双目超分辨率重建网络模型;网络以一对低分辨率的RGB双目图像作为输入,生成超分辨率的双目图像;Step S2, establish and train a binocular super-resolution reconstruction network model based on a multi-level enhanced attention mechanism; the network uses a pair of low-resolution RGB binocular images as input to generate a super-resolution binocular image;
步骤S3,构建损失函数;采用L1损失函数和与频域损失函数相结合的方法来增强高层次特征空间的监督,约束网络的训练;Step S3, construct a loss function; use the L 1 loss function and the method combined with the frequency domain loss function to enhance the supervision of high-level feature space and constrain the training of the network;
步骤S4,设置训练参数进行网络训练;Step S4, set training parameters for network training;
步骤S5,测试网络性能;将低分辨率的双目图像对作为测试样本,输入上一步训练完成的网络模型中,获得超分辨率的双目图像对,用客观评价指标以及视觉效果对比检验超分辨率的效果。Step S5, test network performance; use low-resolution binocular image pairs as test samples, input them into the network model trained in the previous step, obtain super-resolution binocular image pairs, and use objective evaluation indicators and visual effect comparison to test the super-resolution resolution effect.
在一较佳的实施例中,具体来说,基于多级强化注意力机制的双目超分辨率重建网络模型包含两个左右权重共享网络分支;在每个权重共享网络中,对混合注意信息提取模块进行堆叠,提取左右图像的视内通道和空间特征;双目交互视图注意模块用于捕获从左右双目图像中提取的全局对应的信息和交叉视图信息;具体分为三个部分:视图内特征提取、交互视图特征融合和双目图像重建。In a preferred embodiment, specifically, the binocular super-resolution reconstruction network model based on the multi-level enhanced attention mechanism includes two left and right weight sharing network branches; in each weight sharing network, the mixed attention information is The extraction modules are stacked to extract the intra-view channels and spatial features of the left and right images; the binocular interactive view attention module is used to capture the global corresponding information and cross-view information extracted from the left and right binocular images; it is divided into three parts: View Intra-feature extraction, interactive view feature fusion and binocular image reconstruction.
在一较佳的实施例中,所述步骤2具体包括以下步骤:In a preferred embodiment, step 2 specifically includes the following steps:
步骤S21,视图内特征提取;在特征提取阶段,首先将输入的双目图像输入到3×3卷积层中提取浅层特征,生成高维特征/>其中C为特征通道数;然后将高维特征输入到堆叠的多注意增强块中进行视内特征提取,以获取更多的局部特征和交互信息,恢复更准确的纹理细节;其中,多注意增强块包含混合注意信息提取模块和双目交互视图注意模块;Step S21, in-view feature extraction; in the feature extraction stage, first the input binocular image Input to 3×3 convolution layer to extract shallow features and generate high-dimensional features/> where C is the number of feature channels; then the high-dimensional features are input into the stacked multi-attention enhancement block for intra-view feature extraction to obtain more local features and interactive information and restore more accurate texture details; among them, multi-attention enhancement The block contains a hybrid attention information extraction module and a binocular interactive view attention module;
混合注意信息提取模块是网络左右分支的基本模块,通过捕获远程和本地依赖关系来更深入地提取视图中的特征;混合注意信息提取模块由两个顺序连接的模块组成;第一个是简化通道和空间信息提取模块,第二个是残差信息聚合的前馈网络模块;两部分的计算过程如下:The hybrid attention information extraction module is the basic module of the left and right branches of the network, which extracts features in the view more deeply by capturing remote and local dependencies; the hybrid attention information extraction module consists of two sequentially connected modules; the first one is a simplified channel and spatial information extraction module, the second one is the feed-forward network module for residual information aggregation; the calculation process of the two parts is as follows:
在第一个模块中,经过层归一化后,使用1×1的卷积层扩展输入特征映射的通道之后,产生的输出将通过3×3深度卷积传递,以捕获每个通道的本地上下文;再使用交叉激活结构A单元来进一步学习空间上下文的有效表示;下一步是简化通道空间注意模块,充分利用通道注意机制和空间注意机制,给定原始输入/>首先利用平均池化和1×1卷积运算来学习给定输入图像的特征映射的通道间关系,实现全局空间信息聚合和通道信息交互的功能,输出简化通道注意特征X1;通过平均池化和最大池化操作对特征图的空间信息进行聚合,再采用3×3卷积和Sigmoid函数相结合的方法获得简化的空间注意图;最后,输入特征映射和Sigmoid层的输出的逐元素相乘的结果作为简化空间注意模块的输出X2;简化通道空间注意模块表示为:In the first module, after layer normalization, a 1×1 convolutional layer is used to expand the channels of the input feature map. After that, the generated output will be passed through 3×3 depth convolution to capture the local context of each channel; the cross-activation structure A unit is then used to further learn an effective representation of the spatial context; the next step is to simplify the channel spatial attention module to fully Using the channel attention mechanism and spatial attention mechanism, given the original input/> First, average pooling and 1×1 convolution operations are used to learn the inter-channel relationship of the feature map of a given input image, realize the functions of global spatial information aggregation and channel information interaction, and output simplified channel attention features X 1 ; through average pooling and maximum pooling operation to aggregate the spatial information of the feature map, and then use a method combining 3×3 convolution and Sigmoid function to obtain a simplified spatial attention map; finally, the input feature map and the output of the Sigmoid layer are multiplied element by element The result of is taken as the output of the simplified spatial attention module X 2 ; the simplified channel spatial attention module is expressed as:
式中WC(·),HAP(·)分别为1×1卷积、平均池化操作,HAP,1(·),HMP,1(·)分别表示对第1维度上进行平均池化、最大池化操作,Hcat(·)表示按维数为1的形式进行拼接,σ(·)分别表示Sigmoid激活函数,Θ表示逐元素相乘;In the formula, W C (·), H AP (·) are respectively 1×1 convolution and average pooling operations, H AP,1 (·), H MP,1 (·) respectively represent the averaging in the first dimension. Pooling and maximum pooling operations, H cat (·) represents splicing in the form of 1 dimension, σ (·) represents Sigmoid activation function respectively, Θ represents element-wise multiplication;
经过简化通道和空间信息提取模块后,对特征映射的通道进行1×1卷积反变换,以产生自适应的特征细化,得到第一个模块的结果;在第二个模块中,在对前一个模块的输出结果进行归一化之后,通过一个含交叉激活结构B单元的残差信息聚合前馈网络来提高局部上下文感知能力;具体地说,给定一个输入张量首先使用1×1的卷积层将X′扩展到更高的维度/>其中k为扩展比;接下来,使用一个3×3的深度卷积层对X′1相邻像素位置的信息进行编码,然后使用CAS-B单元作为深度卷积层的激活函数,特征通道数量减半输出;最后,通过1×1卷积层重新映射到的初始输入维X′2;After simplifying the channel and spatial information extraction modules, a 1×1 convolutional inverse transformation is performed on the feature map channels to produce adaptive feature refinement, and the results of the first module are obtained; in the second module, after After the output of the previous module is normalized, a residual information aggregation feed-forward network containing a cross-activation structure B unit is used to improve local context awareness; specifically, given an input tensor First use a 1×1 convolutional layer to extend X′ to a higher dimension/> where k is the expansion ratio; next, a 3×3 depth convolution layer is used to encode the information of X′ 1 adjacent pixel position, and then the CAS-B unit is used as the activation function of the depth convolution layer, the number of feature channels Halve the output; finally, remap the initial input dimension X′ 2 through a 1×1 convolution layer;
上述过程表示为:The above process is expressed as:
式中WC(·),WD分别为层归一化、1×1卷积、3×3深度卷积,CAS.B(·)表示交叉激活结构的B单元;In the formula, W C (·) and W D are layer normalization, 1×1 convolution, and 3×3 depth convolution respectively, and CAS.B(·) represents the B unit of the cross-activation structure;
最后,与前一个模块一样,将后一个模块的输入和卷积层的输出相加,作为最终结果;Finally, like the previous module, the input of the latter module and the output of the convolutional layer are added as the final result;
步骤S22,交叉视图特征融合,在左右分支的混合注意信息提取模块之后使用双目交互视图注意模块;双目交互视图注意模块利用前一步生成的双目特征作为输入,进行双向跨视图交互,并生成与该视图输入特征融合的交互特征;具体来说,给定输入双目视图特性经过层归一化和1×1卷积运算,得到双目特征和/>其中/>是1×1卷积;然后,通过执行大小为k的快速1D卷积来生成通道权值/>其中k通过通道维数C的映射自适应确定,并将通道权值与双目特征逐元素相乘获得聚合特征/>如下所示:Step S22, cross-view feature fusion, uses the binocular interactive view attention module after the mixed attention information extraction module of the left and right branches; the binocular interactive view attention module uses the binocular features generated in the previous step as input to perform two-way cross-view interaction, and Generate interaction features that are fused with the input features of that view; specifically, given the input binocular view features After layer normalization and 1×1 convolution operation, binocular features are obtained and/> Among them/> is a 1×1 convolution; the channel weights are then generated by performing a fast 1D convolution of size k/> Among them, k is determined adaptively through the mapping of the channel dimension C, and the channel weight and the binocular feature are multiplied element by element to obtain the aggregate feature/> As follows:
式中,HMP(·)分别代表k×k卷积层、最大池化操作,Θ表示逐元素相乘;In the formula, H MP (·) represents k×k convolution layer and maximum pooling operation respectively, and Θ represents element-wise multiplication;
通过计算一次注意力矩阵,同时生成FR→L,FL→R;最后,通过逐元素相加将交互作用的交叉视图信息和与内部视图信息FL、FR融合在一起,根据式中/>是源视图内特征(如左视图)内特征投影的查询矩阵,/>是目标视图内特征(如右视图)内特征投影的键值矩阵;表示如下:By calculating the attention matrix once, F R→L and F L→R are generated at the same time; finally, the interactive cross-view information and the internal view information FL and FR are fused together through element-by-element addition. According to Formula in/> Is the query matrix of the feature projection in the source view (such as the left view), /> Is the key value matrix of the feature projection within the target view (such as the right view); expressed as follows:
式中,γL和γR是可训练通道缩放并初始化为零以稳定训练,where γ L and γ R are trainable channel scaling and initialized to zero to stabilize training,
是1×1卷积; It is a 1×1 convolution;
步骤S23,双目图像重建;融合特征提取后,输出到3×3卷积层和空间注意增强模块,最后使用像素重组操作将输出特征上采样到高分辨率大小,并使用全局残差路径来利用输入的双目图像信息进一步提高超分辨性能,恢复出左右试图的超分辨率图像 Step S23, binocular image reconstruction; after fusion feature extraction, output to 3×3 convolution layer and spatial attention enhancement module, and finally use pixel reorganization operation to upsample the output features to a high-resolution size, and use the global residual path to Use the input binocular image information to further improve the super-resolution performance and restore the super-resolution images of the left and right views.
式中HC(·),HE(·),HP(·),H↑(·)分别为卷积运算、增强空间注意模块、像素重组和双线性插值的上采样操作;In the formula, H C (·), H E (·), H P (·), H ↑ (·) are the upsampling operations of convolution operation, enhanced spatial attention module, pixel reorganization and bilinear interpolation respectively;
增强空间注意模块将给定的输入发送到1×1的卷积层得到其中WC(·)为1×1卷积,以减小输入特征的通道大小;然后,块使用跨行卷积和跨行最大池化层来减小空间大小;经过一组卷积后,为了提取特征,进行基于双线性插值的上采样以恢复空间大小;结合残差连通性,对特征进行进一步处理得到1×1卷积层恢复通道大小;最后,由Sigmoid函数生成注意力矩阵,并乘以原始输入特征X″。The enhanced spatial attention module converts the given input Sent to a 1×1 convolutional layer to get where W C (·) is a 1×1 convolution to reduce the channel size of the input feature; then, the block uses cross-row convolution and cross-row max pooling layers to reduce the spatial size; after a set of convolutions, in order to extract Features are upsampled based on bilinear interpolation to restore the spatial size; combined with residual connectivity, the features are further processed to obtain a 1×1 convolution layer to restore the channel size; finally, the attention matrix is generated by the Sigmoid function and multiplied Take the original input feature X″.
在一较佳的实施例中,步骤S3中,总差异的损失写成:In a preferred embodiment, in step S3, the total difference loss is written as:
L=LSR+λLFFT,L=L SR +λL FFT ,
(10)(10)
式中,LSR,LFFT分别表示L1重建损失函数、频率Charbonnier loss的频域损失,λ表示用于控制频率Charbonnier损失函数的超参数;所有实验的参数λ都设置为0.1;In the formula, L SR and L FFT respectively represent the frequency domain loss of L 1 reconstruction loss function and frequency Charbonnier loss, and λ represents the hyperparameter used to control the frequency Charbonnier loss function; the parameter λ of all experiments is set to 0.1;
SR重建损失;SR重建损失本质上是一个L1损失函数;使用超分辨率和地面真实双目图像之间的像素级L1距离,获得PSNR;表示如下:SR reconstruction loss; SR reconstruction loss is essentially an L 1 loss function; using the pixel-level L 1 distance between the super-resolution and ground-truth binocular images, PSNR is obtained; expressed as follows:
式中,分别为模型生成的超分辨率左、右图像,/>分别为其高分辨率图像;In the formula, The super-resolution left and right images generated by the model, /> their high-resolution images respectively;
频域损失;引入频率Charbonnier loss的频域损失;表示如下:Frequency domain loss; frequency domain loss that introduces frequency Charbonnier loss; expressed as follows:
式中,实验性地将常数ε设为10-3;FFT(·)表示快速傅里叶变换。In the formula, the constant ε is experimentally set to 10 -3 ; FFT(·) represents the fast Fourier transform.
在一较佳的实施例中,步骤S4中,采用AdamW进行优化,其中β1=0.9,β2=0.9,权值默认为0;学习率初始设置为1×10-3,并通过余弦退火策略降低为1×10-7;模型在30×90个补丁上进行训练;在训练过程中,每批32个样本平均分布到8个部分,迭代2×105次。In a preferred embodiment, in step S4, AdamW is used for optimization, where β1=0.9, β2=0.9, and the weight defaults to 0; the learning rate is initially set to 1×10 -3 and is reduced through the cosine annealing strategy is 1×10 -7 ; the model is trained on 30×90 patches; during the training process, each batch of 32 samples is evenly distributed into 8 parts and iterated 2×10 5 times.
与现有技术相比,本发明具有以下有益效果:本发明提供了一种基于多级强化注意力机制的双目图像超分辨率重建方法,本发明通过构建基于多级强化注意力机制的网络模型,集成了多种注意力机制,全面而又高效的增强了视图内信息和交叉试图间的交互,更好的提取双目图像左右视图中未能完全利用的超分辨率信息,扩大感受野的同时降低计算量;使用新的交叉注意模块,利用高效通道注意机制,在高效交互方面取得了较好的平衡;使用的通道特征和空间特征融合通过特征之间的远程依赖关系向前传播重要信息,有效提升了检测的鲁棒性和泛化能力,提升了超分辨率在某些边缘的效果,更好地恢复图像自然纹理,以更少的计算量得到更好的超分辨率结果。Compared with the existing technology, the present invention has the following beneficial effects: The present invention provides a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism. The present invention constructs a network based on a multi-level enhanced attention mechanism. The model integrates multiple attention mechanisms, comprehensively and efficiently enhances the interaction between in-view information and cross-views, better extracts super-resolution information that cannot be fully utilized in the left and right views of binocular images, and expands the receptive field. while reducing the amount of calculation; using a new cross-attention module and utilizing an efficient channel attention mechanism, a good balance is achieved in efficient interaction; the channel features and spatial feature fusion used are important for forward propagation through the long-range dependencies between features Information, effectively improves the robustness and generalization ability of detection, improves the effect of super-resolution on certain edges, better restores the natural texture of the image, and obtains better super-resolution results with less calculation.
附图说明Description of the drawings
图1为本发明优选实施例的基于多级强化注意力机制的双目图像超分辨率重建方法的流程图;Figure 1 is a flow chart of a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism according to a preferred embodiment of the present invention;
图2为本发明优选实施例的双目超分辨率图像网络结构示意图;Figure 2 is a schematic structural diagram of a binocular super-resolution image network according to a preferred embodiment of the present invention;
图3为本发明优选实施例的混合注意信息提取模块结构示意图;Figure 3 is a schematic structural diagram of the hybrid attention information extraction module according to the preferred embodiment of the present invention;
图4为本发明优选实施例的简化通道空间注意模块示意图;Figure 4 is a schematic diagram of a simplified channel space attention module according to a preferred embodiment of the present invention;
图5为本发明优选实施例的交叉激活结构示意图;Figure 5 is a schematic diagram of the cross-activation structure of a preferred embodiment of the present invention;
图6为本发明优选实施例的双目交互视图注意模块示意图;Figure 6 is a schematic diagram of the binocular interactive view attention module according to the preferred embodiment of the present invention;
图7为本发明优选实施例中所展示的双目图像超分辨率结果图。Figure 7 is a binocular image super-resolution result diagram shown in the preferred embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and examples.
应该指出,以下详细说明都是例示性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless otherwise defined, all technical and scientific terms used herein have the same meanings commonly understood by one of ordinary skill in the art to which this application belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式;如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are for the purpose of describing particular embodiments only and are not intended to limit the exemplary embodiments according to the application; as used herein, the singular form is also intended to include unless the context clearly indicates otherwise. Plural forms, in addition, it should also be understood that when the terms "comprises" and/or "includes" are used in this specification, they indicate the presence of features, steps, operations, means, components and/or combinations thereof.
本发明提供一种基于多级强化注意力机制的双目图像超分辨率方法,包括以下四个步骤:The present invention provides a binocular image super-resolution method based on a multi-level enhanced attention mechanism, which includes the following four steps:
第一,建立双目图像训练集。用现有公开的双目图像超分辨率数据集作为训练样本,低分辨率图像是通过双三次下采样生成的。在训练阶段,生成的低分辨率图像被裁剪成小块,相应的高分辨率图像也被裁剪,同时这些小块被随机地水平和垂直翻转用来增强训练数据。First, establish a binocular image training set. Using existing public binocular image super-resolution datasets as training samples, low-resolution images are generated by bicubic downsampling. During the training phase, the generated low-resolution images are cropped into small patches, the corresponding high-resolution images are also cropped, and these patches are randomly flipped horizontally and vertically to enhance the training data.
第二,设计网络结构。整体网络由三部分组成:视图内特征提取、交互视图特征融合和双目图像重建。在视图内特征提取阶段,首先将输入的双目图像输入到卷积层中提取浅层特征,生成高维特征。然后将高维特征输入到堆叠的多注意增强块中进行视内特征提取,以获取更多的局部特征和交互信息,恢复更准确的纹理细节。在交叉视图特征融合阶段,为了捕获左右视图之间的交叉信息,在左右分支的混合注意信息提取模块之后使用双目交互视图注意模块。双目交互视图注意模块利用前一步混合注意信息提取模块生成的双目特征作为输入,进行双向跨视图交互,并生成与该视图输入特征融合的交互特征。在双目图像重建阶段,融合特征提取后,输出到卷积层和空间注意增强模块,最后使用像素重组操作将输出特征张量上采样,恢复出超分辨率的左右视图图像。Second, design the network structure. The overall network consists of three parts: intra-view feature extraction, interactive view feature fusion and binocular image reconstruction. In the in-view feature extraction stage, the input binocular images are first input into the convolution layer to extract shallow features and generate high-dimensional features. The high-dimensional features are then input into stacked multi-attention enhancement blocks for intra-view feature extraction to obtain more local features and interactive information and restore more accurate texture details. In the cross-view feature fusion stage, in order to capture the cross information between the left and right views, a binocular interactive view attention module is used after the hybrid attention information extraction module of the left and right branches. The binocular interaction view attention module uses the binocular features generated by the hybrid attention information extraction module in the previous step as input to perform two-way cross-view interaction and generate interaction features that are fused with the input features of the view. In the binocular image reconstruction stage, after fusion feature extraction, the output is output to the convolution layer and spatial attention enhancement module. Finally, the output feature tensor is upsampled using a pixel reorganization operation to restore the super-resolution left and right view images.
第三,构建损失函数。为了增强双目图像的纹理细节,并保持视点间的视差一致性,本发明采用均方误差损失函数和与频域损失函数相结合的方法来增强高层次特征空间的监督,约束网络的训练。总差异的损失可以写成:Third, construct a loss function. In order to enhance the texture details of binocular images and maintain the disparity consistency between viewpoints, the present invention uses the mean square error loss function and a method combined with the frequency domain loss function to enhance the supervision of high-level feature spaces and constrain the training of the network. The total difference loss can be written as:
L=LSR+λLFFT,L=L SR +λL FFT ,
式中,LSR,LFFT分别表示L1的均方误差损失函数和频率Charbonnier损失函数,λ表示用于控制频率Charbonnier损失函数的超参数,根据以往经验,设置为0.1。In the formula, L SR and L FFT respectively represent the mean square error loss function and frequency Charbonnier loss function of L 1 , and λ represents the hyperparameter used to control the frequency Charbonnier loss function. Based on past experience, it is set to 0.1.
第四,设置训练参数进行网络训练。选择合适的优化器,并对损失函数、学习率、最大迭代次数以及每批训练样本大小等参数进行设置,对网络进行训练,直至训练完成得到最终的网络权重模型;Fourth, set training parameters for network training. Select an appropriate optimizer, set parameters such as the loss function, learning rate, maximum number of iterations, and each batch of training sample size, and train the network until the training is completed to obtain the final network weight model;
第五,测试网络性能。将低分辨率的双目图像对作为测试样本,输入上一步训练完成的网络模型中,获得超分辨率的双目图像对,用客观评价指标以及视觉效果对比检验超分辨率的效果。Fifth, test network performance. Use the low-resolution binocular image pair as a test sample and input it into the network model trained in the previous step to obtain a super-resolution binocular image pair. Use objective evaluation indicators and visual effect comparison to test the super-resolution effect.
图1为本发明方法流程图。对于双目图像按照如下详细步骤进行超分辨率处理:Figure 1 is a flow chart of the method of the present invention. For binocular images, follow the detailed steps below to perform super-resolution processing:
步骤S1,建立双目图像训练集。将现有公开的双目图像超分辨率数据集划分成训练集和测试集,低分辨率图像是通过双三次下采样生成的。在训练阶段,生成的低分辨率图像被裁剪成小块,相应的高分辨率图像也被裁剪,同时这些小块被随机地水平和垂直翻转用来增强训练数据。Step S1: Establish a binocular image training set. The existing public binocular image super-resolution data set is divided into a training set and a test set, and the low-resolution images are generated by bicubic downsampling. During the training phase, the generated low-resolution images are cropped into small patches, the corresponding high-resolution images are also cropped, and these patches are randomly flipped horizontally and vertically to enhance the training data.
步骤S2,建立并训练基于多级强化注意力机制的双目超分辨率重建网络模型。如图2所示,网络以一对低分辨率的RGB双目图像作为输入,生成超分辨率的双目图像。具体来说,网络包含两个左右权重共享网络分支。在每个权重共享网络中,对混合注意信息提取模块进行堆叠,提取左右图像的视内通道和空间特征。双目交互视图注意模块用于捕获从左右双目图像中提取的全局对应的信息和交叉视图信息。总的来说,可以分为三个部分:视图内特征提取、交互视图特征融合和双目图像重建。Step S2: Establish and train a binocular super-resolution reconstruction network model based on the multi-level enhanced attention mechanism. As shown in Figure 2, the network takes a pair of low-resolution RGB binocular images as input and generates a super-resolution binocular image. Specifically, the network contains two left and right weight-sharing network branches. In each weight-sharing network, hybrid attention information extraction modules are stacked to extract intra-view channels and spatial features of the left and right images. The binocular interactive view attention module is used to capture the global corresponding information and cross-view information extracted from the left and right binocular images. In general, it can be divided into three parts: intra-view feature extraction, interactive view feature fusion and binocular image reconstruction.
步骤S2.1,视图内特征提取。在特征提取阶段,首先将输入的双目图像输入到3×3卷积层中提取浅层特征,生成高维特征/>其中C为特征通道数。然后将高维特征输入到堆叠的多注意增强块中进行视内特征提取,以获取更多的局部特征和交互信息,恢复更准确的纹理细节。其中,多注意增强块包含混合注意信息提取模块和双目交互视图注意模块。Step S2.1, feature extraction within the view. In the feature extraction stage, the input binocular image is first Input to 3×3 convolution layer to extract shallow features and generate high-dimensional features/> Where C is the number of characteristic channels. The high-dimensional features are then input into stacked multi-attention enhancement blocks for intra-view feature extraction to obtain more local features and interactive information and restore more accurate texture details. Among them, the multi-attention enhancement block includes a hybrid attention information extraction module and a binocular interactive view attention module.
如图3所示,混合注意信息提取模块是网络左右分支的基本模块,可以通过捕获远程和本地依赖关系来更深入地提取视图中的特征。混合注意信息提取模块由两个顺序连接的模块组成。第一个是简化通道和空间信息提取模块,第二个是残差信息聚合的前馈网络模块。两部分的计算过程如下:As shown in Figure 3, the hybrid attention information extraction module is the basic module of the left and right branches of the network, which can extract features in the view more deeply by capturing remote and local dependencies. The hybrid attention information extraction module consists of two sequentially connected modules. The first is a simplified channel and spatial information extraction module, and the second is a feed-forward network module for residual information aggregation. The calculation process of the two parts is as follows:
在第一个模块中,经过层归一化后,使用1×1的卷积层扩展输入特征映射的通道之后,产生的输出将通过3×3深度卷积传递,以捕获每个通道的本地上下文。再使用交叉激活结构A单元(如图4所示)来进一步学习空间上下文的有效表示。下一步是简化通道空间注意模块,如图5所示,充分利用通道注意机制和空间注意机制,过滤掉不太有用的信息,给定原始输入/>首先利用平均池化和1×1卷积运算来学习给定输入图像的特征映射的通道间关系,实现全局空间信息聚合和通道信息交互的功能,输出简化通道注意特征/>通过平均池化和最大池化操作对特征图的空间信息进行聚合,再采用3×3卷积和Sigmoid函数相结合的方法获得简化的空间注意图。最后,输入特征映射和Sigmoid层的输出的逐元素相乘的结果作为简化空间注意模块的输出简化通道空间注意模块可以表示为:In the first module, after layer normalization, a 1×1 convolutional layer is used to expand the channels of the input feature map. Afterwards, the resulting output is passed through a 3×3 depthwise convolution to capture the local context of each channel. The cross-activation structure unit A (shown in Figure 4) is then used to further learn an effective representation of spatial context. The next step is to simplify the channel spatial attention module, as shown in Figure 5, making full use of the channel attention mechanism and spatial attention mechanism to filter out less useful information, given the original input/> First, average pooling and 1×1 convolution operations are used to learn the inter-channel relationship of the feature map of a given input image, realize the functions of global spatial information aggregation and channel information interaction, and output simplified channel attention features/> The spatial information of the feature map is aggregated through average pooling and maximum pooling operations, and then a simplified spatial attention map is obtained by combining 3×3 convolution and Sigmoid function. Finally, the result of the element-wise multiplication of the input feature map and the output of the Sigmoid layer is used as the output of the simplified spatial attention module. The simplified channel space attention module can be expressed as:
式中WC(·),HAP(·)分别为1×1卷积、平均池化操作,HAP,1(·),HMP,1(·)分别表示对第1维度上进行平均池化、最大池化操作,Hcat(·)表示按维数为1的形式进行拼接,σ(·)分别表示Sigmoid激活函数,Θ表示逐元素相乘。In the formula, W C (·), H AP (·) are respectively 1×1 convolution and average pooling operations, H AP,1 (·), H MP,1 (·) respectively represent the averaging in the first dimension. Pooling and maximum pooling operations, H cat (·) means splicing in the form of 1 dimension, σ (·) respectively represents the Sigmoid activation function, and Θ means element-wise multiplication.
经过简化通道和空间信息提取模块后,对特征映射的通道进行1×1的卷积反变换,以产生自适应的特征细化,得到第一个模块的结果。After simplifying the channel and spatial information extraction modules, a 1×1 convolutional inverse transformation is performed on the feature mapped channels to produce adaptive feature refinement, and the results of the first module are obtained.
在第二个模块中,在对前一个模块的输出结果进行归一化之后,通过一个含交叉激活结构B单元(如图4所示)的残差信息聚合前馈网络来提高局部上下文感知能力。具体地说,给定一个输入张量首先使用1×1的卷积层将X′扩展到更高的维度其中k为扩展比。接下来,使用一个3×3的深度卷积层对X′1相邻像素位置的信息进行编码,然后使用CAS-B单元作为深度卷积层的激活函数,特征通道数量减半输出。最后,通过1×1卷积层重新映射到的初始输入维X′2。上述过程可表示为:In the second module, after normalizing the output results of the previous module, a residual information aggregation feed-forward network containing a cross-activation structure B unit (shown in Figure 4) is used to improve local context awareness. . Specifically, given an input tensor First extend X′ to higher dimensions using a 1×1 convolutional layer where k is the expansion ratio. Next, a 3×3 depth convolution layer is used to encode the information of the adjacent pixel position of X′ 1 , and then the CAS-B unit is used as the activation function of the depth convolution layer, and the number of feature channels is halved for output. Finally, it is remapped to the initial input dimension X′ 2 through a 1 × 1 convolutional layer. The above process can be expressed as:
式中WC(·),WD分别为层归一化、1×1卷积、3×3深度卷积,CAS.B(·)表示交叉激活结构的B单元。In the formula, W C (·) and W D are layer normalization, 1×1 convolution, and 3×3 depth convolution respectively, and CAS.B(·) represents the B unit of the cross-activation structure.
最后,与前一个模块一样,将后一个模块的输入和卷积层的输出相加,作为最终结果。Finally, like the previous module, the input of the latter module and the output of the convolutional layer are added together as the final result.
步骤S2.2,交叉视图特征融合。如图6所示,为了捕获左右视图之间的交叉信息,在左右分支的混合注意信息提取模块之后使用双目交互视图注意模块。双目交互视图注意模块利用前一步生成的双目特征作为输入,进行双向跨视图交互,并生成与该视图输入特征融合的交互特征。具体来说,给定输入双目视图特性经过层归一化和1×1卷积运算,得到双目特征/>和/>其中/>是1×1卷积。然后,通过执行大小为k的快速1D卷积来生成通道权值/>其中k通过通道维数C的映射自适应确定,并将通道权值与双目特征逐元素相乘获得聚合特征/>如下所示:Step S2.2, cross-view feature fusion. As shown in Figure 6, in order to capture the cross information between the left and right views, a binocular interactive view attention module is used after the hybrid attention information extraction module of the left and right branches. The binocular interactive view attention module uses the binocular features generated in the previous step as input to perform two-way cross-view interaction and generate interactive features that are fused with the input features of the view. Specifically, given the input binocular view characteristics After layer normalization and 1×1 convolution operation, the binocular features/> and/> Among them/> It is a 1×1 convolution. Then, channel weights are generated by performing a fast 1D convolution of size k/> Among them, k is determined adaptively through the mapping of the channel dimension C, and the channel weight and the binocular feature are multiplied element by element to obtain the aggregate feature/> As follows:
式中,分别代表k×k卷积层、最大池化操作,Θ表示逐元素相乘。In the formula, Represent k×k convolution layer and maximum pooling operation respectively, and Θ represents element-wise multiplication.
通过计算一次注意力矩阵,同时生成FR→L,FL→R。最后,通过逐元素相加将交互作用的交叉视图信息和与内部视图信息FL、FR融合在一起,根据式中/>是源视图内特征(如左视图)内特征投影的查询矩阵,/>是目标视图内特征(如右视图)内特征投影的键值矩阵。以上可表示如下:By calculating the attention matrix once, F R→L and F L→R are generated simultaneously. Finally, the interactive cross-view information and the internal view information FL and FR are fused together through element-by-element addition, according to Formula in/> Is the query matrix of the feature projection in the source view (such as the left view), /> Is the key value matrix of the feature projection within the target view (such as the right view). The above can be expressed as follows:
式中,γL和γR是可训练通道缩放并初始化为零以稳定训练,是1×1卷积。where γ L and γ R are trainable channel scaling and initialized to zero to stabilize training, It is a 1×1 convolution.
步骤S2.3,双目图像重建。融合特征提取后,输出到3×3卷积层和空间注意增强模块,最后使用像素重组操作将输出特征上采样到高分辨率大小。此外,为了减少特征提取的负担,在这一部分中使用全局残差路径来利用输入的双目图像信息进一步提高超分辨性能,恢复出左右试图的超分辨率图像 Step S2.3, binocular image reconstruction. After fusion feature extraction, it is output to a 3×3 convolutional layer and spatial attention enhancement module, and finally a pixel reorganization operation is used to upsample the output features to a high-resolution size. In addition, in order to reduce the burden of feature extraction, a global residual path is used in this part to use the input binocular image information to further improve the super-resolution performance and recover the super-resolution images of the left and right views.
式中HC(·),HE(·),HP(·),H↑(·)分别为卷积运算、增强空间注意模块、像素重组和双线性插值的上采样操作。In the formula, H C (·), H E (·), H P (·), H ↑ (·) are the upsampling operations of convolution operation, enhanced spatial attention module, pixel reorganization and bilinear interpolation respectively.
增强空间注意模块将给定的输入发送到1×1的卷积层得到其中WC(·)为1×1卷积,以减小输入特征的通道大小。然后,块使用跨行卷积和跨行最大池化层来减小空间大小。经过一组卷积后,为了提取特征,进行基于双线性插值的上采样以恢复空间大小。结合残差连通性,对特征进行进一步处理得到1×1卷积层恢复通道大小。最后,由Sigmoid函数生成注意力矩阵,并乘以原始输入特征X″。The enhanced spatial attention module converts the given input Sent to a 1×1 convolutional layer to get Where W C (·) is a 1×1 convolution to reduce the channel size of the input feature. Then, the block uses cross-row convolution and cross-row max pooling layers to reduce the spatial size. After a set of convolutions, in order to extract features, upsampling based on bilinear interpolation is performed to restore the spatial size. Combined with residual connectivity, the features are further processed to obtain a 1×1 convolutional layer to restore the channel size. Finally, the attention matrix is generated by the Sigmoid function and multiplied by the original input feature X″.
步骤S3,构建损失函数。为了增强双目图像的纹理细节,并保持视点间的视差一致性,本发明采用L1损失函数和与频域损失函数相结合的方法来增强高层次特征空间的监督,约束网络的训练。总差异的损失可以写成:Step S3, construct the loss function. In order to enhance the texture details of binocular images and maintain the disparity consistency between viewpoints, the present invention uses the L 1 loss function and a method combined with the frequency domain loss function to enhance the supervision of high-level feature spaces and constrain the training of the network. The total difference loss can be written as:
L=LSR+λLFFT, (10)L=L SR +λL FFT , (10)
式中,LSR,LFFT分别表示L1重建损失函数、频率Charbonnier loss的频域损失,λ表示用于控制频率Charbonnier损失函数的超参数。所有实验的参数λ都设置为0.1。In the formula, L SR and L FFT respectively represent the frequency domain loss of the L 1 reconstruction loss function and frequency Charbonnier loss, and λ represents the hyperparameter used to control the frequency Charbonnier loss function. The parameter λ was set to 0.1 for all experiments.
SR重建损失。SR重建损失本质上是一个L1损失函数。为了达到更快的收敛,本发明使用了超分辨率和地面真实双目图像之间的像素级L1距离,避免了过于光滑的纹理,从而获得更高的PSNR。可表示如下:SR reconstruction loss. The SR reconstruction loss is essentially an L 1 loss function. In order to achieve faster convergence, the present invention uses the pixel-level L 1 distance between the super-resolution and ground-truth binocular images to avoid overly smooth textures and thereby obtain higher PSNR. It can be expressed as follows:
式中,分别为模型生成的超分辨率左、右图像,/>分别为其高分辨率图像。In the formula, The super-resolution left and right images generated by the model, /> respectively for their high-resolution images.
频域损失。为了更好恢复图像超分辨率任务中的高频细节,本发明引入了频率Charbonnier loss的频域损失。可表示如下:frequency domain loss. In order to better restore high-frequency details in the image super-resolution task, the present invention introduces the frequency domain loss of frequency Charbonnier loss. It can be expressed as follows:
式中,实验性地将常数ε设为10-3。FFT(·)表示快速傅里叶变换。In the formula, the constant ε is experimentally set to 10 -3 . FFT(·) stands for Fast Fourier Transform.
步骤S4,设置训练参数进行网络训练。采用AdamW进行优化,其中β1=0.9,β2=0.9,权值默认为0。学习率初始设置为1×10-3,并通过余弦退火策略降低为1×10-7。模型在30×90个补丁上进行训练。在训练过程中,每批32个样本平均分布到8个部分,迭代2×105次。Step S4: Set training parameters for network training. AdamW is used for optimization, where β1=0.9, β2=0.9, and the weight defaults to 0. The learning rate is initially set to 1×10 -3 and reduced to 1×10 -7 through the cosine annealing strategy. The model is trained on 30×90 patches. During the training process, each batch of 32 samples is evenly distributed into 8 parts, and iterates 2 × 10 5 times.
步骤S5,测试网络性能。将低分辨率的双目图像对作为测试样本,输入上一步训练完成的网络模型中,获得超分辨率的双目图像对,用客观评价指标以及视觉效果对比检验超分辨率的效果。Step S5, test network performance. Use the low-resolution binocular image pair as a test sample and input it into the network model trained in the previous step to obtain a super-resolution binocular image pair. Use objective evaluation indicators and visual effect comparison to test the super-resolution effect.
为了表现超分辨率的效果,实验中对比了双三次插值方法(Bicubic)、现有单图像超分辨率技术(EDSR)和双目图像超分辨率技术(PASSRnet、SRResNet+SAM、iPASSR和NAFSSR-L)。In order to show the effect of super-resolution, the experiment compared the bicubic interpolation method (Bicubic), the existing single image super-resolution technology (EDSR) and the binocular image super-resolution technology (PASSRnet, SRResNet+SAM, iPASSR and NAFSSR- L).
从定性和定量两个角度对本发明实施例提出的双目超分辨图像方法进行验证。The binocular super-resolution image method proposed in the embodiment of the present invention is verified from two perspectives: qualitative and quantitative.
2.1定性实验结果2.1 Qualitative experimental results
本发明实施例对测试集中的图像进行了超分辨率操作,并与其他方法得到的结果进行对比。如图7所示,为超分辨率结果图。图7中图像中,右下角的框中展示的是完整图像,其余部分为该图像中某个区块的局部放大。可以看到,与其他超分辨率的方法相比,本发明实施例所提出的方法恢复出了更多与边缘、纹理相关的细节,验证了本发明实施例在双目图像超分辨任务上具有良好的效果。In the embodiment of the present invention, a super-resolution operation is performed on the images in the test set, and the results obtained by other methods are compared. As shown in Figure 7, it is the super-resolution result picture. In the image in Figure 7, the box in the lower right corner shows the complete image, and the rest is a partial enlargement of a certain area in the image. It can be seen that compared with other super-resolution methods, the method proposed by the embodiment of the present invention recovers more details related to edges and textures, verifying that the embodiment of the present invention has the ability to perform binocular image super-resolution tasks. good effect.
2.2定量分析2.2 Quantitative analysis
本发明实施例对测试集中的112对双目图像超分辨率结果进行了定量误差分析,所对比的方法包括插值方法(bicubic)、EDSR、(PASSRnet、SRResNet+SAM、iPASSR和NAFSSR-L方法。客观质量评价,是指通过固定的数学公式,对目标图像进行定量的计算,根据计算出的数值来评价图像的质量。目前,峰值信噪比PSNR(Peak Signal-to-Noise Ratio)与结构相似性度量SSIM(Structural Similarity Index Measure)为主要的客观评价指标。图像I与的图像K的PSNR计算公式如下:The embodiment of the present invention conducted a quantitative error analysis on the super-resolution results of 112 pairs of binocular images in the test set. The compared methods include interpolation method (bicubic), EDSR, (PASSRnet, SRResNet+SAM, iPASSR and NAFSSR-L methods). Objective quality evaluation refers to quantitatively calculating the target image through fixed mathematical formulas, and evaluating the quality of the image based on the calculated value. Currently, the peak signal-to-noise ratio PSNR (Peak Signal-to-Noise Ratio) has a similar structure to SSIM (Structural Similarity Index Measure) is the main objective evaluation index. The PSNR calculation formula of image I and image K is as follows:
其中,H,W分别表示图像I,K的高与宽,为图像I中位置坐标为x,y的像素值,为图像的像素峰值,b为像素二进制的位数,目前在自然图像处理中一般取b=8。PSNR的单位为分贝(dB),取值通常在20到40之间,数值越大,表明重建图像与标签图像的像素差异越小,进而证明超分辨模型的性能越好。Among them, H and W represent the height and width of images I and K respectively, are the pixel values whose position coordinates are x and y in image I, are the pixel peak values of the image, and b are the number of binary digits of the pixel. Currently, in natural image processing Generally, b=8 is taken. The unit of PSNR is decibel (dB), and the value is usually between 20 and 40. The larger the value, the smaller the pixel difference between the reconstructed image and the label image, which proves that the performance of the super-resolution model is better.
给定一幅标签图像I与一幅重建图像K,SSIM的计算步骤如下所示:Given a labeled image I and a reconstructed image K, the calculation steps of SSIM are as follows:
μI,μK分别为I,K的像素均值,σI,σK分别为I,K的方差,σIK为I与K的协方差。SSIM的值是介于0到1之间的,其值越近1则说明两幅图像的整体相似度越高。SSIM通常与PSNR同时使用作为客观质量评价指标。μ I and μ K are the pixel means of I and K respectively, σ I and σ K are the variances of I and K respectively, and σ IK is the covariance of I and K. The value of SSIM is between 0 and 1. The closer the value is to 1, the higher the overall similarity between the two images. SSIM is often used together with PSNR as an objective quality evaluation index.
对测试集中的图像利用不同方法进行测试并取平均后,得到实验结果如表1所示:After testing the images in the test set using different methods and averaging, the experimental results are shown in Table 1:
表1.不同超分辨率方法在测试集上的PSNR和SSIM对比Table 1. Comparison of PSNR and SSIM of different super-resolution methods on the test set
从表1的结果可以看出,本发明实施例所提出的双目图像超分辨率方法取得了24.21dB的平均峰值信噪比和0.7633的结构相似度。相比于其他利用神经网络进行超分辨率的方法,该数值说明,本发明实施例所提出的超分辨率方法在测试集上的结果更好,利用双目图像之间的映射关系提升了超分辨率的效果。It can be seen from the results in Table 1 that the binocular image super-resolution method proposed by the embodiment of the present invention achieved an average peak signal-to-noise ratio of 24.21dB and a structural similarity of 0.7633. Compared with other super-resolution methods that use neural networks, this numerical value shows that the super-resolution method proposed in the embodiment of the present invention has better results on the test set, and the mapping relationship between binocular images is used to improve the super-resolution. resolution effect.
图7展示了本发明的重建效果与双三次插值方法(Bicubic)、现有单图像超分辨率技术(EDSR)和双目图像超分辨率技术(PASSRnet、SRResNet+SAM、iPASSR和NAFSSR-L)的超分辨率重建结果对比,本发明能够较为清晰地展示边缘的线条,且背景与纹理主体的区分明显,超分辨率结果良好。Figure 7 shows the reconstruction effect of the present invention compared with the bicubic interpolation method (Bicubic), existing single image super-resolution technology (EDSR) and binocular image super-resolution technology (PASSRnet, SRResNet+SAM, iPASSR and NAFSSR-L) Comparing the super-resolution reconstruction results, the present invention can display the edge lines relatively clearly, and the distinction between the background and the texture body is obvious, and the super-resolution result is good.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310853109.9A CN116797461A (en) | 2023-07-12 | 2023-07-12 | Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310853109.9A CN116797461A (en) | 2023-07-12 | 2023-07-12 | Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116797461A true CN116797461A (en) | 2023-09-22 |
Family
ID=88036723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310853109.9A Pending CN116797461A (en) | 2023-07-12 | 2023-07-12 | Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116797461A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117274316A (en) * | 2023-10-31 | 2023-12-22 | 广东省水利水电科学研究院 | A method, device, equipment and storage medium for estimating river surface flow velocity |
CN117788296A (en) * | 2024-02-23 | 2024-03-29 | 北京理工大学 | Super-resolution reconstruction method of infrared remote sensing images based on heterogeneous combined deep network |
CN118212476A (en) * | 2024-05-20 | 2024-06-18 | 山东云海国创云计算装备产业创新中心有限公司 | Image classification method, product and storage medium |
CN118297808A (en) * | 2024-06-06 | 2024-07-05 | 山东大学 | Binocular image super-resolution reconstruction method and system based on parallax guidance |
CN118608389A (en) * | 2024-05-24 | 2024-09-06 | 四川新视创伟超高清科技有限公司 | Real-time dynamic super-resolution image reconstruction method and reconstruction system |
-
2023
- 2023-07-12 CN CN202310853109.9A patent/CN116797461A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117274316A (en) * | 2023-10-31 | 2023-12-22 | 广东省水利水电科学研究院 | A method, device, equipment and storage medium for estimating river surface flow velocity |
CN117274316B (en) * | 2023-10-31 | 2024-05-03 | 广东省水利水电科学研究院 | River surface flow velocity estimation method, device, equipment and storage medium |
CN117788296A (en) * | 2024-02-23 | 2024-03-29 | 北京理工大学 | Super-resolution reconstruction method of infrared remote sensing images based on heterogeneous combined deep network |
CN117788296B (en) * | 2024-02-23 | 2024-05-07 | 北京理工大学 | Infrared remote sensing image super-resolution reconstruction method based on heterogeneous combined depth network |
CN118212476A (en) * | 2024-05-20 | 2024-06-18 | 山东云海国创云计算装备产业创新中心有限公司 | Image classification method, product and storage medium |
CN118608389A (en) * | 2024-05-24 | 2024-09-06 | 四川新视创伟超高清科技有限公司 | Real-time dynamic super-resolution image reconstruction method and reconstruction system |
CN118297808A (en) * | 2024-06-06 | 2024-07-05 | 山东大学 | Binocular image super-resolution reconstruction method and system based on parallax guidance |
CN118297808B (en) * | 2024-06-06 | 2024-08-13 | 山东大学 | Binocular image super-resolution reconstruction method and system based on parallax guidance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116797461A (en) | Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism | |
CN109671022B (en) | A super-resolution method for image texture enhancement based on deep feature translation network | |
Wen et al. | Image recovery via transform learning and low-rank modeling: The power of complementary regularizers | |
CN110738605A (en) | Image denoising method, system, device and medium based on transfer learning | |
CN115409733A (en) | Low-dose CT image noise reduction method based on image enhancement and diffusion model | |
CN107358576A (en) | Depth map super resolution ratio reconstruction method based on convolutional neural networks | |
CN103077506B (en) | In conjunction with local and non-local adaptive denoising method | |
CN113706386A (en) | Super-resolution reconstruction method based on attention mechanism | |
CN112819737A (en) | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution | |
CN113362241B (en) | Depth map denoising method combining high-low frequency decomposition and two-stage fusion strategy | |
CN117726540A (en) | An image denoising method that enhances gated Transformer | |
CN115393186A (en) | Face image super-resolution reconstruction method, system, device and medium | |
CN115311144A (en) | Wavelet domain-based standard flow super-resolution image reconstruction method | |
Chen et al. | Attentional coarse-and-fine generative adversarial networks for image inpainting | |
CN116630964A (en) | A Food Image Segmentation Method Based on Discrete Wavelet Attention Network | |
CN113627487A (en) | Super-resolution reconstruction method based on deep attention mechanism | |
Yuan et al. | Unsupervised real image super-resolution via knowledge distillation network | |
Kim et al. | Infrared and visible image fusion using a guiding network to leverage perceptual similarity | |
Xu et al. | Depth map super-resolution via joint local gradient and nonlocal structural regularizations | |
Zou et al. | EDCNN: A novel network for image denoising | |
CN115293966A (en) | Face image reconstruction method and device and storage medium | |
CN113486928B (en) | Multi-view image alignment method based on rational polynomial model differentiable tensor expression | |
Li et al. | High fidelity single image blind deblur via GAN | |
Feng et al. | Research on Low Resolution Digital Image Reconstruction Method Based on Rational Function Model. | |
CN111489306A (en) | Image denoising method based on reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |