CN116797461A - Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism - Google Patents

Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism Download PDF

Info

Publication number
CN116797461A
CN116797461A CN202310853109.9A CN202310853109A CN116797461A CN 116797461 A CN116797461 A CN 116797461A CN 202310853109 A CN202310853109 A CN 202310853109A CN 116797461 A CN116797461 A CN 116797461A
Authority
CN
China
Prior art keywords
resolution
binocular
attention
module
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310853109.9A
Other languages
Chinese (zh)
Inventor
吴靖
罗文武
黄峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202310853109.9A priority Critical patent/CN116797461A/en
Publication of CN116797461A publication Critical patent/CN116797461A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a binocular image super-resolution reconstruction method based on a multistage enhanced attention mechanism, which is characterized in that multiple attention is used for feature enhancement in neural network training, feature information in views is fully utilized for fusion processing, a frequency domain related loss function is used for carrying out constraint processing on the frequency domain, the retention of low-frequency information and an image integral structure is enhanced, a better effect is recovered to the binocular image after super-resolution, and clearer textures and edge details are recovered.

Description

基于多级强化注意力机制的双目图像超分辨率重建方法Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism

技术领域Technical field

本发明涉及双目图像超分辨率技术领域,特别是基于多级强化注意力机制的双目图像超分辨率重建方法。The present invention relates to the technical field of binocular image super-resolution, in particular to a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism.

背景技术Background technique

实现双目图像超分辨率的一种很直接的方式是对左右两图分别使用单图像超分辨率算法。注意力机制是深度学习领域非常重要的一个研究方向,在过去的几年里,已经涌现了一些性能较为优越的单图像超分辨率算法,例如基于通道注意力构建的RCAN,基于像素注意力机制构建的PAN,基于Transformer自注意力机制构建的SwinIR,基于多尺度大核注意力的MAN等。然而,仅使用单图像超分辨率方法分别独立重建双目图像只使用了图像内的自相似性来恢复细节,但忽视了跨视图之间可以被利用的额外信息,即跨视图相似性,限制了超分辨性能的进一步提高。因此,充分利用跨视图信息可以帮助重建更高质量的超分辨图像,因为一个视图相对于另一个视图对同一个场景区域可能有补充的信息。随着社会的需求,各种适应时代的超分辨率重建技术被提出,双目图像超分辨率重建技术也被应用于各领域的研究基础中,具有极大的应用研究意义。由此产生了基于视差注意力构建的PASSRnet,基于双向视差注意力机制构建的iPASSR,基于Transformer自注意力机制构建的SwinFSR,基于大核卷积注意力机制构建的CVHSSR等。A very straightforward way to achieve binocular image super-resolution is to use a single-image super-resolution algorithm for the left and right images respectively. Attention mechanism is a very important research direction in the field of deep learning. In the past few years, some single-image super-resolution algorithms with superior performance have emerged, such as RCAN based on channel attention, and pixel attention mechanism based on PAN built, SwinIR built based on the Transformer self-attention mechanism, MAN based on multi-scale large-core attention, etc. However, using only single-image super-resolution methods to independently reconstruct binocular images only uses intra-image self-similarity to recover details, but ignores the additional information that can be exploited across views, that is, cross-view similarity, limitations further improve the super-resolution performance. Therefore, fully utilizing cross-view information can help reconstruct higher-quality super-resolution images, since one view may have complementary information about the same scene area relative to another view. With the needs of society, various super-resolution reconstruction technologies adapted to the times have been proposed. Binocular image super-resolution reconstruction technology has also been applied to the research foundation in various fields, which has great application research significance. This resulted in PASSRnet based on disparity attention, iPASSR based on bidirectional disparity attention mechanism, SwinFSR based on Transformer self-attention mechanism, CVHSSR based on large kernel convolution attention mechanism, etc.

尽管为了有效地从内部视图和跨试图中提取更多的特征,有许多尝试探索将多种注意力机制与双目图像超分辨率结合,但大部分双目图像超分辨率方法,在恢复图像自然纹理和边缘细节上差强人意仍然是一个悬而未决的问题。因此,如何基于多种注意力机制,有效利用不同注意力特征间的视点间依赖性,以重建超分辨率双目图像,需要进一步探索。Although there are many attempts to explore combining multiple attention mechanisms with binocular image super-resolution in order to effectively extract more features from internal views and across views, most of the binocular image super-resolution methods are ineffective in recovering images. Natural texture and edge detail are still an open question. Therefore, how to effectively utilize the inter-viewpoint dependence between different attention features based on multiple attention mechanisms to reconstruct super-resolution binocular images requires further exploration.

发明内容Contents of the invention

有鉴于此,本发明的目的在于提供一种基于多级强化注意力机制的双目图像超分辨率重建方法,充分利用视图内的特征信息进行融合处理,且使用频域相关的损失函数对频域进行处理,加强低频信息和图像整体结构的保留,对使得超分辨率后的双目图像恢复出更好的效果,恢复更加清晰的纹理和边缘细节。In view of this, the purpose of the present invention is to provide a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism, making full use of the feature information in the view for fusion processing, and using a frequency domain-related loss function to Processing in the domain enhances the retention of low-frequency information and the overall structure of the image, and has a better effect on restoring the binocular image after super-resolution, and restores clearer texture and edge details.

为实现上述目的,本发明采用如下技术方案:基于多级强化注意力机制的双目图像超分辨率重建方法,包括以下步骤:To achieve the above objectives, the present invention adopts the following technical solution: a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism, which includes the following steps:

步骤S1,建立双目图像训练集;将双目图像超分辨率数据集划分成训练集和测试集,低分辨率图像是通过双三次下采样生成的;在训练阶段,生成的低分辨率图像被裁剪成小块,相应的高分辨率图像也被裁剪,同时这些小块被随机地水平和垂直翻转用来增强训练数据;Step S1, establish a binocular image training set; divide the binocular image super-resolution data set into a training set and a test set. The low-resolution images are generated by bicubic downsampling; in the training phase, the generated low-resolution images are cropped into small patches, the corresponding high-resolution images are also cropped, and these patches are randomly flipped horizontally and vertically to enhance the training data;

步骤S2,建立并训练基于多级强化注意力机制的双目超分辨率重建网络模型;网络以一对低分辨率的RGB双目图像作为输入,生成超分辨率的双目图像;Step S2, establish and train a binocular super-resolution reconstruction network model based on a multi-level enhanced attention mechanism; the network uses a pair of low-resolution RGB binocular images as input to generate a super-resolution binocular image;

步骤S3,构建损失函数;采用L1损失函数和与频域损失函数相结合的方法来增强高层次特征空间的监督,约束网络的训练;Step S3, construct a loss function; use the L 1 loss function and the method combined with the frequency domain loss function to enhance the supervision of high-level feature space and constrain the training of the network;

步骤S4,设置训练参数进行网络训练;Step S4, set training parameters for network training;

步骤S5,测试网络性能;将低分辨率的双目图像对作为测试样本,输入上一步训练完成的网络模型中,获得超分辨率的双目图像对,用客观评价指标以及视觉效果对比检验超分辨率的效果。Step S5, test network performance; use low-resolution binocular image pairs as test samples, input them into the network model trained in the previous step, obtain super-resolution binocular image pairs, and use objective evaluation indicators and visual effect comparison to test the super-resolution resolution effect.

在一较佳的实施例中,具体来说,基于多级强化注意力机制的双目超分辨率重建网络模型包含两个左右权重共享网络分支;在每个权重共享网络中,对混合注意信息提取模块进行堆叠,提取左右图像的视内通道和空间特征;双目交互视图注意模块用于捕获从左右双目图像中提取的全局对应的信息和交叉视图信息;具体分为三个部分:视图内特征提取、交互视图特征融合和双目图像重建。In a preferred embodiment, specifically, the binocular super-resolution reconstruction network model based on the multi-level enhanced attention mechanism includes two left and right weight sharing network branches; in each weight sharing network, the mixed attention information is The extraction modules are stacked to extract the intra-view channels and spatial features of the left and right images; the binocular interactive view attention module is used to capture the global corresponding information and cross-view information extracted from the left and right binocular images; it is divided into three parts: View Intra-feature extraction, interactive view feature fusion and binocular image reconstruction.

在一较佳的实施例中,所述步骤2具体包括以下步骤:In a preferred embodiment, step 2 specifically includes the following steps:

步骤S21,视图内特征提取;在特征提取阶段,首先将输入的双目图像输入到3×3卷积层中提取浅层特征,生成高维特征/>其中C为特征通道数;然后将高维特征输入到堆叠的多注意增强块中进行视内特征提取,以获取更多的局部特征和交互信息,恢复更准确的纹理细节;其中,多注意增强块包含混合注意信息提取模块和双目交互视图注意模块;Step S21, in-view feature extraction; in the feature extraction stage, first the input binocular image Input to 3×3 convolution layer to extract shallow features and generate high-dimensional features/> where C is the number of feature channels; then the high-dimensional features are input into the stacked multi-attention enhancement block for intra-view feature extraction to obtain more local features and interactive information and restore more accurate texture details; among them, multi-attention enhancement The block contains a hybrid attention information extraction module and a binocular interactive view attention module;

混合注意信息提取模块是网络左右分支的基本模块,通过捕获远程和本地依赖关系来更深入地提取视图中的特征;混合注意信息提取模块由两个顺序连接的模块组成;第一个是简化通道和空间信息提取模块,第二个是残差信息聚合的前馈网络模块;两部分的计算过程如下:The hybrid attention information extraction module is the basic module of the left and right branches of the network, which extracts features in the view more deeply by capturing remote and local dependencies; the hybrid attention information extraction module consists of two sequentially connected modules; the first one is a simplified channel and spatial information extraction module, the second one is the feed-forward network module for residual information aggregation; the calculation process of the two parts is as follows:

在第一个模块中,经过层归一化后,使用1×1的卷积层扩展输入特征映射的通道之后,产生的输出将通过3×3深度卷积传递,以捕获每个通道的本地上下文;再使用交叉激活结构A单元来进一步学习空间上下文的有效表示;下一步是简化通道空间注意模块,充分利用通道注意机制和空间注意机制,给定原始输入/>首先利用平均池化和1×1卷积运算来学习给定输入图像的特征映射的通道间关系,实现全局空间信息聚合和通道信息交互的功能,输出简化通道注意特征X1;通过平均池化和最大池化操作对特征图的空间信息进行聚合,再采用3×3卷积和Sigmoid函数相结合的方法获得简化的空间注意图;最后,输入特征映射和Sigmoid层的输出的逐元素相乘的结果作为简化空间注意模块的输出X2;简化通道空间注意模块表示为:In the first module, after layer normalization, a 1×1 convolutional layer is used to expand the channels of the input feature map. After that, the generated output will be passed through 3×3 depth convolution to capture the local context of each channel; the cross-activation structure A unit is then used to further learn an effective representation of the spatial context; the next step is to simplify the channel spatial attention module to fully Using the channel attention mechanism and spatial attention mechanism, given the original input/> First, average pooling and 1×1 convolution operations are used to learn the inter-channel relationship of the feature map of a given input image, realize the functions of global spatial information aggregation and channel information interaction, and output simplified channel attention features X 1 ; through average pooling and maximum pooling operation to aggregate the spatial information of the feature map, and then use a method combining 3×3 convolution and Sigmoid function to obtain a simplified spatial attention map; finally, the input feature map and the output of the Sigmoid layer are multiplied element by element The result of is taken as the output of the simplified spatial attention module X 2 ; the simplified channel spatial attention module is expressed as:

式中WC(·),HAP(·)分别为1×1卷积、平均池化操作,HAP,1(·),HMP,1(·)分别表示对第1维度上进行平均池化、最大池化操作,Hcat(·)表示按维数为1的形式进行拼接,σ(·)分别表示Sigmoid激活函数,Θ表示逐元素相乘;In the formula, W C (·), H AP (·) are respectively 1×1 convolution and average pooling operations, H AP,1 (·), H MP,1 (·) respectively represent the averaging in the first dimension. Pooling and maximum pooling operations, H cat (·) represents splicing in the form of 1 dimension, σ (·) represents Sigmoid activation function respectively, Θ represents element-wise multiplication;

经过简化通道和空间信息提取模块后,对特征映射的通道进行1×1卷积反变换,以产生自适应的特征细化,得到第一个模块的结果;在第二个模块中,在对前一个模块的输出结果进行归一化之后,通过一个含交叉激活结构B单元的残差信息聚合前馈网络来提高局部上下文感知能力;具体地说,给定一个输入张量首先使用1×1的卷积层将X′扩展到更高的维度/>其中k为扩展比;接下来,使用一个3×3的深度卷积层对X′1相邻像素位置的信息进行编码,然后使用CAS-B单元作为深度卷积层的激活函数,特征通道数量减半输出;最后,通过1×1卷积层重新映射到的初始输入维X′2After simplifying the channel and spatial information extraction modules, a 1×1 convolutional inverse transformation is performed on the feature map channels to produce adaptive feature refinement, and the results of the first module are obtained; in the second module, after After the output of the previous module is normalized, a residual information aggregation feed-forward network containing a cross-activation structure B unit is used to improve local context awareness; specifically, given an input tensor First use a 1×1 convolutional layer to extend X′ to a higher dimension/> where k is the expansion ratio; next, a 3×3 depth convolution layer is used to encode the information of X′ 1 adjacent pixel position, and then the CAS-B unit is used as the activation function of the depth convolution layer, the number of feature channels Halve the output; finally, remap the initial input dimension X′ 2 through a 1×1 convolution layer;

上述过程表示为:The above process is expressed as:

式中WC(·),WD分别为层归一化、1×1卷积、3×3深度卷积,CAS.B(·)表示交叉激活结构的B单元;In the formula, W C (·) and W D are layer normalization, 1×1 convolution, and 3×3 depth convolution respectively, and CAS.B(·) represents the B unit of the cross-activation structure;

最后,与前一个模块一样,将后一个模块的输入和卷积层的输出相加,作为最终结果;Finally, like the previous module, the input of the latter module and the output of the convolutional layer are added as the final result;

步骤S22,交叉视图特征融合,在左右分支的混合注意信息提取模块之后使用双目交互视图注意模块;双目交互视图注意模块利用前一步生成的双目特征作为输入,进行双向跨视图交互,并生成与该视图输入特征融合的交互特征;具体来说,给定输入双目视图特性经过层归一化和1×1卷积运算,得到双目特征和/>其中/>是1×1卷积;然后,通过执行大小为k的快速1D卷积来生成通道权值/>其中k通过通道维数C的映射自适应确定,并将通道权值与双目特征逐元素相乘获得聚合特征/>如下所示:Step S22, cross-view feature fusion, uses the binocular interactive view attention module after the mixed attention information extraction module of the left and right branches; the binocular interactive view attention module uses the binocular features generated in the previous step as input to perform two-way cross-view interaction, and Generate interaction features that are fused with the input features of that view; specifically, given the input binocular view features After layer normalization and 1×1 convolution operation, binocular features are obtained and/> Among them/> is a 1×1 convolution; the channel weights are then generated by performing a fast 1D convolution of size k/> Among them, k is determined adaptively through the mapping of the channel dimension C, and the channel weight and the binocular feature are multiplied element by element to obtain the aggregate feature/> As follows:

式中,HMP(·)分别代表k×k卷积层、最大池化操作,Θ表示逐元素相乘;In the formula, H MP (·) represents k×k convolution layer and maximum pooling operation respectively, and Θ represents element-wise multiplication;

通过计算一次注意力矩阵,同时生成FR→L,FL→R;最后,通过逐元素相加将交互作用的交叉视图信息和与内部视图信息FL、FR融合在一起,根据式中/>是源视图内特征(如左视图)内特征投影的查询矩阵,/>是目标视图内特征(如右视图)内特征投影的键值矩阵;表示如下:By calculating the attention matrix once, F R→L and F L→R are generated at the same time; finally, the interactive cross-view information and the internal view information FL and FR are fused together through element-by-element addition. According to Formula in/> Is the query matrix of the feature projection in the source view (such as the left view), /> Is the key value matrix of the feature projection within the target view (such as the right view); expressed as follows:

式中,γL和γR是可训练通道缩放并初始化为零以稳定训练,where γ L and γ R are trainable channel scaling and initialized to zero to stabilize training,

是1×1卷积; It is a 1×1 convolution;

步骤S23,双目图像重建;融合特征提取后,输出到3×3卷积层和空间注意增强模块,最后使用像素重组操作将输出特征上采样到高分辨率大小,并使用全局残差路径来利用输入的双目图像信息进一步提高超分辨性能,恢复出左右试图的超分辨率图像 Step S23, binocular image reconstruction; after fusion feature extraction, output to 3×3 convolution layer and spatial attention enhancement module, and finally use pixel reorganization operation to upsample the output features to a high-resolution size, and use the global residual path to Use the input binocular image information to further improve the super-resolution performance and restore the super-resolution images of the left and right views.

式中HC(·),HE(·),HP(·),H(·)分别为卷积运算、增强空间注意模块、像素重组和双线性插值的上采样操作;In the formula, H C (·), H E (·), H P (·), H (·) are the upsampling operations of convolution operation, enhanced spatial attention module, pixel reorganization and bilinear interpolation respectively;

增强空间注意模块将给定的输入发送到1×1的卷积层得到其中WC(·)为1×1卷积,以减小输入特征的通道大小;然后,块使用跨行卷积和跨行最大池化层来减小空间大小;经过一组卷积后,为了提取特征,进行基于双线性插值的上采样以恢复空间大小;结合残差连通性,对特征进行进一步处理得到1×1卷积层恢复通道大小;最后,由Sigmoid函数生成注意力矩阵,并乘以原始输入特征X″。The enhanced spatial attention module converts the given input Sent to a 1×1 convolutional layer to get where W C (·) is a 1×1 convolution to reduce the channel size of the input feature; then, the block uses cross-row convolution and cross-row max pooling layers to reduce the spatial size; after a set of convolutions, in order to extract Features are upsampled based on bilinear interpolation to restore the spatial size; combined with residual connectivity, the features are further processed to obtain a 1×1 convolution layer to restore the channel size; finally, the attention matrix is generated by the Sigmoid function and multiplied Take the original input feature X″.

在一较佳的实施例中,步骤S3中,总差异的损失写成:In a preferred embodiment, in step S3, the total difference loss is written as:

L=LSR+λLFFT,L=L SR +λL FFT ,

(10)(10)

式中,LSR,LFFT分别表示L1重建损失函数、频率Charbonnier loss的频域损失,λ表示用于控制频率Charbonnier损失函数的超参数;所有实验的参数λ都设置为0.1;In the formula, L SR and L FFT respectively represent the frequency domain loss of L 1 reconstruction loss function and frequency Charbonnier loss, and λ represents the hyperparameter used to control the frequency Charbonnier loss function; the parameter λ of all experiments is set to 0.1;

SR重建损失;SR重建损失本质上是一个L1损失函数;使用超分辨率和地面真实双目图像之间的像素级L1距离,获得PSNR;表示如下:SR reconstruction loss; SR reconstruction loss is essentially an L 1 loss function; using the pixel-level L 1 distance between the super-resolution and ground-truth binocular images, PSNR is obtained; expressed as follows:

式中,分别为模型生成的超分辨率左、右图像,/>分别为其高分辨率图像;In the formula, The super-resolution left and right images generated by the model, /> their high-resolution images respectively;

频域损失;引入频率Charbonnier loss的频域损失;表示如下:Frequency domain loss; frequency domain loss that introduces frequency Charbonnier loss; expressed as follows:

式中,实验性地将常数ε设为10-3;FFT(·)表示快速傅里叶变换。In the formula, the constant ε is experimentally set to 10 -3 ; FFT(·) represents the fast Fourier transform.

在一较佳的实施例中,步骤S4中,采用AdamW进行优化,其中β1=0.9,β2=0.9,权值默认为0;学习率初始设置为1×10-3,并通过余弦退火策略降低为1×10-7;模型在30×90个补丁上进行训练;在训练过程中,每批32个样本平均分布到8个部分,迭代2×105次。In a preferred embodiment, in step S4, AdamW is used for optimization, where β1=0.9, β2=0.9, and the weight defaults to 0; the learning rate is initially set to 1×10 -3 and is reduced through the cosine annealing strategy is 1×10 -7 ; the model is trained on 30×90 patches; during the training process, each batch of 32 samples is evenly distributed into 8 parts and iterated 2×10 5 times.

与现有技术相比,本发明具有以下有益效果:本发明提供了一种基于多级强化注意力机制的双目图像超分辨率重建方法,本发明通过构建基于多级强化注意力机制的网络模型,集成了多种注意力机制,全面而又高效的增强了视图内信息和交叉试图间的交互,更好的提取双目图像左右视图中未能完全利用的超分辨率信息,扩大感受野的同时降低计算量;使用新的交叉注意模块,利用高效通道注意机制,在高效交互方面取得了较好的平衡;使用的通道特征和空间特征融合通过特征之间的远程依赖关系向前传播重要信息,有效提升了检测的鲁棒性和泛化能力,提升了超分辨率在某些边缘的效果,更好地恢复图像自然纹理,以更少的计算量得到更好的超分辨率结果。Compared with the existing technology, the present invention has the following beneficial effects: The present invention provides a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism. The present invention constructs a network based on a multi-level enhanced attention mechanism. The model integrates multiple attention mechanisms, comprehensively and efficiently enhances the interaction between in-view information and cross-views, better extracts super-resolution information that cannot be fully utilized in the left and right views of binocular images, and expands the receptive field. while reducing the amount of calculation; using a new cross-attention module and utilizing an efficient channel attention mechanism, a good balance is achieved in efficient interaction; the channel features and spatial feature fusion used are important for forward propagation through the long-range dependencies between features Information, effectively improves the robustness and generalization ability of detection, improves the effect of super-resolution on certain edges, better restores the natural texture of the image, and obtains better super-resolution results with less calculation.

附图说明Description of the drawings

图1为本发明优选实施例的基于多级强化注意力机制的双目图像超分辨率重建方法的流程图;Figure 1 is a flow chart of a binocular image super-resolution reconstruction method based on a multi-level enhanced attention mechanism according to a preferred embodiment of the present invention;

图2为本发明优选实施例的双目超分辨率图像网络结构示意图;Figure 2 is a schematic structural diagram of a binocular super-resolution image network according to a preferred embodiment of the present invention;

图3为本发明优选实施例的混合注意信息提取模块结构示意图;Figure 3 is a schematic structural diagram of the hybrid attention information extraction module according to the preferred embodiment of the present invention;

图4为本发明优选实施例的简化通道空间注意模块示意图;Figure 4 is a schematic diagram of a simplified channel space attention module according to a preferred embodiment of the present invention;

图5为本发明优选实施例的交叉激活结构示意图;Figure 5 is a schematic diagram of the cross-activation structure of a preferred embodiment of the present invention;

图6为本发明优选实施例的双目交互视图注意模块示意图;Figure 6 is a schematic diagram of the binocular interactive view attention module according to the preferred embodiment of the present invention;

图7为本发明优选实施例中所展示的双目图像超分辨率结果图。Figure 7 is a binocular image super-resolution result diagram shown in the preferred embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and examples.

应该指出,以下详细说明都是例示性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless otherwise defined, all technical and scientific terms used herein have the same meanings commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式;如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are for the purpose of describing particular embodiments only and are not intended to limit the exemplary embodiments according to the application; as used herein, the singular form is also intended to include unless the context clearly indicates otherwise. Plural forms, in addition, it should also be understood that when the terms "comprises" and/or "includes" are used in this specification, they indicate the presence of features, steps, operations, means, components and/or combinations thereof.

本发明提供一种基于多级强化注意力机制的双目图像超分辨率方法,包括以下四个步骤:The present invention provides a binocular image super-resolution method based on a multi-level enhanced attention mechanism, which includes the following four steps:

第一,建立双目图像训练集。用现有公开的双目图像超分辨率数据集作为训练样本,低分辨率图像是通过双三次下采样生成的。在训练阶段,生成的低分辨率图像被裁剪成小块,相应的高分辨率图像也被裁剪,同时这些小块被随机地水平和垂直翻转用来增强训练数据。First, establish a binocular image training set. Using existing public binocular image super-resolution datasets as training samples, low-resolution images are generated by bicubic downsampling. During the training phase, the generated low-resolution images are cropped into small patches, the corresponding high-resolution images are also cropped, and these patches are randomly flipped horizontally and vertically to enhance the training data.

第二,设计网络结构。整体网络由三部分组成:视图内特征提取、交互视图特征融合和双目图像重建。在视图内特征提取阶段,首先将输入的双目图像输入到卷积层中提取浅层特征,生成高维特征。然后将高维特征输入到堆叠的多注意增强块中进行视内特征提取,以获取更多的局部特征和交互信息,恢复更准确的纹理细节。在交叉视图特征融合阶段,为了捕获左右视图之间的交叉信息,在左右分支的混合注意信息提取模块之后使用双目交互视图注意模块。双目交互视图注意模块利用前一步混合注意信息提取模块生成的双目特征作为输入,进行双向跨视图交互,并生成与该视图输入特征融合的交互特征。在双目图像重建阶段,融合特征提取后,输出到卷积层和空间注意增强模块,最后使用像素重组操作将输出特征张量上采样,恢复出超分辨率的左右视图图像。Second, design the network structure. The overall network consists of three parts: intra-view feature extraction, interactive view feature fusion and binocular image reconstruction. In the in-view feature extraction stage, the input binocular images are first input into the convolution layer to extract shallow features and generate high-dimensional features. The high-dimensional features are then input into stacked multi-attention enhancement blocks for intra-view feature extraction to obtain more local features and interactive information and restore more accurate texture details. In the cross-view feature fusion stage, in order to capture the cross information between the left and right views, a binocular interactive view attention module is used after the hybrid attention information extraction module of the left and right branches. The binocular interaction view attention module uses the binocular features generated by the hybrid attention information extraction module in the previous step as input to perform two-way cross-view interaction and generate interaction features that are fused with the input features of the view. In the binocular image reconstruction stage, after fusion feature extraction, the output is output to the convolution layer and spatial attention enhancement module. Finally, the output feature tensor is upsampled using a pixel reorganization operation to restore the super-resolution left and right view images.

第三,构建损失函数。为了增强双目图像的纹理细节,并保持视点间的视差一致性,本发明采用均方误差损失函数和与频域损失函数相结合的方法来增强高层次特征空间的监督,约束网络的训练。总差异的损失可以写成:Third, construct a loss function. In order to enhance the texture details of binocular images and maintain the disparity consistency between viewpoints, the present invention uses the mean square error loss function and a method combined with the frequency domain loss function to enhance the supervision of high-level feature spaces and constrain the training of the network. The total difference loss can be written as:

L=LSR+λLFFT,L=L SR +λL FFT ,

式中,LSR,LFFT分别表示L1的均方误差损失函数和频率Charbonnier损失函数,λ表示用于控制频率Charbonnier损失函数的超参数,根据以往经验,设置为0.1。In the formula, L SR and L FFT respectively represent the mean square error loss function and frequency Charbonnier loss function of L 1 , and λ represents the hyperparameter used to control the frequency Charbonnier loss function. Based on past experience, it is set to 0.1.

第四,设置训练参数进行网络训练。选择合适的优化器,并对损失函数、学习率、最大迭代次数以及每批训练样本大小等参数进行设置,对网络进行训练,直至训练完成得到最终的网络权重模型;Fourth, set training parameters for network training. Select an appropriate optimizer, set parameters such as the loss function, learning rate, maximum number of iterations, and each batch of training sample size, and train the network until the training is completed to obtain the final network weight model;

第五,测试网络性能。将低分辨率的双目图像对作为测试样本,输入上一步训练完成的网络模型中,获得超分辨率的双目图像对,用客观评价指标以及视觉效果对比检验超分辨率的效果。Fifth, test network performance. Use the low-resolution binocular image pair as a test sample and input it into the network model trained in the previous step to obtain a super-resolution binocular image pair. Use objective evaluation indicators and visual effect comparison to test the super-resolution effect.

图1为本发明方法流程图。对于双目图像按照如下详细步骤进行超分辨率处理:Figure 1 is a flow chart of the method of the present invention. For binocular images, follow the detailed steps below to perform super-resolution processing:

步骤S1,建立双目图像训练集。将现有公开的双目图像超分辨率数据集划分成训练集和测试集,低分辨率图像是通过双三次下采样生成的。在训练阶段,生成的低分辨率图像被裁剪成小块,相应的高分辨率图像也被裁剪,同时这些小块被随机地水平和垂直翻转用来增强训练数据。Step S1: Establish a binocular image training set. The existing public binocular image super-resolution data set is divided into a training set and a test set, and the low-resolution images are generated by bicubic downsampling. During the training phase, the generated low-resolution images are cropped into small patches, the corresponding high-resolution images are also cropped, and these patches are randomly flipped horizontally and vertically to enhance the training data.

步骤S2,建立并训练基于多级强化注意力机制的双目超分辨率重建网络模型。如图2所示,网络以一对低分辨率的RGB双目图像作为输入,生成超分辨率的双目图像。具体来说,网络包含两个左右权重共享网络分支。在每个权重共享网络中,对混合注意信息提取模块进行堆叠,提取左右图像的视内通道和空间特征。双目交互视图注意模块用于捕获从左右双目图像中提取的全局对应的信息和交叉视图信息。总的来说,可以分为三个部分:视图内特征提取、交互视图特征融合和双目图像重建。Step S2: Establish and train a binocular super-resolution reconstruction network model based on the multi-level enhanced attention mechanism. As shown in Figure 2, the network takes a pair of low-resolution RGB binocular images as input and generates a super-resolution binocular image. Specifically, the network contains two left and right weight-sharing network branches. In each weight-sharing network, hybrid attention information extraction modules are stacked to extract intra-view channels and spatial features of the left and right images. The binocular interactive view attention module is used to capture the global corresponding information and cross-view information extracted from the left and right binocular images. In general, it can be divided into three parts: intra-view feature extraction, interactive view feature fusion and binocular image reconstruction.

步骤S2.1,视图内特征提取。在特征提取阶段,首先将输入的双目图像输入到3×3卷积层中提取浅层特征,生成高维特征/>其中C为特征通道数。然后将高维特征输入到堆叠的多注意增强块中进行视内特征提取,以获取更多的局部特征和交互信息,恢复更准确的纹理细节。其中,多注意增强块包含混合注意信息提取模块和双目交互视图注意模块。Step S2.1, feature extraction within the view. In the feature extraction stage, the input binocular image is first Input to 3×3 convolution layer to extract shallow features and generate high-dimensional features/> Where C is the number of characteristic channels. The high-dimensional features are then input into stacked multi-attention enhancement blocks for intra-view feature extraction to obtain more local features and interactive information and restore more accurate texture details. Among them, the multi-attention enhancement block includes a hybrid attention information extraction module and a binocular interactive view attention module.

如图3所示,混合注意信息提取模块是网络左右分支的基本模块,可以通过捕获远程和本地依赖关系来更深入地提取视图中的特征。混合注意信息提取模块由两个顺序连接的模块组成。第一个是简化通道和空间信息提取模块,第二个是残差信息聚合的前馈网络模块。两部分的计算过程如下:As shown in Figure 3, the hybrid attention information extraction module is the basic module of the left and right branches of the network, which can extract features in the view more deeply by capturing remote and local dependencies. The hybrid attention information extraction module consists of two sequentially connected modules. The first is a simplified channel and spatial information extraction module, and the second is a feed-forward network module for residual information aggregation. The calculation process of the two parts is as follows:

在第一个模块中,经过层归一化后,使用1×1的卷积层扩展输入特征映射的通道之后,产生的输出将通过3×3深度卷积传递,以捕获每个通道的本地上下文。再使用交叉激活结构A单元(如图4所示)来进一步学习空间上下文的有效表示。下一步是简化通道空间注意模块,如图5所示,充分利用通道注意机制和空间注意机制,过滤掉不太有用的信息,给定原始输入/>首先利用平均池化和1×1卷积运算来学习给定输入图像的特征映射的通道间关系,实现全局空间信息聚合和通道信息交互的功能,输出简化通道注意特征/>通过平均池化和最大池化操作对特征图的空间信息进行聚合,再采用3×3卷积和Sigmoid函数相结合的方法获得简化的空间注意图。最后,输入特征映射和Sigmoid层的输出的逐元素相乘的结果作为简化空间注意模块的输出简化通道空间注意模块可以表示为:In the first module, after layer normalization, a 1×1 convolutional layer is used to expand the channels of the input feature map. Afterwards, the resulting output is passed through a 3×3 depthwise convolution to capture the local context of each channel. The cross-activation structure unit A (shown in Figure 4) is then used to further learn an effective representation of spatial context. The next step is to simplify the channel spatial attention module, as shown in Figure 5, making full use of the channel attention mechanism and spatial attention mechanism to filter out less useful information, given the original input/> First, average pooling and 1×1 convolution operations are used to learn the inter-channel relationship of the feature map of a given input image, realize the functions of global spatial information aggregation and channel information interaction, and output simplified channel attention features/> The spatial information of the feature map is aggregated through average pooling and maximum pooling operations, and then a simplified spatial attention map is obtained by combining 3×3 convolution and Sigmoid function. Finally, the result of the element-wise multiplication of the input feature map and the output of the Sigmoid layer is used as the output of the simplified spatial attention module. The simplified channel space attention module can be expressed as:

式中WC(·),HAP(·)分别为1×1卷积、平均池化操作,HAP,1(·),HMP,1(·)分别表示对第1维度上进行平均池化、最大池化操作,Hcat(·)表示按维数为1的形式进行拼接,σ(·)分别表示Sigmoid激活函数,Θ表示逐元素相乘。In the formula, W C (·), H AP (·) are respectively 1×1 convolution and average pooling operations, H AP,1 (·), H MP,1 (·) respectively represent the averaging in the first dimension. Pooling and maximum pooling operations, H cat (·) means splicing in the form of 1 dimension, σ (·) respectively represents the Sigmoid activation function, and Θ means element-wise multiplication.

经过简化通道和空间信息提取模块后,对特征映射的通道进行1×1的卷积反变换,以产生自适应的特征细化,得到第一个模块的结果。After simplifying the channel and spatial information extraction modules, a 1×1 convolutional inverse transformation is performed on the feature mapped channels to produce adaptive feature refinement, and the results of the first module are obtained.

在第二个模块中,在对前一个模块的输出结果进行归一化之后,通过一个含交叉激活结构B单元(如图4所示)的残差信息聚合前馈网络来提高局部上下文感知能力。具体地说,给定一个输入张量首先使用1×1的卷积层将X′扩展到更高的维度其中k为扩展比。接下来,使用一个3×3的深度卷积层对X′1相邻像素位置的信息进行编码,然后使用CAS-B单元作为深度卷积层的激活函数,特征通道数量减半输出。最后,通过1×1卷积层重新映射到的初始输入维X′2。上述过程可表示为:In the second module, after normalizing the output results of the previous module, a residual information aggregation feed-forward network containing a cross-activation structure B unit (shown in Figure 4) is used to improve local context awareness. . Specifically, given an input tensor First extend X′ to higher dimensions using a 1×1 convolutional layer where k is the expansion ratio. Next, a 3×3 depth convolution layer is used to encode the information of the adjacent pixel position of X′ 1 , and then the CAS-B unit is used as the activation function of the depth convolution layer, and the number of feature channels is halved for output. Finally, it is remapped to the initial input dimension X′ 2 through a 1 × 1 convolutional layer. The above process can be expressed as:

式中WC(·),WD分别为层归一化、1×1卷积、3×3深度卷积,CAS.B(·)表示交叉激活结构的B单元。In the formula, W C (·) and W D are layer normalization, 1×1 convolution, and 3×3 depth convolution respectively, and CAS.B(·) represents the B unit of the cross-activation structure.

最后,与前一个模块一样,将后一个模块的输入和卷积层的输出相加,作为最终结果。Finally, like the previous module, the input of the latter module and the output of the convolutional layer are added together as the final result.

步骤S2.2,交叉视图特征融合。如图6所示,为了捕获左右视图之间的交叉信息,在左右分支的混合注意信息提取模块之后使用双目交互视图注意模块。双目交互视图注意模块利用前一步生成的双目特征作为输入,进行双向跨视图交互,并生成与该视图输入特征融合的交互特征。具体来说,给定输入双目视图特性经过层归一化和1×1卷积运算,得到双目特征/>和/>其中/>是1×1卷积。然后,通过执行大小为k的快速1D卷积来生成通道权值/>其中k通过通道维数C的映射自适应确定,并将通道权值与双目特征逐元素相乘获得聚合特征/>如下所示:Step S2.2, cross-view feature fusion. As shown in Figure 6, in order to capture the cross information between the left and right views, a binocular interactive view attention module is used after the hybrid attention information extraction module of the left and right branches. The binocular interactive view attention module uses the binocular features generated in the previous step as input to perform two-way cross-view interaction and generate interactive features that are fused with the input features of the view. Specifically, given the input binocular view characteristics After layer normalization and 1×1 convolution operation, the binocular features/> and/> Among them/> It is a 1×1 convolution. Then, channel weights are generated by performing a fast 1D convolution of size k/> Among them, k is determined adaptively through the mapping of the channel dimension C, and the channel weight and the binocular feature are multiplied element by element to obtain the aggregate feature/> As follows:

式中,分别代表k×k卷积层、最大池化操作,Θ表示逐元素相乘。In the formula, Represent k×k convolution layer and maximum pooling operation respectively, and Θ represents element-wise multiplication.

通过计算一次注意力矩阵,同时生成FR→L,FL→R。最后,通过逐元素相加将交互作用的交叉视图信息和与内部视图信息FL、FR融合在一起,根据式中/>是源视图内特征(如左视图)内特征投影的查询矩阵,/>是目标视图内特征(如右视图)内特征投影的键值矩阵。以上可表示如下:By calculating the attention matrix once, F R→L and F L→R are generated simultaneously. Finally, the interactive cross-view information and the internal view information FL and FR are fused together through element-by-element addition, according to Formula in/> Is the query matrix of the feature projection in the source view (such as the left view), /> Is the key value matrix of the feature projection within the target view (such as the right view). The above can be expressed as follows:

式中,γL和γR是可训练通道缩放并初始化为零以稳定训练,是1×1卷积。where γ L and γ R are trainable channel scaling and initialized to zero to stabilize training, It is a 1×1 convolution.

步骤S2.3,双目图像重建。融合特征提取后,输出到3×3卷积层和空间注意增强模块,最后使用像素重组操作将输出特征上采样到高分辨率大小。此外,为了减少特征提取的负担,在这一部分中使用全局残差路径来利用输入的双目图像信息进一步提高超分辨性能,恢复出左右试图的超分辨率图像 Step S2.3, binocular image reconstruction. After fusion feature extraction, it is output to a 3×3 convolutional layer and spatial attention enhancement module, and finally a pixel reorganization operation is used to upsample the output features to a high-resolution size. In addition, in order to reduce the burden of feature extraction, a global residual path is used in this part to use the input binocular image information to further improve the super-resolution performance and recover the super-resolution images of the left and right views.

式中HC(·),HE(·),HP(·),H(·)分别为卷积运算、增强空间注意模块、像素重组和双线性插值的上采样操作。In the formula, H C (·), H E (·), H P (·), H (·) are the upsampling operations of convolution operation, enhanced spatial attention module, pixel reorganization and bilinear interpolation respectively.

增强空间注意模块将给定的输入发送到1×1的卷积层得到其中WC(·)为1×1卷积,以减小输入特征的通道大小。然后,块使用跨行卷积和跨行最大池化层来减小空间大小。经过一组卷积后,为了提取特征,进行基于双线性插值的上采样以恢复空间大小。结合残差连通性,对特征进行进一步处理得到1×1卷积层恢复通道大小。最后,由Sigmoid函数生成注意力矩阵,并乘以原始输入特征X″。The enhanced spatial attention module converts the given input Sent to a 1×1 convolutional layer to get Where W C (·) is a 1×1 convolution to reduce the channel size of the input feature. Then, the block uses cross-row convolution and cross-row max pooling layers to reduce the spatial size. After a set of convolutions, in order to extract features, upsampling based on bilinear interpolation is performed to restore the spatial size. Combined with residual connectivity, the features are further processed to obtain a 1×1 convolutional layer to restore the channel size. Finally, the attention matrix is generated by the Sigmoid function and multiplied by the original input feature X″.

步骤S3,构建损失函数。为了增强双目图像的纹理细节,并保持视点间的视差一致性,本发明采用L1损失函数和与频域损失函数相结合的方法来增强高层次特征空间的监督,约束网络的训练。总差异的损失可以写成:Step S3, construct the loss function. In order to enhance the texture details of binocular images and maintain the disparity consistency between viewpoints, the present invention uses the L 1 loss function and a method combined with the frequency domain loss function to enhance the supervision of high-level feature spaces and constrain the training of the network. The total difference loss can be written as:

L=LSR+λLFFT, (10)L=L SR +λL FFT , (10)

式中,LSR,LFFT分别表示L1重建损失函数、频率Charbonnier loss的频域损失,λ表示用于控制频率Charbonnier损失函数的超参数。所有实验的参数λ都设置为0.1。In the formula, L SR and L FFT respectively represent the frequency domain loss of the L 1 reconstruction loss function and frequency Charbonnier loss, and λ represents the hyperparameter used to control the frequency Charbonnier loss function. The parameter λ was set to 0.1 for all experiments.

SR重建损失。SR重建损失本质上是一个L1损失函数。为了达到更快的收敛,本发明使用了超分辨率和地面真实双目图像之间的像素级L1距离,避免了过于光滑的纹理,从而获得更高的PSNR。可表示如下:SR reconstruction loss. The SR reconstruction loss is essentially an L 1 loss function. In order to achieve faster convergence, the present invention uses the pixel-level L 1 distance between the super-resolution and ground-truth binocular images to avoid overly smooth textures and thereby obtain higher PSNR. It can be expressed as follows:

式中,分别为模型生成的超分辨率左、右图像,/>分别为其高分辨率图像。In the formula, The super-resolution left and right images generated by the model, /> respectively for their high-resolution images.

频域损失。为了更好恢复图像超分辨率任务中的高频细节,本发明引入了频率Charbonnier loss的频域损失。可表示如下:frequency domain loss. In order to better restore high-frequency details in the image super-resolution task, the present invention introduces the frequency domain loss of frequency Charbonnier loss. It can be expressed as follows:

式中,实验性地将常数ε设为10-3。FFT(·)表示快速傅里叶变换。In the formula, the constant ε is experimentally set to 10 -3 . FFT(·) stands for Fast Fourier Transform.

步骤S4,设置训练参数进行网络训练。采用AdamW进行优化,其中β1=0.9,β2=0.9,权值默认为0。学习率初始设置为1×10-3,并通过余弦退火策略降低为1×10-7。模型在30×90个补丁上进行训练。在训练过程中,每批32个样本平均分布到8个部分,迭代2×105次。Step S4: Set training parameters for network training. AdamW is used for optimization, where β1=0.9, β2=0.9, and the weight defaults to 0. The learning rate is initially set to 1×10 -3 and reduced to 1×10 -7 through the cosine annealing strategy. The model is trained on 30×90 patches. During the training process, each batch of 32 samples is evenly distributed into 8 parts, and iterates 2 × 10 5 times.

步骤S5,测试网络性能。将低分辨率的双目图像对作为测试样本,输入上一步训练完成的网络模型中,获得超分辨率的双目图像对,用客观评价指标以及视觉效果对比检验超分辨率的效果。Step S5, test network performance. Use the low-resolution binocular image pair as a test sample and input it into the network model trained in the previous step to obtain a super-resolution binocular image pair. Use objective evaluation indicators and visual effect comparison to test the super-resolution effect.

为了表现超分辨率的效果,实验中对比了双三次插值方法(Bicubic)、现有单图像超分辨率技术(EDSR)和双目图像超分辨率技术(PASSRnet、SRResNet+SAM、iPASSR和NAFSSR-L)。In order to show the effect of super-resolution, the experiment compared the bicubic interpolation method (Bicubic), the existing single image super-resolution technology (EDSR) and the binocular image super-resolution technology (PASSRnet, SRResNet+SAM, iPASSR and NAFSSR- L).

从定性和定量两个角度对本发明实施例提出的双目超分辨图像方法进行验证。The binocular super-resolution image method proposed in the embodiment of the present invention is verified from two perspectives: qualitative and quantitative.

2.1定性实验结果2.1 Qualitative experimental results

本发明实施例对测试集中的图像进行了超分辨率操作,并与其他方法得到的结果进行对比。如图7所示,为超分辨率结果图。图7中图像中,右下角的框中展示的是完整图像,其余部分为该图像中某个区块的局部放大。可以看到,与其他超分辨率的方法相比,本发明实施例所提出的方法恢复出了更多与边缘、纹理相关的细节,验证了本发明实施例在双目图像超分辨任务上具有良好的效果。In the embodiment of the present invention, a super-resolution operation is performed on the images in the test set, and the results obtained by other methods are compared. As shown in Figure 7, it is the super-resolution result picture. In the image in Figure 7, the box in the lower right corner shows the complete image, and the rest is a partial enlargement of a certain area in the image. It can be seen that compared with other super-resolution methods, the method proposed by the embodiment of the present invention recovers more details related to edges and textures, verifying that the embodiment of the present invention has the ability to perform binocular image super-resolution tasks. good effect.

2.2定量分析2.2 Quantitative analysis

本发明实施例对测试集中的112对双目图像超分辨率结果进行了定量误差分析,所对比的方法包括插值方法(bicubic)、EDSR、(PASSRnet、SRResNet+SAM、iPASSR和NAFSSR-L方法。客观质量评价,是指通过固定的数学公式,对目标图像进行定量的计算,根据计算出的数值来评价图像的质量。目前,峰值信噪比PSNR(Peak Signal-to-Noise Ratio)与结构相似性度量SSIM(Structural Similarity Index Measure)为主要的客观评价指标。图像I与的图像K的PSNR计算公式如下:The embodiment of the present invention conducted a quantitative error analysis on the super-resolution results of 112 pairs of binocular images in the test set. The compared methods include interpolation method (bicubic), EDSR, (PASSRnet, SRResNet+SAM, iPASSR and NAFSSR-L methods). Objective quality evaluation refers to quantitatively calculating the target image through fixed mathematical formulas, and evaluating the quality of the image based on the calculated value. Currently, the peak signal-to-noise ratio PSNR (Peak Signal-to-Noise Ratio) has a similar structure to SSIM (Structural Similarity Index Measure) is the main objective evaluation index. The PSNR calculation formula of image I and image K is as follows:

其中,H,W分别表示图像I,K的高与宽,为图像I中位置坐标为x,y的像素值,为图像的像素峰值,b为像素二进制的位数,目前在自然图像处理中一般取b=8。PSNR的单位为分贝(dB),取值通常在20到40之间,数值越大,表明重建图像与标签图像的像素差异越小,进而证明超分辨模型的性能越好。Among them, H and W represent the height and width of images I and K respectively, are the pixel values whose position coordinates are x and y in image I, are the pixel peak values of the image, and b are the number of binary digits of the pixel. Currently, in natural image processing Generally, b=8 is taken. The unit of PSNR is decibel (dB), and the value is usually between 20 and 40. The larger the value, the smaller the pixel difference between the reconstructed image and the label image, which proves that the performance of the super-resolution model is better.

给定一幅标签图像I与一幅重建图像K,SSIM的计算步骤如下所示:Given a labeled image I and a reconstructed image K, the calculation steps of SSIM are as follows:

μIK分别为I,K的像素均值,σIK分别为I,K的方差,σIK为I与K的协方差。SSIM的值是介于0到1之间的,其值越近1则说明两幅图像的整体相似度越高。SSIM通常与PSNR同时使用作为客观质量评价指标。μ I and μ K are the pixel means of I and K respectively, σ I and σ K are the variances of I and K respectively, and σ IK is the covariance of I and K. The value of SSIM is between 0 and 1. The closer the value is to 1, the higher the overall similarity between the two images. SSIM is often used together with PSNR as an objective quality evaluation index.

对测试集中的图像利用不同方法进行测试并取平均后,得到实验结果如表1所示:After testing the images in the test set using different methods and averaging, the experimental results are shown in Table 1:

表1.不同超分辨率方法在测试集上的PSNR和SSIM对比Table 1. Comparison of PSNR and SSIM of different super-resolution methods on the test set

从表1的结果可以看出,本发明实施例所提出的双目图像超分辨率方法取得了24.21dB的平均峰值信噪比和0.7633的结构相似度。相比于其他利用神经网络进行超分辨率的方法,该数值说明,本发明实施例所提出的超分辨率方法在测试集上的结果更好,利用双目图像之间的映射关系提升了超分辨率的效果。It can be seen from the results in Table 1 that the binocular image super-resolution method proposed by the embodiment of the present invention achieved an average peak signal-to-noise ratio of 24.21dB and a structural similarity of 0.7633. Compared with other super-resolution methods that use neural networks, this numerical value shows that the super-resolution method proposed in the embodiment of the present invention has better results on the test set, and the mapping relationship between binocular images is used to improve the super-resolution. resolution effect.

图7展示了本发明的重建效果与双三次插值方法(Bicubic)、现有单图像超分辨率技术(EDSR)和双目图像超分辨率技术(PASSRnet、SRResNet+SAM、iPASSR和NAFSSR-L)的超分辨率重建结果对比,本发明能够较为清晰地展示边缘的线条,且背景与纹理主体的区分明显,超分辨率结果良好。Figure 7 shows the reconstruction effect of the present invention compared with the bicubic interpolation method (Bicubic), existing single image super-resolution technology (EDSR) and binocular image super-resolution technology (PASSRnet, SRResNet+SAM, iPASSR and NAFSSR-L) Comparing the super-resolution reconstruction results, the present invention can display the edge lines relatively clearly, and the distinction between the background and the texture body is obvious, and the super-resolution result is good.

Claims (5)

1. The binocular image super-resolution reconstruction method based on the multistage enhanced attention mechanism is characterized by comprising the following steps of:
step S1, a binocular image training set is established; dividing a binocular image super-resolution data set into a training set and a testing set, wherein a low-resolution image is generated through bicubic downsampling; in the training phase, the generated low-resolution images are cut into small blocks, and the corresponding high-resolution images are also cut, and meanwhile, the small blocks are randomly horizontally and vertically flipped to strengthen training data;
step S2, a binocular super-resolution reconstruction network model based on a multi-stage enhanced attention mechanism is established and trained; the network takes a pair of RGB binocular images with low resolution as input to generate a binocular image with super resolution;
s3, constructing a loss function; by L 1 The loss function and the method combined with the frequency domain loss function enhance the supervision of the high-level characteristic space and restrict the training of the network;
step S4, setting training parameters to perform network training;
s5, testing network performance; and taking the low-resolution binocular image pair as a test sample, inputting the test sample into a network model which is trained in the last step, obtaining the super-resolution binocular image pair, and comparing and checking the super-resolution effect of the binocular image by using objective evaluation indexes and visual effects.
2. The method for reconstructing a binocular image based on a multi-stage enhanced attention mechanism according to claim 1, wherein the binocular super-resolution reconstruction network model based on the multi-stage enhanced attention mechanism comprises two left and right weight sharing network branches; stacking the mixed attention information extraction modules in each weight sharing network, and extracting intra-view channels and spatial features of left and right images; the binocular interaction view attention module is used for capturing globally corresponding information and cross view information extracted from left and right binocular images; the method comprises three parts: intra-view feature extraction, interactive view feature fusion and binocular image reconstruction.
3. The binocular image super-resolution reconstruction method based on the multi-stage enhanced attention mechanism according to claim 2, wherein the step 2 specifically comprises the following steps:
step S21, extracting features in the view; in the feature extraction stage, the input binocular image is first inputInputting into 3×3 convolution layer to extract shallow layer features and generate high-dimensional features +.>Wherein C is the number of characteristic channels; then inputting the high-dimensional features into the stacked multi-attention enhancement blocks for intra-view feature extraction so as to acquire more local features and interaction information and restore more accurate texture details; the multi-attention enhancement block comprises a mixed attention information extraction module and a binocular interactive view attention module;
the mixed attention information extraction module is a basic module of the left and right branches of the network, and the characteristics in the view are extracted more deeply by capturing remote and local dependency relations; the mixed attention information extraction module consists of two modules which are sequentially connected; the first is a simplified channel and spatial information extraction module, and the second is a feedforward network module for residual information aggregation; the two parts of calculation process are as follows:
in the first module, after layer normalization, the channels of the input feature map are extended using a 1×1 convolutional layerThe resulting output will then be passed through a 3 x 3 depth convolution to capture the local context of each channel; the cross-activation structure a unit is then used to further learn the effective representation of the spatial context; the next step is to simplify the channel spatial attention module, make full use of the channel attention mechanism and the spatial attention mechanism, given the original input +.>Firstly, the relation between channels of feature mapping of a given input image is learned by utilizing average pooling and 1 multiplied by 1 convolution operation, the functions of global space information aggregation and channel information interaction are realized, and the simplified channel attention feature X is output 1 The method comprises the steps of carrying out a first treatment on the surface of the The space information of the feature images is aggregated through the operations of average pooling and maximum pooling, and a method of combining 3×3 convolution and Sigmoid functions is adopted to obtain a simplified space attention graph; finally, the result of element-by-element multiplication of the output of the input feature map and Sigmoid layer is taken as the output X of the simplified spatial attention module 2 The method comprises the steps of carrying out a first treatment on the surface of the The reduced channel spatial attention module is expressed as:
w in the formula C (·),H AP (. Cndot.) are respectively 1X 1 convolution, average pooling operations, H AP,1 (·),H MP,1 (. Cndot.) represents the average pooling and maximum pooling operations on dimension 1, H cat (. Cndot.) represents stitching in the form of dimension 1, σ (. Cndot.) represents Sigmoid activation function, respectively, and Θ represents multiplication by element;
after the simplified channel and the spatial information extraction module, carrying out 1X 1 convolution inverse transformation on the channel mapped by the features to generate self-adaptive feature refinement, and obtaining a result of the first module;
in the second module, after normalizing the output result of the previous module, improving the local context sensing capability through a residual information aggregation feedforward network containing a cross-activation structure B unit; specifically, given an input tensorFirst extending X ' to a higher dimension X ' using a 1X 1 convolution layer ' 1 Wherein k is the expansion ratio; next, a 3×3 depth convolution layer pair X 'is used' 1 Coding the information of the adjacent pixel positions, and then using a CAS-B unit as an activation function of the depth convolution layer, and halving the number of characteristic channels and outputting the characteristic channels; finally, the initial input dimension X 'remapped by the 1X 1 convolutional layer' 2
The above process is expressed as:
w in the formula C (·),W D Respectively layer normalization, 1 x 1 convolution, 3 x 3 depth convolution,
CAS.B (-) represents the B unit of the cross-activated structure;
finally, as with the previous module, adding the input of the next module and the output of the convolution layer as the final result;
step S22, cross view feature fusion, wherein a binocular interaction view attention module is used after a left and right branch mixed attention information extraction module; the binocular interaction view attention module uses the binocular characteristics generated in the previous step as input to perform bidirectional cross-view interaction and generates interaction characteristics fused with the view input characteristics; specifically, given an input binocular view characteristicThe binocular feature is obtained by layer normalization and 1X 1 convolution operation>And->Wherein->Is a 1 x 1 convolution; then, a channel weight ++is generated by performing a fast 1D convolution of size k>Wherein k is adaptively determined by mapping of the channel dimension C and the channel weight is multiplied element by the binocular feature to obtain the aggregate feature +.>The following is shown:
in the method, in the process of the application,H MP (. Cndot.) represents the k x k convolutional layer, max pooling operation, respectively, Θ represents element-by-element multiplication;
by computing the primary attention matrix while generating F R→L ,F L→R The method comprises the steps of carrying out a first treatment on the surface of the Finally, the interactive cross-view information is summed and anded by element-wise additionThe intra-view information FL, FR are fused together according toIn->Is a query matrix of feature projections within the source view (e.g. left view), +.>Is a key-value matrix of feature projections within features (e.g., right view) within the target view; the expression is as follows:
/>
wherein, gamma L And gamma R Is the trainable channel scaling and initializing to zero to stabilize training,is a 1 x 1 convolution;
step S23, reconstructing a binocular image; after the fusion feature is extracted, the fusion feature is output to a 3X 3 convolution layer and a spatial attention enhancement module, finally, the pixel recombination operation is used for upsampling the output feature to a high resolution size, and a global residual path is used for further improving the super resolution performance by utilizing the input binocular image information, so that the left-right attempted super resolution image is recovered
H in C (·),H E (·),H P (·),H (. Cndot.) are the upsampling operations of convolution operation, enhanced spatial attention module, pixel reorganization, and bilinear interpolation, respectively;
enhanced spatial attention module will give inputSent to the 1 x 1 convolutional layer to getWherein W is C (. Cndot.) is a 1 x 1 convolution to reduce the channel size of the input features; the block then uses cross-row rolling and cross-row max pooling layers to reduce the space size; after a set of convolutions, upsampling based on bilinear interpolation to recover the spatial size in order to extract features; combining residual connectivity, and further processing the features to obtain a 1×1 convolutional layer recovery channel size; finally, an attention matrix is generated by the Sigmoid function and multiplied by the original input feature X ".
4. The binocular image super resolution reconstruction method based on the multi-stage enhanced attention mechanism of claim 1, wherein in step S3, the loss of the total difference is written as:
L=L SR +λL FFT ,
wherein L is SR ,L FFT Respectively represent L 1 Reconstructing a frequency domain loss of the loss function, the frequency Charbonnier loss, lambda representing a super parameter for controlling the frequency Charbonnier loss function; the parameter λ for all experiments was set to 0.1;
SR reconstruction loss; the SR reconstruction penalty is essentially an L 1 Loss ofA function; using pixel level L between super resolution and ground truth binocular image 1 Distance, PSNR is obtained; the expression is as follows:
in the method, in the process of the application,super-resolution left and right images generated for the model, respectively,>respectively, its high resolution image;
frequency domain loss; introducing frequency domain loss of frequency Charbonnier loss; the expression is as follows:
in the formula, the constant epsilon is experimentally set to be 10 -3 The method comprises the steps of carrying out a first treatment on the surface of the FFT (·) represents the fast Fourier transform.
5. The binocular image super resolution reconstruction method based on the multi-stage enhanced attention mechanism of claim 1, wherein in step S4, adamW is adopted for optimization, wherein β1=0.9, β2=0.9, and the weight is default to 0; the learning rate is initially set to 1×10 -3 And reduced to 1 x 10 by cosine annealing strategy -7 The method comprises the steps of carrying out a first treatment on the surface of the The model was trained on 30 x 90 patches; during training, 32 samples of each batch are evenly distributed to 8 parts, iterating 2×10 5 And twice.
CN202310853109.9A 2023-07-12 2023-07-12 Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism Pending CN116797461A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310853109.9A CN116797461A (en) 2023-07-12 2023-07-12 Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310853109.9A CN116797461A (en) 2023-07-12 2023-07-12 Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism

Publications (1)

Publication Number Publication Date
CN116797461A true CN116797461A (en) 2023-09-22

Family

ID=88036723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310853109.9A Pending CN116797461A (en) 2023-07-12 2023-07-12 Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism

Country Status (1)

Country Link
CN (1) CN116797461A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274316A (en) * 2023-10-31 2023-12-22 广东省水利水电科学研究院 A method, device, equipment and storage medium for estimating river surface flow velocity
CN117788296A (en) * 2024-02-23 2024-03-29 北京理工大学 Super-resolution reconstruction method of infrared remote sensing images based on heterogeneous combined deep network
CN118212476A (en) * 2024-05-20 2024-06-18 山东云海国创云计算装备产业创新中心有限公司 Image classification method, product and storage medium
CN118297808A (en) * 2024-06-06 2024-07-05 山东大学 Binocular image super-resolution reconstruction method and system based on parallax guidance
CN118608389A (en) * 2024-05-24 2024-09-06 四川新视创伟超高清科技有限公司 Real-time dynamic super-resolution image reconstruction method and reconstruction system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274316A (en) * 2023-10-31 2023-12-22 广东省水利水电科学研究院 A method, device, equipment and storage medium for estimating river surface flow velocity
CN117274316B (en) * 2023-10-31 2024-05-03 广东省水利水电科学研究院 River surface flow velocity estimation method, device, equipment and storage medium
CN117788296A (en) * 2024-02-23 2024-03-29 北京理工大学 Super-resolution reconstruction method of infrared remote sensing images based on heterogeneous combined deep network
CN117788296B (en) * 2024-02-23 2024-05-07 北京理工大学 Infrared remote sensing image super-resolution reconstruction method based on heterogeneous combined depth network
CN118212476A (en) * 2024-05-20 2024-06-18 山东云海国创云计算装备产业创新中心有限公司 Image classification method, product and storage medium
CN118608389A (en) * 2024-05-24 2024-09-06 四川新视创伟超高清科技有限公司 Real-time dynamic super-resolution image reconstruction method and reconstruction system
CN118297808A (en) * 2024-06-06 2024-07-05 山东大学 Binocular image super-resolution reconstruction method and system based on parallax guidance
CN118297808B (en) * 2024-06-06 2024-08-13 山东大学 Binocular image super-resolution reconstruction method and system based on parallax guidance

Similar Documents

Publication Publication Date Title
CN116797461A (en) Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism
CN109671022B (en) A super-resolution method for image texture enhancement based on deep feature translation network
Wen et al. Image recovery via transform learning and low-rank modeling: The power of complementary regularizers
CN110738605A (en) Image denoising method, system, device and medium based on transfer learning
CN115409733A (en) Low-dose CT image noise reduction method based on image enhancement and diffusion model
CN107358576A (en) Depth map super resolution ratio reconstruction method based on convolutional neural networks
CN103077506B (en) In conjunction with local and non-local adaptive denoising method
CN113706386A (en) Super-resolution reconstruction method based on attention mechanism
CN112819737A (en) Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN113362241B (en) Depth map denoising method combining high-low frequency decomposition and two-stage fusion strategy
CN117726540A (en) An image denoising method that enhances gated Transformer
CN115393186A (en) Face image super-resolution reconstruction method, system, device and medium
CN115311144A (en) Wavelet domain-based standard flow super-resolution image reconstruction method
Chen et al. Attentional coarse-and-fine generative adversarial networks for image inpainting
CN116630964A (en) A Food Image Segmentation Method Based on Discrete Wavelet Attention Network
CN113627487A (en) Super-resolution reconstruction method based on deep attention mechanism
Yuan et al. Unsupervised real image super-resolution via knowledge distillation network
Kim et al. Infrared and visible image fusion using a guiding network to leverage perceptual similarity
Xu et al. Depth map super-resolution via joint local gradient and nonlocal structural regularizations
Zou et al. EDCNN: A novel network for image denoising
CN115293966A (en) Face image reconstruction method and device and storage medium
CN113486928B (en) Multi-view image alignment method based on rational polynomial model differentiable tensor expression
Li et al. High fidelity single image blind deblur via GAN
Feng et al. Research on Low Resolution Digital Image Reconstruction Method Based on Rational Function Model.
CN111489306A (en) Image denoising method based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination