CN111833386A - A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism - Google Patents

A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism Download PDF

Info

Publication number
CN111833386A
CN111833386A CN202010707918.5A CN202010707918A CN111833386A CN 111833386 A CN111833386 A CN 111833386A CN 202010707918 A CN202010707918 A CN 202010707918A CN 111833386 A CN111833386 A CN 111833386A
Authority
CN
China
Prior art keywords
attention mechanism
information
pyramid
network
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010707918.5A
Other languages
Chinese (zh)
Inventor
郑秋梅
温阳
王风华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202010707918.5A priority Critical patent/CN111833386A/en
Publication of CN111833386A publication Critical patent/CN111833386A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

本发明涉及一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法,属于计算机视觉中的双目视觉领域。本发明首先采用具有不同尺寸卷积核的通道注意力机制从原始图像中获得像素之间的信息,之后采用空洞空间金字塔模块扩大感受野获取多尺度信息,最后通过三维通道注意力机制和堆叠的可分离的编解码结构计算视差获取深度信息。使用本发明对校正后的双目图像进行深度测试,结果表明,该算法不仅可以提高匹配的准确率同时也减少了模型的参数量和计算量,缩短了运行时间。The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision. The present invention first uses the channel attention mechanism with different size convolution kernels to obtain the information between pixels from the original image, then uses the empty space pyramid module to expand the receptive field to obtain multi-scale information, and finally uses the three-dimensional channel attention mechanism and the stacked The separable codec structure computes disparity to obtain depth information. Using the present invention to perform depth test on the corrected binocular image, the results show that the algorithm can not only improve the matching accuracy, but also reduce the amount of parameters and calculation of the model, and shorten the running time.

Description

一种基于多尺度信息和注意力机制的金字塔双目立体匹配 方法A Pyramid Binocular Stereo Matching Based on Multi-scale Information and Attention Mechanism method

技术领域technical field

本发明涉及一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法,属于计算机视觉中的双目视觉领域。The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision.

背景技术Background technique

作为计算机视觉领域的研究热点,双目立体匹配在三维重建、自动驾驶和移动机器人等领域有着广泛的应用。对于由双目相机所捕获的一组经过校正的立体图像,立体匹配的实质是计算出图像中每个像素的视差。一般来讲,立体匹配算法主要分为两类,一类是传统算法,另一类是基于卷积神经网络的方法。由于采用手工选取特征的方法限制了传统立体匹配方法的发展。随着深度学习的发展,卷积神经网络展现出了强大的计算能力和特征提取能力。因此,目前的研究主要集中于以神经网络为基础的方法。但是如何提高网络的信息提取能力,实现在病态区域(如弱纹理区域、反光表面等)得到精确的视差图仍有一定的困难。As a research hotspot in the field of computer vision, binocular stereo matching has a wide range of applications in 3D reconstruction, autonomous driving, and mobile robots. For a set of rectified stereo images captured by a binocular camera, the essence of stereo matching is to calculate the disparity of each pixel in the image. Generally speaking, stereo matching algorithms are mainly divided into two categories, one is the traditional algorithm, and the other is the method based on the convolutional neural network. The development of traditional stereo matching methods is limited due to the method of manually selecting features. With the development of deep learning, convolutional neural networks have demonstrated powerful computing power and feature extraction capabilities. Therefore, current research mainly focuses on neural network-based methods. However, it is still difficult to improve the information extraction ability of the network and obtain accurate disparity maps in ill-conditioned areas (such as weak texture areas, reflective surfaces, etc.).

发明内容SUMMARY OF THE INVENTION

针对上述问题,本发明提出了一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法。该方法由三个模块组成:自适应特征提取模块、上下文信息提取模块和视差计算模块。为达到上述目的,本发明的技术方案为:In view of the above problems, the present invention proposes a pyramid binocular stereo matching method based on multi-scale information and attention mechanism. The method consists of three modules: adaptive feature extraction module, context information extraction module and disparity calculation module. For achieving the above object, the technical scheme of the present invention is:

一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法,包括下述步骤:A pyramid binocular stereo matching method based on multi-scale information and attention mechanism, comprising the following steps:

步骤一、获取参数校正后的双目图像;Step 1: Obtain the binocular image after parameter correction;

步骤二、采用自适应特征提取模块获取加权后的通道维度的特征。自适应特征提取模块以残差网络块作为基础,增加了具有多卷积核的通道注意力模块可以使网络获得不同尺度的特征,并且可以提高含有有效信息丰富的特征所占有的权重,有利于提高后续的匹配精度,采用卷积层处理全局池化后的特征以提高网络的学习能力,使用PReLU函数作为激活函数保留更多的细节信息;Step 2, using an adaptive feature extraction module to obtain the features of the weighted channel dimension. The adaptive feature extraction module is based on the residual network block, and the channel attention module with multiple convolution kernels is added, which can enable the network to obtain features of different scales, and can increase the weight occupied by the features with rich effective information, which is beneficial to Improve the subsequent matching accuracy, use the convolution layer to process the global pooled features to improve the learning ability of the network, and use the PReLU function as the activation function to retain more detailed information;

步骤三、采用具有四个不同空洞率的卷积分支和全局平均池化层的空洞空间金字塔池化结构作为上下文信息提取模块获取图像中的多尺度上下文信息和全局上下文信息作为图像维度的信息,以此提高网络在处理病态区域的精度;Step 3: Adopt the convolution branch with four different dilation rates and the hole space pyramid pooling structure of the global average pooling layer as the context information extraction module to obtain the multi-scale context information and global context information in the image as the information of the image dimension, In this way, the accuracy of the network in dealing with ill-conditioned regions is improved;

步骤四、将步骤一中的特征与步骤二中的特征融合构造匹配代价卷,在视差计算模块采用三维可分离卷积处理所构造的匹配代价卷计算深度信息,且视差计算模块只保留各个编解码结构之间的链接,对于编解码模块内部之间除了所加入的通道注意力模块之外不存在其余的跳跃链接,可以有效降低网络的参数;同时为了保证在参数减少的情况下不损失匹配精度,还在视差计算模块增加了三维通道注意力机制。Step 4: Integrate the features in step 1 with the features in step 2 to construct a matching cost volume, and use the matching cost volume constructed by three-dimensional separable convolution processing in the disparity calculation module to calculate the depth information, and the disparity calculation module only retains each code. For the link between decoding structures, there are no other jump links between the encoding and decoding modules except the added channel attention module, which can effectively reduce the parameters of the network; at the same time, in order to ensure that the matching is not lost when the parameters are reduced Accuracy, and also added a 3D channel attention mechanism to the disparity calculation module.

有益效果:Beneficial effects:

本发明提出了一种新的端到端立体匹配网络。通过设计自适应特征提取模块和多尺度信息提取模块,实现了局部特征和全局特征的提取。然后利用三维深度卷积和三维通道注意力机制构造视差计算模块,增加了网络的宽度,有利于恢复图像细节,从而提高在病态区域的匹配精度。The present invention proposes a new end-to-end stereo matching network. By designing an adaptive feature extraction module and a multi-scale information extraction module, the extraction of local features and global features is realized. Then, a disparity calculation module is constructed by using 3D depthwise convolution and 3D channel attention mechanism, which increases the width of the network, which is beneficial to recover image details, thereby improving the matching accuracy in ill-conditioned regions.

附图说明Description of drawings

图1是本发明中的算法流程图Fig. 1 is the algorithm flow chart in the present invention

图2是本发明中的特征提取模块示意图2 is a schematic diagram of a feature extraction module in the present invention

图3是本发明中的视差计算模块示意图3 is a schematic diagram of a disparity calculation module in the present invention

具体实施方式Detailed ways

本发明公开了一种基于多尺度上下文注意力机制的金字塔双目立体匹配方法,本方法在保证图像像素间细节信息的同时充分考虑图像中的上下文信息;同时设计基于可分离卷积和通道注意力机制的视差计算模块计算深度信息获得视差图。下面对本方法做进一步说明:The invention discloses a pyramid binocular stereo matching method based on a multi-scale context attention mechanism. The method fully considers the context information in the image while ensuring the detailed information between image pixels; meanwhile, the method is designed based on separable convolution and channel attention The disparity calculation module of the force mechanism calculates the depth information to obtain the disparity map. The method is further described below:

本方法提出了一种用于双目立体匹配的自适应特征提取模块。首先采用全连接层对网络进行降维压缩,利用通道注意力机制对压缩后的特征进行处理,生成相应的通道维数权值。为了使网络具有更好的学习能力,使用卷积层代替全连接层。由于PRELU函数在负数部分采用的是线性运算,可以解决ReLU函数在负数部分的神经元死亡问题。因此选择PRELU函数作为激活函数来保留更多的特征信息;This method proposes an adaptive feature extraction module for binocular stereo matching. First, the fully connected layer is used to reduce the dimension of the network, and the channel attention mechanism is used to process the compressed features to generate the corresponding channel dimension weights. To make the network have better learning ability, use convolutional layers instead of fully connected layers. Since the PRELU function uses a linear operation in the negative part, it can solve the problem of neuron death in the negative part of the ReLU function. Therefore, the PRELU function is selected as the activation function to retain more feature information;

提高立体匹配的精度还需要提高多尺度特征提取的能力,采用空洞空间金字塔结构构建上下文信息提取模块:该模块由四个具有不同空洞率的卷积分支与一个全局平均池化层构成。这种设计可以在四个分支上获得不同尺度的信息同时又可以通过全局平均池化层获取图像级别的信息;To improve the accuracy of stereo matching, it is also necessary to improve the ability of multi-scale feature extraction. A context information extraction module is constructed with a hole space pyramid structure: this module consists of four convolution branches with different hole rates and a global average pooling layer. This design can obtain information of different scales on the four branches and at the same time obtain image-level information through the global average pooling layer;

随着维数和网络层数的增加,视差计算模块生成大量的参数。这增加了网络的计算时间。为了减少网络参数,节省计算时间,在视差计算模块只保留各个编解码结构之间的链接,对于编解码模块内部之间除了所加入的通道注意力模块之外不存在其余的跳跃链接,同时采用三维可分离卷积构造视差计算单元。在视差计算模块中增加三维通道注意力机制保证网络在参数减少时匹配精度不会降低。As the number of dimensions and network layers increases, the disparity calculation module generates a large number of parameters. This increases the computation time of the network. In order to reduce network parameters and save calculation time, only the links between the codec structures are retained in the disparity calculation module. There are no other jump links between the codec modules except the added channel attention module. At the same time, the use of The 3D separable convolution constructs the disparity computation unit. Adding a three-dimensional channel attention mechanism to the disparity calculation module ensures that the matching accuracy of the network will not decrease when the parameters are reduced.

Claims (3)

1.一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法,其特征在于,包括如下步骤:1. a pyramid binocular stereo matching method based on multi-scale information and attention mechanism, is characterized in that, comprises the steps: 步骤一、获取参数校正后的双目图像;Step 1: Obtain the binocular image after parameter correction; 步骤二、采用自适应特征提取模块获取加权后的通道维度的特征。自适应特征提取模块通过在同一个网络层中设置不同尺寸的卷积核使网络获得不同尺度的特征,采用卷积层处理全局池化后的特征以提高网络的学习能力,使用PReLU函数作为激活函数保留更多的细节信息;Step 2, using an adaptive feature extraction module to obtain the features of the weighted channel dimension. The adaptive feature extraction module enables the network to obtain features of different scales by setting convolution kernels of different sizes in the same network layer. The convolution layer is used to process the global pooled features to improve the learning ability of the network, and the PReLU function is used as activation. The function retains more detailed information; 步骤三、采用具有四个不同空洞率的卷积分支和全局平均池化层的空洞空间金字塔池化结构作为上下文信息提取模块获取图像中的多尺度上下文信息和全局上下文信息作为图像维度的信息;Step 3, adopting the convolution branch with four different dilation rates and the hole space pyramid pooling structure of the global average pooling layer as the context information extraction module to obtain the multi-scale context information and global context information in the image as the information of the image dimension; 步骤四、将步骤一中的特征与步骤二中的特征融合构造匹配代价卷,在视差计算模块采用三维可分离卷积和三维通道注意力机制处理所构造的匹配代价卷计算深度信息,在保证匹配精度的同时降低了网络的参数,同时视差计算模块只保留各个编解码结构之间的链接,对于编解码模块内部之间除了所加入的通道注意力模块之外不存在其余的跳跃链接,以此可以有效地降低模型的计算参数。Step 4: Integrate the features in step 1 with the features in step 2 to construct a matching cost volume, and use the three-dimensional separable convolution and three-dimensional channel attention mechanism to process the constructed matching cost volume in the disparity calculation module to calculate the depth information. The matching accuracy reduces the parameters of the network at the same time, and the disparity calculation module only retains the links between the various codec structures. There are no other jump links between the codec modules except the added channel attention module. This can effectively reduce the computational parameters of the model. 2.如权利要求1所述的基于多尺度上下文注意力机制的金字塔双目立体匹配方法,其特征在于,设计自适应特征提取模块和基于图像的信息提取模块,实现了局部特征和全局特征的提取。2. the pyramid binocular stereo matching method based on the multi-scale context attention mechanism as claimed in claim 1, it is characterized in that, designing adaptive feature extraction module and image-based information extraction module, has realized local feature and global feature. extract. 3.如权利要求1所述的基于多尺度上下文注意力机制的金字塔双目立体匹配方法,其特征在于,利用三维深度卷积和三维通道注意力机制构造视差计算模块,且只模型不存在编解码模块内部的跳跃链接,既可以增加网络宽度,有利于恢复图像细节,从而提高在病态区域的匹配精度,又可以降低模型的计算参数。3. the pyramid binocular stereo matching method based on multi-scale context attention mechanism as claimed in claim 1, it is characterized in that, utilize 3D depth convolution and 3D channel attention mechanism to construct disparity calculation module, and only model does not have code. The skip link inside the decoding module can not only increase the width of the network, but also help to restore the image details, thereby improving the matching accuracy in the ill-conditioned area, and reducing the calculation parameters of the model.
CN202010707918.5A 2020-07-22 2020-07-22 A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism Pending CN111833386A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010707918.5A CN111833386A (en) 2020-07-22 2020-07-22 A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010707918.5A CN111833386A (en) 2020-07-22 2020-07-22 A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism

Publications (1)

Publication Number Publication Date
CN111833386A true CN111833386A (en) 2020-10-27

Family

ID=72924570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010707918.5A Pending CN111833386A (en) 2020-07-22 2020-07-22 A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism

Country Status (1)

Country Link
CN (1) CN111833386A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991422A (en) * 2021-04-27 2021-06-18 杭州云智声智能科技有限公司 Stereo matching method and system based on void space pyramid pooling
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN114445480A (en) * 2022-01-26 2022-05-06 安徽大学 Transformer-based thermal infrared image stereo matching method and device
CN115375930A (en) * 2022-10-26 2022-11-22 中国航发四川燃气涡轮研究院 Stereo matching network and stereo matching method based on multi-scale information
CN115965578A (en) * 2022-11-09 2023-04-14 国网江西省电力有限公司超高压分公司 A binocular stereo matching detection method and device based on channel attention mechanism
CN116128946A (en) * 2022-12-09 2023-05-16 东南大学 Binocular infrared depth estimation method based on edge guiding and attention mechanism
CN118570492A (en) * 2024-07-25 2024-08-30 长春工程学院 Deep stereo matching method based on PSMNet optimized feature extraction

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN112991422A (en) * 2021-04-27 2021-06-18 杭州云智声智能科技有限公司 Stereo matching method and system based on void space pyramid pooling
CN114445480A (en) * 2022-01-26 2022-05-06 安徽大学 Transformer-based thermal infrared image stereo matching method and device
CN115375930A (en) * 2022-10-26 2022-11-22 中国航发四川燃气涡轮研究院 Stereo matching network and stereo matching method based on multi-scale information
CN115375930B (en) * 2022-10-26 2023-05-05 中国航发四川燃气涡轮研究院 Three-dimensional matching network and three-dimensional matching method based on multi-scale information
CN115965578A (en) * 2022-11-09 2023-04-14 国网江西省电力有限公司超高压分公司 A binocular stereo matching detection method and device based on channel attention mechanism
CN115965578B (en) * 2022-11-09 2025-06-13 国网江西省电力有限公司超高压分公司 A binocular stereo matching detection method and device based on channel attention mechanism
CN116128946A (en) * 2022-12-09 2023-05-16 东南大学 Binocular infrared depth estimation method based on edge guiding and attention mechanism
CN116128946B (en) * 2022-12-09 2024-02-09 东南大学 A binocular infrared depth estimation method based on edge guidance and attention mechanism
CN118570492A (en) * 2024-07-25 2024-08-30 长春工程学院 Deep stereo matching method based on PSMNet optimized feature extraction
CN118570492B (en) * 2024-07-25 2024-11-22 长春工程学院 Depth stereo matching method based on PSMNet optimized feature extraction

Similar Documents

Publication Publication Date Title
CN111833386A (en) A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism
CN112150521B (en) Image stereo matching method based on PSMNet optimization
CN109635882B (en) A salient object detection method based on multi-scale convolutional feature extraction and fusion
CN111582483B (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN109005398B (en) A Disparity Matching Method of Stereo Image Based on Convolutional Neural Network
CN107909150B (en) Method and system for online training of CNN based on block-wise stochastic gradient descent
CN113066089B (en) A Real-time Image Semantic Segmentation Method Based on Attention Guidance Mechanism
CN109472819A (en) A Binocular Disparity Estimation Method Based on Cascaded Geometric Context Neural Networks
CN110674742B (en) Remote sensing image road extraction method based on DLinkNet
CN113592026A (en) Binocular vision stereo matching method based on void volume and cascade cost volume
CN111553296B (en) A Binary Neural Network Stereo Vision Matching Method Based on FPGA
CN113902807B (en) A 3D reconstruction method for electronic components based on semi-supervised learning
CN110956655B (en) Dense depth estimation method based on monocular image
CN110348299A (en) The recognition methods of three-dimension object
CN116188695A (en) Construction method of three-dimensional hand gesture model and three-dimensional hand gesture estimation method
CN113298097A (en) Feature point extraction method and device based on convolutional neural network and storage medium
CN117576402A (en) A multi-scale aggregation Transformer remote sensing image semantic segmentation method based on deep learning
CN115376209B (en) A 3D human pose estimation method based on deep learning
CN117475182A (en) Stereo matching method based on multi-feature aggregation
CN113239771A (en) Attitude estimation method, system and application thereof
CN111462211B (en) A Binocular Disparity Calculation Method Based on Convolutional Neural Network
CN119516115A (en) A three-dimensional human body point cloud completion method, system, medium and device based on Transformer
CN114913604A (en) A gesture recognition method based on two-stage pooling S2E module
CN117765273B (en) Real-time stereo matching method based on multi-scale and multi-class cost volumes
CN118485783A (en) Multi-view 3D reconstruction method and system based on visual center and implicit attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201027

WD01 Invention patent application deemed withdrawn after publication