CN111833386A

CN111833386A - A Pyramid Binocular Stereo Matching Method Based on Multi-scale Information and Attention Mechanism

Info

Publication number: CN111833386A
Application number: CN202010707918.5A
Authority: CN
Inventors: 郑秋梅; 温阳; 王风华
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-27

Abstract

本发明涉及一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法，属于计算机视觉中的双目视觉领域。本发明首先采用具有不同尺寸卷积核的通道注意力机制从原始图像中获得像素之间的信息，之后采用空洞空间金字塔模块扩大感受野获取多尺度信息，最后通过三维通道注意力机制和堆叠的可分离的编解码结构计算视差获取深度信息。使用本发明对校正后的双目图像进行深度测试，结果表明，该算法不仅可以提高匹配的准确率同时也减少了模型的参数量和计算量，缩短了运行时间。The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision. The present invention first uses the channel attention mechanism with different size convolution kernels to obtain the information between pixels from the original image, then uses the empty space pyramid module to expand the receptive field to obtain multi-scale information, and finally uses the three-dimensional channel attention mechanism and the stacked The separable codec structure computes disparity to obtain depth information. Using the present invention to perform depth test on the corrected binocular image, the results show that the algorithm can not only improve the matching accuracy, but also reduce the amount of parameters and calculation of the model, and shorten the running time.

Description

A Pyramid Binocular Stereo Matching Based on Multi-scale Information and Attention Mechanism method

技术领域technical field

本发明涉及一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法，属于计算机视觉中的双目视觉领域。The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision.

背景技术Background technique

作为计算机视觉领域的研究热点，双目立体匹配在三维重建、自动驾驶和移动机器人等领域有着广泛的应用。对于由双目相机所捕获的一组经过校正的立体图像，立体匹配的实质是计算出图像中每个像素的视差。一般来讲，立体匹配算法主要分为两类，一类是传统算法，另一类是基于卷积神经网络的方法。由于采用手工选取特征的方法限制了传统立体匹配方法的发展。随着深度学习的发展，卷积神经网络展现出了强大的计算能力和特征提取能力。因此，目前的研究主要集中于以神经网络为基础的方法。但是如何提高网络的信息提取能力，实现在病态区域(如弱纹理区域、反光表面等)得到精确的视差图仍有一定的困难。As a research hotspot in the field of computer vision, binocular stereo matching has a wide range of applications in 3D reconstruction, autonomous driving, and mobile robots. For a set of rectified stereo images captured by a binocular camera, the essence of stereo matching is to calculate the disparity of each pixel in the image. Generally speaking, stereo matching algorithms are mainly divided into two categories, one is the traditional algorithm, and the other is the method based on the convolutional neural network. The development of traditional stereo matching methods is limited due to the method of manually selecting features. With the development of deep learning, convolutional neural networks have demonstrated powerful computing power and feature extraction capabilities. Therefore, current research mainly focuses on neural network-based methods. However, it is still difficult to improve the information extraction ability of the network and obtain accurate disparity maps in ill-conditioned areas (such as weak texture areas, reflective surfaces, etc.).

发明内容SUMMARY OF THE INVENTION

针对上述问题，本发明提出了一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法。该方法由三个模块组成:自适应特征提取模块、上下文信息提取模块和视差计算模块。为达到上述目的，本发明的技术方案为：In view of the above problems, the present invention proposes a pyramid binocular stereo matching method based on multi-scale information and attention mechanism. The method consists of three modules: adaptive feature extraction module, context information extraction module and disparity calculation module. For achieving the above object, the technical scheme of the present invention is:

一种基于多尺度信息和注意力机制的金字塔双目立体匹配方法，包括下述步骤：A pyramid binocular stereo matching method based on multi-scale information and attention mechanism, comprising the following steps:

步骤一、获取参数校正后的双目图像；Step 1: Obtain the binocular image after parameter correction;

步骤二、采用自适应特征提取模块获取加权后的通道维度的特征。自适应特征提取模块以残差网络块作为基础，增加了具有多卷积核的通道注意力模块可以使网络获得不同尺度的特征，并且可以提高含有有效信息丰富的特征所占有的权重，有利于提高后续的匹配精度，采用卷积层处理全局池化后的特征以提高网络的学习能力，使用PReLU函数作为激活函数保留更多的细节信息；Step 2, using an adaptive feature extraction module to obtain the features of the weighted channel dimension. The adaptive feature extraction module is based on the residual network block, and the channel attention module with multiple convolution kernels is added, which can enable the network to obtain features of different scales, and can increase the weight occupied by the features with rich effective information, which is beneficial to Improve the subsequent matching accuracy, use the convolution layer to process the global pooled features to improve the learning ability of the network, and use the PReLU function as the activation function to retain more detailed information;

步骤三、采用具有四个不同空洞率的卷积分支和全局平均池化层的空洞空间金字塔池化结构作为上下文信息提取模块获取图像中的多尺度上下文信息和全局上下文信息作为图像维度的信息，以此提高网络在处理病态区域的精度；Step 3: Adopt the convolution branch with four different dilation rates and the hole space pyramid pooling structure of the global average pooling layer as the context information extraction module to obtain the multi-scale context information and global context information in the image as the information of the image dimension, In this way, the accuracy of the network in dealing with ill-conditioned regions is improved;

步骤四、将步骤一中的特征与步骤二中的特征融合构造匹配代价卷，在视差计算模块采用三维可分离卷积处理所构造的匹配代价卷计算深度信息，且视差计算模块只保留各个编解码结构之间的链接，对于编解码模块内部之间除了所加入的通道注意力模块之外不存在其余的跳跃链接，可以有效降低网络的参数；同时为了保证在参数减少的情况下不损失匹配精度，还在视差计算模块增加了三维通道注意力机制。Step 4: Integrate the features in step 1 with the features in step 2 to construct a matching cost volume, and use the matching cost volume constructed by three-dimensional separable convolution processing in the disparity calculation module to calculate the depth information, and the disparity calculation module only retains each code. For the link between decoding structures, there are no other jump links between the encoding and decoding modules except the added channel attention module, which can effectively reduce the parameters of the network; at the same time, in order to ensure that the matching is not lost when the parameters are reduced Accuracy, and also added a 3D channel attention mechanism to the disparity calculation module.

有益效果：Beneficial effects:

本发明提出了一种新的端到端立体匹配网络。通过设计自适应特征提取模块和多尺度信息提取模块，实现了局部特征和全局特征的提取。然后利用三维深度卷积和三维通道注意力机制构造视差计算模块，增加了网络的宽度，有利于恢复图像细节，从而提高在病态区域的匹配精度。The present invention proposes a new end-to-end stereo matching network. By designing an adaptive feature extraction module and a multi-scale information extraction module, the extraction of local features and global features is realized. Then, a disparity calculation module is constructed by using 3D depthwise convolution and 3D channel attention mechanism, which increases the width of the network, which is beneficial to recover image details, thereby improving the matching accuracy in ill-conditioned regions.

附图说明Description of drawings

图1是本发明中的算法流程图Fig. 1 is the algorithm flow chart in the present invention

图2是本发明中的特征提取模块示意图2 is a schematic diagram of a feature extraction module in the present invention

图3是本发明中的视差计算模块示意图3 is a schematic diagram of a disparity calculation module in the present invention

具体实施方式Detailed ways

本发明公开了一种基于多尺度上下文注意力机制的金字塔双目立体匹配方法，本方法在保证图像像素间细节信息的同时充分考虑图像中的上下文信息；同时设计基于可分离卷积和通道注意力机制的视差计算模块计算深度信息获得视差图。下面对本方法做进一步说明：The invention discloses a pyramid binocular stereo matching method based on a multi-scale context attention mechanism. The method fully considers the context information in the image while ensuring the detailed information between image pixels; meanwhile, the method is designed based on separable convolution and channel attention The disparity calculation module of the force mechanism calculates the depth information to obtain the disparity map. The method is further described below:

本方法提出了一种用于双目立体匹配的自适应特征提取模块。首先采用全连接层对网络进行降维压缩，利用通道注意力机制对压缩后的特征进行处理，生成相应的通道维数权值。为了使网络具有更好的学习能力，使用卷积层代替全连接层。由于PRELU函数在负数部分采用的是线性运算，可以解决ReLU函数在负数部分的神经元死亡问题。因此选择PRELU函数作为激活函数来保留更多的特征信息；This method proposes an adaptive feature extraction module for binocular stereo matching. First, the fully connected layer is used to reduce the dimension of the network, and the channel attention mechanism is used to process the compressed features to generate the corresponding channel dimension weights. To make the network have better learning ability, use convolutional layers instead of fully connected layers. Since the PRELU function uses a linear operation in the negative part, it can solve the problem of neuron death in the negative part of the ReLU function. Therefore, the PRELU function is selected as the activation function to retain more feature information;

提高立体匹配的精度还需要提高多尺度特征提取的能力，采用空洞空间金字塔结构构建上下文信息提取模块：该模块由四个具有不同空洞率的卷积分支与一个全局平均池化层构成。这种设计可以在四个分支上获得不同尺度的信息同时又可以通过全局平均池化层获取图像级别的信息；To improve the accuracy of stereo matching, it is also necessary to improve the ability of multi-scale feature extraction. A context information extraction module is constructed with a hole space pyramid structure: this module consists of four convolution branches with different hole rates and a global average pooling layer. This design can obtain information of different scales on the four branches and at the same time obtain image-level information through the global average pooling layer;

随着维数和网络层数的增加，视差计算模块生成大量的参数。这增加了网络的计算时间。为了减少网络参数，节省计算时间，在视差计算模块只保留各个编解码结构之间的链接，对于编解码模块内部之间除了所加入的通道注意力模块之外不存在其余的跳跃链接，同时采用三维可分离卷积构造视差计算单元。在视差计算模块中增加三维通道注意力机制保证网络在参数减少时匹配精度不会降低。As the number of dimensions and network layers increases, the disparity calculation module generates a large number of parameters. This increases the computation time of the network. In order to reduce network parameters and save calculation time, only the links between the codec structures are retained in the disparity calculation module. There are no other jump links between the codec modules except the added channel attention module. At the same time, the use of The 3D separable convolution constructs the disparity computation unit. Adding a three-dimensional channel attention mechanism to the disparity calculation module ensures that the matching accuracy of the network will not decrease when the parameters are reduced.

Claims

1. a pyramid binocular stereo matching method based on multi-scale information and attention mechanism, is characterized in that, comprises the steps:

Step 1: Obtain the binocular image after parameter correction;

Step 2, using an adaptive feature extraction module to obtain the features of the weighted channel dimension. The adaptive feature extraction module enables the network to obtain features of different scales by setting convolution kernels of different sizes in the same network layer. The convolution layer is used to process the global pooled features to improve the learning ability of the network, and the PReLU function is used as activation. The function retains more detailed information;

Step 3, adopting the convolution branch with four different dilation rates and the hole space pyramid pooling structure of the global average pooling layer as the context information extraction module to obtain the multi-scale context information and global context information in the image as the information of the image dimension;

Step 4: Integrate the features in step 1 with the features in step 2 to construct a matching cost volume, and use the three-dimensional separable convolution and three-dimensional channel attention mechanism to process the constructed matching cost volume in the disparity calculation module to calculate the depth information. The matching accuracy reduces the parameters of the network at the same time, and the disparity calculation module only retains the links between the various codec structures. There are no other jump links between the codec modules except the added channel attention module. This can effectively reduce the computational parameters of the model.

2. the pyramid binocular stereo matching method based on the multi-scale context attention mechanism as claimed in claim 1, it is characterized in that, designing adaptive feature extraction module and image-based information extraction module, has realized local feature and global feature. extract.

3. the pyramid binocular stereo matching method based on multi-scale context attention mechanism as claimed in claim 1, it is characterized in that, utilize 3D depth convolution and 3D channel attention mechanism to construct disparity calculation module, and only model does not have code. The skip link inside the decoding module can not only increase the width of the network, but also help to restore the image details, thereby improving the matching accuracy in the ill-conditioned area, and reducing the calculation parameters of the model.