WO2022089077A1 - 一种基于自适应候选视差预测网络的实时双目立体匹配方法 - Google Patents

一种基于自适应候选视差预测网络的实时双目立体匹配方法 Download PDF

Info

Publication number
WO2022089077A1
WO2022089077A1 PCT/CN2021/118609 CN2021118609W WO2022089077A1 WO 2022089077 A1 WO2022089077 A1 WO 2022089077A1 CN 2021118609 W CN2021118609 W CN 2021118609W WO 2022089077 A1 WO2022089077 A1 WO 2022089077A1
Authority
WO
WIPO (PCT)
Prior art keywords
disparity
map
parallax
estimation
real
Prior art date
Application number
PCT/CN2021/118609
Other languages
English (en)
French (fr)
Inventor
张旭翀
孙宏滨
戴赫
赵永利
郑南宁
Original Assignee
西安交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安交通大学 filed Critical 西安交通大学
Publication of WO2022089077A1 publication Critical patent/WO2022089077A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the invention belongs to the technical field of computer vision, and in particular relates to a real-time binocular stereo matching method based on an adaptive candidate disparity prediction network.
  • the binocular stereo vision system has a wide range of applications in robot navigation, intelligent monitoring, automatic driving and other application fields. Therefore, accurate and fast binocular stereo matching is of great significance for the real-time deployment of stereo vision systems in mobile devices.
  • binocular stereo matching based on deep learning technology has benefited from the continuous innovation of neural network models, and the processing accuracy of its algorithms has been significantly improved.
  • current high-precision stereo matching networks usually require a large amount of memory and computing resources, which makes it difficult to apply existing methods on resource-constrained mobile platforms.
  • the end-to-end binocular stereo matching network mainly includes the steps of feature extraction, aggregation cost construction, matching cost aggregation, and disparity regression/optimization.
  • the matching cost aggregation step plays a decisive role in the calculation speed and resource consumption of the model, so the reasonable optimization of this step becomes the key to the lightweight design of the network.
  • existing methods mainly use a coarse-to-fine (Coarse-to-fine) disparity estimation strategy to greatly reduce the computational complexity of the cost aggregation step.
  • the method first performs a full parallax range search at small resolution to obtain the coarse parallax estimation result, then upsampling step by step, and finely corrects the coarse estimated parallax with a very small amount of parallax offset at large resolution , so the calculation speed is significantly improved.
  • the existing methods all use a fixed offset method to provide candidate disparity for the fine estimation stage, which limits the candidate value to a small local range of the coarse disparity estimation result, which makes it difficult for disparity correction to meet the needs of different targets in different scenes. Therefore, the quality of disparity maps of existing methods is relatively poor.
  • the existing coarse-to-fine methods usually adopt multi-stage (generally ⁇ 3 levels) processing to obtain more accurate disparity.
  • multi-stage generally ⁇ 3 levels
  • the computational speed decreases significantly.
  • the existing lightweight binocular stereo matching network using the coarse-to-fine strategy is still difficult to meet the real-time requirements of mobile devices for stereo vision in terms of computational accuracy and speed.
  • the purpose of the present invention is to propose a real-time binocular stereo matching method based on an adaptive candidate disparity prediction network to overcome the shortcomings of the prior art.
  • the present invention uses the coarse disparity estimation result and the original image information to dynamically predict the disparity offset required in the fine estimation stage for each pixel, thereby adapting to the different requirements of different target objects for the disparity correction range.
  • the present invention designs a two-stage processing structure to improve the computational accuracy and speed of the binocular stereo matching network.
  • the present invention adopts the following technical solutions to realize:
  • a real-time binocular stereo matching method based on an adaptive candidate disparity prediction network comprising:
  • multi-scale feature extraction is performed on the corrected stereo image pair by using 2D convolution to obtain high- and low-resolution feature maps; then, in the first stage, disparity is roughly estimated under the low-resolution feature map; then Use the coarse estimation disparity map and the left image for dynamic offset prediction, the offset is added to the coarse estimation result to generate an adaptive candidate disparity; the second-stage disparity estimation uses the adaptive candidate disparity and high-resolution feature map to construct a compact match The amount of cost, which is obtained by performing disparity regression after regularization to obtain a finely estimated disparity; finally, the disparity refinement module performs hierarchical upsampling on the fine disparity map to obtain a full-size disparity map.
  • a further improvement of the present invention is that, during feature extraction, firstly, a series of two-dimensional convolutions are used to downsample the input original image to 1/2, 1/4, 1/8 and 1/16, and then the 1/4 and 1/16 features for deeper feature extraction.
  • a further improvement of the present invention is that, in the first stage of disparity estimation, the 1/16 feature map extracted from the feature is used for dislocation splicing to obtain a complete matching cost; Match the cost amount and regress the cost amount to obtain a rough estimated disparity map.
  • a further improvement of the present invention is that the dynamic offset prediction DOP predicts the dynamic candidate disparity offset according to the rough estimated disparity map and the left image information, and adds it to the rough estimated disparity map to generate the adaptive candidate disparity.
  • a further improvement of the present invention is that the DOP uses the disparity rough estimation result and the left image information to predict the dynamic offset, and then obtains the adaptive candidate disparity, which is expressed as follows:
  • the specific process is: first, the rough estimated disparity map and the left image are bilinearly interpolated to 1/4 resolution, and then along the channel The direction is cascaded, and then the tensor is passed through a convolution to obtain the C DOP dimension representation, and then the tensor is passed through 4 residual blocks with a stride of 1 to obtain a size of (N-1) ⁇ H/4 ⁇ W/4
  • the offset of where N is the total number of offsets, and H and W are the height and width of the input image; adding the offset and zero tensor to the rough estimated disparity map can get the adaptive candidate disparity dc p :
  • a further improvement of the present invention is that, in the second stage of disparity estimation, the adaptive candidate disparity is used to perform a warping operation on the 1/4 right feature map, that is, each pixel of the right feature map is shifted to different degrees according to the adaptive candidate disparity, and then It is cascaded with the left feature map to obtain a compact matching cost. After the cost is normalized, parallax regression is performed to obtain a fine disparity estimate of 1/4 resolution.
  • a further improvement of the present invention is that, when the parallax is refined, the parallax residual is predicted hierarchically by using the fine parallax estimation result and the left image information by cascading residual blocks, and the residual and parallax are added to obtain a refined parallax map, which is added on the Sampling to get full-size parallax;
  • the Adam optimization method is used to optimize the SmoothL1Loss objective function.
  • the specific formula is as follows:
  • the present invention has the following beneficial effects:
  • the present invention proposes a real-time binocular stereo matching method based on an adaptive candidate disparity prediction network.
  • the DOP proposed by the method can predict the dynamic offset to replace the constant offset of the existing method.
  • the offset is different from the rough estimation.
  • the parallax results are added to generate an adaptive candidate parallax, which can adapt to different parallax correction range requirements of different image positions, and can restore the fine structural information lost in the rough estimation stage, significantly improving the quality of the parallax map.
  • the present invention does not require the use of multi-stage processing operations similar to existing methods. Therefore, the present invention designs a two-stage processing structure from coarse to fine, which can greatly improve the accuracy and at the same time increase the speed to twice the original method.
  • Fig. 1 is the overall framework of the real-time binocular stereo matching method based on adaptive candidate parallax prediction network of the present invention
  • FIG. 2 is a schematic diagram of a feature extraction network of the present invention
  • FIG. 3 is a schematic diagram of dynamic offset prediction and adaptive candidate disparity generation according to the present invention.
  • FIG. 4 is a schematic diagram of the visualization of the dynamic offset of the DOP, FIG. 4(a) is the dynamic candidate parallax offset, and FIG. 4(b) is the offset histogram;
  • FIG. 5 is a schematic diagram of a parallax refinement module of the present invention.
  • a layer/element when referred to as being "on" another layer/element, it can be directly on the other layer/element or intervening layers/elements may be present therebetween. element.
  • a layer/element if a layer/element is located "on” another layer/element in one orientation, then when the orientation is reversed, the layer/element can be located "under” the other layer/element.
  • the present invention provides a real-time binocular stereo matching method based on an adaptive candidate disparity network,
  • the method includes five steps: feature extraction, first-stage disparity estimation, dynamic offset prediction DOP (Dynamic Offset Prediction, DOP), second-stage disparity estimation and disparity refinement:
  • DOP Dynamic Offset Prediction
  • FIG. 1 is a schematic diagram of the overall framework of the present invention.
  • the input of the neural network model for the binocular stereo matching task is the matched image pair I 1 and I 2 , and the output is the dense disparity map D of the target image I 1 .
  • the network will learn a function (model) f satisfying the following relation:
  • the network first extracts high-dimensional feature information F 1 and F 2 for matching cost calculation from the corrected original input images I 1 and I 2 , and then uses F 1 and F 2 to construct a three-dimensional matching cost and perform cost Aggregate, and finally regress the dense disparity map D.
  • the overall model of the present invention mainly includes five modules including feature extraction f 1 , first-stage disparity estimation f 2 , DOPf 3 , second-stage disparity estimation f 4 and disparity refinement f 5 .
  • Feature extraction f 1 uses a series of 2D convolution operations to learn 1/4 and 1/16 resolution feature representations of I 1 and I 2 as well as This process can be expressed as:
  • the present invention uses three convolutions with downsampling rates of 2, 1, and 2, a residual block, and a convolution operation to transform the original input image I1 into a high-dimensional 2C ⁇ H/4 ⁇ W/4 Feature map
  • H and W represent the height and width of the input image, respectively
  • C is a constant that controls the number of feature extraction channels.
  • the first stage of disparity estimation f 2 This module mainly includes three parts: constructing the complete matching cost, cost aggregation and disparity calculation.
  • the construction process of the complete matching cost is as follows: under each parallax, The corresponding disparity value units are shifted to the left along the width direction, and then stitched with the target feature map (left) in the channel direction.
  • the initial matching cost of size 16C ⁇ D/16 ⁇ H/16 ⁇ W/16 can be constructed where D represents the maximum disparity value.
  • 6 cascaded standard 3D convolution pairs Perform regularization to obtain a matching cost of size 1 ⁇ D/16 ⁇ H/16 ⁇ W/16
  • use Soft Argmin to regress the cost to obtain a rough estimated disparity value:
  • cd represents the matching cost under the corresponding disparity d
  • Dmax represents the maximum disparity under the resolution
  • DOP f 3 DOP dynamically predicts the disparity offset of each pixel according to the f 2 coarse disparity result and the left image information. Specifically, it can be expressed as follows:
  • the roughly estimated disparity map and the left image are bilinearly interpolated to 1/4 resolution, and then cascaded along the channel direction, and then the tensor is obtained through a convolution to obtain the C DOP dimension representation , and then the tensor obtains an offset of size (N-1) ⁇ H/4 ⁇ W/4 through 4 residual blocks with a stride of 1, where N is the total number of offsets, and the dynamic offset Its statistical histogram is shown in Figure 4. Adding this offset and zero tensor to the coarsely estimated disparity map yields the adaptive candidate disparity dc p :
  • the second stage disparity estimation f 4 This module is similar to f 2 , and mainly includes three parts: constructing compact matching cost, cost aggregation and disparity calculation.
  • the present invention uses the dc p pair obtained by f 3 to the 1/4 resolution feature map of the right figure Perform the warping operation, that is, perform different degrees of displacement on each pixel of the right feature map according to the candidate disparity, and then compare the 1/4 resolution feature map of the left image with the Cascade along the channel direction to form an initial matching cost of size 4C ⁇ D/4 ⁇ H/4 ⁇ W/4 next to Regularize to get the cost Finally, use Soft Argmin to regress the cost:
  • the present invention is designed as a two-level coarse-to-fine structure for accurate and fast disparity estimation.
  • Parallax refinement f 5 As shown in Figure 5, after obtaining 1/4 resolution parallax Later, the present invention performs two-level refinement and upsampling. Specifically, firstly, the present invention will It is cascaded with 1/4 of the left image, and after convolution, a tensor of size 32 ⁇ H/4 ⁇ W/4 is formed, and then the tensor undergoes expansion rates of 1, 2, 4, 8, 1, and 1, respectively. Residual block and a 2D convolution to obtain a disparity residual r 1 of size 1 ⁇ H/4 ⁇ W/4, which is compared with After the addition, the parallax refinement result at 1/4 resolution can be obtained. After upsampling the result to 1/2 resolution, the above process is repeated to obtain the parallax refinement result r 2 at 1/2 resolution. Finally, The final disparity result is obtained by upsampling the 1/2 refined disparity map to full resolution.
  • the present invention uses the SmoothL1Loss function as the optimization target, and its specific formula is as follows:
  • the present invention increases the output disparity map after the first convolution in the first and second stages
  • the loss function is calculated as follows:
  • the present invention selects the Adam optimizer to update the model parameters.
  • the present invention performs pre-training on the FlyingThings3D, Driving and Monkaa data sets according to the above process, and then uses the pre-trained model to perform migration training in KITTI 2012 or KITTI 2015. At this point, the model optimization is completed, and online inference tasks can be performed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种基于自适应候选视差预测网络的实时双目立体匹配方法。该方法首先利用二维卷积神经网络对立体图像对进行多尺度特征提取,得到高、低分辨率的特征图。然后,第一阶段视差估计利用低分辨率特征图进行视差粗估计。在利用粗估计结果和左图信息预测得到自适应候选视差以后,第二阶段视差估计则利用预测结果和高分辨率特征图进行精细视差估计。最后,对视差图进行层次化精修得到全尺寸视差图。与现有的由粗到精立体匹配神经网络相比,本发明可以为精细视差估计阶段预测更准确的动态偏移量,以满足图像中各种目标不同的视差校正需求。由于动态预测的有效性,本发明设计了两级处理结构以大幅提高实时双目立体匹配网络的计算精度和速度。

Description

一种基于自适应候选视差预测网络的实时双目立体匹配方法 技术领域
本发明属于计算机视觉技术领域,具体涉及一种基于自适应候选视差预测网络的实时双目立体匹配方法。
背景技术
双目立体视觉系统在机器人导航、智能监控、自动驾驶等应用领域有着广泛的应用。因此,准确,快速的双目立体匹配对于立体视觉系统在移动设备的实时部署具有重要意义。近年来,基于深度学习技术的双目立体匹配得益于神经网络模型的不断创新,其算法的处理精度显著提升。但是,当前的高精度立体匹配网络通常需要占用大量的内存和计算资源,这使得已有方法难以在资源受限的移动平台上应用。
端到端双目立体匹配网络主要包括特征提取、聚合代价量构建、匹配代价聚合以及视差回归/优化等步骤。其中,匹配代价聚合步骤对模型的计算速度和资源耗费起着决定性作用,因此对该步骤的合理优化成为网络轻量化设计的关键。当前,已有方法主要采用由粗到精(Coarse-to-fine)的视差估计策略来大幅降低代价聚合步骤的计算复杂度。具体地,该方法首先在小分辨率下进行全视差范围搜索得到粗视差估计结果,然后逐级上采样,并在大分辨率下用极少数的视差偏移量对粗估计视差进行精细化修正,因此计算速度显著提高。然而,已有方法均采用固定偏移量的方式为精细估计阶段提供候选视差,该方式将候选值限制在粗视差估计结果的局部小范围内,从而导致视差修正难以满足不同场景中不同目标的实际需求,因此已有方法的视差图质量相对较差。此外,为了一定程度提高估计 结果,已有由粗到精方法通常采用多阶段(一般≥3级)处理来得到更准确的视差。但是,随着操作级数的增加,计算速度会显著降低。综上所述,已有采用由粗到精策略的轻量化双目立体匹配网络在计算精度和速度等方面仍难以满足移动设备对立体视觉的实时性要求。
发明内容
本发明的目的在于提出一种基于自适应候选视差预测网络的实时双目立体匹配方法,以克服现有技术的缺点。本发明利用粗视差估计结果和原始图像信息为每一像素动态预测精细估计阶段所需的视差偏移量,从而适应不同目标物体对视差校正范围的差异化需求。并且,由于该方法的有效性,本发明设计了一种两级处理结构以提升双目立体匹配网络的计算精度和速度。
为达到上述目的,本发明采用如下技术方案来实现:
一种基于自适应候选视差预测网络的实时双目立体匹配方法,该方法包括:
首先利用二维卷积对校正后的立体图像对进行多尺度特征提取,得到高、低分辨率的特征图;然后,在第一阶段中,在低分辨率特征图下进行视差粗估计;随后利用粗估计视差图和左图进行动态偏移量预测,该偏移量与粗估计结果相加生成自适应候选视差;第二阶段视差估计利用自适应候选视差和高分辨率特征图构建紧凑匹配代价量,该代价量通过正则化之后进行视差回归得到精细估计视差;最后,视差精修模块对精细视差图进行层次化上采样,得到全尺寸视差图。
本发明进一步的改进在于,特征提取时,首先用一系列二维卷积将输入原图逐级下采样到1/2、1/4、1/8和1/16,然后对1/4和1/16特征进行更深层次的特征提取。
本发明进一步的改进在于,第一阶段视差估计,利用特征提取的1/16特征图 进行错位拼接,得到完整匹配代价量;通过堆叠的三维卷积对代价量进行正则化处理,得到聚合后的匹配代价量,对该代价量进行回归得到粗估计视差图。
本发明进一步的改进在于,动态偏移量预测DOP根据粗估计视差图和左图信息预测动态候选视差偏移量,将其与粗估计视差图相加生成自适应候选视差。
本发明进一步的改进在于,DOP利用视差粗估计结果和左图信息预测动态偏移量,进而得到自适应候选视差,表示如下:
Figure PCTCN2021118609-appb-000001
其中,
Figure PCTCN2021118609-appb-000002
表示像素点p的第n个视差偏移量
Figure PCTCN2021118609-appb-000003
I 1p表示左图像素点p的值,
Figure PCTCN2021118609-appb-000004
表示像素点p的第一阶段视差粗估计结果;使用一系列二维卷积实现DOP,具体过程为:首先将粗估计视差图和左图双线性插值到1/4分辨率,再沿通道方向级联,接着将该张量通过一个卷积得到C DOP维表示,然后该张量通过4个步长为1的残差块得到尺寸为(N-1)×H/4×W/4的偏移量,其中,N为偏移量总数,H和W为输入图像的高和宽;将该偏移量和零张量加到粗估计视差图上,便可得到自适应的候选视差dc p
Figure PCTCN2021118609-appb-000005
本发明进一步的改进在于,第二阶段视差估计,利用自适应候选视差对1/4右特征图进行扭曲操作,即根据自适应候选视差对右特征图的每一像素进行不同程度的位移,然后与左特征图级联得到紧凑匹配代价量,对该代价量正则化处理后,进行视差回归得到1/4分辨率的精细视差估计。
本发明进一步的改进在于,视差精修时,通过级联残差块,利用精细视差估计结果和左图信息层次化预测视差残差,将残差与视差相加得到精修视差图,并上采样得到全尺寸视差;
得到视差图后,采用Adam优化方法优化SmoothL1Loss目标函数,具体公式如下:
Figure PCTCN2021118609-appb-000006
Figure PCTCN2021118609-appb-000007
其中,
Figure PCTCN2021118609-appb-000008
为像素点i的视差预测值,d i为像素点i的视差真值;得到优化模型后,便可进行线上推理。
与现有技术相比,本发明具有以下有益效果:
本发明提出的一种基于自适应候选视差预测网络的实时双目立体匹配方法,该方法提出的DOP可以预测动态偏移量来代替已有方法的恒定偏移量,该偏移量与粗估计视差结果相加生成自适应候选视差,可以适应不同图像位置的不同视差校正范围需求,并且能够恢复粗估计阶段丢失的细小结构信息,显著提升视差图质量。
进一步,由于DOP的有效性,本发明无需采用与已有方法类似的多级处理操作。因此,本发明设计了两级由粗到精的处理结构,能够大幅提升精度的同时,速度也提高至原有方法的两倍。
附图说明
为了更清楚的说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本发明基于自适应候选视差预测网络的实时双目立体匹配方法的整体 框架;
图2为本发明的特征提取网络示意图;
图3为本发明的动态偏移量预测以及自适应候选视差生成示意图;
图4为DOP的动态偏移量可视化示意图,图4(a)为动态候选视差偏移量,图4(b)为偏移量直方图;
图5为本发明的视差精修模块示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,不是全部的实施例,而并非要限制本发明公开的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要的混淆本发明公开的概念。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
在附图中示出了根据本发明公开实施例的各种结构示意图。这些图并非是按比例绘制的,其中为了清楚表达的目的,放大了某些细节,并且可能省略了某些细节。图中所示出的各种区域、层的形状及它们之间的相对大小、位置关系仅是示例性的,实际中可能由于制造公差或技术限制而有所偏差,并且本领域技术人员根据实际所需可以另外设计具有不同形状、大小、相对位置的区域/层。
本发明公开的上下文中,当将一层/元件称作位于另一层/元件“上”时,该层/元件可以直接位于该另一层/元件上,或者它们之间可以存在居中层/元件。另外,如果在一种朝向中一层/元件位于另一层/元件“上”,那么当调转朝向时,该 层/元件可以位于该另一层/元件“下”。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
以下结合附图和实施例对本发明作进一步的详细说明。
如图1-5所示,在对原始输入图像进行打乱、裁剪、归一化等常规数据预处理操作后,本发明提供了一种基于自适应候选视差网络的实时双目立体匹配方法,该方法包括特征提取、第一阶段视差估计、动态偏移量预测DOP(Dynamic Offset Prediction,DOP)、第二阶段视差估计和视差精修等5个步骤:
1)图1是本发明的整体框架示意图。完成双目立体匹配任务的神经网络模型输入是匹配图像对I 1和I 2,输出是目标图像I 1的稠密视差图D。该网络将学习一个函数(模型)f满足下列关系:
f(I 1,I 2)=D
具体地,网络首先从经过校正的原始输入图像I 1和I 2中提取用于匹配代价计算的高维特征信息F 1和F 2,然后利用F 1和F 2构建三维匹配代价量并进行代价聚合,最终回归出稠密视差图D。如图1所示,本发明的整体模型主要包括特征提取f 1、第一阶段视差估计f 2、DOPf 3、第二阶段视差估计f 4和视差精修f 5等5个模块。
2)特征提取f 1:f 1采用一系列二维卷积操作学习I 1和I 2的1/4和1/16分辨率特征表示
Figure PCTCN2021118609-appb-000009
以及
Figure PCTCN2021118609-appb-000010
该过程可表示为:
Figure PCTCN2021118609-appb-000011
Figure PCTCN2021118609-appb-000012
首先,本发明采用三个下采样率分别为2、1、2的卷积、一个残差块和一个卷积操作将原始输入图像I 1变换为2C×H/4×W/4的高维特征图
Figure PCTCN2021118609-appb-000013
其中,H、W分别表示输入图像的高和宽,C为控制特征提取通道数的常数。然后用两次2倍下采样卷积+残差块的操作组合、一个残差块和一个卷积操作提取到尺寸为8C×H/16×W/16的特征
Figure PCTCN2021118609-appb-000014
I 1和I 2的特征提取网络权值共享,且I 2的特征提取过程与上述一致。
3)第一阶段视差估计f 2:该模块主要包含构建完整匹配代价量、代价聚合和视差计算三部分。完整匹配代价量的构建过程具体为:在每一个视差下,
Figure PCTCN2021118609-appb-000015
沿宽度方向向左进行相应视差值个单位的平移,然后与目标特征图(左)在通道方向进行拼接。通过上述错位拼接,即可构建尺寸为16C×D/16×H/16×W/16的初始匹配代价量
Figure PCTCN2021118609-appb-000016
其中D表示最大视差值。通过6个级联的标准三维卷积对
Figure PCTCN2021118609-appb-000017
进行正则化得到尺寸为1×D/16×H/16×W/16的匹配代价量
Figure PCTCN2021118609-appb-000018
最后用Soft Argmin对该代价量进行回归,得到粗估计视差值:
Figure PCTCN2021118609-appb-000019
其中,c d表示相应视差d下的匹配代价,D max表示该分辨率下的最大视差。
4)DOP f 3:DOP根据f 2粗视差结果和左图信息动态预测每个像素的视差偏移量。具体可表示如下:
Figure PCTCN2021118609-appb-000020
其中,
Figure PCTCN2021118609-appb-000021
表示像素点p的第n个视差偏移量
Figure PCTCN2021118609-appb-000022
I 1p表示左图像素点p的值,
Figure PCTCN2021118609-appb-000023
表示像素点p的第一阶段视差粗估计结果。本发明使用一系列二维卷积来实现DOP函数。具体运算过程如图3所示,首先将粗估计视差图和左图双线性插值到1/4分辨率,再沿通道方向级联,接着将该张量通过一个卷积得到C DOP维表示,然后该张量通过4个步长为1的残差块得到尺寸为(N-1)×H/4×W/4的偏移量,其中,N为偏移量总数,动态偏移量及其统计直方图如图4所示。将该偏移量和零张量加到粗估计视差图,便可得到自适应的候选视差dc p
Figure PCTCN2021118609-appb-000024
5)第二阶段视差估计f 4:该模块与f 2类似,主要包含构建紧凑匹配代价量、代价聚合和视差计算三部分。本发明利用f 3得到的dc p对右图1/4分辨率特征图
Figure PCTCN2021118609-appb-000025
进行扭曲操作,即根据候选视差对右特征图的每一个像素进行不同程度的位移,然后与左图1/4分辨率特征图
Figure PCTCN2021118609-appb-000026
沿通道方向级联,形成尺寸为4C×D/4×H/4×W/4的初始匹配代价量
Figure PCTCN2021118609-appb-000027
接下来对
Figure PCTCN2021118609-appb-000028
进行正则化得到代价量
Figure PCTCN2021118609-appb-000029
最后用Soft Argmin对该代价量进行回归:
Figure PCTCN2021118609-appb-000030
其中,
Figure PCTCN2021118609-appb-000031
表示相应视差
Figure PCTCN2021118609-appb-000032
下的匹配代价。
由于DOP可以预测更准确的候选视差,本发明设计为两级由粗到精结构以进行准确且快速的视差估计。
6)视差精修f 5:如图5所示,在得到1/4分辨率视差
Figure PCTCN2021118609-appb-000033
以后,本发明对其进行两级精修和上采样。具体地,首先本发明将
Figure PCTCN2021118609-appb-000034
和1/4左图级联,经过卷积后形 成尺寸为32×H/4×W/4的张量,之后该张量经过膨胀率分别为1、2、4、8、1、1的残差块和一个二维卷积,得到尺寸为1×H/4×W/4的视差残差r 1,将其与
Figure PCTCN2021118609-appb-000035
相加后,便可得到1/4分辨率下的视差精修结果,将该结果上采样到1/2分辨率后重复上述过程得到1/2分辨率下的视差精修结果r 2,最后将1/2精修视差图上采样到全分辨率便得到最终视差结果。
为了使反向传播的梯度随误差的变化更加平滑,对离群点更加鲁棒,本发明使用SmoothL1Loss函数作为优化目标,其具体公式如下:
Figure PCTCN2021118609-appb-000036
Figure PCTCN2021118609-appb-000037
其中,
Figure PCTCN2021118609-appb-000038
为像素点i的视差预测值,d i为像素点i的视差真值。
在训练阶段,本发明在第一、二阶段的第一个卷积后增加输出视差图
Figure PCTCN2021118609-appb-000039
进行更有效的监督,损失函数计算如下:
Figure PCTCN2021118609-appb-000040
为了提升学习收敛速度,防止陷入局部最优点,本发明选择Adam优化器对模型参数进行更新。本发明在FlyingThings3D、Driving和Monkaa数据集按上述过程做预训练,之后利用预训练得到的模型在KITTI 2012或KITTI 2015做迁移训练。至此,模型优化完成,可进行线上推理任务。
以上内容仅为说明本发明的技术思想,不能以此限定本发明的保护范围,凡是按照本发明提出的技术思想,在技术方案基础上所做的任何改动,均落入本发明权利要求书的保护范围之内。

Claims (7)

  1. 一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,该方法包括:
    首先利用二维卷积对校正后的立体图像对进行多尺度特征提取,得到高、低分辨率的特征图;然后,在第一阶段中,在低分辨率特征图下进行视差粗估计;随后利用粗估计视差图和左图进行动态偏移量预测,该偏移量与粗估计结果相加生成自适应候选视差;第二阶段视差估计利用自适应候选视差和高分辨率特征图构建紧凑匹配代价量,该代价量通过正则化之后进行视差回归得到精细估计视差;最后,视差精修模块对精细视差图进行层次化上采样,得到全尺寸视差图。
  2. 根据权利要求1所述的一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,特征提取时,首先用一系列二维卷积将输入原图逐级下采样到1/2、1/4、1/8和1/16,然后对1/4和1/16特征进行更深层次的特征提取。
  3. 根据权利要求2所述的一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,第一阶段视差估计,利用特征提取的1/16特征图进行错位拼接,得到完整匹配代价量;通过堆叠的三维卷积对代价量进行正则化处理,得到聚合后的匹配代价量,对该代价量进行回归得到粗估计视差图。
  4. 根据权利要求3所述的一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,动态偏移量预测DOP根据粗估计视差图和左图信息预测动态候选视差偏移量,将其与粗估计视差图相加生成自适应候选视差。
  5. 根据权利要求4所述的一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,DOP利用视差粗估计结果和左图信息预测动态偏移量,进而得到自适应候选视差,表示如下:
    Figure PCTCN2021118609-appb-100001
    其中,
    Figure PCTCN2021118609-appb-100002
    表示像素点p的第n个视差偏移量
    Figure PCTCN2021118609-appb-100003
    I 1p表示左图像素点p的值,
    Figure PCTCN2021118609-appb-100004
    表示像素点p的第一阶段视差粗估计结果;使用一系列二维卷积实现DOP,具体过程为:首先将粗估计视差图和左图双线性插值到1/4分辨率,再沿通道方向级联,接着将该张量通过一个卷积得到C DOP维表示,然后该张量通过4个步长为1的残差块得到尺寸为(N-1)×H/4×W/4的偏移量,其中,N为偏移量总数,H和W为输入图像的高和宽;将该偏移量和零张量加到粗估计视差图上,便可得到自适应的候选视差dc p
    Figure PCTCN2021118609-appb-100005
  6. 根据权利要求5所述的一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,第二阶段视差估计,利用自适应候选视差对1/4右特征图进行扭曲操作,即根据自适应候选视差对右特征图的每一像素进行不同程度的位移,然后与左特征图级联得到紧凑匹配代价量,对该代价量正则化处理后,进行视差回归得到1/4分辨率的精细视差估计。
  7. 根据权利要求6所述的一种基于自适应候选视差预测网络的实时双目立体匹配方法,其特征在于,视差精修时,通过级联残差块,利用精细视差估计结果和左图信息层次化预测视差残差,将残差与视差相加得到精修视差图,并上采样得到全尺寸视差;
    得到视差图后,采用Adam优化方法优化SmoothL1Loss目标函数,具体公式如下:
    Figure PCTCN2021118609-appb-100006
    Figure PCTCN2021118609-appb-100007
    其中,
    Figure PCTCN2021118609-appb-100008
    为像素点i的视差预测值,d i为像素点i的视差真值;得到优化模型后,便可进行线上推理。
PCT/CN2021/118609 2020-10-28 2021-09-15 一种基于自适应候选视差预测网络的实时双目立体匹配方法 WO2022089077A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011176728.1A CN112435282B (zh) 2020-10-28 2020-10-28 一种基于自适应候选视差预测网络的实时双目立体匹配方法
CN202011176728.1 2020-10-28

Publications (1)

Publication Number Publication Date
WO2022089077A1 true WO2022089077A1 (zh) 2022-05-05

Family

ID=74696379

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118609 WO2022089077A1 (zh) 2020-10-28 2021-09-15 一种基于自适应候选视差预测网络的实时双目立体匹配方法

Country Status (2)

Country Link
CN (1) CN112435282B (zh)
WO (1) WO2022089077A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115209122A (zh) * 2022-07-26 2022-10-18 福州大学 一种基于多智能体的立体图像视觉舒适度增强方法及系统
CN116824307A (zh) * 2023-08-29 2023-09-29 深圳市万物云科技有限公司 基于sam模型的图像标注方法、装置及相关介质
CN117409058A (zh) * 2023-12-14 2024-01-16 浙江优众新材料科技有限公司 一种基于自监督的深度估计匹配代价预估方法
CN117422750A (zh) * 2023-10-30 2024-01-19 河南送变电建设有限公司 一种场景距离实时感知方法、装置、电子设备及存储介质
CN117593350A (zh) * 2024-01-18 2024-02-23 泉州装备制造研究所 一种用于无人机输电线检测的双目立体匹配方法及系统
CN117747056A (zh) * 2024-02-19 2024-03-22 遂宁市中心医院 一种微创手术术前图像估计方法、装置、设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435282B (zh) * 2020-10-28 2023-09-12 西安交通大学 一种基于自适应候选视差预测网络的实时双目立体匹配方法
CN112991422A (zh) * 2021-04-27 2021-06-18 杭州云智声智能科技有限公司 一种基于空洞空间金字塔池化的立体匹配方法及系统
CN113658277B (zh) * 2021-08-25 2022-11-11 北京百度网讯科技有限公司 立体匹配方法、模型训练方法、相关装置及电子设备
CN114155303B (zh) * 2022-02-09 2022-06-17 北京中科慧眼科技有限公司 基于双目相机的参数立体匹配方法和系统
CN116740162B (zh) * 2023-08-14 2023-11-14 东莞市爱培科技术有限公司 一种基于多尺度代价卷的立体匹配方法及计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410266A (zh) * 2018-09-18 2019-03-01 合肥工业大学 基于四模Census变换和离散视差搜索的立体匹配算法
US20190304117A1 (en) * 2018-03-30 2019-10-03 Samsung Electronics Co., Ltd. Hardware disparity evaluation for stereo matching
CN110427968A (zh) * 2019-06-28 2019-11-08 武汉大学 一种基于细节增强的双目立体匹配方法
CN110533712A (zh) * 2019-08-26 2019-12-03 北京工业大学 一种基于卷积神经网络的双目立体匹配方法
CN112435282A (zh) * 2020-10-28 2021-03-02 西安交通大学 一种基于自适应候选视差预测网络的实时双目立体匹配方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8385630B2 (en) * 2010-01-05 2013-02-26 Sri International System and method of processing stereo images
CN106525004A (zh) * 2016-11-09 2017-03-22 人加智能机器人技术(北京)有限公司 双目立体视觉系统及深度测量方法
CN106780442B (zh) * 2016-11-30 2019-12-24 成都通甲优博科技有限责任公司 一种立体匹配方法及系统
CN109472819B (zh) * 2018-09-06 2021-12-28 杭州电子科技大学 一种基于级联几何上下文神经网络的双目视差估计方法
CN111402129B (zh) * 2020-02-21 2022-03-01 西安交通大学 一种基于联合上采样卷积神经网络的双目立体匹配方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190304117A1 (en) * 2018-03-30 2019-10-03 Samsung Electronics Co., Ltd. Hardware disparity evaluation for stereo matching
CN109410266A (zh) * 2018-09-18 2019-03-01 合肥工业大学 基于四模Census变换和离散视差搜索的立体匹配算法
CN110427968A (zh) * 2019-06-28 2019-11-08 武汉大学 一种基于细节增强的双目立体匹配方法
CN110533712A (zh) * 2019-08-26 2019-12-03 北京工业大学 一种基于卷积神经网络的双目立体匹配方法
CN112435282A (zh) * 2020-10-28 2021-03-02 西安交通大学 一种基于自适应候选视差预测网络的实时双目立体匹配方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG YAN; LAI ZIHANG; HUANG GAO; WANG BRIAN H.; VAN DER MAATEN LAURENS; CAMPBELL MARK; WEINBERGER KILIAN Q.: "Anytime Stereo Image Depth Estimation on Mobile Devices", 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 20 May 2019 (2019-05-20), pages 5893 - 5900, XP033593992, DOI: 10.1109/ICRA.2019.8794003 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115209122A (zh) * 2022-07-26 2022-10-18 福州大学 一种基于多智能体的立体图像视觉舒适度增强方法及系统
CN115209122B (zh) * 2022-07-26 2023-07-07 福州大学 一种基于多智能体的立体图像视觉舒适度增强方法及系统
CN116824307A (zh) * 2023-08-29 2023-09-29 深圳市万物云科技有限公司 基于sam模型的图像标注方法、装置及相关介质
CN116824307B (zh) * 2023-08-29 2024-01-02 深圳市万物云科技有限公司 基于sam模型的图像标注方法、装置及相关介质
CN117422750A (zh) * 2023-10-30 2024-01-19 河南送变电建设有限公司 一种场景距离实时感知方法、装置、电子设备及存储介质
CN117409058A (zh) * 2023-12-14 2024-01-16 浙江优众新材料科技有限公司 一种基于自监督的深度估计匹配代价预估方法
CN117409058B (zh) * 2023-12-14 2024-03-26 浙江优众新材料科技有限公司 一种基于自监督的深度估计匹配代价预估方法
CN117593350A (zh) * 2024-01-18 2024-02-23 泉州装备制造研究所 一种用于无人机输电线检测的双目立体匹配方法及系统
CN117747056A (zh) * 2024-02-19 2024-03-22 遂宁市中心医院 一种微创手术术前图像估计方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112435282B (zh) 2023-09-12
CN112435282A (zh) 2021-03-02

Similar Documents

Publication Publication Date Title
WO2022089077A1 (zh) 一种基于自适应候选视差预测网络的实时双目立体匹配方法
CN111402129B (zh) 一种基于联合上采样卷积神经网络的双目立体匹配方法
CN109377530B (zh) 一种基于深度神经网络的双目深度估计方法
US20210042954A1 (en) Binocular matching method and apparatus, device and storage medium
CN111652899A (zh) 一种时空部件图的视频目标分割方法
CN110569875B (zh) 一种基于特征复用的深度神经网络目标检测方法
CN111259945B (zh) 引入注意力图谱的双目视差估计方法
CN113592026B (zh) 一种基于空洞卷积和级联代价卷的双目视觉立体匹配方法
CN109005398B (zh) 一种基于卷积神经网络的立体图像视差匹配方法
CN113744311A (zh) 基于全连接注意力模块的孪生神经网络运动目标跟踪方法
CN110569851A (zh) 门控多层融合的实时语义分割方法
CN113344869A (zh) 一种基于候选视差的行车环境实时立体匹配方法及装置
CN115641285A (zh) 一种基于密集多尺度信息融合的双目视觉立体匹配方法
CN117058456A (zh) 一种基于多相注意力机制的视觉目标跟踪方法
CN113763446A (zh) 一种基于引导信息的立体匹配方法
CN113313176A (zh) 一种基于动态图卷积神经网络的点云分析方法
CN114677417A (zh) 用于立体视觉在线自校正与自监督视差估计的优化方法
Yi et al. An Effective Lightweight Crowd Counting Method Based on an Encoder-Decoder Network for the Internet of Video Things
CN117036699A (zh) 一种基于Transformer神经网络的点云分割方法
CN116934796A (zh) 基于孪生残差注意力聚合网络的视觉目标跟踪方法
CN110766732A (zh) 一种鲁棒的单相机深度图估计方法
Zhang et al. Dyna-depthformer: Multi-frame transformer for self-supervised depth estimation in dynamic scenes
CN116051752A (zh) 基于多尺度特征融合空洞卷积ResNet的双目立体匹配算法
CN115797557A (zh) 基于图注意力网络的自监督3d场景流估计方法
CN111553921B (zh) 一种基于通道信息共享残差模块的实时语义分割方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884807

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.10.2023)