WO2021138992A1 - 基于上采样及精确重匹配的视差估计优化方法 - Google Patents

基于上采样及精确重匹配的视差估计优化方法 Download PDF

Info

Publication number
WO2021138992A1
WO2021138992A1 PCT/CN2020/077961 CN2020077961W WO2021138992A1 WO 2021138992 A1 WO2021138992 A1 WO 2021138992A1 CN 2020077961 W CN2020077961 W CN 2020077961W WO 2021138992 A1 WO2021138992 A1 WO 2021138992A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
disparity
resolution
feature
propagation
Prior art date
Application number
PCT/CN2020/077961
Other languages
English (en)
French (fr)
Inventor
仲维
张宏
李豪杰
王智慧
刘日升
樊鑫
罗钟铉
李胜全
Original Assignee
大连理工大学
鹏城实验室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大连理工大学, 鹏城实验室 filed Critical 大连理工大学
Priority to US17/604,588 priority Critical patent/US12008779B2/en
Publication of WO2021138992A1 publication Critical patent/WO2021138992A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention belongs to the field of image processing and computer vision, and relates to a coarse-to-fine binocular disparity estimation method based on supervised learning, and in particular to a disparity estimation optimization method based on upsampling and precise rematching.
  • Binocular depth estimation uses two calibrated left and right views, and obtains the corresponding disparity value according to the relative position of each pixel between different views. According to the camera imaging model, the disparity is restored to the depth information of the image.
  • the existing binocular depth estimation methods are mainly divided into traditional methods and deep learning methods.
  • the traditional method is divided into local algorithm and global algorithm.
  • the local algorithm uses the similarity of neighboring pixels in the window to match.
  • the global method constructs an energy function, including the matching cost of the pixel itself and the constraint relationship between different pixels, and obtains the final disparity map by minimizing the energy function.
  • the traditional method runs too long and the accuracy is not high, especially the mismatch error is high in the untextured and occluded areas.
  • the deep learning method is to learn the disparity map of the left and right views through the neural network end-to-end.
  • the basic framework mainly includes feature extraction, construction of cost maps, disparity aggregation, and optimization of disparity.
  • Input the left and right views into the network obtain the left and right feature maps through the feature extraction network, and then perform matching under different parallaxes to obtain a low-resolution cost map.
  • the aggregation optimization part is divided into two methods: one is to optimize the low-resolution cost map , Gradually restore to the original resolution, and finally calculate the disparity map with soft argmin. Second, the low-resolution cost map first obtains the low-resolution disparity map, and the disparity map is gradually up-sampled and optimized to obtain the final original resolution disparity map.
  • the present invention proposes a coarse-to-fine accurate re-matching method.
  • geometric constraints are reintroduced, the parallax map and the left and right images are obtained at low resolution, and then a matching is performed in a small parallax range, and the parallax is accurate
  • the scope of the graph improves the generalization ability of the network.
  • the present invention proposes an up-sampling method based on propagation. Use the left image feature of the corresponding resolution to learn the relative weight of each pixel and the pixels in its neighborhood. According to the left and right reconstruction consistency, the confidence and weight are propagated on the up-sampled disparity map, so that the disparity map is sampled In the process, the context information is better combined to reduce the error filling caused by interpolation and upsampling.
  • the present invention aims to overcome the shortcomings of the existing deep learning methods, and provides a disparity estimation optimization method based on upsampling and accurate rematching, that is, rematching accurately in a small range in the optimized network part, while improving the previous disparity map or cost
  • Up-sampling methods such as graph neighbor interpolation and bilinear interpolation use the network method to learn a propagation-based up-sampling, so that the disparity map can recover the accurate disparity value during the up-sampling process.
  • the specific plan includes the following steps:
  • the first step is to extract distinguishable features
  • the second step is to match the initial cost and optimize the cost map to obtain a low-resolution initial disparity map
  • the low-resolution initial disparity map is propagated up-sampling and accurate re-matching methods to obtain a higher-resolution disparity map, and this process is repeated until the original resolution is restored;
  • the lowest-resolution initial disparity map D n+1 is first interpolated and up-sampled to obtain a coarsely matched disparity map D′ n .
  • the disparity map obtained at this time is only obtained by numerical interpolation, and does not refer to any structural information of the original image.
  • the right view I r the left view is reconstructed according to the coarsely matched disparity map D′ n, denoted as Then calculate the reconstructed left view And the error between the real left view I l , and the confidence map M c is obtained :
  • f c (.) represents the operation of copying and shifting to change the size
  • k represents the size of the neighborhood window
  • s represents the void rate of the sampling window
  • the receptive field is (2s+1) 2 , and each position gets a k*k confidence
  • the degree vector represents the confidence of the pixel in the k*k neighborhood window around the pixel
  • the module inputs the left feature map of the corresponding resolution, and learns a weight vector at each position, which represents the relative relationship between its neighboring pixels and the center pixel. The greater the weight, the neighborhood The greater the influence of a pixel on the pixel; the weight is recorded as W relative
  • k represents the size of the neighborhood window
  • ⁇ relative represents the relative relationship network model
  • the present invention proposes a coarse-to-fine precise re-matching method, re-introducing geometric constraints in the process of optimizing the parallax, using the low-resolution parallax map and left and right images to be obtained, and then performing a matching within a small parallax range. Accurate the range of the disparity map to improve the generalization ability of the network.
  • the present invention proposes a method for propagating up-sampling using context relationships.
  • up-sampling is performed by combining context relationships and the current rough disparity confidence, which solves the problem of edge destruction in existing up-sampling methods. , You can get a higher resolution disparity map with finer edges.
  • Figure 1 is the overall flow chart of the program
  • Figure 2 is a flow chart of the propagation up-sampling module
  • Figure 3 is a flow chart of exact rematching.
  • the present invention is based on the disparity optimization strategy of the coarse-to-fine disparity estimation framework, and performs end-to-end disparity map prediction on the input left and right views. Without introducing additional tasks, the propagation up-sampling method and accurate reconstruction proposed in this application are used. The matching method predicts an accurate disparity map.
  • the specific implementation plan is as follows:
  • the first step is to extract distinguishable features
  • the second step is to match the initial cost and optimize the cost map to obtain a low-resolution initial disparity
  • ⁇ > represents the subtraction of the corresponding position elements of the feature vector
  • d is equal to ⁇ 0,1,2...D max ⁇
  • D max is the maximum disparity during matching, so the size of the final cost map is [H/8,W/8 ,D max /8,f].
  • the hourglass network is composed of convolutional layers with asynchronous length, and the cost map output by the hourglass network is returned to a rough 1/8 resolution through the soft argmin layer
  • the disparity map of is denoted as D 3 .
  • the low-resolution initial disparity enters the optimization network to obtain high-resolution fine disparity
  • the obtained disparity map at the lowest resolution passes through the propagation up-sampling module and the precise re-matching module to obtain a higher resolution disparity map, and this process is repeated until the original resolution is restored.
  • D 3 first interpolates up-sampling to obtain a coarse-matched disparity map D′ 2 , the disparity map obtained at this time is only obtained by numerical interpolation, and does not refer to any structural information of the original image, and the information loss caused by down-sampling cannot be recovered. Therefore, the error rate of D′ 2 obtained is higher. Therefore, the disparity map D′ 2 needs to be optimized based on the propagation strategy.
  • the original right view I r according to the up-sampled disparity map D′ 2 , reconstruct the left view, denoted as f w (.) is the warping function. Then calculate the reconstructed left view The error between the real left view I l , and the confidence map M c is obtained :
  • Normalization is a normalization operation, the normalization of the difference between a (0,1), the probability of each point on the confidence map M c represents the credibility of the disparity values of pixels. Copy the translational confidence map into a confidence map group of [H/8,W/8,k*k] size, denoted as Mcg ,
  • f c (.) represents the operation of copying and shifting to change the size
  • k represents the size of the neighborhood window
  • s represents the hole rate of the sampling window.
  • the receptive field is (2s+1) 2
  • a k*k confidence vector can be obtained, which represents the confidence of the pixel in the k*k neighborhood window around the pixel.
  • a relative relationship network module is proposed.
  • the module inputs the left feature map of the corresponding resolution.
  • a weight vector can be learned at each position, which represents the relative relationship between its neighboring pixels and the center pixel. The greater the weight, the A pixel in the neighborhood has a greater impact on the pixel. For example, if the relative relationship between the pixels within the same object and its neighboring pixels is relatively strong, the weight is also greater. On the contrary, if the neighboring pixels are on the edge, the weight of the pixel is smaller.
  • each different picture can learn different weights, so that the disparity value of the pixel can be updated according to the different weights of surrounding pixels during propagation.
  • the same weighted convolution kernel is used to optimize the disparity map.
  • This module is composed of three convolutional layers with void ratios of ⁇ 1,2,3 ⁇ respectively.
  • the left feature map is input and the weight of [H/8,W/8,k*k] is output, denoted as W relative
  • k represents the size of the neighborhood window
  • ⁇ relative represents the relative relationship network model
  • ⁇ ,> represents the dot multiplication operation
  • f c (.) represents the copy translation resize operation
  • the propagation up-sampling module outputs a high-resolution disparity map based on propagation from the low-resolution D n+1
  • the exact rematching module will be in Perform a small range of re-matching on the top.
  • N represents the number of image pixels
  • 2 represents the L2 distance; the final loss function is composed of the addition of two loss functions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种基于上采样及精确重匹配的视差估计优化方法。即在优化网络部分小范围精确重匹配,同时改进以往对视差图或代价图邻插、双线性插值等上采样方法,用网络的方式学习出一种基于传播的上采样,使视差图在上采样的过程中能够好的恢复出准确的视差值。

Description

基于上采样及精确重匹配的视差估计优化方法 技术领域
本发明属于图像处理和计算机视觉领域,涉及一种基于监督学习的由粗到细的双目视差估计方法,具体涉及一种基于上采样及精确重匹配的视差估计优化方法。
背景技术
双目深度估计是通过两张标定好的左右视图,根据每个像素在不同视图之间的相对位置得到对应的视差值,根据相机成像模型,将视差恢复成图像的深度信息。现有的双目深度估计的方法主要分为传统方法和深度学习的方法。
传统方法分为局部算法和全局算法,局部算法利用窗口内邻域像素的相似度进行匹配。全局方法构造一个能量函数,包括像素点本身的匹配代价和不同像素点间的约束关系,通过最小化能量函数来得到最终的视差图。传统方法运行时间过长,精度不高,特别是在无纹理和遮挡区域误匹配误差较高。
深度学习方法是通过神经网络端到端的学习出左右视图的视差图,基本框架主要包括特征提取、构建代价图、视差聚合、优化视差。将左右视图输入网络中,通过特征提取网络得到左右特征图,然后在不同视差下进行匹配,得到低分辨率的代价图,聚合优化部分分为两种方法:一是优化低分辨率的代价图,逐步恢复成原分辨率,最后用soft argmin计算出视差图。二是低分辨率的代价图先得到低分辨率的视差图,将视差图逐步上采样优化得到最终原分辨率的视差图。为了满足网络计算和速度的需求,往往需要在低分辨率的特征图上进行匹配,这就导致了下采样的过程中小物体的丢失。而后续优化模块,未考虑小物体的缺失问题,通过监督从无到有重新生成小物体,未引入几何先验,导致细节缺失,网络泛化 能力差。当前采用的上采样方法多基于邻插、双线性及三线性的采样方法,这样的插值方法并不符合视差图的分布,会导致对于正对成像平面的物体视差不一致,同时也破坏了在物体边缘视差的不连续性。
本发明提出一种由粗到细精确重匹配方法,在优化视差过程中重新引入几何约束,利用在低分辨率上得到视差图和左右图,再在小的视差范围内做一次匹配,精确视差图范围,提高网络的泛化能力。同时本发明提出一种基于传播的上采样方法。利用对应分辨率的左图特征学习出每个像素及其邻域内像素的相对关系权重,根据左右重建一致性得到的置信度和权重在上采样的视差图上进行传播,使在视差图上采样的过程中更好的结合上下文信息,减少插值上采样带来的错误填充。
发明内容
本发明旨在克服现有的深度学习方法的不足,提供了一种基于上采样及精确重匹配的视差估计优化方法,即在优化网络部分小范围精确重匹配,同时改进以往对视差图或代价图邻插、双线性插值等上采样方法,用网络的方式学习出一种基于传播的上采样,使视差图在上采样的过程中能够好的恢复出准确的视差值。
具体方案包括下列步骤:
一种基于上采样及精确重匹配的视差估计优化方法,其特征在于,包括步骤如下:
第一步,提取可判别性特征;
第二步,初始代价匹配与代价图优化,获得低分辨率初始视差图;
第三步,低分辨率初始视差图经过传播上采样方法和精确重匹配方法得到高一个分辨率的视差图,重复此过程直到恢复为原分辨率;
3.1所述的传播上采样方法
最低分辩率的初始视差图D n+1首先插值上采样,得到粗匹配的视差图D′ n,此时得到的视差图仅由数值插值得到,并没有参考任何原始图像的结构信息,用原始的右视图I r,根据粗匹配的视差图D′ n,重建出左视图,记为
Figure PCTCN2020077961-appb-000001
然后计算重建的左视图
Figure PCTCN2020077961-appb-000002
和真实的左视图I l之间的误差,得到置信度图M c
Figure PCTCN2020077961-appb-000003
normalization(.)为归一化操作,将差值归一化到(0,1)之间,置信度图M c上每一点的概率值代表该像素视差值的可信程度;复制平移置信度图变为置信度图组,记为M cg
M cg=f c(M c,k,s)         (3)
其中f c(.)代表复制平移以改变尺寸的操作,k代表邻域窗口大小,s代表采样窗口的空洞率;感受野为(2s+1) 2,每个位置得到一个k*k的置信度向量,代表该像素周围k*k邻域窗口内像素的置信度;
通过一个相对关系网络模块,该模块输入对应分辨率的左特征图,在每个位置都学习出一个权重向量,代表着其邻域像素与该中心像素的相对关系,权重越大,表示邻域某像素对该像素的影响越大;该权重,记为W relative
Figure PCTCN2020077961-appb-000004
其中k为代表邻域窗口大小,θ relative表示相对关系网络模型;
用粗匹配的视差图D′ n、置信度图M cg和相对关系权重W relative进行传播,得到传播后的视差图,传播计算过程如下:
Figure PCTCN2020077961-appb-000005
其中
Figure PCTCN2020077961-appb-000006
代表传播后的视差图,<,>代表点乘操作,f c(.)代表复制平移resize操作,softmax(W relative*M cg)代表传播时周围像素对中心像素的支持力度,由周围像素的置信度和相对关系权重相乘得到;
然后使用窗口空洞率为重复此传播过程,使其能在不同感受野上传播优化视差图;至此,完成从D n+1
Figure PCTCN2020077961-appb-000007
的传播上采样过程;
3.2所述的精确重匹配方法
首先根据
Figure PCTCN2020077961-appb-000008
将特征列表
Figure PCTCN2020077961-appb-000009
中对应分辨率的右特征图
Figure PCTCN2020077961-appb-000010
重建出左特征图,记为
Figure PCTCN2020077961-appb-000011
用重建的左特征图
Figure PCTCN2020077961-appb-000012
和原始的左特征图
Figure PCTCN2020077961-appb-000013
在视差d=[-d 0,d 0]小范围内做一次重匹配,得到代价图,再通过一个沙漏网络优化代价图,回归视差,得到一个偏置图Δ,代表和
Figure PCTCN2020077961-appb-000014
的偏移量的大小,两者相加得到最终的优化网络的视差图D n
Figure PCTCN2020077961-appb-000015
重复迭代3.1、3.2过程,直至恢复到原分辨率,得到最终的高精度的视差图。
本发明的有益效果是:
1)本发明提出一种由粗到细精确重匹配方法,在优化视差过程中重新引入几何约束,利用在低分辨率上得到视差图和左右图,再在小的视差范围内做一次匹配,精确视差图范围,提高网络的泛化能力。
2)本发明提出一种利用上下文关系进行传播上采样的方法,在优化视差过程中,通过结合上下文关系以及当前粗糙视差置信度进行上采样,解决了现有上采样方法存在的破坏边缘的问题,可以获得边缘更精细的较高分辨率视差图。
附图说明
图1为方案整体流程图;
图2为传播上采样模块流程图;
图3为精确重匹配流程图。
具体实施方式
本发明基于由粗到细视差估计框架的视差优化策略,对输入的左右视图进行端到端的视差图预测,在不引入额外的任务的前提下,用本申请提出的传播上采样方法和精确重匹配方法,预测出准确的视差图,具体实施方案如下:
方案网络具体流程如图一,具体操作如下:
第一步,提取可判别性特征;
对输入网络中的左右两张视图进行特征提取。比起在原始图像的灰度值上进行匹配,使用特征向量进行匹配能够更好的应对光照、外观的变化,提取的特征向量能更加详细、全面的描述图片的信息,有助于更好的匹配。使用一个简单的CNN网络进行特征提取,包括四个级联的部分,(每个部分都包括三种不同的卷积层来提取特征),四个子部分分别产生不同分辨率的左右特征图F 0~F 3(下标表示下采样因子,例,F 3表示1/8分辨率的特征图),每个特征向量f的维度是32,将四个不同分辨率的特征图储存在特征列表中
Figure PCTCN2020077961-appb-000016
作为后面优化网络的输入,然后在最小分辨率,即F 3,1/8分辨率的特征图上做匹配。
第二步,初始代价匹配与代价图优化,获得低分辨率初始视差;
Figure PCTCN2020077961-appb-000017
代表1/8分辨率的左右特征图,f l(x,y)f r(x,y)代表图像上某一点的特征向量,C表示代价图,形成代价图的公式如下:(式1)
C(x,y,d)=<f l(x,y)-f r(x-d,y)>   (1)
<>表示特征向量对应位置元素相减,d等于{0,1,2…D max},D max为匹配时的最大视差,所以最终形成的代价图的尺寸为[H/8,W/8,D max/8,f]。
得到1/8分辨率的代价图后,用一个沙漏网络优化;沙漏网络由不同步长的卷积层组成,沙漏网络输出的代价图经过soft argmin层回归出一张1/8分辨率的粗略的视差图,记为D 3
第三步,低分辨率初始视差进入优化网络,获得高分辨率精细视差;
得到的在最低分辨率上的视差图再经过传播上采样模块和精确重匹配模块得到高一个分辨率的视差图,重复此过程直到恢复为原分辨率。
具体流程如图2、图3所示。
具体步骤如下所示:(这里以D 3到D 2一步迭代为例)
3.1传播上采样方法
D 3首先插值上采样,得到粗匹配的视差图D′ 2,此时得到的视差图仅由数值插值得到,并没有参考任何原始图像的结构信息,无法恢复因下采样而导致的信息损失,因此得到的D′ 2错误率较高。所以需要基于的传播策略优化视差图D′ 2。用原始的右视图I r,根据上采样的视差图D′ 2,重建出左视图,记为
Figure PCTCN2020077961-appb-000018
f w(.)为warping函数。然后计算重建的左视图
Figure PCTCN2020077961-appb-000019
和真实的左视图I l之间的误差,得到置信度图M c:
Figure PCTCN2020077961-appb-000020
normalization(.)为归一化操作,将差值归一化到(0,1)之间,置信度图M c上每一点的概率值代表该像素视差值的可信程度。复制平移置信度图变为[H/8,W/8,k*k]大小的置信度图组,记为M cg
M cg=f c(M c,k,s)      (3)
其中f c(.)代表复制平移更改尺寸操作,k代表邻域窗口大小,s代表采样窗口的空洞率。(感受野为(2s+1) 2)每个位置可以得到一个k*k的置信度向量,代表该像素周围k*k邻域窗口内像素的置信度。
提出了一个相对关系网络模块,该模块输入对应分辨率的左特征图,在每个位置都可以学习出一个权重向量,代表着其邻域像素与该中心像素的相对关系,权重越大,表示邻域某像素对该像素的影响越大。例如,在同一物体内部的像素及其邻域像素相对关系比较强,则权重也较大,相反,若邻域像素处在为边缘,则对该像素的权重较小。通过这个模块,每一张不同的图片,都可以学习出不同的权重,使传播的时候能根据周围像素的不同的权重,来更新该像素的视差值。而不是在常规的神经网络,对于不同的输入,都使用相同权重的卷积核来优化视差图。
该模块由三层空洞率分别为{1,2,3}的卷积层组成,输入左特征图,输出[H/8,W/8,k*k]大小的权重,记为W relative
Figure PCTCN2020077961-appb-000021
其中k为代表邻域窗口大小,θ relative表示相对关系网络模型。
用上一步上采样得到的粗略的视差图D′ 2、置信度图M cg和相对关系权重W relative进行传播,得到优化后的
Figure PCTCN2020077961-appb-000022
(p:传播propagate),传播计算过程如下:
Figure PCTCN2020077961-appb-000023
其中
Figure PCTCN2020077961-appb-000024
代表传播后的视差图,<,>代表点乘操作,f c(.)代表复制平移resize操作,softmax(W relative*M cg)代表传播时周围像素对中心像素的支持力度,由周围像素的置信度和相对关系权重相乘得到。然后使用窗口空洞率为s=1,2,3重复此传播三次过程,使其能在不同感受野上传播优化视差图。至此,完成从D n+1
Figure PCTCN2020077961-appb-000025
的传播上采样过程。
3.2精确重匹配方法
传播上采样模块从低分辨率的D n+1输出基于传播的高分辨率的视差图
Figure PCTCN2020077961-appb-000026
精确重匹配模块将在
Figure PCTCN2020077961-appb-000027
上进行小范围的重匹配。首先根据
Figure PCTCN2020077961-appb-000028
将特征列表
Figure PCTCN2020077961-appb-000029
中对应分辨率的右特征图
Figure PCTCN2020077961-appb-000030
重建出左特征图,记为
Figure PCTCN2020077961-appb-000031
用重建的左特征图
Figure PCTCN2020077961-appb-000032
和原始的左特征图
Figure PCTCN2020077961-appb-000033
在视差d=[-2,2]小范围内做一次重匹配,得到一个大小为[H/4,W/4,5,f]的代价图(以
Figure PCTCN2020077961-appb-000034
为例),再通过一个沙漏网络优化代价图,回归视差,可以得到一个偏置图Δ,代表和
Figure PCTCN2020077961-appb-000035
的偏移量的大小,两者相加可得到最终的优化网络的视差图D n
Figure PCTCN2020077961-appb-000036
重复迭代3.1、3.2过程,直至恢复到原分辨率,得到最终的高精度的视差图。
4.损失函数
本方案网络训练时采用两种损失函数,对传播上采样模块输出的视差图
Figure PCTCN2020077961-appb-000037
用平滑项损失,记为
Figure PCTCN2020077961-appb-000038
对精确重匹配模块的输出使用下采样到对应分辨率的视差标签进行监督,记为
Figure PCTCN2020077961-appb-000039
Figure PCTCN2020077961-appb-000040
Figure PCTCN2020077961-appb-000041
Figure PCTCN2020077961-appb-000042
公式(7)中,N代表图像像素个数,
Figure PCTCN2020077961-appb-000043
代表视差图的梯度,
Figure PCTCN2020077961-appb-000044
代表原图边缘图的梯度。公示(8)中,
Figure PCTCN2020077961-appb-000045
代表对应分辨率的视差标签,||.|| 2代表L2距离;最终的损失函数由两个损失函数相加构成。

Claims (3)

  1. 一种基于上采样及精确重匹配的视差估计优化方法,其特征在于,包括步骤如下:
    第一步,提取可判别性特征;
    第二步,初始代价匹配与代价图优化,获得低分辨率初始视差图;
    第三步,低分辨率初始视差图经过传播上采样方法和精确重匹配方法得到高一个分辨率的视差图,重复此过程直到恢复为原分辨率;
    3.1所述的传播上采样方法
    最低分辩率的初始视差图D n+1首先插值上采样,得到粗匹配的视差图D′ n,此时得到的视差图仅由数值插值得到,并没有参考任何原始图像的结构信息,用原始的右视图I r,根据粗匹配的视差图D′ n,重建出左视图,记为
    Figure PCTCN2020077961-appb-100001
    然后计算重建的左视图
    Figure PCTCN2020077961-appb-100002
    和真实的左视图I l之间的误差,得到置信度图M c
    Figure PCTCN2020077961-appb-100003
    normalization(.)为归一化操作,将差值归一化到(0,1)之间,置信度图M c上每一点的概率值代表该像素视差值的可信程度;复制平移置信度图变为置信度图组,记为M cg
    M cg=f c(M c,k,s)  (3)
    其中f c(.)代表复制平移以改变尺寸的操作,k代表邻域窗口大小,s代表采样窗口的空洞率;感受野为(2s+1) 2,每个位置得到一个k*k的置信度向量,代表该像素周围k*k邻域窗口内像素的置信度;
    通过一个相对关系网络模块,该模块输入对应分辨率的左特征图,在每个位置都学习出一个权重向量,代表着其邻域像素与该中心像素的相对关系,权重越大,表示邻域某像素对该像素的影响越大;该权重,记为W relative
    Figure PCTCN2020077961-appb-100004
    其中k为代表邻域窗口大小,θ relative表示相对关系网络模型;
    用粗匹配的视差图D′ n、置信度图M cg和相对关系权重W relative进行传播,得到传播后的视差图,传播计算过程如下:
    Figure PCTCN2020077961-appb-100005
    其中
    Figure PCTCN2020077961-appb-100006
    代表传播后的视差图,<,>代表点乘操作,f c(.)代表复制平移resize操作,softmax(W relative*M cg)代表传播时周围像素对中心像素的支持力度,由周围像素的置信度和相对关系权重相乘得到;
    然后使用窗口空洞率为重复此传播过程,使其能在不同感受野上传播优化视差图;至此,完成从D n+1
    Figure PCTCN2020077961-appb-100007
    的传播上采样过程;
    3.2所述的精确重匹配方法
    首先根据
    Figure PCTCN2020077961-appb-100008
    将特征列表
    Figure PCTCN2020077961-appb-100009
    中对应分辨率的右特征图
    Figure PCTCN2020077961-appb-100010
    重建出左特征图,记为
    Figure PCTCN2020077961-appb-100011
    用重建的左特征图
    Figure PCTCN2020077961-appb-100012
    和原始的左特征图
    Figure PCTCN2020077961-appb-100013
    在视差d=[-d 0,d 0]小范围内做一次重匹配,得到代价图,再通过一个沙漏网络优化代价图,回归视差,得到一个偏置图Δ,代表和
    Figure PCTCN2020077961-appb-100014
    的偏移量的大小,两者相加得到最终的优化网络的视差图D n
    Figure PCTCN2020077961-appb-100015
    重复迭代3.1、3.2过程,直至恢复到原分辨率,得到最终的高精度的视差图。
  2. 根据权利要求1所述的基于上采样及精确重匹配的视差估计优化方法,其特征在于,第一步,对输入网络中的左右两张视图进行特征提取,将不同分辨率的特征图储存在特征列表
    Figure PCTCN2020077961-appb-100016
    中,然后在最小分辨率的特征图上做匹配。
  3. 根据权利要求1所述的基于上采样及精确重匹配的视差估计优化方法,其特征在于,第二步,用最低分辨率的左右特征图,f l(x,y)f r(x,y)代表图像上某一点的特征向量,C表示代价图,形成代价图的公式如下:
    C(x,y,d)=<f l(x,y)-f r(x-d,y)>  (1)
    <>表示特征向量对应位置元素相减,d等于{0,1,2…D max},D max为匹配时的最大视差;得到最低分辨率的代价图后,用一个沙漏网络优化;沙漏网络由不同步长的卷积层组成,沙漏网络输出的代价图经过soft argmin层回归出一张最低分辨率的初始视差图,记为D n+1
PCT/CN2020/077961 2020-01-10 2020-03-05 基于上采样及精确重匹配的视差估计优化方法 WO2021138992A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/604,588 US12008779B2 (en) 2020-01-10 2020-03-05 Disparity estimation optimization method based on upsampling and exact rematching

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010028308.2A CN111242999B (zh) 2020-01-10 2020-01-10 基于上采样及精确重匹配的视差估计优化方法
CN202010028308.2 2020-01-10

Publications (1)

Publication Number Publication Date
WO2021138992A1 true WO2021138992A1 (zh) 2021-07-15

Family

ID=70872416

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/077961 WO2021138992A1 (zh) 2020-01-10 2020-03-05 基于上采样及精确重匹配的视差估计优化方法

Country Status (2)

Country Link
CN (1) CN111242999B (zh)
WO (1) WO2021138992A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333758A (zh) * 2023-12-01 2024-01-02 博创联动科技股份有限公司 基于大数据分析的田地路线识别系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270701B (zh) * 2020-10-26 2023-09-12 湖北汽车工业学院 基于分组距离网络的视差预测方法、系统及存储介质
CN113313740B (zh) * 2021-05-17 2023-01-31 北京航空航天大学 一种基于平面连续性的视差图和表面法向量联合学习方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110176722A1 (en) * 2010-01-05 2011-07-21 Mikhail Sizintsev System and method of processing stereo images
CN110427968A (zh) * 2019-06-28 2019-11-08 武汉大学 一种基于细节增强的双目立体匹配方法
CN110533712A (zh) * 2019-08-26 2019-12-03 北京工业大学 一种基于卷积神经网络的双目立体匹配方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930530B (zh) * 2012-09-26 2015-06-17 苏州工业职业技术学院 一种双视点图像的立体匹配方法
KR101989133B1 (ko) * 2018-06-12 2019-09-30 중앙대학교 산학협력단 스테레오 이미지에서 공통 지원 영역을 이용한 실시간 스테레오 매칭 장치 및 매칭 방법
CN109887008B (zh) * 2018-08-31 2022-09-13 河海大学常州校区 基于前后向平滑和o(1)复杂度视差立体匹配方法、装置和设备
CN109472819B (zh) * 2018-09-06 2021-12-28 杭州电子科技大学 一种基于级联几何上下文神经网络的双目视差估计方法
CN109410266A (zh) * 2018-09-18 2019-03-01 合肥工业大学 基于四模Census变换和离散视差搜索的立体匹配算法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110176722A1 (en) * 2010-01-05 2011-07-21 Mikhail Sizintsev System and method of processing stereo images
CN110427968A (zh) * 2019-06-28 2019-11-08 武汉大学 一种基于细节增强的双目立体匹配方法
CN110533712A (zh) * 2019-08-26 2019-12-03 北京工业大学 一种基于卷积神经网络的双目立体匹配方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333758A (zh) * 2023-12-01 2024-01-02 博创联动科技股份有限公司 基于大数据分析的田地路线识别系统
CN117333758B (zh) * 2023-12-01 2024-02-13 博创联动科技股份有限公司 基于大数据分析的田地路线识别系统

Also Published As

Publication number Publication date
CN111242999B (zh) 2022-09-20
US20220198694A1 (en) 2022-06-23
CN111242999A (zh) 2020-06-05

Similar Documents

Publication Publication Date Title
WO2021138992A1 (zh) 基于上采样及精确重匹配的视差估计优化方法
Wang et al. Mvster: Epipolar transformer for efficient multi-view stereo
CN110119780B (zh) 基于生成对抗网络的高光谱图像超分辨重建方法
CN110189253B (zh) 一种基于改进生成对抗网络的图像超分辨率重建方法
CN110020989B (zh) 一种基于深度学习的深度图像超分辨率重建方法
CN112396607B (zh) 一种可变形卷积融合增强的街景图像语义分割方法
CN111626308B (zh) 一种基于轻量卷积神经网络的实时光流估计方法
CN110349087B (zh) 基于适应性卷积的rgb-d图像高质量网格生成方法
CN114066729A (zh) 一种可恢复身份信息的人脸超分辨率重建方法
CN116934592A (zh) 一种基于深度学习的图像拼接方法、系统、设备及介质
WO2024032331A1 (zh) 图像处理方法及装置、电子设备、存储介质
CN115511705A (zh) 一种基于可变形残差卷积神经网络的图像超分辨率重建方法
CN113421186A (zh) 使用生成对抗网络的非监督视频超分辨率的设备和方法
CN116823610A (zh) 一种基于深度学习的水下图像超分辨率生成方法和系统
Chen et al. Recovering fine details for neural implicit surface reconstruction
CN115330935A (zh) 一种基于深度学习的三维重建方法及系统
CN113538527B (zh) 一种高效轻量级光流估计方法、存储介质及装置
CN115330601A (zh) 一种多尺度文物点云超分辨率方法及系统
CN111382845B (zh) 一种基于自注意力机制的模板重建方法
CN115272066A (zh) 一种基于细节信息渐近恢复的图像超分辨率重建方法
US12008779B2 (en) Disparity estimation optimization method based on upsampling and exact rematching
CN114140322A (zh) 注意力引导插值方法和低延迟语义分割方法
Yang et al. Reference-based Image Super-Resolution by Dual-Variational AutoEncoder
CN117853340B (zh) 基于单向卷积网络和降质建模的遥感视频超分辨率重建方法
CN116363382B (zh) 一种双波段图像特征点搜索与匹配方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911530

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911530

Country of ref document: EP

Kind code of ref document: A1