WO2021138992A1 - 基于上采样及精确重匹配的视差估计优化方法 - Google Patents
基于上采样及精确重匹配的视差估计优化方法 Download PDFInfo
- Publication number
- WO2021138992A1 WO2021138992A1 PCT/CN2020/077961 CN2020077961W WO2021138992A1 WO 2021138992 A1 WO2021138992 A1 WO 2021138992A1 CN 2020077961 W CN2020077961 W CN 2020077961W WO 2021138992 A1 WO2021138992 A1 WO 2021138992A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- map
- disparity
- resolution
- feature
- propagation
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000005070 sampling Methods 0.000 title claims abstract description 30
- 238000005457 optimization Methods 0.000 title claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 239000011800 void material Substances 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000000644 propagated effect Effects 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
- G06T3/4076—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the invention belongs to the field of image processing and computer vision, and relates to a coarse-to-fine binocular disparity estimation method based on supervised learning, and in particular to a disparity estimation optimization method based on upsampling and precise rematching.
- Binocular depth estimation uses two calibrated left and right views, and obtains the corresponding disparity value according to the relative position of each pixel between different views. According to the camera imaging model, the disparity is restored to the depth information of the image.
- the existing binocular depth estimation methods are mainly divided into traditional methods and deep learning methods.
- the traditional method is divided into local algorithm and global algorithm.
- the local algorithm uses the similarity of neighboring pixels in the window to match.
- the global method constructs an energy function, including the matching cost of the pixel itself and the constraint relationship between different pixels, and obtains the final disparity map by minimizing the energy function.
- the traditional method runs too long and the accuracy is not high, especially the mismatch error is high in the untextured and occluded areas.
- the deep learning method is to learn the disparity map of the left and right views through the neural network end-to-end.
- the basic framework mainly includes feature extraction, construction of cost maps, disparity aggregation, and optimization of disparity.
- Input the left and right views into the network obtain the left and right feature maps through the feature extraction network, and then perform matching under different parallaxes to obtain a low-resolution cost map.
- the aggregation optimization part is divided into two methods: one is to optimize the low-resolution cost map , Gradually restore to the original resolution, and finally calculate the disparity map with soft argmin. Second, the low-resolution cost map first obtains the low-resolution disparity map, and the disparity map is gradually up-sampled and optimized to obtain the final original resolution disparity map.
- the present invention proposes a coarse-to-fine accurate re-matching method.
- geometric constraints are reintroduced, the parallax map and the left and right images are obtained at low resolution, and then a matching is performed in a small parallax range, and the parallax is accurate
- the scope of the graph improves the generalization ability of the network.
- the present invention proposes an up-sampling method based on propagation. Use the left image feature of the corresponding resolution to learn the relative weight of each pixel and the pixels in its neighborhood. According to the left and right reconstruction consistency, the confidence and weight are propagated on the up-sampled disparity map, so that the disparity map is sampled In the process, the context information is better combined to reduce the error filling caused by interpolation and upsampling.
- the present invention aims to overcome the shortcomings of the existing deep learning methods, and provides a disparity estimation optimization method based on upsampling and accurate rematching, that is, rematching accurately in a small range in the optimized network part, while improving the previous disparity map or cost
- Up-sampling methods such as graph neighbor interpolation and bilinear interpolation use the network method to learn a propagation-based up-sampling, so that the disparity map can recover the accurate disparity value during the up-sampling process.
- the specific plan includes the following steps:
- the first step is to extract distinguishable features
- the second step is to match the initial cost and optimize the cost map to obtain a low-resolution initial disparity map
- the low-resolution initial disparity map is propagated up-sampling and accurate re-matching methods to obtain a higher-resolution disparity map, and this process is repeated until the original resolution is restored;
- the lowest-resolution initial disparity map D n+1 is first interpolated and up-sampled to obtain a coarsely matched disparity map D′ n .
- the disparity map obtained at this time is only obtained by numerical interpolation, and does not refer to any structural information of the original image.
- the right view I r the left view is reconstructed according to the coarsely matched disparity map D′ n, denoted as Then calculate the reconstructed left view And the error between the real left view I l , and the confidence map M c is obtained :
- f c (.) represents the operation of copying and shifting to change the size
- k represents the size of the neighborhood window
- s represents the void rate of the sampling window
- the receptive field is (2s+1) 2 , and each position gets a k*k confidence
- the degree vector represents the confidence of the pixel in the k*k neighborhood window around the pixel
- the module inputs the left feature map of the corresponding resolution, and learns a weight vector at each position, which represents the relative relationship between its neighboring pixels and the center pixel. The greater the weight, the neighborhood The greater the influence of a pixel on the pixel; the weight is recorded as W relative
- k represents the size of the neighborhood window
- ⁇ relative represents the relative relationship network model
- the present invention proposes a coarse-to-fine precise re-matching method, re-introducing geometric constraints in the process of optimizing the parallax, using the low-resolution parallax map and left and right images to be obtained, and then performing a matching within a small parallax range. Accurate the range of the disparity map to improve the generalization ability of the network.
- the present invention proposes a method for propagating up-sampling using context relationships.
- up-sampling is performed by combining context relationships and the current rough disparity confidence, which solves the problem of edge destruction in existing up-sampling methods. , You can get a higher resolution disparity map with finer edges.
- Figure 1 is the overall flow chart of the program
- Figure 2 is a flow chart of the propagation up-sampling module
- Figure 3 is a flow chart of exact rematching.
- the present invention is based on the disparity optimization strategy of the coarse-to-fine disparity estimation framework, and performs end-to-end disparity map prediction on the input left and right views. Without introducing additional tasks, the propagation up-sampling method and accurate reconstruction proposed in this application are used. The matching method predicts an accurate disparity map.
- the specific implementation plan is as follows:
- the first step is to extract distinguishable features
- the second step is to match the initial cost and optimize the cost map to obtain a low-resolution initial disparity
- ⁇ > represents the subtraction of the corresponding position elements of the feature vector
- d is equal to ⁇ 0,1,2...D max ⁇
- D max is the maximum disparity during matching, so the size of the final cost map is [H/8,W/8 ,D max /8,f].
- the hourglass network is composed of convolutional layers with asynchronous length, and the cost map output by the hourglass network is returned to a rough 1/8 resolution through the soft argmin layer
- the disparity map of is denoted as D 3 .
- the low-resolution initial disparity enters the optimization network to obtain high-resolution fine disparity
- the obtained disparity map at the lowest resolution passes through the propagation up-sampling module and the precise re-matching module to obtain a higher resolution disparity map, and this process is repeated until the original resolution is restored.
- D 3 first interpolates up-sampling to obtain a coarse-matched disparity map D′ 2 , the disparity map obtained at this time is only obtained by numerical interpolation, and does not refer to any structural information of the original image, and the information loss caused by down-sampling cannot be recovered. Therefore, the error rate of D′ 2 obtained is higher. Therefore, the disparity map D′ 2 needs to be optimized based on the propagation strategy.
- the original right view I r according to the up-sampled disparity map D′ 2 , reconstruct the left view, denoted as f w (.) is the warping function. Then calculate the reconstructed left view The error between the real left view I l , and the confidence map M c is obtained :
- Normalization is a normalization operation, the normalization of the difference between a (0,1), the probability of each point on the confidence map M c represents the credibility of the disparity values of pixels. Copy the translational confidence map into a confidence map group of [H/8,W/8,k*k] size, denoted as Mcg ,
- f c (.) represents the operation of copying and shifting to change the size
- k represents the size of the neighborhood window
- s represents the hole rate of the sampling window.
- the receptive field is (2s+1) 2
- a k*k confidence vector can be obtained, which represents the confidence of the pixel in the k*k neighborhood window around the pixel.
- a relative relationship network module is proposed.
- the module inputs the left feature map of the corresponding resolution.
- a weight vector can be learned at each position, which represents the relative relationship between its neighboring pixels and the center pixel. The greater the weight, the A pixel in the neighborhood has a greater impact on the pixel. For example, if the relative relationship between the pixels within the same object and its neighboring pixels is relatively strong, the weight is also greater. On the contrary, if the neighboring pixels are on the edge, the weight of the pixel is smaller.
- each different picture can learn different weights, so that the disparity value of the pixel can be updated according to the different weights of surrounding pixels during propagation.
- the same weighted convolution kernel is used to optimize the disparity map.
- This module is composed of three convolutional layers with void ratios of ⁇ 1,2,3 ⁇ respectively.
- the left feature map is input and the weight of [H/8,W/8,k*k] is output, denoted as W relative
- k represents the size of the neighborhood window
- ⁇ relative represents the relative relationship network model
- ⁇ ,> represents the dot multiplication operation
- f c (.) represents the copy translation resize operation
- the propagation up-sampling module outputs a high-resolution disparity map based on propagation from the low-resolution D n+1
- the exact rematching module will be in Perform a small range of re-matching on the top.
- N represents the number of image pixels
- 2 represents the L2 distance; the final loss function is composed of the addition of two loss functions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Image Processing (AREA)
Abstract
本发明公开了一种基于上采样及精确重匹配的视差估计优化方法。即在优化网络部分小范围精确重匹配,同时改进以往对视差图或代价图邻插、双线性插值等上采样方法,用网络的方式学习出一种基于传播的上采样,使视差图在上采样的过程中能够好的恢复出准确的视差值。
Description
本发明属于图像处理和计算机视觉领域,涉及一种基于监督学习的由粗到细的双目视差估计方法,具体涉及一种基于上采样及精确重匹配的视差估计优化方法。
双目深度估计是通过两张标定好的左右视图,根据每个像素在不同视图之间的相对位置得到对应的视差值,根据相机成像模型,将视差恢复成图像的深度信息。现有的双目深度估计的方法主要分为传统方法和深度学习的方法。
传统方法分为局部算法和全局算法,局部算法利用窗口内邻域像素的相似度进行匹配。全局方法构造一个能量函数,包括像素点本身的匹配代价和不同像素点间的约束关系,通过最小化能量函数来得到最终的视差图。传统方法运行时间过长,精度不高,特别是在无纹理和遮挡区域误匹配误差较高。
深度学习方法是通过神经网络端到端的学习出左右视图的视差图,基本框架主要包括特征提取、构建代价图、视差聚合、优化视差。将左右视图输入网络中,通过特征提取网络得到左右特征图,然后在不同视差下进行匹配,得到低分辨率的代价图,聚合优化部分分为两种方法:一是优化低分辨率的代价图,逐步恢复成原分辨率,最后用soft argmin计算出视差图。二是低分辨率的代价图先得到低分辨率的视差图,将视差图逐步上采样优化得到最终原分辨率的视差图。为了满足网络计算和速度的需求,往往需要在低分辨率的特征图上进行匹配,这就导致了下采样的过程中小物体的丢失。而后续优化模块,未考虑小物体的缺失问题,通过监督从无到有重新生成小物体,未引入几何先验,导致细节缺失,网络泛化 能力差。当前采用的上采样方法多基于邻插、双线性及三线性的采样方法,这样的插值方法并不符合视差图的分布,会导致对于正对成像平面的物体视差不一致,同时也破坏了在物体边缘视差的不连续性。
本发明提出一种由粗到细精确重匹配方法,在优化视差过程中重新引入几何约束,利用在低分辨率上得到视差图和左右图,再在小的视差范围内做一次匹配,精确视差图范围,提高网络的泛化能力。同时本发明提出一种基于传播的上采样方法。利用对应分辨率的左图特征学习出每个像素及其邻域内像素的相对关系权重,根据左右重建一致性得到的置信度和权重在上采样的视差图上进行传播,使在视差图上采样的过程中更好的结合上下文信息,减少插值上采样带来的错误填充。
发明内容
本发明旨在克服现有的深度学习方法的不足,提供了一种基于上采样及精确重匹配的视差估计优化方法,即在优化网络部分小范围精确重匹配,同时改进以往对视差图或代价图邻插、双线性插值等上采样方法,用网络的方式学习出一种基于传播的上采样,使视差图在上采样的过程中能够好的恢复出准确的视差值。
具体方案包括下列步骤:
一种基于上采样及精确重匹配的视差估计优化方法,其特征在于,包括步骤如下:
第一步,提取可判别性特征;
第二步,初始代价匹配与代价图优化,获得低分辨率初始视差图;
第三步,低分辨率初始视差图经过传播上采样方法和精确重匹配方法得到高一个分辨率的视差图,重复此过程直到恢复为原分辨率;
3.1所述的传播上采样方法
最低分辩率的初始视差图D
n+1首先插值上采样,得到粗匹配的视差图D′
n,此时得到的视差图仅由数值插值得到,并没有参考任何原始图像的结构信息,用原始的右视图I
r,根据粗匹配的视差图D′
n,重建出左视图,记为
然后计算重建的左视图
和真实的左视图I
l之间的误差,得到置信度图M
c:
normalization(.)为归一化操作,将差值归一化到(0,1)之间,置信度图M
c上每一点的概率值代表该像素视差值的可信程度;复制平移置信度图变为置信度图组,记为M
cg,
M
cg=f
c(M
c,k,s) (3)
其中f
c(.)代表复制平移以改变尺寸的操作,k代表邻域窗口大小,s代表采样窗口的空洞率;感受野为(2s+1)
2,每个位置得到一个k*k的置信度向量,代表该像素周围k*k邻域窗口内像素的置信度;
通过一个相对关系网络模块,该模块输入对应分辨率的左特征图,在每个位置都学习出一个权重向量,代表着其邻域像素与该中心像素的相对关系,权重越大,表示邻域某像素对该像素的影响越大;该权重,记为W
relative
其中k为代表邻域窗口大小,θ
relative表示相对关系网络模型;
用粗匹配的视差图D′
n、置信度图M
cg和相对关系权重W
relative进行传播,得到传播后的视差图,传播计算过程如下:
其中
代表传播后的视差图,<,>代表点乘操作,f
c(.)代表复制平移resize操作,softmax(W
relative*M
cg)代表传播时周围像素对中心像素的支持力度,由周围像素的置信度和相对关系权重相乘得到;
3.2所述的精确重匹配方法
首先根据
将特征列表
中对应分辨率的右特征图
重建出左特征图,记为
用重建的左特征图
和原始的左特征图
在视差d=[-d
0,d
0]小范围内做一次重匹配,得到代价图,再通过一个沙漏网络优化代价图,回归视差,得到一个偏置图Δ,代表和
的偏移量的大小,两者相加得到最终的优化网络的视差图D
n,
重复迭代3.1、3.2过程,直至恢复到原分辨率,得到最终的高精度的视差图。
本发明的有益效果是:
1)本发明提出一种由粗到细精确重匹配方法,在优化视差过程中重新引入几何约束,利用在低分辨率上得到视差图和左右图,再在小的视差范围内做一次匹配,精确视差图范围,提高网络的泛化能力。
2)本发明提出一种利用上下文关系进行传播上采样的方法,在优化视差过程中,通过结合上下文关系以及当前粗糙视差置信度进行上采样,解决了现有上采样方法存在的破坏边缘的问题,可以获得边缘更精细的较高分辨率视差图。
图1为方案整体流程图;
图2为传播上采样模块流程图;
图3为精确重匹配流程图。
本发明基于由粗到细视差估计框架的视差优化策略,对输入的左右视图进行端到端的视差图预测,在不引入额外的任务的前提下,用本申请提出的传播上采样方法和精确重匹配方法,预测出准确的视差图,具体实施方案如下:
方案网络具体流程如图一,具体操作如下:
第一步,提取可判别性特征;
对输入网络中的左右两张视图进行特征提取。比起在原始图像的灰度值上进行匹配,使用特征向量进行匹配能够更好的应对光照、外观的变化,提取的特征向量能更加详细、全面的描述图片的信息,有助于更好的匹配。使用一个简单的CNN网络进行特征提取,包括四个级联的部分,(每个部分都包括三种不同的卷积层来提取特征),四个子部分分别产生不同分辨率的左右特征图F
0~F
3(下标表示下采样因子,例,F
3表示1/8分辨率的特征图),每个特征向量f的维度是32,将四个不同分辨率的特征图储存在特征列表中
作为后面优化网络的输入,然后在最小分辨率,即F
3,1/8分辨率的特征图上做匹配。
第二步,初始代价匹配与代价图优化,获得低分辨率初始视差;
C(x,y,d)=<f
l(x,y)-f
r(x-d,y)> (1)
<>表示特征向量对应位置元素相减,d等于{0,1,2…D
max},D
max为匹配时的最大视差,所以最终形成的代价图的尺寸为[H/8,W/8,D
max/8,f]。
得到1/8分辨率的代价图后,用一个沙漏网络优化;沙漏网络由不同步长的卷积层组成,沙漏网络输出的代价图经过soft argmin层回归出一张1/8分辨率的粗略的视差图,记为D
3。
第三步,低分辨率初始视差进入优化网络,获得高分辨率精细视差;
得到的在最低分辨率上的视差图再经过传播上采样模块和精确重匹配模块得到高一个分辨率的视差图,重复此过程直到恢复为原分辨率。
具体流程如图2、图3所示。
具体步骤如下所示:(这里以D
3到D
2一步迭代为例)
3.1传播上采样方法
D
3首先插值上采样,得到粗匹配的视差图D′
2,此时得到的视差图仅由数值插值得到,并没有参考任何原始图像的结构信息,无法恢复因下采样而导致的信息损失,因此得到的D′
2错误率较高。所以需要基于的传播策略优化视差图D′
2。用原始的右视图I
r,根据上采样的视差图D′
2,重建出左视图,记为
f
w(.)为warping函数。然后计算重建的左视图
和真实的左视图I
l之间的误差,得到置信度图M
c:
normalization(.)为归一化操作,将差值归一化到(0,1)之间,置信度图M
c上每一点的概率值代表该像素视差值的可信程度。复制平移置信度图变为[H/8,W/8,k*k]大小的置信度图组,记为M
cg,
M
cg=f
c(M
c,k,s) (3)
其中f
c(.)代表复制平移更改尺寸操作,k代表邻域窗口大小,s代表采样窗口的空洞率。(感受野为(2s+1)
2)每个位置可以得到一个k*k的置信度向量,代表该像素周围k*k邻域窗口内像素的置信度。
提出了一个相对关系网络模块,该模块输入对应分辨率的左特征图,在每个位置都可以学习出一个权重向量,代表着其邻域像素与该中心像素的相对关系,权重越大,表示邻域某像素对该像素的影响越大。例如,在同一物体内部的像素及其邻域像素相对关系比较强,则权重也较大,相反,若邻域像素处在为边缘,则对该像素的权重较小。通过这个模块,每一张不同的图片,都可以学习出不同的权重,使传播的时候能根据周围像素的不同的权重,来更新该像素的视差值。而不是在常规的神经网络,对于不同的输入,都使用相同权重的卷积核来优化视差图。
该模块由三层空洞率分别为{1,2,3}的卷积层组成,输入左特征图,输出[H/8,W/8,k*k]大小的权重,记为W
relative
其中k为代表邻域窗口大小,θ
relative表示相对关系网络模型。
其中
代表传播后的视差图,<,>代表点乘操作,f
c(.)代表复制平移resize操作,softmax(W
relative*M
cg)代表传播时周围像素对中心像素的支持力度,由周围像素的置信度和相对关系权重相乘得到。然后使用窗口空洞率为s=1,2,3重复此传播三次过程,使其能在不同感受野上传播优化视差图。至此,完成从D
n+1到
的传播上采样过程。
3.2精确重匹配方法
传播上采样模块从低分辨率的D
n+1输出基于传播的高分辨率的视差图
精确重匹配模块将在
上进行小范围的重匹配。首先根据
将特征列表
中对应分辨率的右特征图
重建出左特征图,记为
用重建的左特征图
和原始的左特征图
在视差d=[-2,2]小范围内做一次重匹配,得到一个大小为[H/4,W/4,5,f]的代价图(以
为例),再通过一个沙漏网络优化代价图,回归视差,可以得到一个偏置图Δ,代表和
的偏移量的大小,两者相加可得到最终的优化网络的视差图D
n,
重复迭代3.1、3.2过程,直至恢复到原分辨率,得到最终的高精度的视差图。
4.损失函数
Claims (3)
- 一种基于上采样及精确重匹配的视差估计优化方法,其特征在于,包括步骤如下:第一步,提取可判别性特征;第二步,初始代价匹配与代价图优化,获得低分辨率初始视差图;第三步,低分辨率初始视差图经过传播上采样方法和精确重匹配方法得到高一个分辨率的视差图,重复此过程直到恢复为原分辨率;3.1所述的传播上采样方法最低分辩率的初始视差图D n+1首先插值上采样,得到粗匹配的视差图D′ n,此时得到的视差图仅由数值插值得到,并没有参考任何原始图像的结构信息,用原始的右视图I r,根据粗匹配的视差图D′ n,重建出左视图,记为 然后计算重建的左视图 和真实的左视图I l之间的误差,得到置信度图M c:normalization(.)为归一化操作,将差值归一化到(0,1)之间,置信度图M c上每一点的概率值代表该像素视差值的可信程度;复制平移置信度图变为置信度图组,记为M cg,M cg=f c(M c,k,s) (3)其中f c(.)代表复制平移以改变尺寸的操作,k代表邻域窗口大小,s代表采样窗口的空洞率;感受野为(2s+1) 2,每个位置得到一个k*k的置信度向量,代表该像素周围k*k邻域窗口内像素的置信度;通过一个相对关系网络模块,该模块输入对应分辨率的左特征图,在每个位置都学习出一个权重向量,代表着其邻域像素与该中心像素的相对关系,权重越大,表示邻域某像素对该像素的影响越大;该权重,记为W relative用粗匹配的视差图D′ n、置信度图M cg和相对关系权重W relative进行传播,得到传播后的视差图,传播计算过程如下:其中 代表传播后的视差图,<,>代表点乘操作,f c(.)代表复制平移resize操作,softmax(W relative*M cg)代表传播时周围像素对中心像素的支持力度,由周围像素的置信度和相对关系权重相乘得到;3.2所述的精确重匹配方法首先根据 将特征列表 中对应分辨率的右特征图 重建出左特征图,记为 用重建的左特征图 和原始的左特征图 在视差d=[-d 0,d 0]小范围内做一次重匹配,得到代价图,再通过一个沙漏网络优化代价图,回归视差,得到一个偏置图Δ,代表和 的偏移量的大小,两者相加得到最终的优化网络的视差图D n,重复迭代3.1、3.2过程,直至恢复到原分辨率,得到最终的高精度的视差图。
- 根据权利要求1所述的基于上采样及精确重匹配的视差估计优化方法,其特征在于,第二步,用最低分辨率的左右特征图,f l(x,y)f r(x,y)代表图像上某一点的特征向量,C表示代价图,形成代价图的公式如下:C(x,y,d)=<f l(x,y)-f r(x-d,y)> (1)<>表示特征向量对应位置元素相减,d等于{0,1,2…D max},D max为匹配时的最大视差;得到最低分辨率的代价图后,用一个沙漏网络优化;沙漏网络由不同步长的卷积层组成,沙漏网络输出的代价图经过soft argmin层回归出一张最低分辨率的初始视差图,记为D n+1。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/604,588 US12008779B2 (en) | 2020-01-10 | 2020-03-05 | Disparity estimation optimization method based on upsampling and exact rematching |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010028308.2A CN111242999B (zh) | 2020-01-10 | 2020-01-10 | 基于上采样及精确重匹配的视差估计优化方法 |
CN202010028308.2 | 2020-01-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021138992A1 true WO2021138992A1 (zh) | 2021-07-15 |
Family
ID=70872416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/077961 WO2021138992A1 (zh) | 2020-01-10 | 2020-03-05 | 基于上采样及精确重匹配的视差估计优化方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111242999B (zh) |
WO (1) | WO2021138992A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117333758A (zh) * | 2023-12-01 | 2024-01-02 | 博创联动科技股份有限公司 | 基于大数据分析的田地路线识别系统 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270701B (zh) * | 2020-10-26 | 2023-09-12 | 湖北汽车工业学院 | 基于分组距离网络的视差预测方法、系统及存储介质 |
CN113313740B (zh) * | 2021-05-17 | 2023-01-31 | 北京航空航天大学 | 一种基于平面连续性的视差图和表面法向量联合学习方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110176722A1 (en) * | 2010-01-05 | 2011-07-21 | Mikhail Sizintsev | System and method of processing stereo images |
CN110427968A (zh) * | 2019-06-28 | 2019-11-08 | 武汉大学 | 一种基于细节增强的双目立体匹配方法 |
CN110533712A (zh) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | 一种基于卷积神经网络的双目立体匹配方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930530B (zh) * | 2012-09-26 | 2015-06-17 | 苏州工业职业技术学院 | 一种双视点图像的立体匹配方法 |
KR101989133B1 (ko) * | 2018-06-12 | 2019-09-30 | 중앙대학교 산학협력단 | 스테레오 이미지에서 공통 지원 영역을 이용한 실시간 스테레오 매칭 장치 및 매칭 방법 |
CN109887008B (zh) * | 2018-08-31 | 2022-09-13 | 河海大学常州校区 | 基于前后向平滑和o(1)复杂度视差立体匹配方法、装置和设备 |
CN109472819B (zh) * | 2018-09-06 | 2021-12-28 | 杭州电子科技大学 | 一种基于级联几何上下文神经网络的双目视差估计方法 |
CN109410266A (zh) * | 2018-09-18 | 2019-03-01 | 合肥工业大学 | 基于四模Census变换和离散视差搜索的立体匹配算法 |
-
2020
- 2020-01-10 CN CN202010028308.2A patent/CN111242999B/zh active Active
- 2020-03-05 WO PCT/CN2020/077961 patent/WO2021138992A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110176722A1 (en) * | 2010-01-05 | 2011-07-21 | Mikhail Sizintsev | System and method of processing stereo images |
CN110427968A (zh) * | 2019-06-28 | 2019-11-08 | 武汉大学 | 一种基于细节增强的双目立体匹配方法 |
CN110533712A (zh) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | 一种基于卷积神经网络的双目立体匹配方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117333758A (zh) * | 2023-12-01 | 2024-01-02 | 博创联动科技股份有限公司 | 基于大数据分析的田地路线识别系统 |
CN117333758B (zh) * | 2023-12-01 | 2024-02-13 | 博创联动科技股份有限公司 | 基于大数据分析的田地路线识别系统 |
Also Published As
Publication number | Publication date |
---|---|
CN111242999B (zh) | 2022-09-20 |
US20220198694A1 (en) | 2022-06-23 |
CN111242999A (zh) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021138992A1 (zh) | 基于上采样及精确重匹配的视差估计优化方法 | |
Wang et al. | Mvster: Epipolar transformer for efficient multi-view stereo | |
CN110119780B (zh) | 基于生成对抗网络的高光谱图像超分辨重建方法 | |
CN110189253B (zh) | 一种基于改进生成对抗网络的图像超分辨率重建方法 | |
CN110020989B (zh) | 一种基于深度学习的深度图像超分辨率重建方法 | |
CN112396607B (zh) | 一种可变形卷积融合增强的街景图像语义分割方法 | |
CN111626308B (zh) | 一种基于轻量卷积神经网络的实时光流估计方法 | |
CN110349087B (zh) | 基于适应性卷积的rgb-d图像高质量网格生成方法 | |
CN114066729A (zh) | 一种可恢复身份信息的人脸超分辨率重建方法 | |
CN116934592A (zh) | 一种基于深度学习的图像拼接方法、系统、设备及介质 | |
WO2024032331A1 (zh) | 图像处理方法及装置、电子设备、存储介质 | |
CN115511705A (zh) | 一种基于可变形残差卷积神经网络的图像超分辨率重建方法 | |
CN113421186A (zh) | 使用生成对抗网络的非监督视频超分辨率的设备和方法 | |
CN116823610A (zh) | 一种基于深度学习的水下图像超分辨率生成方法和系统 | |
Chen et al. | Recovering fine details for neural implicit surface reconstruction | |
CN115330935A (zh) | 一种基于深度学习的三维重建方法及系统 | |
CN113538527B (zh) | 一种高效轻量级光流估计方法、存储介质及装置 | |
CN115330601A (zh) | 一种多尺度文物点云超分辨率方法及系统 | |
CN111382845B (zh) | 一种基于自注意力机制的模板重建方法 | |
CN115272066A (zh) | 一种基于细节信息渐近恢复的图像超分辨率重建方法 | |
US12008779B2 (en) | Disparity estimation optimization method based on upsampling and exact rematching | |
CN114140322A (zh) | 注意力引导插值方法和低延迟语义分割方法 | |
Yang et al. | Reference-based Image Super-Resolution by Dual-Variational AutoEncoder | |
CN117853340B (zh) | 基于单向卷积网络和降质建模的遥感视频超分辨率重建方法 | |
CN116363382B (zh) | 一种双波段图像特征点搜索与匹配方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20911530 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20911530 Country of ref document: EP Kind code of ref document: A1 |