CN103002309B - Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera - Google Patents

Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera Download PDF

Info

Publication number
CN103002309B
CN103002309B CN201210360976.0A CN201210360976A CN103002309B CN 103002309 B CN103002309 B CN 103002309B CN 201210360976 A CN201210360976 A CN 201210360976A CN 103002309 B CN103002309 B CN 103002309B
Authority
CN
China
Prior art keywords
dynamic
depth
pixels
frame
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210360976.0A
Other languages
Chinese (zh)
Other versions
CN103002309A (en
Inventor
章国锋
鲍虎军
姜翰青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shangtang Technology Development Co Ltd
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201210360976.0A priority Critical patent/CN103002309B/en
Publication of CN103002309A publication Critical patent/CN103002309A/en
Application granted granted Critical
Publication of CN103002309B publication Critical patent/CN103002309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

本发明公开了一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法。它利用多视图几何方法结合DAISY特征向量,对于同一时刻的多目视频帧进行立体匹配,得到多目视频每一时刻的初始化深度图;对于多目视频的每一帧图像计算动态概率图,利用动态概率图对每帧图像进行动态像素点和静态像素点的划分,利用不同的优化方法进行时空一致性的深度优化,对于静态点,利用bundleoptimization方法结合多个相邻时刻的颜色和几何一致性约束进行优化;对于动态点,统计多个相邻时刻的多目摄像机之间对应像素点的颜色和几何一致性约束信息,对每一时刻动态深度值进行时空一致性优化。本发明在3D立体影像、3D动画、增强现实和运动捕获等领域将会有很高的应用价值。

The invention discloses a method for recovering the temporal and spatial consistency depth of dynamic scene videos captured by multi-eye synchronous cameras. It uses the multi-view geometric method combined with the DAISY feature vector to perform stereo matching on the multi-view video frames at the same time, and obtain the initial depth map of the multi-view video at each moment; calculate the dynamic probability map for each frame of the multi-view video, using The dynamic probability map divides each frame of images into dynamic pixels and static pixels, and uses different optimization methods to optimize the depth of space-time consistency. For static points, the bundle optimization method is used to combine the color and geometric consistency of multiple adjacent moments Constraints are optimized; for dynamic points, the color and geometric consistency constraint information of corresponding pixels between multiple multi-eye cameras at adjacent moments are counted, and the spatiotemporal consistency optimization is performed on the dynamic depth value at each moment. The invention will have high application value in the fields of 3D stereoscopic image, 3D animation, augmented reality and motion capture.

Description

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法Method for Spatio-temporal Consistent Depth Recovery of Dynamic Scene Videos Captured by Multi-camera Synchronous Cameras

技术领域 technical field

本发明涉及立体匹配和深度恢复方法,尤其涉及一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法。The present invention relates to a method for stereo matching and depth restoration, in particular to a method for restoration of temporal-spatial consistency depth for dynamic scene videos shot by multi-eye synchronous cameras.

背景技术 Background technique

视频的稠密深度恢复技术是计算机中层视觉领域的基础技术之一,其在3D建模、3D影像、增强现实和运动捕获等众多领域中有及其重要的应用。这些应用通常要求深度恢复结果具有很高精度和时空一致性。The dense depth restoration technology of video is one of the basic technologies in the field of computer mid-level vision, and it has important applications in many fields such as 3D modeling, 3D imaging, augmented reality and motion capture. These applications usually require high accuracy and spatiotemporal consistency of depth restoration results.

视频的稠密深度恢复技术的难点在于:对于场景中的静态和动态物体,所恢复的深度值具有很高的精度和时空一致性。虽然目前对于静态场景的深度恢复技术已能够恢复具有较高精度的深度信息,但是自然界处处充满了运动的物体,对于视频场景中包含的动态物体来说,现有的深度恢复方法都很难达到较高的精度及时空域上的一致性。这些方法通常要求较多个固定放置的同步摄像机对场景进行捕获,在每个时刻对同步的多目视频帧利用多视图几何的方法进行立体匹配,从而恢复每个时刻的深度信息。而这种拍摄方法更多是被应用于实验室内动态场景的拍摄工作,实际拍摄过程中这种拍摄模式会有很多限制。另外现有的方法在时序上优化深度的过程中,通常利用光流寻找到不同时刻视频帧上对应像素点,然后将对应点的深度值或3D点位置进行线性或曲线拟合,从而估计出当前帧像素点的深度信息。这种时域上3D光顺化的方法只能使得时序上对应像素点的深度更为一致,并不能优化出真正准确的深度值;同时由于光流估计不鲁棒性的普遍存在,使得动态点的深度优化问题变得更为复杂难解。The difficulty of dense depth recovery technology for video lies in the fact that for static and dynamic objects in the scene, the recovered depth values have high precision and spatiotemporal consistency. Although the current depth restoration technology for static scenes has been able to restore high-precision depth information, the natural world is full of moving objects. For dynamic objects contained in video scenes, existing depth restoration methods are difficult to achieve. Higher accuracy and consistency in time and space. These methods usually require a plurality of fixedly placed synchronous cameras to capture the scene, and use multi-view geometry to perform stereo matching on the synchronized multi-view video frames at each moment, thereby recovering the depth information at each moment. And this shooting method is more applied to the shooting of dynamic scenes in the laboratory, and this shooting mode will have many restrictions in the actual shooting process. In addition, in the process of optimizing the depth in time series, the existing methods usually use optical flow to find the corresponding pixel points on the video frame at different times, and then perform linear or curve fitting on the depth value or 3D point position of the corresponding point to estimate the Depth information of pixels in the current frame. This method of 3D smoothing in the time domain can only make the depth of the corresponding pixels in the time series more consistent, and cannot optimize the true and accurate depth value; The depth optimization problem of points becomes more complicated and difficult to solve.

现有的视频深度恢复方法主要分为两大类:Existing video depth restoration methods are mainly divided into two categories:

1.对于单目静态场景视频的时域一致性深度恢复1. Temporal Consistent Depth Recovery for Monocular Static Scene Video

此类方法较为典型的是Zhang于09年提出的方法:G.Zhang,J.Jia,T.-T.Wong,and H.Bao.Consistent depth maps recovery from a video sequence.IEEETransactions on Pattern Analysis and Machine Intelligence,31(6):974-988,2009.。此方法首先利用传统多视图几何的方法初始化每帧图像的深度,然后在时域上利用bundle optimization技术统计多个时刻的几何和颜色一致性来优化当前帧的深度。此方法对于静态场景能够恢复出高精度的深度图;对于包含动态物体的场景,此方法不能恢复动态物体的深度值。A typical method of this type is the method proposed by Zhang in 2009: G. Zhang, J. Jia, T.-T. Wong, and H. Bao. Consistent depth maps recovery from a video sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6):974-988, 2009. This method first uses the traditional multi-view geometry method to initialize the depth of each frame image, and then uses bundle optimization technology to count the geometry and color consistency at multiple moments in the time domain to optimize the depth of the current frame. This method can restore high-precision depth maps for static scenes; for scenes containing dynamic objects, this method cannot restore the depth value of dynamic objects.

2.对于多目动态场景视频的深度恢复2. Depth restoration for multi-objective dynamic scene video

此类方法较为典型的是Zitnick的方法:C.L.Zitnick,S.B.Kang,M.Uyttendaele,S.Winder,and R.Szeliski.High-quality video view interpolation using a layeredrepresentation.ACM Transactions on Graphics,23:600-608,August 2004.、Larsen的方法:E.S.Larsen,P.Mordohai,M.Pollefeys,and H.Fuchs.Temporallyconsistent reconstruction from multiple video streams using enhanced beliefpropagation.In ICCV,pages 1-8,2007.以及Lei的方法:C.Lei,X.D.Chen,and Y.H.Yang.A new multi-view spacetime-consistent depth recovery framework for freeviewpoint video rendering.In ICCV,pages 1570-1577,2009.。这些方法都利用同一时刻的多目同步视频帧恢复深度图,要求利用较多数目的固定放置的同步摄像机拍摄动态场景,不适合用于户外实际拍摄。Larsen和Lei的方法分别利用时空域上能量优化和时域3D光顺化的方法来优化深度值,使得这些方法不够鲁棒,不能处理光流估计产生严重错误的情况。This type of method is more typical of Zitnick's method: C.L.Zitnick, S.B.Kang, M.Uyttendaele, S.Winder, and R.Szeliski. High-quality video view interpolation using a layered representation. ACM Transactions on Graphics, 23:600-608 , August 2004., Larsen's method: E.S.Larsen, P.Mordohai, M.Pollefeys, and H.Fuchs. Temporally consistent reconstruction from multiple video streams using enhanced belief propagation. In ICCV, pages 1-8, 2007. And Lei's method: C.Lei, X.D.Chen, and Y.H.Yang. A new multi-view spacetime-consistent depth recovery framework for freeviewpoint video rendering. In ICCV, pages 1570-1577, 2009. These methods all use multi-eye synchronous video frames at the same time to restore the depth map, and require a large number of fixedly placed synchronous cameras to shoot dynamic scenes, which are not suitable for outdoor actual shooting. The methods of Larsen and Lei respectively use energy optimization in the space-time domain and 3D smoothing in the time domain to optimize the depth value, making these methods not robust enough to deal with serious errors in optical flow estimation.

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤1)使用了Tola提出的DAISY特征描述符:E.Tola,V.Lepetit,and P.Fua.Daisy:An efficient dense descriptor applied to wide-baseline stereo.IEEE Transactions onPattern Analysis and Machine Intelligence,32(5):815-830,2010.Step 1 of the method for temporal and spatial consistency depth recovery of dynamic scene videos captured by multi-eye synchronous cameras uses the DAISY feature descriptor proposed by Tola: E.Tola, V.Lepetit, and P.Fua.Daisy: An efficient dense descriptor applied to wide-baseline stereo.IEEE Transactions on Pattern Analysis and Machine Intelligence,32(5):815-830,2010.

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤1)和步骤2)使用了Comaniciu提出的Mean-shift技术:D.Comaniciu,P.Meer,and S.Member.Mean shift:A robust approach toward feature space analysis.IEEETransactions on Pattern Analysis and Machine Intelligence,24:603-619,2002.For step 1) and step 2) of the method for temporal and spatial consistency depth recovery of dynamic scene videos captured by multi-eye synchronous cameras, the Mean-shift technology proposed by Comaniciu is used: D.Comaniciu, P.Meer, and S.Member.Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:603-619, 2002.

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤2)使用了Rother提出的Grabcut技术:C.Rother,V.Kolmogorov,and A.Blake.”grabcut”:interactive foreground extraction using iterated graph cuts.ACMTransactions on Graphics,23:309-314,August 2004.Step 2) of the method for temporal and spatial consistency depth recovery of dynamic scene videos captured by multi-eye synchronous cameras uses the Grabcut technology proposed by Rother: C.Rother, V.Kolmogorov, and A.Blake."grabcut":interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23:309-314, August 2004.

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤1)、步骤2)和步骤3)使用了Felzenszwalb提出的能量方程优化技术:P.F.Felzenszwalb and D.P.Huttenlocher.Efficient belief propagation for early vision.International Journal of Computer Vision,70(1):41-54,2006.Step 1), step 2) and step 3) of the method for the temporal and spatial consistency depth recovery of dynamic scene videos captured by multi-eye synchronous cameras use the energy equation optimization technology proposed by Felzenszwalb: P.F.Felzenszwalb and D.P.Huttenlocher.Efficient belief propagation for early vision.International Journal of Computer Vision,70(1):41-54,2006.

发明内容 Contents of the invention

本发明的目的在于针对现有技术的不足,提供一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法。The purpose of the present invention is to provide a method for recovering the temporal-spatial consistency depth of the dynamic scene video captured by multi-eye synchronous cameras in view of the deficiencies in the prior art.

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤如下:The steps of the method for the temporal-spatial consistency depth restoration of the dynamic scene video shot by the multi-eye synchronous camera are as follows:

1)利用多视图几何方法结合DAISY特征向量,对于同一时刻的多目视频帧进行立体匹配,得到多目视频每一时刻的初始化深度图;1) Using the multi-view geometry method combined with the DAISY feature vector, perform stereo matching on the multi-view video frames at the same time, and obtain the initial depth map of the multi-view video at each time;

2)利用步骤1)得到的初始化深度图对于多目视频的每一帧图像计算动态概率图,并利用动态概率图对每帧图像进行动态像素点和静态像素点的划分;2) Use the initialized depth map obtained in step 1) to calculate a dynamic probability map for each frame of the multi-view video, and use the dynamic probability map to divide each frame of images into dynamic pixels and static pixels;

3)对于步骤2)所划分的动态像素点和静态像素点,利用不同的优化方法进行时空一致性的深度优化,对于静态像素点,利用bundle optimization方法结合多个相邻时刻的颜色和几何一致性约束进行优化;对于动态像素点,统计多个相邻时刻的多目摄像机之间对应像素点的颜色和几何一致性约束信息,由此对每一时刻动态深度值进行时空一致性优化。3) For the dynamic pixels and static pixels divided in step 2), use different optimization methods to optimize the depth of space-time consistency. For static pixels, use the bundle optimization method to combine the color and geometry consistency of multiple adjacent moments For dynamic pixels, the color and geometric consistency constraint information of corresponding pixels between multi-cameras at multiple adjacent moments are counted, so as to optimize the temporal and spatial consistency of dynamic depth values at each moment.

所述的步骤1)为:The described step 1) is:

(1)利用多视图几何方法结合DAISY特征描述符,对于同一时刻的多目视频帧进行立体匹配,通过如下能量优化方程式求解每一时刻图像帧的初始化深度图:(1) Use the multi-view geometry method combined with the DAISY feature descriptor to perform stereo matching on multi-view video frames at the same time, and solve the initialization depth map of the image frame at each time through the following energy optimization equation:

EE. DD. (( DD. mm tt ;; II ^^ (( tt )) )) == EE. dd (( DD. mm tt ;; II ^^ (( tt )) )) ++ EE. sthe s (( DD. mm tt ))

其中表示在t时刻的M个多目同步视频帧,表示第m个视频的t时刻的图像帧,表示第m个视频的t时刻的深度图;是数据项,表示中像素点与根据计算的中其余图像帧投影点之间的DAISY特征相似度,其计算公式如下:in Represents M multi-view synchronous video frames at time t, Indicates the image frame at time t of the mth video, Indicates the depth map at time t of the mth video; is a data item, representing Middle pixel and according to computational The DAISY feature similarity between the projection points of the remaining image frames in , the calculation formula is as follows:

EE. dd (( DD. mm tt ;; II ^^ (( tt )) )) == ΣΣ xx mm tt ΣΣ mm ′′ ≠≠ mm LL dd (( xx mm tt ,, DD. mm tt (( xx mm tt )) ;; II mm tt ,, II mm ′′ tt )) Mm -- 11

其中是用来估计对应像素的DAISY特征相似度的惩罚函数,表示像素点的DAISY特征描述符,利用投影至中的投影位置;是平滑项,表示相邻像素x、y之间的深度平滑程度,其计算公式如下:in is a penalty function used to estimate the DAISY feature similarity of the corresponding pixel, represent pixels The DAISY feature descriptor, yes use Project to The projection position in ; Is a smoothing item, indicating the degree of depth smoothness between adjacent pixels x, y, and its calculation formula is as follows:

EE. sthe s (( DD. mm tt )) == λλ ΣΣ xx ΣΣ ythe y ∈∈ NN (( xx )) minmin {{ || DD. mm tt (( xx )) -- DD. mm tt (( ythe y )) || ,, ηη }}

其中平滑权重λ为0.008,深度差的截断值η为3;Among them, the smoothing weight λ is 0.008, and the cut-off value η of the depth difference is 3;

(2)利用多目视频帧的初始化深度在3D空间中的一致性来判断每帧图像中的每个像素点在同一时刻其余摄像机中是否可见,从而得到同一时刻多个摄像机两两之间的可视性图;可视性图的计算公式如下:(2) Use the consistency of the initialization depth of the multi-view video frame in 3D space to judge whether each pixel in each frame of image is visible in other cameras at the same time, so as to obtain the distance between two cameras at the same time. Visibility map; the formula for calculating the visibility map is as follows:

VV mm →&Right Arrow; mm ′′ tt (( xx mm tt )) == 11 || DD. mm →&Right Arrow; mm ′′ tt (( xx mm tt )) -- DD. mm ′′ tt (( xx mm ′′ tt )) || ≤≤ δδ dd 00 || DD. mm →&Right Arrow; mm ′′ tt (( xx mm tt )) -- DD. mm ′′ tt (( xx mm ′′ tt )) || >> δδ dd

其中表示中是否可见,1表示可见,0表示不可见;δd是深度差异的阈值,是通过利用投影至上计算得到的;利用所得到的可视性图,对每个像素计算总体可视性如果在t时刻所有其余视频帧中均不可见,则为0,否则为1;in express exist Whether it is visible in , 1 means visible, 0 means invisible; δ d is the threshold of depth difference, is by using Will Project to Calculated above; using the obtained visibility map, for each pixel Calculate overall visibility if is invisible in all remaining video frames at time t, then is 0, otherwise is 1;

(3)结合所求得的可视性图重新初始化每帧图像的深度图,DAISY特征相似度仅在可见的像素格点进行比较估计;并且,当的像素点的初始化深度值出现错误的情况下,利用Mean-shift技术对每帧图像进行分割,对于每个分割区域,利用的像素点的深度来拟合参数为[a,b,c]的平面,利用拟合的平面重新定义的像素点的数据项:(3) Re-initialize the depth map of each frame image in combination with the obtained visibility map, and the DAISY feature similarity is only compared and estimated at the visible pixel grid points; and, when When the initial depth value of the pixel is wrong, the Mean-shift technology is used to segment each frame of the image. For each segmented area, the The depth of the pixel points to fit the plane with parameters [a,b,c], and use the fitted plane to redefine The data item of the pixel point:

EE. dd (( xx mm tt ,, DD. mm tt )) == ΣΣ xx mm tt σσ dd σσ dd ++ || axax ++ byby ++ cc -- DD. mm tt (( xx mm tt )) ||

其中σd用来控制数据项对于深度值与拟合平面的距离差的敏感度,x和y是像素点的坐标值;利用重新定义的数据项进行能量优化,从而纠正被遮挡像素点的错误深度值;Where σ d is used to control the sensitivity of the data item to the distance difference between the depth value and the fitting plane, and x and y are pixel points The coordinate value of ; use the redefined data item to optimize the energy, so as to correct the wrong depth value of the occluded pixel;

所述的步骤2)为:The described step 2) is:

(1)对于每帧图像中的像素点,利用初始化深度将其投影至其余时刻帧,比较像素点在当前时刻帧与其余时刻帧上的对应位置的几何与颜色的一致性,统计深度值和颜色值具有一致性的其余时刻帧数目所占的比例值,作为像素点属于动态物体的概率值,从而得到每帧图像的动态概率图,其计算公式如下:(1) For the pixels in each frame of the image, use the initialization depth Project it to the rest of the time frame, compare the geometric and color consistency of the corresponding position of the pixel point on the current time frame and the rest of the time frame, and count the proportion of the number of frames at the other time when the depth value and color value are consistent , as the probability value that the pixel belongs to the dynamic object, so as to obtain the dynamic probability map of each frame image, the calculation formula is as follows:

PP dd (( xx mm tt )) == ΣΣ (( mm ′′ ,, tt ′′ )) ∈∈ NN (( mm ,, tt )) CC mm →&Right Arrow; mm ′′ tt →&Right Arrow; tt ′′ (( xx mm tt )) == dynamicdynamic || NN (( mm ,, tt )) ||

其中启发式函数用来判断在其余帧上几何和颜色是否一致;首先比较与对应位置的深度值差异,如果上的深度值与的深度不相似,则认为几何不一致,如果的深度值相似,则比较其颜色值,如果颜色相似,则认为的颜色值一致,否则认为颜色不一致;统计具有深度值和颜色值一致性的其余时刻帧数目所占的比例,作为像素点属于动态物体的概率值;where the heuristic function used to judge in the remaining frames Whether the geometry and color are consistent; first compare with the corresponding position The difference in depth value, if exist The depth value on the The depths of are not similar, the geometry is considered inconsistent, if and have similar depth values, compare their color values, and if the colors are similar, consider and The color values of the pixels are consistent, otherwise the colors are considered to be inconsistent; the proportion of the number of frames at other times with consistent depth values and color values is counted as the probability value that the pixel belongs to a dynamic object;

(2)将动态概率图利用大小为0.4的阈值ηp进行二值化得到每帧图像的初始动态/静态分割图;利用Mean-shift技术对每帧图像进行over-segmentation,即粒度小的图像分割,对于每个分割区域统计二值化后的动态像素点数目的比例值,如果比例值大于0.5,则将整个分割区域的像素点标记为动态,否则标记为静态,由此对二值化分割图进行边界调整和去噪;(2) Binarize the dynamic probability map with a threshold η p of 0.4 to obtain the initial dynamic/static segmentation map of each frame of image; use the Mean-shift technology to perform over-segmentation on each frame of image, that is, the image with small granularity Segmentation. For each segmented area, the ratio value of the number of dynamic pixels after binarization is counted. If the ratio value is greater than 0.5, the pixels in the entire segmented area are marked as dynamic, otherwise they are marked as static, and the binarization is segmented. Figure for boundary adjustment and denoising;

(3)利用连续时刻图像之间对应像素点的坐标偏移量,将每帧图像的像素点跟踪至同一视频中的相邻时刻帧寻找对应像素点,统计对应像素点分割标记为动态的帧数目所占的比例,由此计算像素点的时域动态概率,其计算公式如下:(3) Use the coordinate offset of the corresponding pixel points between images at consecutive moments to track the pixels of each frame of image to adjacent time frames in the same video to find the corresponding pixels, and count the corresponding pixels to segment and mark the frames as dynamic The proportion of the number, and thus calculate the time-domain dynamic probability of the pixel point, the calculation formula is as follows:

PP dd ′′ (( xx mm tt )) == ΣΣ tt ′′ ∈∈ NN (( tt )) SS mm tt ′′ (( xx mm tt ++ Oo mm tt →&Right Arrow; tt ′′ (( xx mm tt )) )) == dynamicdynamic || NN (( tt )) ||

其中表示从t至t′时刻的光流偏移量,表示在t′时刻对应像素点的动态/静态分割标记,N(t)表示t前后连续5个相邻时刻帧;利用时域动态概率,通过如下能量优化方程式优化每一时刻图像帧的动态/静态分割图:in express The optical flow offset from time t to t′, express The dynamic/static segmentation mark corresponding to the pixel at time t′, N(t) represents 5 consecutive adjacent time frames before and after t; using the dynamic probability in time domain, optimize the dynamic/static image frame at each time through the following energy optimization equation Split graph:

EE. SS (( SS mm tt ;; PP dd ′′ ,, II mm tt )) == EE. dd (( SS mm tt ;; PP dd ′′ )) ++ EE. sthe s (( SS mm tt ;; II mm tt ))

其中表示视频m在第t帧的动态/静态分割图;数据项Ed的定义如下:in Represents the dynamic/static segmentation map of the tth frame of video m; the definition of data item E d is as follows:

EE. dd (( SS mm tt ;; PP dd ′′ )) == ΣΣ xx mm tt ee dd (( SS mm tt (( xx mm tt )) ))

ee dd (( SS mm tt (( xx mm tt )) )) == -- loglog (( 11 -- PP dd ′′ (( xx mm tt )) )) SS mm tt (( xx mm tt )) == staticstatic -- loglog (( PP dd ′′ (( xx mm tt )) )) SS mm tt (( xx mm tt )) == dynamicdynamic

平滑项Es促使分割边界与图像边界尽可能一致,其定义如下:The smoothing term E s promotes the segmentation boundary to be as consistent as possible with the image boundary, which is defined as follows:

EE. sthe s (( SS mm tt ;; II mm tt )) == λλ ΣΣ xx ΣΣ ythe y ∈∈ NN (( xx )) || SS mm tt (( xx )) -- SS mm tt (( ythe y )) || 11 ++ || || II mm tt (( xx )) -- II mm tt (( ythe y )) || || 22

对于经能量优化后的动态/静态分割图,利用Grabcut分割技术进行进一步优化,除去分割边界上的毛刺,得到最终时序上一致动态/静态划分;For the energy-optimized dynamic/static segmentation map, use the Grabcut segmentation technology to further optimize, remove the burrs on the segmentation boundary, and obtain a consistent dynamic/static division in the final timing;

所述的步骤3)为:Said step 3) is:

(1)对于静态像素点,利用bundle optimization方法统计当前时刻帧像素点和多目视频多个相邻时刻帧上对应像素点之间的颜色和几何一致性约束信息,由此对当前时刻静态深度值进行优化;(1) For static pixels, use the bundle optimization method to count the color and geometric consistency constraint information between the pixels in the frame at the current moment and the corresponding pixels in multiple adjacent frames of the multi-view video, so as to calculate the static depth at the current moment value is optimized;

(2)对于动态像素点假设其候选深度为d,首先根据d将其投影至同一时刻t的视频m,得到对应像素点比较的颜色与几何一致性,其计算公式如下:(2) For dynamic pixels Assuming that the candidate depth is d, first project it to the video m at the same time t according to d, and get the corresponding pixel Compare and The color and geometric consistency of , its calculation formula is as follows:

LL gg (( xx mm tt ,, xx mm ′′ tt )) == pp cc (( xx mm tt ,, xx mm ′′ tt )) pp gg (( xx mm tt ,, xx mm ′′ tt ))

其中估计的颜色一致性,其计算公式如下:in estimate and The color consistency, its calculation formula is as follows:

pp cc (( xx mm tt ,, xx mm ′′ tt )) == σσ cc σσ cc ++ || || II mm tt (( xx mm tt )) -- II mm ′′ tt (( xx mm ′′ tt )) || || 11

σc控制颜色差异的敏感度,σ c controls the sensitivity to color differences,

估计的几何一致性,其计算公式如下: estimate and The geometric consistency of , its calculation formula is as follows:

pp gg (( xx mm tt ,, xx mm ′′ tt )) == σσ ww σσ gg ++ dd gg (( xx mm tt ,, xx mm ′′ tt ;; DD. mm tt ,, DD. mm ′′ tt ))

σg控制深度差异的敏感度,对称投影误差计算函数dg投影至同一时刻t的视频m′的投影位置并计算其与的距离,同时计算投影至t时刻m视频的投影位置与的距离,然后计算两者的平均距离;σ g controls the sensitivity to depth differences, and the symmetric projection error calculation function d g will be Projected to the projection position of the video m' at the same time t and calculate its and distance, while calculating Projected to the projection position of m video at time t and , and then calculate the average distance between the two;

接下来,利用利用光流将跟踪至相邻时刻t′得到对应像素点比较的颜色与几何一致性,其计算公式如下:Next, using optical flow to and Track to the adjacent time t' to get the corresponding pixel and Compare and The color and geometric consistency of , its calculation formula is as follows:

LL gg (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ )) == pp cc (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ )) pp gg (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ ))

累积多个相邻时刻的颜色与几何一致性估计值,由此重新定义对于动态像素点深度优化的能量方程数据项:Accumulate the color and geometric consistency estimates of multiple adjacent moments, thereby redefining the energy equation data item optimized for dynamic pixel depth:

EE. dd ′′ (( DD. mm tt ;; II ^^ ,, DD. ^^ )) == ΣΣ xx mm tt 11 -- ΣΣ tt ′′ ∈∈ NN (( tt )) ΣΣ mm ′′ ≠≠ mm LL gg (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ )) (( Mm -- 11 )) || NN (( tt )) ||

利用重新定义的数据项进行能量优化方程式求解,从而在时空域上优化每帧图像中的动态像素点深度值。Use the redefined data items to solve the energy optimization equation, so as to optimize the dynamic pixel depth value in each frame image in the temporal and spatial domain.

本发明对于视频场景中包含的动态物体来说,现有的深度恢复方法都很难达到较高的精度及时空域上的一致性,这些方法通常要求较多个固定放置的同步摄像机对场景进行捕获,这种拍摄方法更多是被应用于实验室内动态场景的拍摄工作,实际拍摄过程中这种拍摄模式会有很多限制;本发明所提出的一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法能够对于多目视频中的动态和静态物体恢复每一时刻的准确深度图,亦能够保持深度图在多个时刻之间的高度一致性。此方法允许多目摄像机自由独立地运动,并允许较少数目(仅2个)的摄像机拍摄的动态场景,在实际拍摄过程中更为实用。For the dynamic objects included in the video scene, it is difficult for the existing depth recovery methods to achieve high precision and consistency in the spatial domain. These methods usually require more fixedly placed synchronous cameras to capture the scene , this shooting method is more applied to the shooting of dynamic scenes in the laboratory, and this shooting mode will have many restrictions in the actual shooting process; a dynamic scene video shot by multi-eye synchronous cameras proposed by the present invention The spatio-temporal consistency depth recovery method can restore accurate depth maps at each moment for dynamic and static objects in multi-view videos, and can also maintain a high degree of consistency of depth maps between multiple moments. This method allows the multi-cameras to move freely and independently, and allows a small number (only 2) of cameras to capture dynamic scenes, which is more practical in the actual shooting process.

附图说明 Description of drawings

图1是对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法流程图;Fig. 1 is the method flowchart of the spatio-temporal consistency depth restoration of the dynamic scene video that multi-purpose synchronous camera shoots;

图2(a)是视频序列的一帧图像;Figure 2(a) is a frame image of a video sequence;

图2(b)是与图2(a)同步的一帧图像;Fig. 2 (b) is a frame image synchronized with Fig. 2 (a);

图2(c)是图2(a)的初始化深度图;Figure 2(c) is the initialization depth map of Figure 2(a);

图2(d)是利用图2(a)和图2(b)估计出的可视性图;Figure 2(d) is the estimated visibility map using Figure 2(a) and Figure 2(b);

图2(e)是利用图2(d)进行平面拟合纠正的初始化深度图;Figure 2(e) is an initialized depth map using Figure 2(d) for plane fitting correction;

图3(a)是图2(a)的动态概率图;Fig. 3 (a) is the dynamic probability diagram of Fig. 2 (a);

图3(b)是图3(a)经过二值化并利用Mean-shift分割进行边界调整及去噪后的动态/静态分割图;Figure 3(b) is the dynamic/static segmentation diagram of Figure 3(a) after binarization and boundary adjustment and denoising using Mean-shift segmentation;

图3(c)是经过时域上优化的分割图;Figure 3(c) is a segmentation diagram optimized in the time domain;

图3(d)是经过Grabcut技术优化的分割图;Figure 3(d) is a segmentation map optimized by Grabcut technology;

图3(e)是图3(a-d)中方框区域的局部放大图;Figure 3(e) is a partial enlarged view of the framed area in Figure 3(a-d);

图4(a)是视频序列的一帧图像;Fig. 4 (a) is a frame image of video sequence;

图4(b)是图4(a)的动态/静态分割图;Figure 4(b) is the dynamic/static segmentation diagram of Figure 4(a);

图4(c)是图4(a)经时空一致性优化后的深度图;Figure 4(c) is the depth map of Figure 4(a) after spatio-temporal consistency optimization;

图4(d)是图4(a)和图4(c)中方框区域的局部放大图;Figure 4(d) is a partial enlarged view of the framed area in Figure 4(a) and Figure 4(c);

图4(e)是视频序列的另一帧图像;Fig. 4 (e) is another frame image of video sequence;

图4(f)是图4(e)经时空一致性优化的深度图结果;Figure 4(f) is the depth map result optimized by spatio-temporal consistency in Figure 4(e);

图4(g)是利用图4(f)重建出的3D场景模型以及纹理映射后的结果;Figure 4(g) is the 3D scene model reconstructed using Figure 4(f) and the result of texture mapping;

图5是时空一致性深度优化的示意图。Fig. 5 is a schematic diagram of spatio-temporal consistency depth optimization.

具体实施方式 Detailed ways

对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤如下:The steps of the method for the temporal-spatial consistency depth restoration of the dynamic scene video shot by the multi-eye synchronous camera are as follows:

1)利用多视图几何方法结合DAISY特征向量,对于同一时刻的多目视频帧进行立体匹配,得到多目视频每一时刻的初始化深度图;1) Using the multi-view geometry method combined with the DAISY feature vector, perform stereo matching on the multi-view video frames at the same time, and obtain the initial depth map of the multi-view video at each time;

2)利用步骤1)得到的初始化深度图对于多目视频的每一帧图像计算动态概率图,并利用动态概率图对每帧图像进行动态像素点和静态像素点的划分;2) Use the initialized depth map obtained in step 1) to calculate a dynamic probability map for each frame of the multi-view video, and use the dynamic probability map to divide each frame of images into dynamic pixels and static pixels;

3)对于步骤2)所划分的动态像素点和静态像素点,利用不同的优化方法进行时空一致性的深度优化,对于静态像素点,利用bundle optimization方法结合多个相邻时刻的颜色和几何一致性约束进行优化;对于动态像素点,统计多个相邻时刻的多目摄像机之间对应像素点的颜色和几何一致性约束信息,由此对每一时刻动态深度值进行时空一致性优化。3) For the dynamic pixels and static pixels divided in step 2), use different optimization methods to optimize the depth of space-time consistency. For static pixels, use the bundle optimization method to combine the color and geometry consistency of multiple adjacent moments For dynamic pixels, the color and geometric consistency constraint information of corresponding pixels between multi-cameras at multiple adjacent moments are counted, so as to optimize the temporal and spatial consistency of dynamic depth values at each moment.

所述的步骤1)为:The described step 1) is:

(1)利用多视图几何方法结合DAISY特征描述符,对于同一时刻的多目视频帧进行立体匹配,通过如下能量优化方程式求解每一时刻图像帧的初始化深度图:(1) Use the multi-view geometry method combined with the DAISY feature descriptor to perform stereo matching on multi-view video frames at the same time, and solve the initialization depth map of the image frame at each time through the following energy optimization equation:

EE. DD. (( DD. mm tt ;; II ^^ (( tt )) )) == EE. dd (( DD. mm tt ;; II ^^ (( tt )) )) ++ EE. sthe s (( DD. mm tt ))

其中表示在t时刻的M个多目同步视频帧,表示第m个视频的t时刻的图像帧,表示第m个视频的t时刻的深度图;是数据项,表示中像素点与根据计算的中其余图像帧投影点之间的DAISY特征相似度,其计算公式如下:in Represents M multi-view synchronous video frames at time t, Indicates the image frame at time t of the mth video, Indicates the depth map at time t of the mth video; is a data item, representing Middle pixel and according to computational The DAISY feature similarity between the projection points of the remaining image frames in , the calculation formula is as follows:

EE. dd (( DD. mm tt ;; II ^^ (( tt )) )) == ΣΣ xx mm tt ΣΣ mm ′′ ≠≠ mm LL dd (( xx mm tt ,, DD. mm tt (( xx mm tt )) ;; II mm tt ,, II mm ′′ tt )) Mm -- 11

其中是用来估计对应像素的DAISY特征相似度的惩罚函数,表示像素点的DAISY特征描述符,利用投影至中的投影位置;是平滑项,表示相邻像素x、y之间的深度平滑程度,其计算公式如下:in is a penalty function used to estimate the DAISY feature similarity of the corresponding pixel, represent pixels The DAISY feature descriptor, yes use Project to The projection position in ; Is a smoothing item, indicating the degree of depth smoothness between adjacent pixels x, y, and its calculation formula is as follows:

EE. sthe s (( DD. mm tt )) == λλ ΣΣ xx ΣΣ ythe y ∈∈ NN (( xx )) minmin {{ || DD. mm tt (( xx )) -- DD. mm tt (( ythe y )) || ,, ηη }}

其中平滑权重λ为0.008,深度差的截断值η为3;Among them, the smoothing weight λ is 0.008, and the cut-off value η of the depth difference is 3;

(2)利用多目视频帧的初始化深度在3D空间中的一致性来判断每帧图像中的每个像素点在同一时刻其余摄像机中是否可见,从而得到同一时刻多个摄像机两两之间的可视性图;可视性图的计算公式如下:(2) Use the consistency of the initialization depth of the multi-view video frame in 3D space to judge whether each pixel in each frame of image is visible in other cameras at the same time, so as to obtain the distance between two cameras at the same time. Visibility map; the formula for calculating the visibility map is as follows:

VV mm →&Right Arrow; mm ′′ tt (( xx mm tt )) == 11 || DD. mm →&Right Arrow; mm ′′ tt (( xx mm tt )) -- DD. mm ′′ tt (( xx mm ′′ tt )) || ≤≤ δδ dd 00 || DD. mm →&Right Arrow; mm ′′ tt (( xx mm tt )) -- DD. mm ′′ tt (( xx mm ′′ tt )) || >> δδ dd

其中表示中是否可见,1表示可见,0表示不可见;δd是深度差异的阈值,是通过利用投影至上计算得到的;利用所得到的可视性图,对每个像素计算总体可视性如果在t时刻所有其余视频帧中均不可见,则为0,否则为1;in express exist Whether it is visible in , 1 means visible, 0 means invisible; δ d is the threshold of depth difference, is by using Will Project to Calculated above; using the obtained visibility map, for each pixel Calculate overall visibility if is invisible in all remaining video frames at time t, then is 0, otherwise is 1;

(3)结合所求得的可视性图重新初始化每帧图像的深度图,DAISY特征相似度仅在可见的像素格点进行比较估计;并且,当的像素点的初始化深度值出现错误的情况下,利用Mean-shift技术对每帧图像进行分割,对于每个分割区域,利用的像素点的深度来拟合参数为[a,b,c]的平面,利用拟合的平面重新定义的像素点的数据项:(3) Re-initialize the depth map of each frame image in combination with the obtained visibility map, and the DAISY feature similarity is only compared and estimated at the visible pixel grid points; and, when When the initial depth value of the pixel is wrong, the Mean-shift technology is used to segment each frame of the image. For each segmented area, the The depth of the pixel points to fit the plane with parameters [a,b,c], and use the fitted plane to redefine The data item of the pixel point:

EE. dd (( xx mm tt ,, DD. mm tt )) == ΣΣ xx mm tt σσ dd σσ dd ++ || axax ++ byby ++ cc -- DD. mm tt (( xx mm tt )) ||

其中σd用来控制数据项对于深度值与拟合平面的距离差的敏感度,x和y是像素点的坐标值;利用重新定义的数据项进行能量优化,从而纠正被遮挡像素点的错误深度值;Where σ d is used to control the sensitivity of the data item to the distance difference between the depth value and the fitting plane, and x and y are pixel points The coordinate value of ; use the redefined data item to optimize the energy, so as to correct the wrong depth value of the occluded pixel;

所述的步骤2)为:The described step 2) is:

(1)对于每帧图像中的像素点,利用初始化深度将其投影至其余时刻帧,比较像素点在当前时刻帧与其余时刻帧上的对应位置的几何与颜色的一致性,统计深度值和颜色值具有一致性的其余时刻帧数目所占的比例值,作为像素点属于动态物体的概率值,从而得到每帧图像的动态概率图,其计算公式如下:(1) For the pixels in each frame of the image, use the initialization depth Project it to the rest of the time frame, compare the geometric and color consistency of the corresponding position of the pixel point on the current time frame and the rest of the time frame, and count the proportion of the number of frames at the other time when the depth value and color value are consistent , as the probability value that the pixel belongs to the dynamic object, so as to obtain the dynamic probability map of each frame image, the calculation formula is as follows:

PP dd (( xx mm tt )) == ΣΣ (( mm ′′ ,, tt ′′ )) ∈∈ NN (( mm ,, tt )) CC mm →&Right Arrow; mm ′′ tt →&Right Arrow; tt ′′ (( xx mm tt )) == dynamicdynamic || NN (( mm ,, tt )) ||

其中启发式函数用来判断在其余帧上几何和颜色是否一致;首先比较与对应位置的深度值差异,如果上的深度值与的深度不相似,则认为几何不一致,如果的深度值相似,则比较其颜色值,如果颜色相似,则认为的颜色值一致,否则认为颜色不一致;统计具有深度值和颜色值一致性的其余时刻帧数目所占的比例,作为像素点属于动态物体的概率值;where the heuristic function used to judge in the remaining frames Whether the geometry and color are consistent; first compare with the corresponding position The difference in depth value, if exist The depth value on the The depths of are not similar, the geometry is considered inconsistent, if and have similar depth values, compare their color values, and if the colors are similar, consider and The color values of the pixels are consistent, otherwise the colors are considered to be inconsistent; the proportion of the number of frames at other times with consistent depth values and color values is counted as the probability value that the pixel belongs to a dynamic object;

(2)将动态概率图利用大小为0.4的阈值ηp进行二值化得到每帧图像的初始动态/静态分割图;利用Mean-shift技术对每帧图像进行over-segmentation,即粒度小的图像分割,对于每个分割区域统计二值化后的动态像素点数目的比例值,如果比例值大于0.5,则将整个分割区域的像素点标记为动态,否则标记为静态,由此对二值化分割图进行边界调整和去噪;(2) Binarize the dynamic probability map with a threshold η p of 0.4 to obtain the initial dynamic/static segmentation map of each frame of image; use the Mean-shift technology to perform over-segmentation on each frame of image, that is, the image with small granularity Segmentation. For each segmented area, the ratio value of the number of dynamic pixels after binarization is counted. If the ratio value is greater than 0.5, the pixels in the entire segmented area are marked as dynamic, otherwise they are marked as static, and the binarization is segmented. Figure for boundary adjustment and denoising;

(3)利用连续时刻图像之间对应像素点的坐标偏移量,将每帧图像的像素点跟踪至同一视频中的相邻时刻帧寻找对应像素点,统计对应像素点分割标记为动态的帧数目所占的比例,由此计算像素点的时域动态概率,其计算公式如下:(3) Use the coordinate offset of corresponding pixels between images at consecutive moments to track the pixels of each frame of images to adjacent frames in the same video to find the corresponding pixels, and count the corresponding pixels to segment and mark as dynamic frames The proportion of the number, and thus calculate the time-domain dynamic probability of the pixel point, the calculation formula is as follows:

PP dd ′′ (( xx mm tt )) == ΣΣ tt ′′ ∈∈ NN (( tt )) SS mm tt ′′ (( xx mm tt ++ Oo mm tt →&Right Arrow; tt ′′ (( xx mm tt )) )) == dynamicdynamic || NN (( tt )) ||

其中表示从t至t′时刻的光流偏移量,表示在t′时刻对应像素点的动态/静态分割标记,N(t)表示t前后连续5个相邻时刻帧;利用时域动态概率,通过如下能量优化方程式优化每一时刻图像帧的动态/静态分割图:in express The optical flow offset from time t to t′, express The dynamic/static segmentation mark corresponding to the pixel at time t′, N(t) represents 5 consecutive adjacent time frames before and after t; using the dynamic probability in time domain, optimize the dynamic/static image frame at each time through the following energy optimization equation Split graph:

EE. SS (( SS mm tt ;; PP dd ′′ ,, II mm tt )) == EE. dd (( SS mm tt ;; PP dd ′′ )) ++ EE. sthe s (( SS mm tt ;; II mm tt ))

其中表示视频m在第t帧的动态/静态分割图;数据项Ed的定义如下:in Represents the dynamic/static segmentation map of the tth frame of video m; the definition of data item E d is as follows:

EE. dd (( SS mm tt ;; PP dd ′′ )) == ΣΣ xx mm tt ee dd (( SS mm tt (( xx mm tt )) ))

ee dd (( SS mm tt (( xx mm tt )) )) == -- loglog (( 11 -- PP dd ′′ (( xx mm tt )) )) SS mm tt (( xx mm tt )) == staticstatic -- loglog (( PP dd ′′ (( xx mm tt )) )) SS mm tt (( xx mm tt )) == dynamicdynamic

平滑项Es促使分割边界与图像边界尽可能一致,其定义如下:The smoothing term E s promotes the segmentation boundary to be as consistent as possible with the image boundary, which is defined as follows:

EE. sthe s (( SS mm tt ;; II mm tt )) == λλ ΣΣ xx ΣΣ ythe y ∈∈ NN (( xx )) || SS mm tt (( xx )) -- SS mm tt (( ythe y )) || 11 ++ || || II mm tt (( xx )) -- II mm tt (( ythe y )) || || 22

对于经能量优化后的动态/静态分割图,利用Grabcut分割技术进行进一步优化,除去分割边界上的毛刺,得到最终时序上一致动态/静态划分;For the energy-optimized dynamic/static segmentation map, use the Grabcut segmentation technology to further optimize, remove the burrs on the segmentation boundary, and obtain a consistent dynamic/static division in the final timing;

所述的步骤3)为:Said step 3) is:

(1)对于静态像素点,利用bundle optimization方法统计当前时刻帧像素点和多目视频多个相邻时刻帧上对应像素点之间的颜色和几何一致性约束信息,由此对当前时刻静态深度值进行优化;(1) For static pixels, use the bundle optimization method to count the color and geometric consistency constraint information between the pixels in the frame at the current moment and the corresponding pixels in multiple adjacent frames of the multi-view video, so as to calculate the static depth at the current moment value is optimized;

(2)对于动态像素点假设其候选深度为d,首先根据d将其投影至同一时刻t的视频m′,得到对应像素点比较的颜色与几何一致性,其计算公式如下:(2) For dynamic pixels Assuming that the candidate depth is d, first project it to the video m' at the same time t according to d, and obtain the corresponding pixel Compare and The color and geometric consistency of , its calculation formula is as follows:

LL gg (( xx mm tt ,, xx mm ′′ tt )) == pp cc (( xx mm tt ,, xx mm ′′ tt )) pp gg (( xx mm tt ,, xx mm ′′ tt ))

其中估计的颜色一致性,其计算公式如下:in estimate and The color consistency, its calculation formula is as follows:

pp cc (( xx mm tt ,, xx mm ′′ tt )) == σσ cc σσ cc ++ || || II mm tt (( xx mm tt )) -- II mm ′′ tt (( xx mm ′′ tt )) || || 11

σc控制颜色差异的敏感度,σ c controls the sensitivity to color differences,

估计的几何一致性,其计算公式如下: estimate and The geometric consistency of , its calculation formula is as follows:

pp gg (( xx mm tt ,, xx mm ′′ tt )) == σσ ww σσ gg ++ dd gg (( xx mm tt ,, xx mm ′′ tt ;; DD. mm tt ,, DD. mm ′′ tt ))

σg控制深度差异的敏感度,对称投影误差计算函数dg投影至同一时刻t的视频m′的投影位置并计算其与的距离,同时计算投影至t时刻m视频的投影位置与的距离,然后计算两者的平均距离;σ g controls the sensitivity to depth differences, and the symmetric projection error calculation function d g will be Projected to the projection position of the video m' at the same time t and calculate its and distance, while calculating Projected to the projection position of m video at time t and , and then calculate the average distance between the two;

接下来,利用利用光流将跟踪至相邻时刻t′得到对应像素点比较的颜色与几何一致性,其计算公式如下:Next, using optical flow to and Track to the adjacent time t' to get the corresponding pixel and Compare and The color and geometric consistency of , its calculation formula is as follows:

LL gg (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ )) == pp cc (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ )) pp gg (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ ))

累积多个相邻时刻的颜色与几何一致性估计值,由此重新定义对于动态像素点深度优化的能量方程数据项:Accumulate the color and geometric consistency estimates of multiple adjacent moments, thereby redefining the energy equation data item optimized for dynamic pixel depth:

EE. dd ′′ (( DD. mm tt ;; II ^^ ,, DD. ^^ )) == ΣΣ xx mm tt 11 -- ΣΣ tt ′′ ∈∈ NN (( tt )) ΣΣ mm ′′ ≠≠ mm LL gg (( xx ^^ mm tt ′′ ,, xx ^^ mm ′′ tt ′′ )) (( Mm -- 11 )) || NN (( tt )) ||

利用重新定义的数据项进行能量优化方程式求解,从而在时空域上优化每帧图像中的动态像素点深度值。Use the redefined data items to solve the energy optimization equation, so as to optimize the dynamic pixel depth value in each frame image in the temporal and spatial domain.

实施例Example

如图1所示,对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法的步骤如下:As shown in Figure 1, the steps of the method for the temporal-spatial consistency depth restoration of the dynamic scene video captured by the multi-eye synchronous camera are as follows:

1)利用多视图几何方法结合DAISY特征向量,对于同一时刻的多目视频帧进行立体匹配,得到多目视频每一时刻的初始化深度图;1) Using the multi-view geometry method combined with the DAISY feature vector, perform stereo matching on the multi-view video frames at the same time, and obtain the initial depth map of the multi-view video at each time;

2)利用步骤1)得到的初始化的深度图对于多目视频的每一帧图像计算动态概率图,并利用动态概率图对每帧图像的像素点进行动态/静态的分类;2) Use the initialized depth map obtained in step 1) to calculate the dynamic probability map for each frame of the multi-view video, and use the dynamic probability map to classify the pixels of each frame image dynamically/statically;

3)对于步骤2)所划分的动态和静态像素点,利用不同的优化方法进行时空一致性的深度优化,对于静态点,利用bundle optimization方法结合多个相邻时刻的颜色和几何一致性约束进行优化;对于动态点,统计多个相邻时刻的多目摄像机之间对应像素点的颜色和几何一致性约束信息,由此对每一时刻动态深度值进行时空一致性优化。3) For the dynamic and static pixels divided in step 2), use different optimization methods to optimize the depth of space-time consistency. For static points, use the bundle optimization method combined with color and geometric consistency constraints at multiple adjacent moments. Optimization; for dynamic points, the color and geometric consistency constraint information of corresponding pixel points between multiple adjacent multi-camera cameras are counted, so as to optimize the temporal and spatial consistency of the dynamic depth value at each moment.

所述的步骤1)为:The step 1) described is:

(1)利用多视图几何方法结合DAISY特征描述符,对于如图2(a)和图2(b)所示的同一时刻的双目视频帧进行立体匹配,通过能量优化方程式求解每一时刻图像帧的初始化深度图,如图2(c)所示;(1) Use the multi-view geometric method combined with the DAISY feature descriptor to perform stereo matching on the binocular video frames at the same time as shown in Figure 2(a) and Figure 2(b), and solve the image at each time through the energy optimization equation The initial depth map of the frame, as shown in Figure 2(c);

(2)利用多目视频帧的初始化深度在3D空间中的一致性来判断每帧图像中的每个像素点在同一时刻其余摄像机中是否可见,从而得到同一时刻多个摄像机两两之间的可视性图,如图2(d)所示;(2) Use the consistency of the initialization depth of the multi-view video frame in 3D space to judge whether each pixel in each frame of image is visible in the other cameras at the same time, so as to obtain the distance between two cameras at the same time. Visibility diagram, as shown in Figure 2(d);

(3)结合所求得的可视性图重新初始化每帧图像的深度图,DAISY特征相似度仅在可见的像素格点进行比较估计;并且,当不可见像素点的初始化深度值出现错误的情况下,利用Mean-shift技术对每帧图像进行分割,对于每个分割区域,利用可见像素点的深度来拟合平面,利用拟合的平面填补纠正不可见像素点的深度值,如图2(e)所示;(3) Re-initialize the depth map of each frame image in combination with the obtained visibility map, and the DAISY feature similarity is only compared and estimated at the visible pixel grid points; and, when the initialization depth value of the invisible pixel point is wrong In this case, the Mean-shift technology is used to segment each frame of image. For each segmented area, the depth of visible pixels is used to fit the plane, and the fitted plane is used to fill and correct the depth value of invisible pixels, as shown in Figure 2 as shown in (e);

所述的步骤2)为:The described step 2) is:

(1)对于每帧图像中的像素点,利用初始化深度将其投影至其余时刻帧,比较像素点在当前时刻帧与其余时刻帧上的对应位置的几何与颜色的一致性,统计深度值和颜色值具有一致性的其余时刻帧数目所占的比例值,作为像素点属于动态物体的概率值,从而得到每帧图像的动态概率图,如图3(a)所示;(1) For the pixels in each frame of the image, use the initialized depth to project them to the other time frames, compare the geometric and color consistency of the corresponding positions of the pixels in the current time frame and the other time frames, and calculate the depth value and The proportion value of the number of frames at the rest of the time when the color value is consistent is used as the probability value that the pixel point belongs to the dynamic object, so as to obtain the dynamic probability map of each frame image, as shown in Figure 3 (a);

(2)将动态概率图二值化得到每帧图像的初始动态/静态分割图;利用Mean-shift技术对每帧图像进行over-segmentation,即粒度小的图像分割,对于每个分割区域统计二值化后的动态像素点数目的比例值,如果比例值大于0.5,则将整个分割区域的像素点标记为动态,否则标记为静态,由此对二值化分割图进行边界调整和去噪,如图3(b)所示;(2) Binarize the dynamic probability map to obtain the initial dynamic/static segmentation map of each frame image; use the Mean-shift technology to perform over-segmentation on each frame image, that is, image segmentation with small granularity, and count two The ratio value of the number of dynamic pixels after valueization. If the ratio value is greater than 0.5, the pixels in the entire segmented area will be marked as dynamic, otherwise they will be marked as static, thereby performing boundary adjustment and denoising on the binarized segmentation map, such as As shown in Figure 3(b);

(3)利用连续时刻图像之间对应像素点的坐标偏移量,将每帧图像的像素点跟踪至同一视频中的相邻时刻帧寻找对应像素点,统计对应像素点分割标记为动态的帧数目所占的比例,由此计算像素点的时域动态概率,通过能量优化方程式优化每一时刻图像帧的动态/静态分割图,如图3(c)所示;对于图3(c),利用Grabcut分割技术进行进一步优化,除去分割边界上的毛刺,得到最终时序上一致动态/静态划分,如图3(d)所示;(3) Use the coordinate offset of the corresponding pixel points between images at consecutive moments to track the pixels of each frame of image to adjacent time frames in the same video to find the corresponding pixels, and count the corresponding pixels to segment and mark as dynamic frames The proportion of the number, thus calculating the temporal dynamic probability of pixels, and optimizing the dynamic/static segmentation diagram of the image frame at each moment through the energy optimization equation, as shown in Figure 3(c); for Figure 3(c), Use the Grabcut segmentation technology for further optimization, remove the burrs on the segmentation boundary, and obtain a consistent dynamic/static division in the final timing, as shown in Figure 3(d);

所述的步骤3)为:Said step 3) is:

(1)对于静态点,利用bundle optimization方法统计当前时刻帧像素点和多目视频多个相邻时刻帧上对应像素点之间的颜色和几何一致性约束信息,由此对当前时刻静态深度值进行优化;(1) For static points, use the bundle optimization method to count the color and geometric consistency constraint information between the pixel points of the current frame and the corresponding pixels on multiple adjacent frames of the multi-view video, and thus calculate the current static depth value optimize;

(2)对于动态点的时空一致性深度优化方法如图5所示,假设像素点的候选深度为d,首先根据d将其投影至同一时刻t的视频m′,得到对应像素点比较的颜色与几何一致性;接下来,利用利用光流将跟踪至相邻时刻t′得到对应像素点比较的颜色与几何一致性;累积多个相邻时刻的颜色与几何一致性估计值,由此在时空域上利用能量优化方程式优化每帧图像中的动态像素点深度值,得到时空域上一致的深度图,如图4(c)和图4(f)所示。(2) The spatiotemporal consistency depth optimization method for dynamic points is shown in Figure 5, assuming that the pixel points The candidate depth of is d, first project it to the video m′ at the same time t according to d, and get the corresponding pixel Compare and The color and geometric consistency of ; Next, use the optical flow to and Track to the adjacent time t' to get the corresponding pixel and Compare and The color and geometric consistency of multiple adjacent moments are accumulated, and the energy optimization equation is used to optimize the dynamic pixel depth value in each frame of image in the temporal and spatial domain, and the consistent Depth maps, as shown in Figure 4(c) and Figure 4(f).

Claims (4)

1.一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法,其特征在于它的步骤如下:  1. a method for the spatio-temporal consistency depth restoration of the dynamic scene video of multi-eye synchronous camera shooting, it is characterized in that its steps are as follows: 1)利用多视图几何方法结合DAISY特征向量,对于同一时刻的多目视频帧进行立体匹配,得到多目视频每一时刻的初始化深度图;  1) Using the multi-view geometry method combined with the DAISY feature vector, perform stereo matching on the multi-view video frames at the same moment, and obtain the initial depth map of each moment of the multi-view video; 2)利用步骤1)得到的初始化深度图对于多目视频的每一帧图像计算动态概率图,并利用动态概率图对每帧图像进行动态像素点和静态像素点的划分;  2) Using the initialization depth map obtained in step 1) to calculate a dynamic probability map for each frame of multi-view video, and use the dynamic probability map to divide each frame of images into dynamic pixels and static pixels; 3)对于步骤2)所划分的动态像素点和静态像素点,利用不同的优化方法进行时空一致性的深度优化,对于静态像素点,利用bundle optimization方法结合多个相邻时刻的颜色和几何一致性约束进行优化;对于动态像素点,统计多个相邻时刻的多目摄像机之间对应像素点的颜色和几何一致性约束信息,由此对每一时刻动态深度值进行时空一致性优化。  3) For the dynamic pixels and static pixels divided in step 2), use different optimization methods to optimize the depth of space-time consistency. For static pixels, use the bundle optimization method to combine the color and geometric consistency of multiple adjacent moments For dynamic pixels, the color and geometric consistency constraint information of corresponding pixels between multi-cameras at multiple adjacent moments are counted, so as to optimize the temporal and spatial consistency of dynamic depth values at each moment. the 2.根据权利要求1中所述的一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法,其特征在于所述的步骤1)为:  2. according to a kind of method for the space-time consistency depth recovery of the dynamic scene video of multi-purpose synchronous camera shooting according to claim 1, it is characterized in that described step 1) is: (1)利用多视图几何方法结合DAISY特征描述符,对于同一时刻的多目视频帧进行立体匹配,通过如下能量优化方程式求解每一时刻图像帧的初始化深度图:  (1) Use the multi-view geometry method combined with the DAISY feature descriptor to perform stereo matching on the multi-view video frames at the same time, and solve the initialization depth map of the image frame at each time through the following energy optimization equation: 其中表示在t时刻的M个多目同步视频帧,表示第m个视频的t时刻的图像帧,表示第m个视频的t时刻的深度图;是数据项,表示中像素点与根据计算的中其余图像帧投影点之间的DAISY特征相似度,其计算公式如下:  in Represents M multi-view synchronous video frames at time t, Indicates the image frame at time t of the mth video, Indicates the depth map at time t of the mth video; is a data item, representing Middle pixel and according to computational The DAISY feature similarity between the projection points of the remaining image frames in , the calculation formula is as follows: 其中是用来估计对应像素的DAISY特征相似度的惩罚函数,表示像素点的DAISY特征描述符,利用 投影至中的投影位置;是平滑项,表示相邻像素x、y之间的深度平滑程度,其计算公式如下:  in is a penalty function used to estimate the DAISY feature similarity of the corresponding pixel, represent pixels The DAISY feature descriptor, yes use Project to The projection position in ; Is a smoothing item, indicating the degree of depth smoothness between adjacent pixels x, y, and its calculation formula is as follows: 其中平滑权重λ为0.008,深度差的截断值η为3;  Among them, the smoothing weight λ is 0.008, and the cut-off value η of the depth difference is 3; (2)利用多目视频帧的初始化深度在3D空间中的一致性来判断每帧图像中的每个像素点在同一时刻其余摄像机中是否可见,从而得到同一时刻多个摄像机两两之间的可视性图;可视性图的计算公式如下:  (2) Use the consistency of the initialization depth of the multi-view video frame in 3D space to judge whether each pixel in each frame of image is visible in other cameras at the same time, so as to obtain the distance between two cameras at the same time Visibility map; the formula for calculating the visibility map is as follows: 其中表示中是否可见,1表示可见,0表示不可见;δd是深度差异的阈值,是通过利用投影至上计算得到的;利用所得到的可视性图,对每个像素计算总体可视性如果在t时刻所有其余视频帧中均不可见,则为0,否则为1;  in express exist Whether it is visible in , 1 means visible, 0 means invisible; δ d is the threshold of depth difference, is by using Will Project to Calculated above; using the obtained visibility map, for each pixel Calculate overall visibility if is invisible in all remaining video frames at time t, then is 0, otherwise is 1; (3)结合所求得的可视性图重新初始化每帧图像的深度图,DAISY特征相似度仅在可见的像素格点进行比较估计;并且,当的像素点的初始化深度值出现错误的情况下,利用Mean-shift技术对每帧图像进行分割,对于每个分割区域,利用的像素点的深度来拟合参数为[a,b,c]的平面,利用拟合的平面重新定义的像素点的数据项:  (3) Re-initialize the depth map of each frame image in combination with the obtained visibility map, and the DAISY feature similarity is only compared and estimated at the visible pixel grid points; and, when When the initial depth value of the pixel is wrong, the Mean-shift technology is used to segment each frame of the image. For each segmented area, the The depth of the pixel points to fit the plane with parameters [a,b,c], and use the fitted plane to redefine The data item of the pixel point: 其中σd用来控制数据项对于深度值与拟合平面的距离差的敏感度,x和y是像素点的坐标值;利用重新定义的数据项进行能量优化,从而纠正被遮挡像素点的错误深度值 。  Where σ d is used to control the sensitivity of the data item to the distance difference between the depth value and the fitting plane, and x and y are pixel points The coordinate value of ; use the redefined data item to optimize the energy, so as to correct the wrong depth value of the occluded pixel. 3.根据权利要求1中所述的一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法,其特征在于所述的步骤2)为:  3. according to a kind of method for the space-time consistency depth recovery of the dynamic scene video of multi-purpose synchronous camera shooting according to claim 1, it is characterized in that described step 2) is: (1)对于每帧图像中的像素点,利用初始化深度将其投影至其余时刻帧,比较像素点在当前时刻帧与其余时刻帧上的对应位置的几何与颜色的一致性,统计深度值和颜色值具有一致性的其余时刻帧数目所占的比例值,作为像素点属于动态物体的概率值,从而得到每帧图像的动态概率图,其计算公式如下:  (1) For the pixels in each frame of image, use the initialization depth Project it to the rest of the time frame, compare the geometric and color consistency of the corresponding position of the pixel point on the current time frame and the rest of the time frame, and count the proportion of the number of frames at the other time when the depth value and color value are consistent , as the probability value that the pixel belongs to the dynamic object, so as to obtain the dynamic probability map of each frame image, the calculation formula is as follows: 其中启发式函数用来判断在其余帧上几何和颜色是否一致;首先比较与对应位置的深度值差异,如果上的深度值与的深度不相似,则认为几何不一致,如果的深度值相似,则比较其颜色值,如果颜色相似,则认为的颜色值一致,否则认为颜色不一致;统计具有深度值和颜色值一致性的其余时刻帧数目所占的比例,作为像素点属于动态物体的概率值;  where the heuristic function used to judge in the remaining frames Whether the geometry and color are consistent; first compare with the corresponding position The difference in depth values, if exist The depth value on the The depths of are not similar, the geometry is considered inconsistent, if and have similar depth values, compare their color values, and if the colors are similar, consider and The color values of the pixels are consistent, otherwise the colors are considered to be inconsistent; the proportion of the number of frames at other times with consistent depth values and color values is counted as the probability value that the pixel belongs to a dynamic object; (2)将动态概率图利用大小为0.4的阈值ηp进行二值化得到每帧图像的初始动态/静态分割图;利用Mean-shift技术对每帧图像进行over-segmentation,即粒度小的图像分割,对于每个分割区域统计二值化后的动态像素点数目的比例值,如果比例值大于0.5,则将整个分割区域的像素点标记为动态,否则标记为静态,由此对二值化分割图进行边界调整和去噪;  (2) Binarize the dynamic probability map with a threshold η p of 0.4 to obtain the initial dynamic/static segmentation map of each frame of image; use Mean-shift technology to perform over-segmentation on each frame of image, that is, an image with a small granularity Segmentation. For each segmented area, the ratio value of the number of dynamic pixels after binarization is counted. If the ratio value is greater than 0.5, the pixels in the entire segmented area are marked as dynamic, otherwise they are marked as static, and the binarization is segmented. Figure for boundary adjustment and denoising; (3)利用连续时刻图像之间对应像素点的坐标偏移量,将每帧图像的像素点跟踪至同一视频中的相邻时刻帧寻找对应像素点,统计对应像素点分割标记为动态的帧数目所占的比例,由此计算像素点的时域动态概率,其计算公式如下:  (3) Utilize the coordinate offset of corresponding pixels between images at consecutive moments, track the pixels of each frame of images to adjacent frames in the same video to find the corresponding pixels, and count the corresponding pixels to segment and mark as dynamic frames The proportion of the number, from which the temporal dynamic probability of the pixel is calculated, and the calculation formula is as follows: 其中表示从t至t′时刻的光流偏移量,表示在t′时刻对应像素点的动态/静态分割标记,N(t)表示t前后连续5个相邻时刻帧;利用时域动态概率,通过如下能量优化方程式优化每一时刻图像帧的动态/静态分割图:  in express The optical flow offset from time t to t′, express The dynamic/static segmentation mark corresponding to the pixel at time t′, N(t) represents 5 consecutive adjacent time frames before and after t; using the dynamic probability in time domain, optimize the dynamic/static image frame at each time through the following energy optimization equation Split graph: 其中表示视频m在第t帧的动态/静态分割图;数据项Ed的定义如下:  in Represents the dynamic/static segmentation map of the tth frame of video m; the definition of data item E d is as follows: 平滑项Es促使分割边界与图像边界尽可能一致,其定义如下:  The smoothing term E s promotes the segmentation boundary to be as consistent as possible with the image boundary, which is defined as follows: 对于经能量优化后的动态/静态分割图,利用Grabcut分割技术进行进一步优化,除去分割边界上的毛刺,得到最终时序上一致动态/静态划分。  For the energy-optimized dynamic/static segmentation graph, the Grabcut segmentation technology is used to further optimize, remove the burrs on the segmentation boundary, and obtain a consistent dynamic/static division in the final timing. the 4.根据权利要求1中所述的一种对于多目同步摄像机拍摄的动态场景视频的时空一致性深度恢复的方法,其特征在于所述的步骤3)为:  4. according to a kind of method for the space-time consistency depth recovery of the dynamic scene video of multi-purpose synchronous camera shooting according to claim 1, it is characterized in that described step 3) is: (1)对于静态像素点,利用bundle optimization方法统计当前时刻帧像素点和多目视频多个相邻时刻帧上对应像素点之间的颜色和几何一致性约束信息,由此对当前时刻静态深度值进行优化;  (1) For static pixels, the bundle optimization method is used to count the color and geometric consistency constraint information between the pixels of the frame at the current moment and the corresponding pixels on multiple adjacent frames of the multi-view video, and thus the static depth at the current moment value is optimized; (2)对于动态像素点假设其候选深度为d,首先根据d将其投影至同一时刻t的视频m′,得到对应像素点比较的颜色与几何一致性,其计算公式如下:  (2) For dynamic pixels Assuming that the candidate depth is d, first project it to the video m' at the same time t according to d, and obtain the corresponding pixel Compare and The color and geometric consistency of , its calculation formula is as follows: 其中估计的颜色一致性,其计算公式如下:  in estimate and The color consistency, its calculation formula is as follows: σc控制颜色差异的敏感度,  σ c controls the sensitivity to color differences, 估计的几何一致性,其计算公式如下:  estimate and The geometric consistency of , its calculation formula is as follows: σg控制深度差异的敏感度,对称投影误差计算函数dg投影至同一时刻t的视频m′的投影位置并计算其与的距离,同时计算投影至t时刻m视频的投影位置与的距离,然后计算两者的平均距离;  σ g controls the sensitivity to depth differences, and the symmetric projection error calculation function d g will be Projected to the projection position of the video m' at the same time t and calculate its and distance, while calculating Projected to the projection position of m video at time t and , and then calculate the average distance between the two; 接下来,利用利用光流将跟踪至相邻时刻t′得到对应像素点比较的颜色与几何一致性,其计算公式如下:  Next, using optical flow to and Track to the adjacent time t' to get the corresponding pixel and Compare and The color and geometric consistency of , its calculation formula is as follows: 累积多个相邻时刻的颜色与几何一致性估计值,由此重新定义对于动态像素 点深度优化的能量方程数据项:  Accumulate the color and geometric consistency estimates of multiple adjacent moments, thereby redefining the energy equation data items optimized for dynamic pixel depth: 利用重新定义的数据项进行能量优化方程式求解,从而在时空域上优化每帧图像中的动态像素点深度值。  Use the redefined data items to solve the energy optimization equation, so as to optimize the dynamic pixel depth value in each frame image in the temporal and spatial domain. the
CN201210360976.0A 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera Active CN103002309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210360976.0A CN103002309B (en) 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210360976.0A CN103002309B (en) 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Publications (2)

Publication Number Publication Date
CN103002309A CN103002309A (en) 2013-03-27
CN103002309B true CN103002309B (en) 2014-12-24

Family

ID=47930367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210360976.0A Active CN103002309B (en) 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Country Status (1)

Country Link
CN (1) CN103002309B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6369461B2 (en) * 2013-05-27 2018-08-08 ソニー株式会社 Image processing apparatus, image processing method, and program
CN104899855A (en) * 2014-03-06 2015-09-09 株式会社日立制作所 Three-dimensional obstacle detection method and apparatus
EP3007130A1 (en) 2014-10-08 2016-04-13 Thomson Licensing Method and apparatus for generating superpixel clusters
CN106296696B (en) * 2016-08-12 2019-05-24 深圳市利众信息科技有限公司 The processing method and image capture device of color of image consistency
CN106887015B (en) * 2017-01-19 2019-06-11 华中科技大学 An unconstrained multi-camera image matching method based on spatiotemporal consistency
CN107507236B (en) * 2017-09-04 2018-08-03 北京建筑大学 The progressive space-time restriction alignment schemes of level and device
CN108322730A (en) * 2018-03-09 2018-07-24 嘀拍信息科技南通有限公司 A kind of panorama depth camera system acquiring 360 degree of scene structures
CN109410145B (en) * 2018-11-01 2020-12-18 北京达佳互联信息技术有限公司 Time sequence smoothing method and device and electronic equipment
CN110782490B (en) * 2019-09-24 2022-07-05 武汉大学 Video depth map estimation method and device with space-time consistency
CN112738423B (en) * 2021-01-19 2022-02-25 深圳市前海手绘科技文化有限公司 Method and device for exporting animation video

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945299A (en) * 2010-07-09 2011-01-12 清华大学 Camera-equipment-array based dynamic scene depth restoring method
CN102074020A (en) * 2010-12-31 2011-05-25 浙江大学 Method for performing multi-body depth recovery and segmentation on video

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945299A (en) * 2010-07-09 2011-01-12 清华大学 Camera-equipment-array based dynamic scene depth restoring method
CN102074020A (en) * 2010-12-31 2011-05-25 浙江大学 Method for performing multi-body depth recovery and segmentation on video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Consistent Depth Maps Recovery from a Video Sequence;Guoeng Zhang eta;《IEEE TRANSACTIONS ON PATTERN ANALYSISI AND MACHINE INTELLIGENCE》;20090630;974-988 *
基于能量最小化扩展深度的实现方法;姜晓红等;《中国图象图形学报》;20061231;1854-1858 *

Also Published As

Publication number Publication date
CN103002309A (en) 2013-03-27

Similar Documents

Publication Publication Date Title
CN103002309B (en) Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera
Liu et al. Robust dynamic radiance fields
CN112435325B (en) VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
CN105654492B (en) Robust real-time three-dimensional method for reconstructing based on consumer level camera
CN102750711B (en) A kind of binocular video depth map calculating method based on Iamge Segmentation and estimation
CN110490919B (en) Monocular vision depth estimation method based on deep neural network
CN102903096B (en) Monocular video based object depth extraction method
Li et al. Markerless shape and motion capture from multiview video sequences
CN110490928A (en) A kind of camera Attitude estimation method based on deep neural network
CN102074020B (en) Method for performing multi-body depth recovery and segmentation on video
CN107833270A (en) Real-time object dimensional method for reconstructing based on depth camera
EP2595116A1 (en) Method for generating depth maps for converting moving 2d images to 3d
CN108038905A (en) A kind of Object reconstruction method based on super-pixel
Ramirez et al. Open challenges in deep stereo: the booster dataset
Tung et al. Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo
CN104869387A (en) Method for acquiring binocular image maximum parallax based on optical flow method
EP3563346A1 (en) Method and device for joint segmentation and 3d reconstruction of a scene
CN103049929A (en) Multi-camera dynamic scene 3D (three-dimensional) rebuilding method based on joint optimization
CN106651943B (en) It is a kind of based on the light-field camera depth estimation method for blocking geometry complementation model
KR101125061B1 (en) A Method For Transforming 2D Video To 3D Video By Using LDI Method
Li et al. Deep learning based monocular depth prediction: Datasets, methods and applications
Birchfield et al. Correspondence as energy-based segmentation
Wang et al. Example-based video stereolization with foreground segmentation and depth propagation
Liu et al. Disparity Estimation in Stereo Sequences using Scene Flow.
Lipski et al. High resolution image correspondences for video Post-Production

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210707

Address after: Room 288-8, 857 Shixin North Road, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province

Patentee after: ZHEJIANG SHANGTANG TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: 310027 No. 38, Zhejiang Road, Hangzhou, Zhejiang, Xihu District

Patentee before: ZHEJIANG University