CN102446366B - Time-space jointed multi-view video interpolation and three-dimensional modeling method - Google Patents
Time-space jointed multi-view video interpolation and three-dimensional modeling method Download PDFInfo
- Publication number
- CN102446366B CN102446366B CN 201110271761 CN201110271761A CN102446366B CN 102446366 B CN102446366 B CN 102446366B CN 201110271761 CN201110271761 CN 201110271761 CN 201110271761 A CN201110271761 A CN 201110271761A CN 102446366 B CN102446366 B CN 102446366B
- Authority
- CN
- China
- Prior art keywords
- interpolation
- frame
- camera
- dimensional
- angle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 87
- 238000005457 optimization Methods 0.000 claims abstract description 10
- 238000003491 array Methods 0.000 claims abstract description 3
- 238000001228 spectrum Methods 0.000 claims description 34
- 230000003287 optical effect Effects 0.000 claims description 27
- 230000000007 visual effect Effects 0.000 claims description 15
- 238000005516 engineering process Methods 0.000 claims description 6
- 101001129902 Homo sapiens Peptidyl-prolyl cis-trans isomerase NIMA-interacting 4 Proteins 0.000 claims description 4
- 102100031653 Peptidyl-prolyl cis-trans isomerase NIMA-interacting 4 Human genes 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 238000005070 sampling Methods 0.000 description 6
- 230000001186 cumulative effect Effects 0.000 description 4
- 238000007500 overflow downdraw method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Landscapes
- Processing Or Creating Images (AREA)
Abstract
本发明属于计算机多媒体技术领域。为提供一种简便实用的多视角视频插值及三维建模方法,本发明采取的技术方案是,时空联合多视角视频插值及三维建模方法,将多摄像机阵列间隔分组;重建出每个时刻上场景的三维模型,具体包括以下步骤:1)插值出两帧之间的未采集帧;2)采用模型辅助的加权方法得到该时刻未采集的视角的图像;3)计算并提取关键点;4)使用shape context形状上下文描述所提取的关键点,通过Hungarian方法求解;5)通过求解泊松编辑优化问题得到最终的插值帧;6)重建场景的三维模型并渲染。本发明主要应用于天线设计制造。
The invention belongs to the technical field of computer multimedia. In order to provide a simple and practical method for multi-view video interpolation and three-dimensional modeling, the technical solution adopted by the present invention is to combine space-time multi-view video interpolation and three-dimensional modeling method to group multi-camera arrays at intervals; The three-dimensional model of the scene specifically includes the following steps: 1) Interpolating uncollected frames between two frames; 2) Using a model-assisted weighting method to obtain an image of a perspective that has not been collected at this moment; 3) Calculating and extracting key points; 4 ) Use the shape context to describe the extracted key points, and solve it through the Hungarian method; 5) Obtain the final interpolation frame by solving the Poisson editing optimization problem; 6) Reconstruct the 3D model of the scene and render it. The invention is mainly applied to antenna design and manufacture.
Description
技术领域 technical field
本发明属于计算机多媒体技术领域,具体讲,涉及时空联合多视角视频插值及三维建模方法。The invention belongs to the technical field of computer multimedia, and specifically relates to a time-space joint multi-view video interpolation and three-dimensional modeling method.
背景技术 Background technique
长期以来,单路视频的采集、处理与通信在关键技术上取得了重要突破,已经趋于成熟并在广播电视、互联网视频、智能交通等多个领域获得广泛应用。然而,传统单摄像机采集形式不能带来深度感、立体感以及对对象的全方位认识(视角可变)。基于多摄像机系统的多路视频采集及对场景对象的重建则可达到全方位的视觉感受,相关研究在上世纪90年代中期开始成为研究热点。基于多摄像机系统的三维场景实时获取及重建技术在自由视点视频、虚拟现实、沉浸视频会议、电影娱乐、立体视频及运动分析等领域有着广泛应用。国际上多所著名大学与研究机构如:斯坦福、麻省理工、卡奈基梅隆、哥伦比亚大学、三菱电子、微软研究院、马克斯-普朗克信息研究所都搭建了各种多摄像机采集系统,以用于场景几何捕获、运动分析以及立体制作。现阶段,基于多摄像机系统的采集和重建技术由于存在摄像机搭建与同步、摄像机存储与传输、高维数据处理、高速运动捕捉等方面的问题而难以获得用户满意的重建效果。其中,要实现对高速运动的捕捉,一种方法是采用多个高速摄像机,但高速摄像机价格昂贵且存储能力有限;另一种方法是采用多个廉价的低帧率摄像机,对这些摄像机进行合理分组,同组摄像机同时采样,不同组摄像机间插采样,如此得到稀疏采样的空时信息,然后通过插值方法实现高帧率的重建。斯坦福大学(Wilburn B,Joshi N,Vaish V,et al.High-speed videography using a dense camera array.Proceedings of IEEE Conference on ComputerVision and Pattern Recognition,Washington,DC,USA,2004.294-301.)通过52个帧率为30fps的密集光场摄像机阵列实现了单视角的高速场景重现。但是该方法结果仅限于单视角,不能得到各个时刻多视角的图像,进而无法重建各个时刻的三维模型。本发明第一发明人曾提出一种采用环形低帧率摄像机阵列对高速运动物体建模的方法(ZL200810103684.2)来实现全视角三维重建,但是该方法仅仅简单地利用可视外壳模型求交来进行插值和重建,因此插值和重建效果很一般且不鲁棒。虽然可以采用现有视频插值、图像融合方法来获取未采集的多视角视频,但是所得结果会存在模糊或不平滑区域。For a long time, the acquisition, processing and communication of single-channel video have made important breakthroughs in key technologies, and have become mature and widely used in broadcast television, Internet video, intelligent transportation and other fields. However, the traditional single-camera acquisition method cannot bring a sense of depth, a sense of three-dimensionality, and a comprehensive understanding of objects (variable viewing angles). Multi-channel video acquisition and reconstruction of scene objects based on a multi-camera system can achieve a full range of visual experience, and related research has become a research hotspot since the mid-1990s. Real-time acquisition and reconstruction of 3D scenes based on multi-camera systems has been widely used in the fields of free viewpoint video, virtual reality, immersive video conferencing, movie entertainment, stereoscopic video, and motion analysis. Many famous universities and research institutions in the world such as: Stanford, MIT, Carnegie Mellon, Columbia University, Mitsubishi Electronics, Microsoft Research Institute, Max Planck Institute for Information Research have built various multi-camera acquisition systems , for scene geometry capture, motion analysis, and stereoscopic production. At present, acquisition and reconstruction technologies based on multi-camera systems are difficult to obtain satisfactory reconstruction results due to problems in camera construction and synchronization, camera storage and transmission, high-dimensional data processing, and high-speed motion capture. Among them, to realize the capture of high-speed motion, one method is to use multiple high-speed cameras, but high-speed cameras are expensive and have limited storage capacity; Grouping, the same group of cameras sampling at the same time, and different groups of cameras interleaving sampling, so as to obtain sparsely sampled space-time information, and then achieve high frame rate reconstruction through interpolation methods. Stanford University (Wilburn B, Joshi N, Vaish V, et al. High-speed videography using a dense camera array. Proceedings of IEEE Conference on ComputerVision and Pattern Recognition, Washington, DC, USA, 2004.294-301.) through 52 frames The dense light field camera array with a rate of 30fps realizes high-speed scene reproduction from a single perspective. However, the result of this method is limited to a single perspective, and images from multiple perspectives at each moment cannot be obtained, and thus the 3D model at each moment cannot be reconstructed. The first inventor of the present invention once proposed a method (ZL200810103684.2) for modeling high-speed moving objects using a ring-shaped low frame rate camera array to achieve full-view 3D reconstruction, but this method only simply uses the visible shell model to calculate the intersection To perform interpolation and reconstruction, so the interpolation and reconstruction effects are very general and not robust. Although existing video interpolation and image fusion methods can be used to obtain uncaptured multi-view videos, the results will have blurred or non-smooth areas.
发明内容 Contents of the invention
为克服现有技术的不足,提供一种简便实用的多视角视频插值及三维建模方法,本发明采取的技术方案是,时空联合多视角视频插值及三维建模方法,将多摄像机阵列间隔分组:设有n个帧率为f帧/秒的摄像机,均匀间隔分为m组,n、m为正整数,且n为m的整数倍;同组摄像机同步采集得到同一时刻n/m个视角的视频,不同组摄像机以1/(fm)秒的时间间插进行采集得到不同时刻的视频;采用所提出的时空联合多视角视频插值及三维建模方法得到所有时刻n个视角的视频,进而重建出每个时刻上场景的三维模型,具体方法包括以下步骤:In order to overcome the deficiencies of the prior art and provide a simple and practical method for multi-view video interpolation and three-dimensional modeling, the technical solution adopted by the present invention is to combine space-time multi-view video interpolation and three-dimensional modeling method to group multi-camera arrays at intervals : There are n cameras with a frame rate of f frames per second, which are divided into m groups at even intervals, where n and m are positive integers, and n is an integer multiple of m; the same group of cameras can be synchronously collected to obtain n/m viewing angles at the same time Different groups of cameras interleaved at 1/(fm) seconds to collect videos at different moments; using the proposed spatio-temporal joint multi-view video interpolation and 3D modeling method to obtain videos of n viewing angles at all times, and then The three-dimensional model of the scene on each moment is reconstructed, and the specific method includes the following steps:
1)对于每一个摄像机,采用光流方法求取相邻两个采集帧之间的前向光流和后向光流,进而插值出两帧之间的未采集帧,即时域插值帧;1) For each camera, use the optical flow method to obtain the forward optical flow and backward optical flow between two adjacent acquisition frames, and then interpolate the unacquired frames between the two frames, which is the interpolation frame in the instant domain;
2)对于每一个采集时刻,采用模型辅助的加权方法得到该时刻未采集的视角的图像,即空域插值帧;2) For each acquisition moment, the model-assisted weighting method is used to obtain the image of the angle of view that is not collected at this moment, that is, the spatial interpolation frame;
3)计算由步骤1)得到的时域插值帧和由步骤2)得到的空域插值帧的双树离散小波域的积累能量谱,并提取关键点;3) calculate the time domain interpolation frame obtained by step 1) and the accumulated energy spectrum of the dual tree discrete wavelet domain of the space domain interpolation frame obtained by step 2), and extract key points;
4)使用shape context形状上下文描述所提取的关键点,并将基于形状上下文的关键点匹配问题转化为平方赋值即加权二分图匹配问题,通过Hungarian方法求解;4) Use the shape context to describe the extracted key points, and convert the key point matching problem based on the shape context into a square assignment, that is, a weighted bipartite graph matching problem, and solve it by the Hungarian method;
5)通过求解泊松编辑优化问题得到最终的插值帧;5) Obtain the final interpolation frame by solving the Poisson editing optimization problem;
6)在每个时刻上,利用所有视角的图像,包括采集图像和插值图像,采用多视角立体方法重建场景的三维模型并渲染。6) At each moment, using images from all perspectives, including acquisition images and interpolation images, the multi-view stereo method is used to reconstruct the 3D model of the scene and render it.
模型辅助的加权方法具体包括以下步骤:The model-assisted weighting method specifically includes the following steps:
21)通过简单的差分或者蓝屏分割技术由采集到视角图像提取三维物体的轮廓图;21) Extract the contour map of the three-dimensional object from the collected perspective image through simple difference or blue screen segmentation technology;
22)利用步骤21)计算得到的轮廓图,通过EPVH方法重建粗略的三维模型,即可视外壳模型;22) Using the contour map calculated in step 21), reconstruct a rough three-dimensional model, namely the visible shell model, by the EPVH method;
23)对于每一个未采集视角i,利用与之相邻最近的两个采集视角j和k的图像进行加权插值,权值计算如下:23) For each uncollected viewing angle i, weighted interpolation is performed using the images of the two closest adjacent collection viewing angles j and k, and the weight is calculated as follows:
(1) (1)
其中,Θ和Ф为两个常量角,分别代表允许的摄像机视线之间夹角的最大值和三维点法线与摄像机视线之间夹角的最大值;θ1为摄像机视线ri和摄像机视线rj的夹角,θ2为摄像机视线ri和摄像机视线rk的夹角,为三维点p的法线与摄像机视线rj的夹角,为三维点p的法线与摄像机视线rk的夹角;p为通过视角i图像上某一像素的视线与三维模型的交点。Among them, Θ and Ф are two constant angles, which respectively represent the maximum value of the angle between the allowable camera line of sight and the maximum angle between the three-dimensional point normal and the camera line of sight; θ 1 is the camera line of sight r i and the camera line of sight The included angle of r j , θ 2 is the included angle between camera line of sight r i and camera line of sight r k , is the angle between the normal of the 3D point p and the camera line of sight r j , is the angle between the normal of the 3D point p and the camera line of sight r k ; p is the intersection point of the line of sight of a certain pixel on the image passing through the viewing angle i and the 3D model.
计算双树离散小波域的积累能量谱并提取关键点,具体方法包括以下步骤:Calculate the cumulative energy spectrum of the dual tree discrete wavelet domain and extract key points, the specific method includes the following steps:
31)将空域插值帧和时域插值帧进行双树离散小波变换,分解为S个尺度;31) performing dual-tree discrete wavelet transform on the spatial domain interpolation frame and the time domain interpolation frame, and decomposing them into S scales;
32)分别计算实部和虚部每个尺度下的关键点能量谱{Ms}1≤s≤S,每个像素位置的关键点能量计算为:32) Calculate the key point energy spectrum {M s } 1≤s≤S at each scale of the real part and imaginary part respectively, and the key point energy at each pixel position is calculated as:
其中{c1,K,c6}为实部或虚部对应像素的六个子带系数,参数α和β用来调整积累能量谱中尺度的重要性;Where {c 1 , K, c 6 } are the six subband coefficients of the pixels corresponding to the real or imaginary part, and the parameters α and β are used to adjust the importance of the scale in the accumulated energy spectrum;
33)将步骤32)所得的能量谱采用二维高斯核插值成原图像大小,尺度s下的插值谱定义为gs(Ms);33) The energy spectrum obtained in step 32) is interpolated into the original image size using a two-dimensional Gaussian kernel, and the interpolation spectrum under the scale s is defined as g s (M s );
34)分别计算实部和虚部的积累能量谱Ar和Ai为并得到最终的积累能量谱为
35)采用SIFT方法提取由步骤34)所得的积累能量谱的关键点。35) Using the SIFT method to extract the key points of the accumulated energy spectrum obtained in step 34).
基于形状上下文的关键点匹配方法具体为通过求解以下优化问题得到最终的插值帧:The key point matching method based on the shape context is specifically to obtain the final interpolation frame by solving the following optimization problem:
Δf|Ω=divv
其中,f为未知的待插值帧,为拉普拉斯算子,v=(u,v)为时域插值帧的梯度向量场,为v的散度,为空域插值帧,为闭集Ω的边界,s.t.表示“满足...条件”的意思,|Ω是指在闭集Ω上,表示在闭集Ω边界上。Among them, f is the unknown frame to be interpolated, Be the Laplacian operator, v=(u, v) is the gradient vector field of the time domain interpolation frame, is the divergence of v, Interpolate frames for the spatial domain, is the boundary of the closed set Ω, st means "satisfies the ... condition", | Ω refers to the closed set Ω, Expressed on the closed set Ω boundary.
本发明的方法的特点及效果:Features and effects of the method of the present invention:
本发明方法避免了昂贵的高速摄像机的需要和已有视频插值、图像融合方法不平滑等问题,通过空时采样、空时插值和空时优化,实现了低帧率摄像机采集条件下的高帧率多视角视频恢复和场景的三维重建。具有以下特点:The method of the invention avoids the need for expensive high-speed cameras and the problems of existing video interpolation and unsmooth image fusion methods. Through space-time sampling, space-time interpolation and space-time optimization, high frame rates under low frame rate camera acquisition conditions are realized. High-rate multi-view video restoration and 3D reconstruction of scenes. Has the following characteristics:
1、程序简单,易于实现。1. The program is simple and easy to implement.
2、针对非平面摄像机系统的空时采样与插值。对低帧率摄像机所组成的系统进行合理分组,使得每组摄像机同步且均匀分布。不同组摄像机在时域上间插采集动态场景。每个采样时刻上,没有采样到的摄像机图像采用加权方法进行空域插值,采用双向光流方法进行时域插值。2. Space-time sampling and interpolation for non-planar camera systems. Reasonably group the system composed of low frame rate cameras, so that each group of cameras is synchronized and evenly distributed. Different groups of cameras interleavedly capture dynamic scenes in the time domain. At each sampling moment, the unsampled camera images are interpolated in the spatial domain using a weighted method, and interpolated in the temporal domain using a bidirectional optical flow method.
3、基于双树离散小波变换的形状上下文的优化方法。将空时信息优化转化为图像的泊松编辑问题求解。利用双树离散小波变换的时移不变性和方向选择性,提取边缘(高频信息)附近的感兴趣关键点,然后采用形状上下文的方法匹配这些关键点作为边界约束。3. The optimization method of shape context based on dual-tree discrete wavelet transform. Transforming the optimization of space-time information into the solution of Poisson editing problems for images. Using the time-shift invariance and direction selectivity of dual-tree discrete wavelet transform, key points of interest near edges (high-frequency information) are extracted, and then the shape context method is used to match these key points as boundary constraints.
本发明可以采用低帧率摄像机系统实现时域上密集的动态场景三维重建。所提出的方法具有很好的可扩展性:可以通过简单地加入更多的摄像机或采用更高帧率的摄像机来获得时域分辨率更高的多视角视频恢复和三维动态场景重建。The present invention can use a low frame rate camera system to realize three-dimensional reconstruction of dense dynamic scenes in time domain. The proposed method is very scalable: multi-view video restoration and 3D dynamic scene reconstruction with higher temporal resolution can be obtained simply by adding more cameras or adopting higher frame rate cameras.
附图说明 Description of drawings
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1为本发明实施例的时空联合多视角视频插值及三维建模方法流程图;Fig. 1 is a flow chart of a spatio-temporal joint multi-view video interpolation and three-dimensional modeling method according to an embodiment of the present invention;
图2为本发明实施例采用所提出的发明方法和其他两种方法对序列1恢复的某一未采集帧结果;Fig. 2 adopts the proposed inventive method and other two methods to recover a certain uncollected frame result of
图3为本发明实施例对序列1重建的可见外壳模型和采用所提出的发明方法重建的三维模型结果;Fig. 3 is the visible shell model reconstructed by the embodiment of the present invention for
图4为本发明实施例采用所提出的发明方法对序列2重建的动态三维模型结果。Fig. 4 is the result of the dynamic three-dimensional model reconstructed for
具体实施方式 Detailed ways
本发明将时空联合多视角视频插值转化为图像的泊松编辑问题求解,其中关键点提取与匹配利用了双树离散小波变换(dual-tree discrete wavelet transform,DDWT)的方向性和形状上下文(shape context)的鲁棒性,实现了高帧率高质量的多视角视频插值。所得结果具有插值效果好,精度高,重建三维模型精确完整、可在分组间插采样的低帧率摄像机阵列的条件下实现的特点。The invention solves the problem of Poisson editing by transforming joint spatio-temporal multi-view video interpolation into images, wherein key point extraction and matching utilizes the directionality and shape context (shape) of dual-tree discrete wavelet transform (DDWT) context) to achieve multi-view video interpolation with high frame rate and high quality. The obtained result has the characteristics of good interpolation effect, high precision, accurate and complete reconstructed three-dimensional model, and can be realized under the condition of low frame rate camera array with group interleaved sampling.
本发明的时空联合多视角视频插值及三维建模方法,其特征在于:The space-time joint multi-view video interpolation and three-dimensional modeling method of the present invention is characterized in that:
将多摄像机阵列间隔分组(设有n个帧率为f帧/秒的摄像机,均匀间隔分为m组,n、m为正整数,且n为m的整数倍),同组摄像机同步采集得到同一时刻n/m个视角的视频,不同组摄像机以1/(fm)秒的时间间插进行采集得到不同时刻的视频;采用所提出的时空联合多视角视频插值及三维建模方法得到所有时刻n个视角的视频,进而重建出每个时刻上场景的三维模型,具体方法包括以下步骤:Group the multi-camera array at intervals (set n cameras with a frame rate of f frames per second, and divide them into m groups at even intervals, where n and m are positive integers, and n is an integer multiple of m), and the same group of cameras collects synchronously to obtain Videos of n/m angles of view at the same moment, different groups of cameras are interleaved at 1/(fm) seconds to collect videos at different moments; using the proposed spatio-temporal joint multi-angle video interpolation and 3D modeling method to obtain all moments Videos from n perspectives, and then reconstruct the 3D model of the scene at each moment. The specific method includes the following steps:
1)对于每一个摄像机,采用Brox等提出的光流方法(Brox T,Bruhn A,Papenberg N,et al.High accuracy optical flow estimation based on a theory for warping.Proceedings of EuropeanConference on Computer Vision,volume 3024,2004.25-36.)求取相邻两个采集帧之间的前向光流和后向光流,进而插值出两帧之间的m-1帧;具体的时域插值方法可包括以下步骤:1) For each camera, use the optical flow method proposed by Brox et al. (Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping. Proceedings of European Conference on Computer Vision, volume 3024, 2004.25-36.) Calculate the forward optical flow and backward optical flow between two adjacent acquisition frames, and then interpolate the m-1 frame between the two frames; the specific time domain interpolation method may include the following steps:
11)假设运动路径是线性的,即待插值帧上的像素在运动路径上的位置与该帧在最近两个采集帧之间的相对位置成正比,根据前向光流和后向光流结果,计算待插值帧对应像素的前向光流插值和后向光流插值;11) Assuming that the motion path is linear, that is, the position of the pixel on the motion path on the frame to be interpolated is proportional to the relative position of the frame between the two latest acquisition frames, according to the forward optical flow and backward optical flow results , calculate the forward optical flow interpolation and backward optical flow interpolation of the pixel corresponding to the frame to be interpolated;
12)对于待插值帧上的每一个像素,取其前向光流插值和后向光流插值的平均值作为最终的估计结果;12) For each pixel on the frame to be interpolated, take the average value of its forward optical flow interpolation and backward optical flow interpolation as the final estimation result;
13)采用八邻域滤波方法填补待插值帧中未被赋值的像素空洞;13) Using an eight-neighborhood filtering method to fill in unassigned pixel holes in the frame to be interpolated;
2)对于每一个采集时刻,采用模型辅助的加权方法得到该时刻未采集的视角的图像;具体方法可包括以下步骤:2) For each collection moment, adopt the model-assisted weighting method to obtain the image of the angle of view that is not collected at this moment; the specific method may include the following steps:
21)通过简单的差分或者蓝屏分割技术由采集到视角图像提取三维物体的轮廓图;21) Extract the contour map of the three-dimensional object from the collected perspective image through simple difference or blue screen segmentation technology;
22)利用步骤21)计算得到的轮廓图,通过EPVH(Exact Polyhedral Visual Hulls,FrancoJ S,Boyer E.Exact polyhedral visual hulls.Proceedings of British Machine Vision Conference,1994.329-338.)方法重建粗略的三维模型,即可视外壳模型;22) Utilize the outline figure calculated in step 21) to reconstruct a rough three-dimensional model by EPVH (Exact Polyhedral Visual Hulls, FrancoJ S, Boyer E.Exact polyhedral visual hulls.Proceedings of British Machine Vision Conference, 1994.329-338.) method, That is, the visual shell model;
23)对于每一个未采集视角i,利用与之相邻最近的两个采集视角j和k的图像进行加权插值,权值计算如下:23) For each uncollected viewing angle i, weighted interpolation is performed using the images of the two closest adjacent collection viewing angles j and k, and the weight is calculated as follows:
(1) (1)
其中,Θ和Ф为两个常量角,分别代表允许的摄像机视线之间夹角的最大值和三维点法线与摄像机视线之间夹角的最大值;θ1为摄像机视线ri和摄像机视线rj的夹角,θ2为摄像机视线ri和摄像机视线rk的夹角,为三维点p的法线与摄像机视线rj的夹角,为三维点p的法线与摄像机视线rk的夹角;p为通过视角i图像上某一像素的视线与三维模型的交点;Among them, Θ and Ф are two constant angles, which respectively represent the maximum value of the angle between the allowable camera line of sight and the maximum angle between the three-dimensional point normal and the camera line of sight; θ 1 is the camera line of sight r i and the camera line of sight The included angle of r j , θ 2 is the included angle between camera line of sight r i and camera line of sight r k , is the angle between the normal of the 3D point p and the camera line of sight r j , is the angle between the normal of the 3D point p and the camera line of sight r k ; p is the intersection point of the line of sight of a certain pixel on the image passing through the viewing angle i and the 3D model;
3)计算由步骤1)得到的时域插值帧和由步骤2)得到的空域插值帧的双树小波变换域的积累能量谱,并提取关键点;具体方法可包括以下步骤:3) calculate by step 1) the time domain interpolation frame that obtains and by step 2) the accumulative energy spectrum of the double tree wavelet transform domain of the space domain interpolation frame that obtains, and extract key point; Concrete method may comprise the following steps:
31)将空域插值帧和时域插值帧进行双树离散小波变换,分解为S个尺度;31) performing dual-tree discrete wavelet transform on the spatial domain interpolation frame and the time domain interpolation frame, and decomposing them into S scales;
32)分别计算实部和虚部每个尺度下的关键点能量谱{Ms}1≤s≤S,每个像素位置的关键点能量计算为:32) Calculate the key point energy spectrum {M s } 1≤s≤S at each scale of the real part and imaginary part respectively, and the key point energy at each pixel position is calculated as:
其中{c1,K,c6}为实部或虚部对应像素的六个子带系数,参数α和β用来调整积累能量谱中尺度的重要性;Where {c 1 , K, c 6 } are the six subband coefficients of the pixels corresponding to the real part or imaginary part, and the parameters α and β are used to adjust the importance of the scale in the accumulated energy spectrum;
33)将步骤32)所得的能量谱采用二维高斯核插值成原图像大小,尺度s下的插值谱定义为gs(Ms);33) The energy spectrum obtained in step 32) is interpolated into the original image size using a two-dimensional Gaussian kernel, and the interpolation spectrum under the scale s is defined as g s (M s );
34)分别计算实部和虚部的积累能量谱Ar和Ai为并得到最终的积累能量谱为
35)采用SIFT(Scale-Invariant Feature Transform)方法提取由步骤34)所得的积累能量谱的关键点;35) adopt the SIFT (Scale-Invariant Feature Transform) method to extract the key points of the accumulated energy spectrum obtained by step 34);
4)使用形状上下文(shape context)描述所提取的关键点,并将基于形状上下文的关键点匹配问题转化为平方赋值(加权二分图匹配)问题通过Hungarian方法求解;4) Use the shape context to describe the extracted key points, and convert the key point matching problem based on the shape context into a square assignment (weighted bipartite graph matching) problem and solve it by the Hungarian method;
5)通过求解以下优化问题得到最终的插值帧:5) Obtain the final interpolated frame by solving the following optimization problem:
Δf|Ω=divv
其中,f为未知的待插值帧,为拉普拉斯算子,v=(u,v)为时域插值帧的梯度向量场,为v的散度,为空域插值帧,为闭集Ω的边界。Among them, f is the unknown frame to be interpolated, Be the Laplacian operator, v=(u, v) is the gradient vector field of the time domain interpolation frame, is the divergence of v, Interpolate frames for the spatial domain, is the boundary of the closed set Ω.
6)在每个时刻上,利用所有视角的图像(采集图像和插值图像)采用多视角立体方法重建场景的三维模型并渲染。6) At each moment, use images from all perspectives (acquisition images and interpolation images) to reconstruct and render the 3D model of the scene using a multi-view stereo method.
本发明提出了一种时空联合多视角视频插值及三维建模的方法,结合附图及实施例详细说明如下:The present invention proposes a method for spatio-temporal joint multi-view video interpolation and three-dimensional modeling, which is described in detail in conjunction with the accompanying drawings and embodiments as follows:
实现本发明方法的系统实施例结构为:20个帧率为30帧/秒的摄像机呈环形分布环绕待采集的场景。将该多摄像机阵列均匀间隔分为4组,同组摄像机同步采集得到同一时刻5个视角的视频,不同组摄像机以1/120秒的时间间插进行采集得到不同时刻的视频;采用所提出的时空联合多视角视频插值及三维建模方法得到所有时刻20个视角的视频,进而重建出每个时刻上场景的三维模型。如图1所示,为本发明实施例的时空联合多视角视频插值及三维建模方法流程图,包括以下步骤:The structure of the embodiment of the system for realizing the method of the present invention is as follows: 20 cameras with a frame rate of 30 frames per second are distributed in a ring around the scene to be collected. The multi-camera array is divided into 4 groups at even intervals, and the cameras in the same group are synchronously collected to obtain videos from 5 viewing angles at the same time, and different groups of cameras are interleaved at 1/120 seconds to collect videos at different times; the proposed Combined spatio-temporal multi-view video interpolation and 3D modeling method to obtain videos from 20 viewing angles at all moments, and then reconstruct the 3D model of the scene at each moment. As shown in Figure 1, it is a flow chart of a spatio-temporal joint multi-view video interpolation and three-dimensional modeling method according to an embodiment of the present invention, including the following steps:
1)对于每一个摄像机,采用Brox等提出的光流方法(Brox T,Bruhn A,Papenberg N,et al.High accuracy optical flow estimation based on a theory for warping.Proceedings of European Conference on Computer Vision,volume 3024,2004.25-36.)求取相邻两个采集帧之间的前向光流和后向光流,进而插值出两帧之间的3帧;具体的时域插值方法可包括以下步骤:1) For each camera, use the optical flow method proposed by Brox et al. (Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping. Proceedings of European Conference on Computer Vision, volume 3024 , 2004.25-36.) Calculate the forward optical flow and backward optical flow between two adjacent acquisition frames, and then interpolate 3 frames between the two frames; the specific time-domain interpolation method may include the following steps:
11)假设运动路径是线性的,即待插值帧上的像素在运动路径上的位置与该帧在最近两个采集帧之间的相对位置成正比,根据前向光流和后向光流结果,计算待插值帧对应像素的前向光流插值和后向光流插值;11) Assuming that the motion path is linear, that is, the position of the pixel on the motion path on the frame to be interpolated is proportional to the relative position of the frame between the two latest acquisition frames, according to the forward optical flow and backward optical flow results , calculate the forward optical flow interpolation and backward optical flow interpolation of the pixel corresponding to the frame to be interpolated;
12)对于待插值帧上的每一个像素,取其前向光流插值和后向光流插值的平均值作为最终的估计结果;12) For each pixel on the frame to be interpolated, take the average value of its forward optical flow interpolation and backward optical flow interpolation as the final estimation result;
13)采用八邻域滤波方法填补待插值帧中未被赋值的像素空洞;13) Using an eight-neighborhood filtering method to fill in unassigned pixel holes in the frame to be interpolated;
2)对于每一个采集时刻,采用模型辅助的加权方法得到该时刻未采集的视角的图像;具体方法可包括以下步骤:2) For each collection moment, adopt the model-assisted weighting method to obtain the image of the angle of view that is not collected at this moment; the specific method may include the following steps:
21)通过简单的差分或者蓝屏分割技术由采集到视角图像提取三维物体的轮廓图;21) Extract the contour map of the three-dimensional object from the collected perspective image through simple difference or blue screen segmentation technology;
22)利用步骤21)计算得到的轮廓图,通过EPVH(Exact Polyhedral Visual Hulls,Franco J S,Boyer E.Exact polyhedral visual hulls.Proceedings of British MachineVision Conference,1994.329-338.)方法重建粗略的三维模型,即可视外壳模型;22) Using the contour map calculated in step 21), a rough three-dimensional model is reconstructed by the EPVH (Exact Polyhedral Visual Hulls, Franco J S, Boyer E. Exact polyhedral visual hulls. Proceedings of British MachineVision Conference, 1994.329-338.) method, That is, the visual shell model;
23)对于每一个未采集视角i,利用与之相邻最近的两个采集视角j和k的图像进行加权插值,权值计算如下:23) For each uncollected viewing angle i, weighted interpolation is performed using the images of the two closest adjacent collection viewing angles j and k, and the weight is calculated as follows:
其中,Θ=80°和Ф=70°为两个常量角,分别代表允许的摄像机视线之间夹角的最大值和三维点法线与摄像机视线之间夹角的最大值;θ1为摄像机视线ri和摄像机视线rj的夹角,θ2为摄像机视线ri和摄像机视线rk的夹角,为三维点p的法线与摄像机视线rj的夹角,为三维点p的法线与摄像机视线rk的夹角;p为通过视角i图像上某一像素的视线与三维模型的交点;Among them, Θ=80° and Ф=70° are two constant angles, which respectively represent the maximum value of the angle between the allowable camera line of sight and the maximum value of the angle between the three-dimensional point normal and the camera line of sight; θ 1 is the camera The angle between line of sight r i and camera line of sight r j , θ 2 is the angle between camera line of sight r i and camera line of sight r k , is the angle between the normal of the 3D point p and the camera line of sight r j , is the angle between the normal of the 3D point p and the camera line of sight r k ; p is the intersection point of the line of sight of a certain pixel on the image passing through the viewing angle i and the 3D model;
3)计算由步骤1)得到的时域插值帧和由步骤2)得到的空域插值帧的双树小波变换域的积累能量谱,并提取关键点;具体方法可包括以下步骤:3) calculate by step 1) the time domain interpolation frame that obtains and by step 2) the accumulative energy spectrum of the double tree wavelet transform domain of the space domain interpolation frame that obtains, and extract key point; Concrete method may comprise the following steps:
31)将空域插值帧和时域插值帧进行双树离散小波变换,分解为s个尺度;31) performing dual-tree discrete wavelet transform on the spatial domain interpolation frame and the time domain interpolation frame, and decomposing them into s scales;
32)分别计算实部和虚部每个尺度下的关键点能量谱{Ms}1≤s≤3,每个像素位置的关键点能量计算为:32) Calculate the key point energy spectrum {M s } 1≤s≤3 at each scale of the real part and imaginary part respectively, and the key point energy at each pixel position is calculated as:
其中{c1,K,c6}为实部或虚部对应像素的六个子带系数,参数α和β用来调整积累能量谱中尺度的重要性,α=1, Where {c 1 , K, c 6 } are the six subband coefficients of the pixels corresponding to the real or imaginary part, and the parameters α and β are used to adjust the importance of the scale in the accumulated energy spectrum, α=1,
33)将步骤32)所得的能量谱采用二维高斯核插值成原图像大小,尺度s下的插值谱定义为gs(Ms);33) The energy spectrum obtained in step 32) is interpolated into the original image size using a two-dimensional Gaussian kernel, and the interpolation spectrum under the scale s is defined as g s (M s );
34)分别计算实部和虚部的积累能量谱Ar和Ai为并得到最终的积累能量谱为
35)采用SIFT(Scale-Invariant Feature Transform)方法提取由步骤34)所得的积累能量谱的关键点;35) adopt the SIFT (Scale-Invariant Feature Transform) method to extract the key points of the accumulated energy spectrum obtained by step 34);
4)使用形状上下文(shape context)描述所提取的关键点,并将基于形状上下文的关键点匹配问题转化为平方赋值(加权二分图匹配)问题通过Hungarian方法求解;4) Use the shape context to describe the extracted key points, and convert the key point matching problem based on the shape context into a square assignment (weighted bipartite graph matching) problem and solve it by the Hungarian method;
5)通过求解以下优化问题得到最终的插值帧:5) Obtain the final interpolated frame by solving the following optimization problem:
Δf|Ω=divv
其中,f为未知的待插值帧,为拉普拉斯算子,v=(u,v)为时域插值帧的梯度向量场,为v的散度,为空域插值帧,为闭集Ω的边界。Among them, f is the unknown frame to be interpolated, Be the Laplacian operator, v=(u, v) is the gradient vector field of the time domain interpolation frame, is the divergence of v, Interpolate frames for the spatial domain, is the boundary of the closed set Ω.
本实施例对序列1的最终优化插值结果及与其他方法的比较如图2所示,其中(a)图为采用基于小波的SSIM融合方法(X.Luo,J.Zhang,and Q.Dai,“A classification-basedimage fusion scheme using wavelet transform,”in Proc.SPIE 8064,no.806400,2011.)得到的插值帧结果;(b)图为采用二维经验模态分解方法(Y.Zheng and Z.Qin,“Region-based image fusion method using bidimensional empirical modedecomposition,”Journal of Electronic Imaging,vol.18,no.1,p.013008,2009.)得到的插值帧结果;(c)图为采用本发明方法得到的插值帧结果。The present embodiment is to the final optimized interpolation result of
6)在每个时刻上,利用所有视角的图像(采集图像和插值图像)采用多视角立体方法重建场景的三维模型并渲染。6) At each moment, use images from all perspectives (acquisition images and interpolation images) to reconstruct and render the 3D model of the scene using a multi-view stereo method.
如图3所示,为采用所提出的发明方法对序列1重建的三维模型结果。其中,(a)图为可视外壳模型,(b)图为采用本发明方法重建的模型。模型采用法向图进行渲染。如图4所示,为采用所提出的发明方法对序列2重建的动态三维模型结果。其中,第一幅图是将各个时刻模型放在一起的总图,后面的图片分别为各个时刻的建模结果。As shown in Fig. 3, it is the result of the three-dimensional model reconstructed on
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110271761 CN102446366B (en) | 2011-09-14 | 2011-09-14 | Time-space jointed multi-view video interpolation and three-dimensional modeling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110271761 CN102446366B (en) | 2011-09-14 | 2011-09-14 | Time-space jointed multi-view video interpolation and three-dimensional modeling method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102446366A CN102446366A (en) | 2012-05-09 |
CN102446366B true CN102446366B (en) | 2013-06-19 |
Family
ID=46008840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110271761 Expired - Fee Related CN102446366B (en) | 2011-09-14 | 2011-09-14 | Time-space jointed multi-view video interpolation and three-dimensional modeling method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102446366B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103903300A (en) * | 2012-12-31 | 2014-07-02 | 博世汽车部件(苏州)有限公司 | Object surface height reconstructing method, object surface height reconstructing system, optical character extracting method and optical character extracting system |
US10085008B2 (en) * | 2013-09-11 | 2018-09-25 | Sony Corporation | Image processing apparatus and method |
CN104766304B (en) * | 2015-02-26 | 2017-12-05 | 浙江工业大学 | A kind of blood vessel method for registering based on multisequencing medical image |
CN106844620B (en) * | 2017-01-19 | 2020-05-12 | 天津大学 | View-based feature matching three-dimensional model retrieval method |
US10311630B2 (en) * | 2017-05-31 | 2019-06-04 | Verizon Patent And Licensing Inc. | Methods and systems for rendering frames of a virtual scene from different vantage points based on a virtual entity description frame of the virtual scene |
CN107901424B (en) * | 2017-12-15 | 2024-07-26 | 北京中睿华信信息技术有限公司 | Image acquisition modeling system |
CN108806259B (en) * | 2018-01-15 | 2021-02-12 | 江苏壹鼎崮机电科技有限公司 | BIM-based traffic control model construction and labeling method |
CN108833785B (en) * | 2018-07-03 | 2020-07-03 | 清华-伯克利深圳学院筹备办公室 | Fusion method and device of multi-view images, computer equipment and storage medium |
CN109242950B (en) * | 2018-07-11 | 2023-05-02 | 天津大学 | Multi-view human dynamic three-dimensional reconstruction method under multi-person tight interaction scene |
CN111797269A (en) * | 2020-07-21 | 2020-10-20 | 天津理工大学 | Multi-view 3D model retrieval method based on multi-level view association convolutional network |
CN112819945B (en) * | 2021-01-26 | 2022-10-04 | 北京航空航天大学 | Fluid reconstruction method based on sparse viewpoint video |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03160575A (en) * | 1989-11-20 | 1991-07-10 | Toshiba Corp | Picture display device |
CN101271579B (en) * | 2008-04-10 | 2010-06-16 | 清华大学 | Modeling of High-Speed Moving Objects Using Ring Low Frame Rate Camera Array |
CN101271582B (en) * | 2008-04-10 | 2010-06-16 | 清华大学 | 3D reconstruction method based on multi-view 2D images combined with SIFT algorithm |
TWI492188B (en) * | 2008-12-25 | 2015-07-11 | Univ Nat Chiao Tung | Method for automatic detection and tracking of multiple targets with multiple cameras and system therefor |
CN101615304A (en) * | 2009-07-31 | 2009-12-30 | 深圳先进技术研究院 | A method for generating robust visual shells |
CN101833786B (en) * | 2010-04-06 | 2011-12-28 | 清华大学 | Method and system for capturing and rebuilding three-dimensional model |
-
2011
- 2011-09-14 CN CN 201110271761 patent/CN102446366B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN102446366A (en) | 2012-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102446366B (en) | Time-space jointed multi-view video interpolation and three-dimensional modeling method | |
CN108074218B (en) | Image super-resolution method and device based on light field acquisition device | |
Wang et al. | End-to-end view synthesis for light field imaging with pseudo 4DCNN | |
Hua et al. | Holopix50k: A large-scale in-the-wild stereo image dataset | |
EP3216216B1 (en) | Methods and systems for multi-view high-speed motion capture | |
CN104365092A (en) | Method and apparatus for fusion of images | |
CN103236082A (en) | Quasi-three dimensional reconstruction method for acquiring two-dimensional videos of static scenes | |
KR102658359B1 (en) | Method for the synthesis of intermediate views of a light field, system for the synthesis of intermediate views of a light field, and method for the compression of a light field | |
CN107240147B (en) | Image rendering method and system | |
CN109447919A (en) | In conjunction with the light field super resolution ratio reconstruction method of multi-angle of view and semantic textural characteristics | |
CN110113593B (en) | A Wide Baseline Multi-View Video Synthesis Method Based on Convolutional Neural Networks | |
CN106056622B (en) | Multi-view depth video restoration method based on Kinect camera | |
CN116664782A (en) | Neural radiation field three-dimensional reconstruction method based on fusion voxels | |
CN108230223A (en) | Light field angle super-resolution rate method and device based on convolutional neural networks | |
US8717418B1 (en) | Real time 3D imaging for remote surveillance | |
CN109819158B (en) | Video stabilization method based on light field imaging | |
CN101662695B (en) | Method and device for acquiring virtual viewport | |
CN109949354A (en) | A light field depth information estimation method based on fully convolutional neural network | |
CN104217412B (en) | An airborne super-resolution image reconstruction device and reconstruction method | |
Knorr et al. | Stereoscopic 3D from 2D video with super-resolution capability | |
CN109302600B (en) | Three-dimensional scene shooting device | |
Schmeing et al. | Depth image based rendering: A faithful approach for the disocclusion problem | |
Lu et al. | A survey on multiview video synthesis and editing | |
Adhikarla et al. | Real-time adaptive content retargeting for live multi-view capture and light field display | |
CN108615221A (en) | Light field angle super-resolution rate method and device based on the two-dimentional epipolar plane figure of shearing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200703 Address after: 411, block a, Zhizao street, Zhongguancun, No. 45, Chengfu Road, Haidian District, Beijing 100080 Patentee after: Beijing Youke Nuclear Power Technology Development Co.,Ltd. Address before: 300072 Tianjin City, Nankai District Wei Jin Road No. 92 Patentee before: Tianjin University |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201010 Address after: 100094 Beijing city Haidian District Cui Hunan loop 13 Hospital No. 7 Building 7 room 701 Patentee after: Beijing lingyunguang Technology Group Co.,Ltd. Address before: 411, block a, Zhizao street, Zhongguancun, No. 45, Chengfu Road, Haidian District, Beijing 100080 Patentee before: Beijing Youke Nuclear Power Technology Development Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100094 701, 7 floor, 7 building, 13 Cui Hunan Ring Road, Haidian District, Beijing. Patentee after: Lingyunguang Technology Co.,Ltd. Address before: 100094 701, 7 floor, 7 building, 13 Cui Hunan Ring Road, Haidian District, Beijing. Patentee before: Beijing lingyunguang Technology Group Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210113 Address after: 518000 room 1101, 11th floor, building 2, C District, Nanshan Zhiyuan, Nanshan District, Shenzhen City, Guangdong Province Patentee after: Shenzhen Lingyun Shixun Technology Co.,Ltd. Address before: 100094 701, 7 floor, 7 building, 13 Cui Hunan Ring Road, Haidian District, Beijing. Patentee before: Lingyunguang Technology Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130619 |
|
CF01 | Termination of patent right due to non-payment of annual fee |