CN102446366B

CN102446366B - Time-space jointed multi-view video interpolation and three-dimensional modeling method

Info

Publication number: CN102446366B
Application number: CN 201110271761
Authority: CN
Inventors: 李坤; 杨敬钰
Original assignee: Tianjin University
Current assignee: Shenzhen Lingyun Shixun Technology Co ltd; Luster LightTech Co Ltd
Priority date: 2011-09-14
Filing date: 2011-09-14
Publication date: 2013-06-19
Anticipated expiration: 2031-09-14
Also published as: CN102446366A

Abstract

The invention belongs to the technical field of computer multimedia. In order to provide a simple and practical method for multi-view video interpolation and three-dimensional modeling, the technical solution adopted by the present invention is to combine space-time multi-view video interpolation and three-dimensional modeling method to group multi-camera arrays at intervals; The three-dimensional model of the scene specifically includes the following steps: 1) Interpolating uncollected frames between two frames; 2) Using a model-assisted weighting method to obtain an image of a perspective that has not been collected at this moment; 3) Calculating and extracting key points; 4 ) Use the shape context to describe the extracted key points, and solve it through the Hungarian method; 5) Obtain the final interpolation frame by solving the Poisson editing optimization problem; 6) Reconstruct the 3D model of the scene and render it. The invention is mainly applied to antenna design and manufacture.

Description

Spatio-temporal joint multi-view video interpolation and 3D modeling method

技术领域 technical field

本发明属于计算机多媒体技术领域，具体讲，涉及时空联合多视角视频插值及三维建模方法。The invention belongs to the technical field of computer multimedia, and specifically relates to a time-space joint multi-view video interpolation and three-dimensional modeling method.

背景技术 Background technique

长期以来，单路视频的采集、处理与通信在关键技术上取得了重要突破，已经趋于成熟并在广播电视、互联网视频、智能交通等多个领域获得广泛应用。然而，传统单摄像机采集形式不能带来深度感、立体感以及对对象的全方位认识(视角可变)。基于多摄像机系统的多路视频采集及对场景对象的重建则可达到全方位的视觉感受，相关研究在上世纪90年代中期开始成为研究热点。基于多摄像机系统的三维场景实时获取及重建技术在自由视点视频、虚拟现实、沉浸视频会议、电影娱乐、立体视频及运动分析等领域有着广泛应用。国际上多所著名大学与研究机构如：斯坦福、麻省理工、卡奈基梅隆、哥伦比亚大学、三菱电子、微软研究院、马克斯-普朗克信息研究所都搭建了各种多摄像机采集系统，以用于场景几何捕获、运动分析以及立体制作。现阶段，基于多摄像机系统的采集和重建技术由于存在摄像机搭建与同步、摄像机存储与传输、高维数据处理、高速运动捕捉等方面的问题而难以获得用户满意的重建效果。其中，要实现对高速运动的捕捉，一种方法是采用多个高速摄像机，但高速摄像机价格昂贵且存储能力有限；另一种方法是采用多个廉价的低帧率摄像机，对这些摄像机进行合理分组，同组摄像机同时采样，不同组摄像机间插采样，如此得到稀疏采样的空时信息，然后通过插值方法实现高帧率的重建。斯坦福大学(Wilburn B，Joshi N，Vaish V，et al.High-speed videography using a dense camera array.Proceedings of IEEE Conference on ComputerVision and Pattern Recognition，Washington，DC，USA，2004.294-301.)通过52个帧率为30fps的密集光场摄像机阵列实现了单视角的高速场景重现。但是该方法结果仅限于单视角，不能得到各个时刻多视角的图像，进而无法重建各个时刻的三维模型。本发明第一发明人曾提出一种采用环形低帧率摄像机阵列对高速运动物体建模的方法(ZL200810103684.2)来实现全视角三维重建，但是该方法仅仅简单地利用可视外壳模型求交来进行插值和重建，因此插值和重建效果很一般且不鲁棒。虽然可以采用现有视频插值、图像融合方法来获取未采集的多视角视频，但是所得结果会存在模糊或不平滑区域。For a long time, the acquisition, processing and communication of single-channel video have made important breakthroughs in key technologies, and have become mature and widely used in broadcast television, Internet video, intelligent transportation and other fields. However, the traditional single-camera acquisition method cannot bring a sense of depth, a sense of three-dimensionality, and a comprehensive understanding of objects (variable viewing angles). Multi-channel video acquisition and reconstruction of scene objects based on a multi-camera system can achieve a full range of visual experience, and related research has become a research hotspot since the mid-1990s. Real-time acquisition and reconstruction of 3D scenes based on multi-camera systems has been widely used in the fields of free viewpoint video, virtual reality, immersive video conferencing, movie entertainment, stereoscopic video, and motion analysis. Many famous universities and research institutions in the world such as: Stanford, MIT, Carnegie Mellon, Columbia University, Mitsubishi Electronics, Microsoft Research Institute, Max Planck Institute for Information Research have built various multi-camera acquisition systems , for scene geometry capture, motion analysis, and stereoscopic production. At present, acquisition and reconstruction technologies based on multi-camera systems are difficult to obtain satisfactory reconstruction results due to problems in camera construction and synchronization, camera storage and transmission, high-dimensional data processing, and high-speed motion capture. Among them, to realize the capture of high-speed motion, one method is to use multiple high-speed cameras, but high-speed cameras are expensive and have limited storage capacity; Grouping, the same group of cameras sampling at the same time, and different groups of cameras interleaving sampling, so as to obtain sparsely sampled space-time information, and then achieve high frame rate reconstruction through interpolation methods. Stanford University (Wilburn B, Joshi N, Vaish V, et al. High-speed videography using a dense camera array. Proceedings of IEEE Conference on ComputerVision and Pattern Recognition, Washington, DC, USA, 2004.294-301.) through 52 frames The dense light field camera array with a rate of 30fps realizes high-speed scene reproduction from a single perspective. However, the result of this method is limited to a single perspective, and images from multiple perspectives at each moment cannot be obtained, and thus the 3D model at each moment cannot be reconstructed. The first inventor of the present invention once proposed a method (ZL200810103684.2) for modeling high-speed moving objects using a ring-shaped low frame rate camera array to achieve full-view 3D reconstruction, but this method only simply uses the visible shell model to calculate the intersection To perform interpolation and reconstruction, so the interpolation and reconstruction effects are very general and not robust. Although existing video interpolation and image fusion methods can be used to obtain uncaptured multi-view videos, the results will have blurred or non-smooth areas.

发明内容 Contents of the invention

为克服现有技术的不足，提供一种简便实用的多视角视频插值及三维建模方法，本发明采取的技术方案是，时空联合多视角视频插值及三维建模方法，将多摄像机阵列间隔分组：设有n个帧率为f帧/秒的摄像机，均匀间隔分为m组，n、m为正整数，且n为m的整数倍；同组摄像机同步采集得到同一时刻n/m个视角的视频，不同组摄像机以1/(fm)秒的时间间插进行采集得到不同时刻的视频；采用所提出的时空联合多视角视频插值及三维建模方法得到所有时刻n个视角的视频，进而重建出每个时刻上场景的三维模型，具体方法包括以下步骤：In order to overcome the deficiencies of the prior art and provide a simple and practical method for multi-view video interpolation and three-dimensional modeling, the technical solution adopted by the present invention is to combine space-time multi-view video interpolation and three-dimensional modeling method to group multi-camera arrays at intervals : There are n cameras with a frame rate of f frames per second, which are divided into m groups at even intervals, where n and m are positive integers, and n is an integer multiple of m; the same group of cameras can be synchronously collected to obtain n/m viewing angles at the same time Different groups of cameras interleaved at 1/(fm) seconds to collect videos at different moments; using the proposed spatio-temporal joint multi-view video interpolation and 3D modeling method to obtain videos of n viewing angles at all times, and then The three-dimensional model of the scene on each moment is reconstructed, and the specific method includes the following steps:

1)对于每一个摄像机，采用光流方法求取相邻两个采集帧之间的前向光流和后向光流，进而插值出两帧之间的未采集帧，即时域插值帧；1) For each camera, use the optical flow method to obtain the forward optical flow and backward optical flow between two adjacent acquisition frames, and then interpolate the unacquired frames between the two frames, which is the interpolation frame in the instant domain;

2)对于每一个采集时刻，采用模型辅助的加权方法得到该时刻未采集的视角的图像，即空域插值帧；2) For each acquisition moment, the model-assisted weighting method is used to obtain the image of the angle of view that is not collected at this moment, that is, the spatial interpolation frame;

3)计算由步骤1)得到的时域插值帧和由步骤2)得到的空域插值帧的双树离散小波域的积累能量谱，并提取关键点；3) calculate the time domain interpolation frame obtained by step 1) and the accumulated energy spectrum of the dual tree discrete wavelet domain of the space domain interpolation frame obtained by step 2), and extract key points;

4)使用shape context形状上下文描述所提取的关键点，并将基于形状上下文的关键点匹配问题转化为平方赋值即加权二分图匹配问题，通过Hungarian方法求解；4) Use the shape context to describe the extracted key points, and convert the key point matching problem based on the shape context into a square assignment, that is, a weighted bipartite graph matching problem, and solve it by the Hungarian method;

5)通过求解泊松编辑优化问题得到最终的插值帧；5) Obtain the final interpolation frame by solving the Poisson editing optimization problem;

6)在每个时刻上，利用所有视角的图像，包括采集图像和插值图像，采用多视角立体方法重建场景的三维模型并渲染。6) At each moment, using images from all perspectives, including acquisition images and interpolation images, the multi-view stereo method is used to reconstruct the 3D model of the scene and render it.

模型辅助的加权方法具体包括以下步骤：The model-assisted weighting method specifically includes the following steps:

21)通过简单的差分或者蓝屏分割技术由采集到视角图像提取三维物体的轮廓图；21) Extract the contour map of the three-dimensional object from the collected perspective image through simple difference or blue screen segmentation technology;

22)利用步骤21)计算得到的轮廓图，通过EPVH方法重建粗略的三维模型，即可视外壳模型；22) Using the contour map calculated in step 21), reconstruct a rough three-dimensional model, namely the visible shell model, by the EPVH method;

23)对于每一个未采集视角i，利用与之相邻最近的两个采集视角j和k的图像进行加权插值，权值计算如下：23) For each uncollected viewing angle i, weighted interpolation is performed using the images of the two closest adjacent collection viewing angles j and k, and the weight is calculated as follows:

(1) (1)

其中，Θ和Ф为两个常量角，分别代表允许的摄像机视线之间夹角的最大值和三维点法线与摄像机视线之间夹角的最大值；θ₁为摄像机视线r_i和摄像机视线r_j的夹角，θ₂为摄像机视线r_i和摄像机视线r_k的夹角，

为三维点p的法线与摄像机视线r_j的夹角，

为三维点p的法线与摄像机视线r_k的夹角；p为通过视角i图像上某一像素的视线与三维模型的交点。Among them, Θ and Ф are two constant angles, which respectively represent the maximum value of the angle between the allowable camera line of sight and the maximum angle between the three-dimensional point normal and the camera line of sight; θ ₁ is the camera line of sight r _i and the camera line of sight The included angle of r _j , θ ₂ is the included angle between camera line of sight r _i and camera line of sight r _k ,

is the angle between the normal of the 3D point p and the camera line of sight r _j ,

is the angle between the normal of the 3D point p and the camera line of sight r _k ; p is the intersection point of the line of sight of a certain pixel on the image passing through the viewing angle i and the 3D model.

计算双树离散小波域的积累能量谱并提取关键点，具体方法包括以下步骤：Calculate the cumulative energy spectrum of the dual tree discrete wavelet domain and extract key points, the specific method includes the following steps:

31)将空域插值帧和时域插值帧进行双树离散小波变换，分解为S个尺度；31) performing dual-tree discrete wavelet transform on the spatial domain interpolation frame and the time domain interpolation frame, and decomposing them into S scales;

32)分别计算实部和虚部每个尺度下的关键点能量谱{M_s}_1≤s≤S，每个像素位置的关键点能量计算为：32) Calculate the key point energy spectrum {M _s } _1≤s≤S at each scale of the real part and imaginary part respectively, and the key point energy at each pixel position is calculated as:

$E E. ((s the s)) = = {α α}^{s the s} {(({Π Π}_{n no = = 11}^{66} {c c}_{n no}))}^{β β} - - - - - - ((22))$

其中{c₁，K，c₆}为实部或虚部对应像素的六个子带系数，参数α和β用来调整积累能量谱中尺度的重要性；Where {c ₁ , K, c ₆ } are the six subband coefficients of the pixels corresponding to the real or imaginary part, and the parameters α and β are used to adjust the importance of the scale in the accumulated energy spectrum;

33)将步骤32)所得的能量谱采用二维高斯核插值成原图像大小，尺度s下的插值谱定义为g_s(M_s)；33) The energy spectrum obtained in step 32) is interpolated into the original image size using a two-dimensional Gaussian kernel, and the interpolation spectrum under the scale s is defined as g _s (M _s );

34)分别计算实部和虚部的积累能量谱A_r和A_i为

并得到最终的积累能量谱为

A = \sqrt{{A_{r}}^{2} + {A_{i}}^{2}};

34) Calculate the cumulative energy spectra A _r and A _i of the real and imaginary parts respectively as

And get the final accumulated energy spectrum as

A = \sqrt{{A_{r}}^{2} + {A_{i}}^{2}};

35)采用SIFT方法提取由步骤34)所得的积累能量谱的关键点。35) Using the SIFT method to extract the key points of the accumulated energy spectrum obtained in step 34).

基于形状上下文的关键点匹配方法具体为通过求解以下优化问题得到最终的插值帧：The key point matching method based on the shape context is specifically to obtain the final interpolation frame by solving the following optimization problem:

Δf|_Ω＝divv $s . t . f |_{&PartialD; Ω} = \hat{f} |_{&PartialD; Ω} - - - (3)$ Δf | _Ω = divv $the s . t . f |_{&PartialD; Ω} = \hat{f} |_{&PartialD; Ω} - - - (3)$

其中，f为未知的待插值帧，

为拉普拉斯算子，v＝(u，v)为时域插值帧的梯度向量场，

为v的散度，为空域插值帧，

为闭集Ω的边界，s.t.表示“满足...条件”的意思，|_Ω是指在闭集Ω上，

表示在闭集Ω边界上。Among them, f is the unknown frame to be interpolated,

Be the Laplacian operator, v=(u, v) is the gradient vector field of the time domain interpolation frame,

is the divergence of v, Interpolate frames for the spatial domain,

is the boundary of the closed set Ω, st means "satisfies the ... condition", | _Ω refers to the closed set Ω,

Expressed on the closed set Ω boundary.

本发明的方法的特点及效果：Features and effects of the method of the present invention:

本发明方法避免了昂贵的高速摄像机的需要和已有视频插值、图像融合方法不平滑等问题，通过空时采样、空时插值和空时优化，实现了低帧率摄像机采集条件下的高帧率多视角视频恢复和场景的三维重建。具有以下特点：The method of the invention avoids the need for expensive high-speed cameras and the problems of existing video interpolation and unsmooth image fusion methods. Through space-time sampling, space-time interpolation and space-time optimization, high frame rates under low frame rate camera acquisition conditions are realized. High-rate multi-view video restoration and 3D reconstruction of scenes. Has the following characteristics:

1、程序简单，易于实现。1. The program is simple and easy to implement.

2、针对非平面摄像机系统的空时采样与插值。对低帧率摄像机所组成的系统进行合理分组，使得每组摄像机同步且均匀分布。不同组摄像机在时域上间插采集动态场景。每个采样时刻上，没有采样到的摄像机图像采用加权方法进行空域插值，采用双向光流方法进行时域插值。2. Space-time sampling and interpolation for non-planar camera systems. Reasonably group the system composed of low frame rate cameras, so that each group of cameras is synchronized and evenly distributed. Different groups of cameras interleavedly capture dynamic scenes in the time domain. At each sampling moment, the unsampled camera images are interpolated in the spatial domain using a weighted method, and interpolated in the temporal domain using a bidirectional optical flow method.

3、基于双树离散小波变换的形状上下文的优化方法。将空时信息优化转化为图像的泊松编辑问题求解。利用双树离散小波变换的时移不变性和方向选择性，提取边缘(高频信息)附近的感兴趣关键点，然后采用形状上下文的方法匹配这些关键点作为边界约束。3. The optimization method of shape context based on dual-tree discrete wavelet transform. Transforming the optimization of space-time information into the solution of Poisson editing problems for images. Using the time-shift invariance and direction selectivity of dual-tree discrete wavelet transform, key points of interest near edges (high-frequency information) are extracted, and then the shape context method is used to match these key points as boundary constraints.

本发明可以采用低帧率摄像机系统实现时域上密集的动态场景三维重建。所提出的方法具有很好的可扩展性：可以通过简单地加入更多的摄像机或采用更高帧率的摄像机来获得时域分辨率更高的多视角视频恢复和三维动态场景重建。The present invention can use a low frame rate camera system to realize three-dimensional reconstruction of dense dynamic scenes in time domain. The proposed method is very scalable: multi-view video restoration and 3D dynamic scene reconstruction with higher temporal resolution can be obtained simply by adding more cameras or adopting higher frame rate cameras.

附图说明 Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1为本发明实施例的时空联合多视角视频插值及三维建模方法流程图；Fig. 1 is a flow chart of a spatio-temporal joint multi-view video interpolation and three-dimensional modeling method according to an embodiment of the present invention;

图2为本发明实施例采用所提出的发明方法和其他两种方法对序列1恢复的某一未采集帧结果；Fig. 2 adopts the proposed inventive method and other two methods to recover a certain uncollected frame result of sequence 1 in the embodiment of the present invention;

图3为本发明实施例对序列1重建的可见外壳模型和采用所提出的发明方法重建的三维模型结果；Fig. 3 is the visible shell model reconstructed by the embodiment of the present invention for sequence 1 and the result of the three-dimensional model reconstructed by the proposed inventive method;

图4为本发明实施例采用所提出的发明方法对序列2重建的动态三维模型结果。Fig. 4 is the result of the dynamic three-dimensional model reconstructed for sequence 2 by using the proposed inventive method according to the embodiment of the present invention.

具体实施方式 Detailed ways

本发明将时空联合多视角视频插值转化为图像的泊松编辑问题求解，其中关键点提取与匹配利用了双树离散小波变换(dual-tree discrete wavelet transform，DDWT)的方向性和形状上下文(shape context)的鲁棒性，实现了高帧率高质量的多视角视频插值。所得结果具有插值效果好，精度高，重建三维模型精确完整、可在分组间插采样的低帧率摄像机阵列的条件下实现的特点。The invention solves the problem of Poisson editing by transforming joint spatio-temporal multi-view video interpolation into images, wherein key point extraction and matching utilizes the directionality and shape context (shape) of dual-tree discrete wavelet transform (DDWT) context) to achieve multi-view video interpolation with high frame rate and high quality. The obtained result has the characteristics of good interpolation effect, high precision, accurate and complete reconstructed three-dimensional model, and can be realized under the condition of low frame rate camera array with group interleaved sampling.

本发明的时空联合多视角视频插值及三维建模方法，其特征在于：The space-time joint multi-view video interpolation and three-dimensional modeling method of the present invention is characterized in that:

将多摄像机阵列间隔分组(设有n个帧率为f帧/秒的摄像机，均匀间隔分为m组，n、m为正整数，且n为m的整数倍)，同组摄像机同步采集得到同一时刻n/m个视角的视频，不同组摄像机以1/(fm)秒的时间间插进行采集得到不同时刻的视频；采用所提出的时空联合多视角视频插值及三维建模方法得到所有时刻n个视角的视频，进而重建出每个时刻上场景的三维模型，具体方法包括以下步骤：Group the multi-camera array at intervals (set n cameras with a frame rate of f frames per second, and divide them into m groups at even intervals, where n and m are positive integers, and n is an integer multiple of m), and the same group of cameras collects synchronously to obtain Videos of n/m angles of view at the same moment, different groups of cameras are interleaved at 1/(fm) seconds to collect videos at different moments; using the proposed spatio-temporal joint multi-angle video interpolation and 3D modeling method to obtain all moments Videos from n perspectives, and then reconstruct the 3D model of the scene at each moment. The specific method includes the following steps:

1)对于每一个摄像机，采用Brox等提出的光流方法(Brox T，Bruhn A，Papenberg N，et al.High accuracy optical flow estimation based on a theory for warping.Proceedings of EuropeanConference on Computer Vision，volume 3024，2004.25-36.)求取相邻两个采集帧之间的前向光流和后向光流，进而插值出两帧之间的m-1帧；具体的时域插值方法可包括以下步骤：1) For each camera, use the optical flow method proposed by Brox et al. (Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping. Proceedings of European Conference on Computer Vision, volume 3024, 2004.25-36.) Calculate the forward optical flow and backward optical flow between two adjacent acquisition frames, and then interpolate the m-1 frame between the two frames; the specific time domain interpolation method may include the following steps:

11)假设运动路径是线性的，即待插值帧上的像素在运动路径上的位置与该帧在最近两个采集帧之间的相对位置成正比，根据前向光流和后向光流结果，计算待插值帧对应像素的前向光流插值和后向光流插值；11) Assuming that the motion path is linear, that is, the position of the pixel on the motion path on the frame to be interpolated is proportional to the relative position of the frame between the two latest acquisition frames, according to the forward optical flow and backward optical flow results , calculate the forward optical flow interpolation and backward optical flow interpolation of the pixel corresponding to the frame to be interpolated;

12)对于待插值帧上的每一个像素，取其前向光流插值和后向光流插值的平均值作为最终的估计结果；12) For each pixel on the frame to be interpolated, take the average value of its forward optical flow interpolation and backward optical flow interpolation as the final estimation result;

13)采用八邻域滤波方法填补待插值帧中未被赋值的像素空洞；13) Using an eight-neighborhood filtering method to fill in unassigned pixel holes in the frame to be interpolated;

2)对于每一个采集时刻，采用模型辅助的加权方法得到该时刻未采集的视角的图像；具体方法可包括以下步骤：2) For each collection moment, adopt the model-assisted weighting method to obtain the image of the angle of view that is not collected at this moment; the specific method may include the following steps:

22)利用步骤21)计算得到的轮廓图，通过EPVH(Exact Polyhedral Visual Hulls，FrancoJ S，Boyer E.Exact polyhedral visual hulls.Proceedings of British Machine Vision Conference，1994.329-338.)方法重建粗略的三维模型，即可视外壳模型；22) Utilize the outline figure calculated in step 21) to reconstruct a rough three-dimensional model by EPVH (Exact Polyhedral Visual Hulls, FrancoJ S, Boyer E.Exact polyhedral visual hulls.Proceedings of British Machine Vision Conference, 1994.329-338.) method, That is, the visual shell model;

(1)

为三维点p的法线与摄像机视线r_j的夹角，

为三维点p的法线与摄像机视线r_k的夹角；p为通过视角i图像上某一像素的视线与三维模型的交点；Among them, Θ and Ф are two constant angles, which respectively represent the maximum value of the angle between the allowable camera line of sight and the maximum angle between the three-dimensional point normal and the camera line of sight; θ ₁ is the camera line of sight r _i and the camera line of sight The included angle of r _j , θ ₂ is the included angle between camera line of sight r _i and camera line of sight r _k ,

is the angle between the normal of the 3D point p and the camera line of sight r _k ; p is the intersection point of the line of sight of a certain pixel on the image passing through the viewing angle i and the 3D model;

3)计算由步骤1)得到的时域插值帧和由步骤2)得到的空域插值帧的双树小波变换域的积累能量谱，并提取关键点；具体方法可包括以下步骤：3) calculate by step 1) the time domain interpolation frame that obtains and by step 2) the accumulative energy spectrum of the double tree wavelet transform domain of the space domain interpolation frame that obtains, and extract key point; Concrete method may comprise the following steps:

其中{c₁，K，c₆}为实部或虚部对应像素的六个子带系数，参数α和β用来调整积累能量谱中尺度的重要性；Where {c ₁ , K, c ₆ } are the six subband coefficients of the pixels corresponding to the real part or imaginary part, and the parameters α and β are used to adjust the importance of the scale in the accumulated energy spectrum;

34)分别计算实部和虚部的积累能量谱A_r和A_i为

并得到最终的积累能量谱为

A = \sqrt{{A_{r}}^{2} + {A_{i}}^{2}};

And get the final accumulated energy spectrum as

A = \sqrt{{A_{r}}^{2} + {A_{i}}^{2}};

35)采用SIFT(Scale-Invariant Feature Transform)方法提取由步骤34)所得的积累能量谱的关键点；35) adopt the SIFT (Scale-Invariant Feature Transform) method to extract the key points of the accumulated energy spectrum obtained by step 34);

4)使用形状上下文(shape context)描述所提取的关键点，并将基于形状上下文的关键点匹配问题转化为平方赋值(加权二分图匹配)问题通过Hungarian方法求解；4) Use the shape context to describe the extracted key points, and convert the key point matching problem based on the shape context into a square assignment (weighted bipartite graph matching) problem and solve it by the Hungarian method;

5)通过求解以下优化问题得到最终的插值帧：5) Obtain the final interpolated frame by solving the following optimization problem:

其中，f为未知的待插值帧，

为拉普拉斯算子，v＝(u，v)为时域插值帧的梯度向量场，

为v的散度，为空域插值帧，

为闭集Ω的边界。Among them, f is the unknown frame to be interpolated,

is the divergence of v, Interpolate frames for the spatial domain,

is the boundary of the closed set Ω.

6)在每个时刻上，利用所有视角的图像(采集图像和插值图像)采用多视角立体方法重建场景的三维模型并渲染。6) At each moment, use images from all perspectives (acquisition images and interpolation images) to reconstruct and render the 3D model of the scene using a multi-view stereo method.

本发明提出了一种时空联合多视角视频插值及三维建模的方法，结合附图及实施例详细说明如下：The present invention proposes a method for spatio-temporal joint multi-view video interpolation and three-dimensional modeling, which is described in detail in conjunction with the accompanying drawings and embodiments as follows:

实现本发明方法的系统实施例结构为：20个帧率为30帧/秒的摄像机呈环形分布环绕待采集的场景。将该多摄像机阵列均匀间隔分为4组，同组摄像机同步采集得到同一时刻5个视角的视频，不同组摄像机以1/120秒的时间间插进行采集得到不同时刻的视频；采用所提出的时空联合多视角视频插值及三维建模方法得到所有时刻20个视角的视频，进而重建出每个时刻上场景的三维模型。如图1所示，为本发明实施例的时空联合多视角视频插值及三维建模方法流程图，包括以下步骤：The structure of the embodiment of the system for realizing the method of the present invention is as follows: 20 cameras with a frame rate of 30 frames per second are distributed in a ring around the scene to be collected. The multi-camera array is divided into 4 groups at even intervals, and the cameras in the same group are synchronously collected to obtain videos from 5 viewing angles at the same time, and different groups of cameras are interleaved at 1/120 seconds to collect videos at different times; the proposed Combined spatio-temporal multi-view video interpolation and 3D modeling method to obtain videos from 20 viewing angles at all moments, and then reconstruct the 3D model of the scene at each moment. As shown in Figure 1, it is a flow chart of a spatio-temporal joint multi-view video interpolation and three-dimensional modeling method according to an embodiment of the present invention, including the following steps:

1)对于每一个摄像机，采用Brox等提出的光流方法(Brox T，Bruhn A，Papenberg N，et al.High accuracy optical flow estimation based on a theory for warping.Proceedings of European Conference on Computer Vision，volume 3024，2004.25-36.)求取相邻两个采集帧之间的前向光流和后向光流，进而插值出两帧之间的3帧；具体的时域插值方法可包括以下步骤：1) For each camera, use the optical flow method proposed by Brox et al. (Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping. Proceedings of European Conference on Computer Vision, volume 3024 , 2004.25-36.) Calculate the forward optical flow and backward optical flow between two adjacent acquisition frames, and then interpolate 3 frames between the two frames; the specific time-domain interpolation method may include the following steps:

22)利用步骤21)计算得到的轮廓图，通过EPVH(Exact Polyhedral Visual Hulls，Franco J S，Boyer E.Exact polyhedral visual hulls.Proceedings of British MachineVision Conference，1994.329-338.)方法重建粗略的三维模型，即可视外壳模型；22) Using the contour map calculated in step 21), a rough three-dimensional model is reconstructed by the EPVH (Exact Polyhedral Visual Hulls, Franco J S, Boyer E. Exact polyhedral visual hulls. Proceedings of British MachineVision Conference, 1994.329-338.) method, That is, the visual shell model;

其中，Θ＝80°和Ф＝70°为两个常量角，分别代表允许的摄像机视线之间夹角的最大值和三维点法线与摄像机视线之间夹角的最大值；θ₁为摄像机视线r_i和摄像机视线r_j的夹角，θ₂为摄像机视线r_i和摄像机视线r_k的夹角，

为三维点p的法线与摄像机视线r_j的夹角，为三维点p的法线与摄像机视线r_k的夹角；p为通过视角i图像上某一像素的视线与三维模型的交点；Among them, Θ=80° and Ф=70° are two constant angles, which respectively represent the maximum value of the angle between the allowable camera line of sight and the maximum value of the angle between the three-dimensional point normal and the camera line of sight; θ ₁ is the camera The angle between line of sight r _i and camera line of sight r _j , θ ₂ is the angle between camera line of sight r _i and camera line of sight r _k ,

is the angle between the normal of the 3D point p and the camera line of sight r _j , is the angle between the normal of the 3D point p and the camera line of sight r _k ; p is the intersection point of the line of sight of a certain pixel on the image passing through the viewing angle i and the 3D model;

32)分别计算实部和虚部每个尺度下的关键点能量谱{M_s}_1≤s≤3，每个像素位置的关键点能量计算为：32) Calculate the key point energy spectrum {M _s } _1≤s≤3 at each scale of the real part and imaginary part respectively, and the key point energy at each pixel position is calculated as:

其中{c₁，K，c₆}为实部或虚部对应像素的六个子带系数，参数α和β用来调整积累能量谱中尺度的重要性，α＝1，

Where {c ₁ , K, c ₆ } are the six subband coefficients of the pixels corresponding to the real or imaginary part, and the parameters α and β are used to adjust the importance of the scale in the accumulated energy spectrum, α=1,

34)分别计算实部和虚部的积累能量谱A_r和A_i为

并得到最终的积累能量谱为

A = \sqrt{{A_{r}}^{2} + {A_{i}}^{2}};

And get the final accumulated energy spectrum as

A = \sqrt{{A_{r}}^{2} + {A_{i}}^{2}};

其中，f为未知的待插值帧，

为拉普拉斯算子，v＝(u，v)为时域插值帧的梯度向量场，

为v的散度，

为空域插值帧，为闭集Ω的边界。Among them, f is the unknown frame to be interpolated,

is the divergence of v,

Interpolate frames for the spatial domain, is the boundary of the closed set Ω.

本实施例对序列1的最终优化插值结果及与其他方法的比较如图2所示，其中(a)图为采用基于小波的SSIM融合方法(X.Luo，J.Zhang，and Q.Dai，“A classification-basedimage fusion scheme using wavelet transform，”in Proc.SPIE 8064，no.806400，2011.)得到的插值帧结果；(b)图为采用二维经验模态分解方法(Y.Zheng and Z.Qin，“Region-based image fusion method using bidimensional empirical modedecomposition，”Journal of Electronic Imaging，vol.18，no.1，p.013008，2009.)得到的插值帧结果；(c)图为采用本发明方法得到的插值帧结果。The present embodiment is to the final optimized interpolation result of sequence 1 and the comparison with other methods as shown in Figure 2, wherein (a) figure is to adopt wavelet-based SSIM fusion method (X.Luo, J.Zhang, and Q.Dai, "A classification-based image fusion scheme using wavelet transform," in Proc.SPIE 8064, no.806400, 2011.) Interpolated frame results; (b) The picture shows the two-dimensional empirical mode decomposition method (Y.Zheng and Z .Qin, "Region-based image fusion method using bidimensional empirical modedecomposition," Journal of Electronic Imaging, vol.18, no.1, p.013008, 2009.) obtained interpolation frame results; (c) the figure shows the use of the present invention method to get the interpolated frame result.

如图3所示，为采用所提出的发明方法对序列1重建的三维模型结果。其中，(a)图为可视外壳模型，(b)图为采用本发明方法重建的模型。模型采用法向图进行渲染。如图4所示，为采用所提出的发明方法对序列2重建的动态三维模型结果。其中，第一幅图是将各个时刻模型放在一起的总图，后面的图片分别为各个时刻的建模结果。As shown in Fig. 3, it is the result of the three-dimensional model reconstructed on sequence 1 by using the proposed inventive method. Wherein, the figure (a) is a visible shell model, and the figure (b) is a model reconstructed by the method of the present invention. The model is rendered with a normal map. As shown in Fig. 4, it is the result of the dynamic 3D model reconstructed on sequence 2 using the proposed inventive method. Among them, the first picture is the general picture that puts the models of each moment together, and the following pictures are the modeling results of each moment.

Claims

1. A space-time joint multi-view video interpolation and three-dimensional modeling method is characterized in that a plurality of camera arrays are grouped at intervals: n cameras with frame rate of f frames/second are arranged and are uniformly divided into m groups at intervals, n and m are positive integers, and n is an integral multiple of m; the same group of cameras synchronously acquire n/m videos of visual angles at the same moment, and the different groups of cameras acquire the videos at different moments by time interpolation of 1/(fm) seconds; the method comprises the following steps of obtaining videos of n visual angles at all moments by adopting the proposed space-time joint multi-visual-angle video interpolation and three-dimensional modeling method, and further reconstructing a three-dimensional model of a scene at each moment, wherein the method comprises the following steps:

1) for each camera, forward optical flow and backward optical flow between two adjacent collected frames are obtained by adopting an optical flow method, and then the non-collected frames between the two frames, namely the frames subjected to the time domain interpolation, are interpolated;

2) for each acquisition moment, obtaining an image of an uncollected visual angle at the moment by adopting a model-assisted weighting method, namely a spatial domain interpolation frame;

3) calculating the accumulated energy spectrum of the dual-tree discrete wavelet domain of the time domain interpolation frame obtained in the step 1) and the spatial domain interpolation frame obtained in the step 2), and extracting key points;

4) describing the extracted key points by using shape context, converting a key point matching problem based on the shape context into square assignment, namely a weighted bipartite graph matching problem, and solving by using a Hungarian method;

5) obtaining a final interpolation frame by solving a Poisson editing optimization problem;

6) at each moment, images of all visual angles are utilized, including collected images and interpolation images, and a three-dimensional model of a scene is reconstructed and rendered by adopting a multi-visual angle stereo method.

2. The method of claim 1, wherein the model-assisted weighting method specifically comprises the steps of:

21) extracting a contour map of the three-dimensional object from the collected visual angle image by a differential or blue screen segmentation technology;

22) reconstructing a rough three-dimensional model, namely a visible shell model, by using the contour map obtained by calculation in the step 21) through an EPVH method;

23) for each non-collection visual angle i, performing weighted interpolation by using the images of two collection visual angles j and k adjacent to the non-collection visual angle i, wherein the weight is calculated as follows:

wherein, theta and phi are two constant angles which respectively represent the maximum value of an included angle between allowed camera sight lines and the maximum value of an included angle between a three-dimensional point normal and the camera sight line; theta₁Is the camera sight line r_iAnd the camera sight line r_jAngle of (a) of₂Is the camera sight line r_iAnd the camera sight line r_kThe angle of,

normal to the three-dimensional point p and the camera's line of sight r_jThe angle of,

normal to the three-dimensional point p and the camera's line of sight r_kThe included angle of (A); p is the intersection of the three-dimensional model and a line of sight through a pixel on the view i image.

3. The method of claim 1, wherein the accumulated energy spectrum of the dual-tree discrete wavelet domain is calculated and the key points are extracted, the method comprising the steps of:

31) performing double-tree discrete wavelet transform on the spatial domain interpolation frame and the time domain interpolation frame, and decomposing the spatial domain interpolation frame and the time domain interpolation frame into S scales;

32) respectively calculating the energy spectrum { M of the key point under each scale of the real part and the imaginary part_s}_1≤i≤SThe keypoint energy for each pixel position is calculated as:

wherein c₁,…c₆The parameters alpha and beta are used for adjusting the importance of the scale in the accumulated energy spectrum;

33) subjecting the energy obtained in step 32)The spectrum is interpolated by two-dimensional Gaussian kernel to obtain the original image size, and the interpolation spectrum at the scale s is defined as g_s(M_s)；

34) Calculating the accumulated energy spectrum A of the real part and the imaginary part respectively_rAnd A_iAs a result of

And obtaining a final accumulated energy spectrum of

35) Extracting key points of the accumulated energy spectrum obtained in the step 34) by adopting a SIFT method.

4. The method as claimed in claim 1, wherein the shape context based keypoint matching method is embodied by solving the following optimization problem to obtain the final interpolated frame:

wherein,ffor the unknown frame to be interpolated,

is Laplace operator, v ═ o (u，v) For the gradient vector field of the time-domain interpolated frame,

is the divergence of v and is the sum of the magnitudes of v,

in order to interpolate the frame in the spatial domain,

Ω is the boundary of the closed set Ω, s.t. means "condition … is satisfied", and_Ωmeans that over a closed set omega, calcuim

_ΩRepresented on the closed set omega boundary.