CN103002309B

CN103002309B - Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Info

Publication number: CN103002309B
Application number: CN201210360976.0A
Authority: CN
Inventors: 章国锋; 鲍虎军; 姜翰青
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2012-09-25
Filing date: 2012-09-25
Publication date: 2014-12-24
Anticipated expiration: 2032-09-25
Also published as: CN103002309A

Abstract

The invention discloses a depth recovery method for time-space consistency of dynamic scene videos shot by a multi-view synchronous camera. A multi-view geometric method is combined with DAISY feature vectors, and three-dimensional matching is performed to multi-view video frames in a same time period to obtain an initial depth image of the multi-view video at each time; dynamic probability graph of each image frame of the multi-view video is calculated, dynamic pixel points and static pixel points division is performed to each image frame according to the dynamic probability graph, different optimization methods are used for depth optimization of time-space consistency, for static points, a bundle optimization method is used to combine a plurality of colors at adjacent time with geometric consistency bundles for optimization; for dynamic points, corresponding pixel point color and geometric consistency bundle information among the multi-view synchronous camera at multiple adjacent time are counted, and time-space consistency optimization is performed to dynamic depth value at each time. The depth recovery method has high application value in fields of 3D (three-dimensional) images, 3D animation, augment reality, motion capture and the like.

Description

For the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting

Technical field

The present invention relates to Stereo matching and depth recovery method, particularly relate to a kind of method of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera.

Background technology

The dense depth recovery technology of video is one of the basic technology in computer middle level field, and it has and important application in the various fields such as 3D modeling, 3D image, augmented reality and capturing movement.These application require that depth recovery result has very high accuracy and space-time consistency usually.

The difficult point of the dense depth recovery technology of video is: for the Static and dynamic object in scene, the depth value recovered has very high precision and space-time consistency.Although the depth information with degree of precision can have been recovered for the depth recovery technology of static scene at present, but nature is filled with the object of motion everywhere, for the dynamic object comprised in video scene, existing depth recovery method is all difficult to reach the consistency on higher precision and time-space domain.These methods require that the synchronization camera of more multiple fixed placement is caught scene usually, utilize the method for multi-view geometry to carry out Stereo matching, thus recover the depth information in each moment in each moment to synchronous many orders frame of video.And this image pickup method be mostly the shooting work being applied to dynamic scene in laboratory, in actual photographed process, this screening-mode has a lot of restriction.Existing method is optimized in the process of the degree of depth in sequential in addition, usually light stream is utilized to search out not corresponding pixel points in frame of video in the same time, then the depth value of corresponding points or 3D point position are carried out linear or curve, thus estimate the depth information of current frame pixel point.In this time domain, the method for 3D SmoothNumerical TechniqueandIt can only make the degree of depth of corresponding pixel points in sequential more consistent, can not the real depth value accurately of optimization; Simultaneously because the ubiquity of not robustness is estimated in light stream, make the depth optimization problem of dynamic point become more complexity and be difficult to resolve.

Existing video depth restoration methods is mainly divided into two large classes:

1. for the time domain consistency depth recovery of monocular static scene video

These class methods comparatively typically Zhang in 09 year propose method: G.Zhang, J.Jia, T.-T.Wong, and H.Bao.Consistent depth maps recovery from a video sequence.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (6): 974-988,2009..First the method utilizes the degree of depth of the every two field picture of method initialization of traditional multi-view geometry, in time domain, then utilize the geometry in bundle optimization stroke analysis multiple moment and colour consistency to optimize the degree of depth of present frame.The method can recover high accuracy depth figure for static scene; For the scene comprising dynamic object, the method can not the depth value of Restoration dynamics object.

2. for the depth recovery of many orders dynamic scene video

The method of these class methods comparatively typically Zitnick: C.L.Zitnick, S.B.Kang, M.Uyttendaele, S.Winder, and R.Szeliski.High-quality video view interpolation using a layered representation.ACM Transactions on Graphics, 23:600-608, August 2004., the method of Larsen: E.S.Larsen, P.Mordohai, M.Pollefeys, and H.Fuchs.Temporally consistent reconstruction from multiple video streams using enhanced belief propagation.In ICCV, pages 1-8, 2007. and the method for Lei: C.Lei, X.D.Chen, and Y.H.Yang.A new multi-view spacetime-consistent depth recovery framework for free viewpoint video rendering.In ICCV, pages 1570-1577, 2009..These methods all utilize many orders synchronized video frames of synchronization to recover depth map, require the synchronization camera shooting dynamic scene utilizing the fixed placement of greater number, are not suitable for outdoor actual photographed.The method of Larsen and Lei utilizes the method for energy-optimised on time-space domain and time domain 3D SmoothNumerical TechniqueandIts to optimize depth value respectively, makes the inadequate robust of these methods, can not process the situation that light stream estimates to produce gross error.

Step 1) for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting employs the DAISY feature descriptor that Tola proposes: E.Tola, V.Lepetit, and P.Fua.Daisy:An efficient dense descriptor applied to wide-baseline stereo.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (5): 815-830,2010.

The step 1) of method and step 2 for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) employ the Mean-shift technology that Comaniciu proposes: D.Comaniciu, P.Meer, and S.Member.Mean shift:A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:603-619,2002.

The step 2 of method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) employ the Grabcut technology that Rother proposes: C.Rother, V.Kolmogorov, and A.Blake. " grabcut ": interactive foreground extraction using iterated graph cuts.ACM Transactions on Graphics, 23:309-314, August 2004.

Step 1), the step 2 of method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) and step 3) employ the energy equation optimisation technique of Felzenszwalb proposition: P.F.Felzenszwalb and D.P.Huttenlocher.Efficient belief propagation for early vision.International Journal of Computer Vision, 70 (1): 41-54,2006.

Summary of the invention

The object of the invention is to for the deficiencies in the prior art, a kind of method of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera is provided.

Step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:

1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;

2) the initialization depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;

3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.

Described step 1) is:

(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:

E_{D} (D_{m}^{t}; \hat{I} (t)) = E_{d} (D_{m}^{t}; \hat{I} (t)) + E_{s} (D_{m}^{t})

Wherein represent M many orders synchronized video frames in t, represent the picture frame of the t of m video, represent the depth map of the t of m video; be data item, represent middle pixel and basis calculate dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:

E_{d} (D_{m}^{t}; \hat{I} (t)) = \underset{x_{m}^{t}}{Σ} \frac{\underset{m^{'} &NotEqual; m}{Σ} L_{d} (x_{m}^{t}, D_{m}^{t} (x_{m}^{t}); I_{m}^{t}, I_{m^{'}}^{t})}{M - 1}

Wherein be used to the penalty of the DAISY characteristic similarity estimating respective pixel, represent pixel dAISY feature descriptor, be utilize be projected to in projected position; be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:

E_{s} (D_{m}^{t}) = λ \underset{x}{Σ} \underset{y &Element; N (x)}{Σ} \min {| D_{m}^{t} (x) - D_{m}^{t} (y) |, η}

Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;

(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:

V_{m &RightArrow; m^{'}}^{t} (x_{m}^{t}) = \{\begin{matrix} 1 & | D_{m &RightArrow; m^{'}}^{t} (x_{m}^{t}) - D_{m^{'}}^{t} (x_{m^{'}}^{t}) | \leq δ_{d} \\ 0 & | D_{m &RightArrow; m^{'}}^{t} (x_{m}^{t}) - D_{m^{'}}^{t} (x_{m^{'}}^{t}) | > δ_{d} \end{matrix}

Wherein represent ? in whether visible, 1 represent visible, 0 expression invisible; δ _dthe threshold value of depth difference, by utilizing will be projected to on to calculate; Utilize the visuality figure obtained, to each pixel calculated population is visual if all invisible in t all the other frame of video all, then be 0, otherwise be 1;

(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine the data item of pixel:

E_{d} (x_{m}^{t}, D_{m}^{t}) = \underset{x_{m}^{t}}{Σ} \frac{σ_{d}}{σ_{d} + | ax + by + c - D_{m}^{t} (x_{m}^{t}) |}

Wherein σ _dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked;

Described step 2) be:

(1) for the pixel in every two field picture, the initialization degree of depth is utilized be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:

P_{d} (x_{m}^{t}) = \frac{\underset{(m^{'}, t^{'}) &Element; N (m, t)}{Σ} C_{m &RightArrow; m^{'}}^{t &RightArrow; t^{'}} (x_{m}^{t}) = dynamic}{| N (m, t) |}

Wherein heuristic function be used for judging at all the other frames whether upper geometry is consistent with color; First compare with correspondence position depth value difference, if ? on depth value with the degree of depth dissimilar, then think that geometry is inconsistent, if with depth value similar, then compare its color value, if color similarity, then think with color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;

(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4 _pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;

(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:

P_{d}^{'} (x_{m}^{t}) = \frac{\underset{t^{'} &Element; N (t)}{Σ} S_{m}^{t^{'}} (x_{m}^{t} + O_{m}^{t &RightArrow; t^{'}} (x_{m}^{t})) = dynamic}{| N (t) |}

Wherein represent from t to t ' the light stream side-play amount in moment, represent at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:

E_{S} (S_{m}^{t}; P_{d}^{'}, I_{m}^{t}) = E_{d} (S_{m}^{t}; P_{d}^{'}) + E_{s} (S_{m}^{t}; I_{m}^{t})

Wherein represent dynamic/static segmentation figure of video m at t frame; Data item E _dbe defined as follows:

E_{d} (S_{m}^{t}; P_{d}^{'}) = \underset{x_{m}^{t}}{Σ} e_{d} (S_{m}^{t} (x_{m}^{t}))

e_{d} (S_{m}^{t} (x_{m}^{t})) = \{\begin{matrix} - \log (1 - P_{d}^{'} (x_{m}^{t})) & S_{m}^{t} (x_{m}^{t}) = static \\ - \log (P_{d}^{'} (x_{m}^{t})) & S_{m}^{t} (x_{m}^{t}) = dynamic \end{matrix}

Level and smooth item E _simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:

E_{s} (S_{m}^{t}; I_{m}^{t}) = λ \underset{x}{Σ} \underset{y &Element; N (x)}{Σ} \frac{| S_{m}^{t} (x) - S_{m}^{t} (y) |}{1 + {| | I_{m}^{t} (x) - I_{m}^{t} (y) | |}_{2}}

For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential;

Described step 3) is:

(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;

(2) for dynamic pixel point suppose that its candidate's degree of depth is d, be first projected to the video m of synchronization t according to d, obtain corresponding pixel points relatively with color and Geometrical consistency, its computing formula is as follows:

L_{g} (x_{m}^{t}, x_{m^{'}}^{t}) = p_{c} (x_{m}^{t}, x_{m^{'}}^{t}) p_{g} (x_{m}^{t}, x_{m^{'}}^{t})

Wherein estimate with colour consistency, its computing formula is as follows:

p_{c} (x_{m}^{t}, x_{m^{'}}^{t}) = \frac{σ_{c}}{σ_{c} + {| | I_{m}^{t} (x_{m}^{t}) - I_{m^{'}}^{t} (x_{m^{'}}^{t}) | |}_{1}}

σ _ccontrol the susceptibility of color distortion,

estimate with geometrical consistency, its computing formula is as follows:

p_{g} (x_{m}^{t}, x_{m^{'}}^{t}) = \frac{σ_{w}}{σ_{g} + d_{g} (x_{m}^{t}, x_{m^{'}}^{t}; D_{m}^{t}, D_{m^{'}}^{t})}

σ _gthe susceptibility of controlling depth difference, symmetrical projection error computing function d _gwill be projected to the video m ' of synchronization t projected position and calculate its with distance, calculate simultaneously the projected position being projected to t m video with distance, then calculate both average distance;

Next, light stream is utilized to incite somebody to action with track to adjacent moment t ' and obtain corresponding pixel points with relatively with color and Geometrical consistency, its computing formula is as follows:

L_{g} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}}) = p_{c} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}}) p_{g} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}})

Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:

E_{d}^{'} (D_{m}^{t}; \hat{I}, \hat{D}) = \underset{x_{m}^{t}}{Σ} 1 - \frac{\underset{t^{'} &Element; N (t)}{Σ} \underset{m^{'} &NotEqual; m}{Σ} L_{g} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}})}{(M - 1) | N (t) |}

Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.

The present invention is for the dynamic object comprised in video scene, existing depth recovery method is all difficult to reach the consistency on higher precision and time-space domain, these methods require that the synchronization camera of more multiple fixed placement is caught scene usually, this image pickup method is mostly is the shooting work being applied to dynamic scene in laboratory, and in actual photographed process, this screening-mode has a lot of restriction; The method of the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera proposed by the invention can recover the accurate depth figure in each moment for the dynamic and static state object in how visual frequency, also can keep the high consistency of depth map between multiple moment.The method allows multi-lens camera freely to move independently, and allows the dynamic scene that the video camera of fewer number of (only 2) is taken, more practical in actual photographed process.

Accompanying drawing explanation

Fig. 1 is the method flow diagram of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera;

Fig. 2 (a) is a two field picture of video sequence;

Fig. 2 (b) is a two field picture synchronous with Fig. 2 (a);

Fig. 2 (c) is the initialization depth map of Fig. 2 (a);

Fig. 2 (d) is the visuality figure utilizing Fig. 2 (a) and Fig. 2 (b) to estimate;

Fig. 2 (e) is the initialization depth map utilizing Fig. 2 (d) to carry out plane fitting correction;

Fig. 3 (a) is the dynamic probability figure of Fig. 2 (a);

Fig. 3 (b) is that Fig. 3 (a) utilizes Mean-shift to split dynamic/static segmentation figure carried out after boundary adjustment and denoising through binaryzation;

Fig. 3 (c) is through the segmentation figure that time domain is optimized;

Fig. 3 (d) is through the segmentation figure of Grabcut technical optimization;

Fig. 3 (e) is the partial enlarged drawing of boxed area in Fig. 3 (a-d);

Fig. 4 (a) is a two field picture of video sequence;

Fig. 4 (b) is dynamic/static segmentation figure of Fig. 4 (a);

Fig. 4 (c) is the depth map of Fig. 4 (a) after space-time consistency is optimized;

Fig. 4 (d) is the partial enlarged drawing of boxed area in Fig. 4 (a) and Fig. 4 (c);

Fig. 4 (e) is another two field picture of video sequence;

Fig. 4 (f) is the depth map results that Fig. 4 (e) optimizes through space-time consistency;

Fig. 4 (g) is the result after the 3D model of place and texture mapping utilizing Fig. 4 (f) to reconstruct;

Fig. 5 is the schematic diagram of space-time consistency depth optimization.

Embodiment

Described step 1) is:

E_{D} (D_{m}^{t}; \hat{I} (t)) = E_{d} (D_{m}^{t}; \hat{I} (t)) + E_{s} (D_{m}^{t})

E_{d} (D_{m}^{t}; \hat{I} (t)) = \underset{x_{m}^{t}}{Σ} \frac{\underset{m^{'} &NotEqual; m}{Σ} L_{d} (x_{m}^{t}, D_{m}^{t} (x_{m}^{t}); I_{m}^{t}, I_{m^{'}}^{t})}{M - 1}

E_{s} (D_{m}^{t}) = λ \underset{x}{Σ} \underset{y &Element; N (x)}{Σ} \min {| D_{m}^{t} (x) - D_{m}^{t} (y) |, η}

V_{m &RightArrow; m^{'}}^{t} (x_{m}^{t}) = \{\begin{matrix} 1 & | D_{m &RightArrow; m^{'}}^{t} (x_{m}^{t}) - D_{m^{'}}^{t} (x_{m^{'}}^{t}) | \leq δ_{d} \\ 0 & | D_{m &RightArrow; m^{'}}^{t} (x_{m}^{t}) - D_{m^{'}}^{t} (x_{m^{'}}^{t}) | > δ_{d} \end{matrix}

E_{d} (x_{m}^{t}, D_{m}^{t}) = \underset{x_{m}^{t}}{Σ} \frac{σ_{d}}{σ_{d} + | ax + by + c - D_{m}^{t} (x_{m}^{t}) |}

Described step 2) be:

P_{d} (x_{m}^{t}) = \frac{\underset{(m^{'}, t^{'}) &Element; N (m, t)}{Σ} C_{m &RightArrow; m^{'}}^{t &RightArrow; t^{'}} (x_{m}^{t}) = dynamic}{| N (m, t) |}

P_{d}^{'} (x_{m}^{t}) = \frac{\underset{t^{'} &Element; N (t)}{Σ} S_{m}^{t^{'}} (x_{m}^{t} + O_{m}^{t &RightArrow; t^{'}} (x_{m}^{t})) = dynamic}{| N (t) |}

E_{S} (S_{m}^{t}; P_{d}^{'}, I_{m}^{t}) = E_{d} (S_{m}^{t}; P_{d}^{'}) + E_{s} (S_{m}^{t}; I_{m}^{t})

E_{d} (S_{m}^{t}; P_{d}^{'}) = \underset{x_{m}^{t}}{Σ} e_{d} (S_{m}^{t} (x_{m}^{t}))

e_{d} (S_{m}^{t} (x_{m}^{t})) = \{\begin{matrix} - \log (1 - P_{d}^{'} (x_{m}^{t})) & S_{m}^{t} (x_{m}^{t}) = static \\ - \log (P_{d}^{'} (x_{m}^{t})) & S_{m}^{t} (x_{m}^{t}) = dynamic \end{matrix}

E_{s} (S_{m}^{t}; I_{m}^{t}) = λ \underset{x}{Σ} \underset{y &Element; N (x)}{Σ} \frac{| S_{m}^{t} (x) - S_{m}^{t} (y) |}{1 + {| | I_{m}^{t} (x) - I_{m}^{t} (y) | |}_{2}}

Described step 3) is:

(2) for dynamic pixel point suppose that its candidate's degree of depth is d, be first projected to the video m ' of synchronization t according to d, obtain corresponding pixel points relatively with color and Geometrical consistency, its computing formula is as follows:

L_{g} (x_{m}^{t}, x_{m^{'}}^{t}) = p_{c} (x_{m}^{t}, x_{m^{'}}^{t}) p_{g} (x_{m}^{t}, x_{m^{'}}^{t})

Wherein estimate with colour consistency, its computing formula is as follows:

p_{c} (x_{m}^{t}, x_{m^{'}}^{t}) = \frac{σ_{c}}{σ_{c} + {| | I_{m}^{t} (x_{m}^{t}) - I_{m^{'}}^{t} (x_{m^{'}}^{t}) | |}_{1}}

σ _ccontrol the susceptibility of color distortion,

estimate with geometrical consistency, its computing formula is as follows:

p_{g} (x_{m}^{t}, x_{m^{'}}^{t}) = \frac{σ_{w}}{σ_{g} + d_{g} (x_{m}^{t}, x_{m^{'}}^{t}; D_{m}^{t}, D_{m^{'}}^{t})}

L_{g} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}}) = p_{c} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}}) p_{g} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}})

E_{d}^{'} (D_{m}^{t}; \hat{I}, \hat{D}) = \underset{x_{m}^{t}}{Σ} 1 - \frac{\underset{t^{'} &Element; N (t)}{Σ} \underset{m^{'} &NotEqual; m}{Σ} L_{g} ({\hat{x}}_{m}^{t^{'}}, {\hat{x}}_{m^{'}}^{t^{'}})}{(M - 1) | N (t) |}

Embodiment

As shown in Figure 1, the step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:

2) the initialized depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out dynamically/the classification of static state to the pixel of every two field picture;

3) for step 2) the dynamic and static state pixel that divides, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.

Described step 1) is:

(1) multi-view geometry methods combining DAISY feature descriptor is utilized, binocular video frame for the synchronization such as shown in Fig. 2 (a) He Fig. 2 (b) carries out Stereo matching, the initialization depth map of each time chart picture frame is solved, as shown in Fig. 2 (c) by energy-optimised equation;

(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two, as shown in Fig. 2 (d);

(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; And, when there is mistake in the initialization depth value of invisible image vegetarian refreshments, Mean-shift technology is utilized to split every two field picture, for each cut zone, the degree of depth of visible image vegetarian refreshments is utilized to carry out fit Plane, the plane of matching is utilized to fill up the depth value correcting invisible image vegetarian refreshments, as shown in Fig. 2 (e);

Described step 2) be:

(1) for the pixel in every two field picture, the initialization degree of depth is utilized to be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, as shown in Fig. 3 (a);

(2) dynamic probability figure binaryzation is obtained the initially dynamically/static segmentation figure of every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out, as shown in Fig. 3 (b) to binarization segmentation figure;

(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, dynamic/static segmentation figure of each time chart picture frame is optimized, as shown in Fig. 3 (c) by energy-optimised equation; For Fig. 3 (c), utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential, as shown in Fig. 3 (d);

Described step 3) is:

(1) for static point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;

(2) for dynamic point space-time consistency depth optimization method as shown in Figure 5, suppose pixel candidate's degree of depth be d, be first projected to the video m ' of synchronization t according to d, obtained corresponding pixel points relatively with color and Geometrical consistency; Next, light stream is utilized to incite somebody to action with track to adjacent moment t ' and obtain corresponding pixel points with relatively with color and Geometrical consistency; Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, the dynamic pixel point depth value in the every two field picture of energy-optimised equation optimization is utilized thus on time-space domain, obtain depth map consistent on time-space domain, as shown in Fig. 4 (c) He Fig. 4 (f).

Claims

1., for a method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting, it is characterized in that its step is as follows:

2) step 1 is utilized) the initialization depth map that obtains calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;

2., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 1) be:

Wherein σ _dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked.

3., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 2) be:

For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential.

4., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 3) be:

Wherein estimate with colour consistency, its computing formula is as follows:

σ _ccontrol the susceptibility of color distortion,

estimate with geometrical consistency, its computing formula is as follows: