CN103002309B - Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera - Google Patents

Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera Download PDF

Info

Publication number
CN103002309B
CN103002309B CN201210360976.0A CN201210360976A CN103002309B CN 103002309 B CN103002309 B CN 103002309B CN 201210360976 A CN201210360976 A CN 201210360976A CN 103002309 B CN103002309 B CN 103002309B
Authority
CN
China
Prior art keywords
depth
dynamic
pixel
video
consistency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210360976.0A
Other languages
Chinese (zh)
Other versions
CN103002309A (en
Inventor
章国锋
鲍虎军
姜翰青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shangtang Technology Development Co Ltd
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201210360976.0A priority Critical patent/CN103002309B/en
Publication of CN103002309A publication Critical patent/CN103002309A/en
Application granted granted Critical
Publication of CN103002309B publication Critical patent/CN103002309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a depth recovery method for time-space consistency of dynamic scene videos shot by a multi-view synchronous camera. A multi-view geometric method is combined with DAISY feature vectors, and three-dimensional matching is performed to multi-view video frames in a same time period to obtain an initial depth image of the multi-view video at each time; dynamic probability graph of each image frame of the multi-view video is calculated, dynamic pixel points and static pixel points division is performed to each image frame according to the dynamic probability graph, different optimization methods are used for depth optimization of time-space consistency, for static points, a bundle optimization method is used to combine a plurality of colors at adjacent time with geometric consistency bundles for optimization; for dynamic points, corresponding pixel point color and geometric consistency bundle information among the multi-view synchronous camera at multiple adjacent time are counted, and time-space consistency optimization is performed to dynamic depth value at each time. The depth recovery method has high application value in fields of 3D (three-dimensional) images, 3D animation, augment reality, motion capture and the like.

Description

For the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting
Technical field
The present invention relates to Stereo matching and depth recovery method, particularly relate to a kind of method of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera.
Background technology
The dense depth recovery technology of video is one of the basic technology in computer middle level field, and it has and important application in the various fields such as 3D modeling, 3D image, augmented reality and capturing movement.These application require that depth recovery result has very high accuracy and space-time consistency usually.
The difficult point of the dense depth recovery technology of video is: for the Static and dynamic object in scene, the depth value recovered has very high precision and space-time consistency.Although the depth information with degree of precision can have been recovered for the depth recovery technology of static scene at present, but nature is filled with the object of motion everywhere, for the dynamic object comprised in video scene, existing depth recovery method is all difficult to reach the consistency on higher precision and time-space domain.These methods require that the synchronization camera of more multiple fixed placement is caught scene usually, utilize the method for multi-view geometry to carry out Stereo matching, thus recover the depth information in each moment in each moment to synchronous many orders frame of video.And this image pickup method be mostly the shooting work being applied to dynamic scene in laboratory, in actual photographed process, this screening-mode has a lot of restriction.Existing method is optimized in the process of the degree of depth in sequential in addition, usually light stream is utilized to search out not corresponding pixel points in frame of video in the same time, then the depth value of corresponding points or 3D point position are carried out linear or curve, thus estimate the depth information of current frame pixel point.In this time domain, the method for 3D SmoothNumerical TechniqueandIt can only make the degree of depth of corresponding pixel points in sequential more consistent, can not the real depth value accurately of optimization; Simultaneously because the ubiquity of not robustness is estimated in light stream, make the depth optimization problem of dynamic point become more complexity and be difficult to resolve.
Existing video depth restoration methods is mainly divided into two large classes:
1. for the time domain consistency depth recovery of monocular static scene video
These class methods comparatively typically Zhang in 09 year propose method: G.Zhang, J.Jia, T.-T.Wong, and H.Bao.Consistent depth maps recovery from a video sequence.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (6): 974-988,2009..First the method utilizes the degree of depth of the every two field picture of method initialization of traditional multi-view geometry, in time domain, then utilize the geometry in bundle optimization stroke analysis multiple moment and colour consistency to optimize the degree of depth of present frame.The method can recover high accuracy depth figure for static scene; For the scene comprising dynamic object, the method can not the depth value of Restoration dynamics object.
2. for the depth recovery of many orders dynamic scene video
The method of these class methods comparatively typically Zitnick: C.L.Zitnick, S.B.Kang, M.Uyttendaele, S.Winder, and R.Szeliski.High-quality video view interpolation using a layered representation.ACM Transactions on Graphics, 23:600-608, August 2004., the method of Larsen: E.S.Larsen, P.Mordohai, M.Pollefeys, and H.Fuchs.Temporally consistent reconstruction from multiple video streams using enhanced belief propagation.In ICCV, pages 1-8, 2007. and the method for Lei: C.Lei, X.D.Chen, and Y.H.Yang.A new multi-view spacetime-consistent depth recovery framework for free viewpoint video rendering.In ICCV, pages 1570-1577, 2009..These methods all utilize many orders synchronized video frames of synchronization to recover depth map, require the synchronization camera shooting dynamic scene utilizing the fixed placement of greater number, are not suitable for outdoor actual photographed.The method of Larsen and Lei utilizes the method for energy-optimised on time-space domain and time domain 3D SmoothNumerical TechniqueandIts to optimize depth value respectively, makes the inadequate robust of these methods, can not process the situation that light stream estimates to produce gross error.
Step 1) for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting employs the DAISY feature descriptor that Tola proposes: E.Tola, V.Lepetit, and P.Fua.Daisy:An efficient dense descriptor applied to wide-baseline stereo.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (5): 815-830,2010.
The step 1) of method and step 2 for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) employ the Mean-shift technology that Comaniciu proposes: D.Comaniciu, P.Meer, and S.Member.Mean shift:A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:603-619,2002.
The step 2 of method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) employ the Grabcut technology that Rother proposes: C.Rother, V.Kolmogorov, and A.Blake. " grabcut ": interactive foreground extraction using iterated graph cuts.ACM Transactions on Graphics, 23:309-314, August 2004.
Step 1), the step 2 of method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) and step 3) employ the energy equation optimisation technique of Felzenszwalb proposition: P.F.Felzenszwalb and D.P.Huttenlocher.Efficient belief propagation for early vision.International Journal of Computer Vision, 70 (1): 41-54,2006.
Summary of the invention
The object of the invention is to for the deficiencies in the prior art, a kind of method of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera is provided.
Step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) the initialization depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;
3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
Described step 1) is:
(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:
E D ( D m t ; I ^ ( t ) ) = E d ( D m t ; I ^ ( t ) ) + E s ( D m t )
Wherein represent M many orders synchronized video frames in t, represent the picture frame of the t of m video, represent the depth map of the t of m video; be data item, represent middle pixel and basis calculate dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:
E d ( D m t ; I ^ ( t ) ) = Σ x m t Σ m ′ ≠ m L d ( x m t , D m t ( x m t ) ; I m t , I m ′ t ) M - 1
Wherein be used to the penalty of the DAISY characteristic similarity estimating respective pixel, represent pixel dAISY feature descriptor, be utilize be projected to in projected position; be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:
E s ( D m t ) = λ Σ x Σ y ∈ N ( x ) min { | D m t ( x ) - D m t ( y ) | , η }
Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:
V m → m ′ t ( x m t ) = 1 | D m → m ′ t ( x m t ) - D m ′ t ( x m ′ t ) | ≤ δ d 0 | D m → m ′ t ( x m t ) - D m ′ t ( x m ′ t ) | > δ d
Wherein represent ? in whether visible, 1 represent visible, 0 expression invisible; δ dthe threshold value of depth difference, by utilizing will be projected to on to calculate; Utilize the visuality figure obtained, to each pixel calculated population is visual if all invisible in t all the other frame of video all, then be 0, otherwise be 1;
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine the data item of pixel:
E d ( x m t , D m t ) = Σ x m t σ d σ d + | ax + by + c - D m t ( x m t ) |
Wherein σ dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked;
Described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:
P d ( x m t ) = Σ ( m ′ , t ′ ) ∈ N ( m , t ) C m → m ′ t → t ′ ( x m t ) = dynamic | N ( m , t ) |
Wherein heuristic function be used for judging at all the other frames whether upper geometry is consistent with color; First compare with correspondence position depth value difference, if ? on depth value with the degree of depth dissimilar, then think that geometry is inconsistent, if with depth value similar, then compare its color value, if color similarity, then think with color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;
(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4 pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:
P d ′ ( x m t ) = Σ t ′ ∈ N ( t ) S m t ′ ( x m t + O m t → t ′ ( x m t ) ) = dynamic | N ( t ) |
Wherein represent from t to t ' the light stream side-play amount in moment, represent at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:
E S ( S m t ; P d ′ , I m t ) = E d ( S m t ; P d ′ ) + E s ( S m t ; I m t )
Wherein represent dynamic/static segmentation figure of video m at t frame; Data item E dbe defined as follows:
E d ( S m t ; P d ′ ) = Σ x m t e d ( S m t ( x m t ) )
e d ( S m t ( x m t ) ) = - log ( 1 - P d ′ ( x m t ) ) S m t ( x m t ) = static - log ( P d ′ ( x m t ) ) S m t ( x m t ) = dynamic
Level and smooth item E simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:
E s ( S m t ; I m t ) = λ Σ x Σ y ∈ N ( x ) | S m t ( x ) - S m t ( y ) | 1 + | | I m t ( x ) - I m t ( y ) | | 2
For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential;
Described step 3) is:
(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic pixel point suppose that its candidate's degree of depth is d, be first projected to the video m of synchronization t according to d, obtain corresponding pixel points relatively with color and Geometrical consistency, its computing formula is as follows:
L g ( x m t , x m ′ t ) = p c ( x m t , x m ′ t ) p g ( x m t , x m ′ t )
Wherein estimate with colour consistency, its computing formula is as follows:
p c ( x m t , x m ′ t ) = σ c σ c + | | I m t ( x m t ) - I m ′ t ( x m ′ t ) | | 1
σ ccontrol the susceptibility of color distortion,
estimate with geometrical consistency, its computing formula is as follows:
p g ( x m t , x m ′ t ) = σ w σ g + d g ( x m t , x m ′ t ; D m t , D m ′ t )
σ gthe susceptibility of controlling depth difference, symmetrical projection error computing function d gwill be projected to the video m ' of synchronization t projected position and calculate its with distance, calculate simultaneously the projected position being projected to t m video with distance, then calculate both average distance;
Next, light stream is utilized to incite somebody to action with track to adjacent moment t ' and obtain corresponding pixel points with relatively with color and Geometrical consistency, its computing formula is as follows:
L g ( x ^ m t ′ , x ^ m ′ t ′ ) = p c ( x ^ m t ′ , x ^ m ′ t ′ ) p g ( x ^ m t ′ , x ^ m ′ t ′ )
Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:
E d ′ ( D m t ; I ^ , D ^ ) = Σ x m t 1 - Σ t ′ ∈ N ( t ) Σ m ′ ≠ m L g ( x ^ m t ′ , x ^ m ′ t ′ ) ( M - 1 ) | N ( t ) |
Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.
The present invention is for the dynamic object comprised in video scene, existing depth recovery method is all difficult to reach the consistency on higher precision and time-space domain, these methods require that the synchronization camera of more multiple fixed placement is caught scene usually, this image pickup method is mostly is the shooting work being applied to dynamic scene in laboratory, and in actual photographed process, this screening-mode has a lot of restriction; The method of the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera proposed by the invention can recover the accurate depth figure in each moment for the dynamic and static state object in how visual frequency, also can keep the high consistency of depth map between multiple moment.The method allows multi-lens camera freely to move independently, and allows the dynamic scene that the video camera of fewer number of (only 2) is taken, more practical in actual photographed process.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera;
Fig. 2 (a) is a two field picture of video sequence;
Fig. 2 (b) is a two field picture synchronous with Fig. 2 (a);
Fig. 2 (c) is the initialization depth map of Fig. 2 (a);
Fig. 2 (d) is the visuality figure utilizing Fig. 2 (a) and Fig. 2 (b) to estimate;
Fig. 2 (e) is the initialization depth map utilizing Fig. 2 (d) to carry out plane fitting correction;
Fig. 3 (a) is the dynamic probability figure of Fig. 2 (a);
Fig. 3 (b) is that Fig. 3 (a) utilizes Mean-shift to split dynamic/static segmentation figure carried out after boundary adjustment and denoising through binaryzation;
Fig. 3 (c) is through the segmentation figure that time domain is optimized;
Fig. 3 (d) is through the segmentation figure of Grabcut technical optimization;
Fig. 3 (e) is the partial enlarged drawing of boxed area in Fig. 3 (a-d);
Fig. 4 (a) is a two field picture of video sequence;
Fig. 4 (b) is dynamic/static segmentation figure of Fig. 4 (a);
Fig. 4 (c) is the depth map of Fig. 4 (a) after space-time consistency is optimized;
Fig. 4 (d) is the partial enlarged drawing of boxed area in Fig. 4 (a) and Fig. 4 (c);
Fig. 4 (e) is another two field picture of video sequence;
Fig. 4 (f) is the depth map results that Fig. 4 (e) optimizes through space-time consistency;
Fig. 4 (g) is the result after the 3D model of place and texture mapping utilizing Fig. 4 (f) to reconstruct;
Fig. 5 is the schematic diagram of space-time consistency depth optimization.
Embodiment
Step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) the initialization depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;
3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
Described step 1) is:
(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:
E D ( D m t ; I ^ ( t ) ) = E d ( D m t ; I ^ ( t ) ) + E s ( D m t )
Wherein represent M many orders synchronized video frames in t, represent the picture frame of the t of m video, represent the depth map of the t of m video; be data item, represent middle pixel and basis calculate dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:
E d ( D m t ; I ^ ( t ) ) = Σ x m t Σ m ′ ≠ m L d ( x m t , D m t ( x m t ) ; I m t , I m ′ t ) M - 1
Wherein be used to the penalty of the DAISY characteristic similarity estimating respective pixel, represent pixel dAISY feature descriptor, be utilize be projected to in projected position; be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:
E s ( D m t ) = λ Σ x Σ y ∈ N ( x ) min { | D m t ( x ) - D m t ( y ) | , η }
Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:
V m → m ′ t ( x m t ) = 1 | D m → m ′ t ( x m t ) - D m ′ t ( x m ′ t ) | ≤ δ d 0 | D m → m ′ t ( x m t ) - D m ′ t ( x m ′ t ) | > δ d
Wherein represent ? in whether visible, 1 represent visible, 0 expression invisible; δ dthe threshold value of depth difference, by utilizing will be projected to on to calculate; Utilize the visuality figure obtained, to each pixel calculated population is visual if all invisible in t all the other frame of video all, then be 0, otherwise be 1;
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine the data item of pixel:
E d ( x m t , D m t ) = Σ x m t σ d σ d + | ax + by + c - D m t ( x m t ) |
Wherein σ dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked;
Described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:
P d ( x m t ) = Σ ( m ′ , t ′ ) ∈ N ( m , t ) C m → m ′ t → t ′ ( x m t ) = dynamic | N ( m , t ) |
Wherein heuristic function be used for judging at all the other frames whether upper geometry is consistent with color; First compare with correspondence position depth value difference, if ? on depth value with the degree of depth dissimilar, then think that geometry is inconsistent, if with depth value similar, then compare its color value, if color similarity, then think with color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;
(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4 pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:
P d ′ ( x m t ) = Σ t ′ ∈ N ( t ) S m t ′ ( x m t + O m t → t ′ ( x m t ) ) = dynamic | N ( t ) |
Wherein represent from t to t ' the light stream side-play amount in moment, represent at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:
E S ( S m t ; P d ′ , I m t ) = E d ( S m t ; P d ′ ) + E s ( S m t ; I m t )
Wherein represent dynamic/static segmentation figure of video m at t frame; Data item E dbe defined as follows:
E d ( S m t ; P d ′ ) = Σ x m t e d ( S m t ( x m t ) )
e d ( S m t ( x m t ) ) = - log ( 1 - P d ′ ( x m t ) ) S m t ( x m t ) = static - log ( P d ′ ( x m t ) ) S m t ( x m t ) = dynamic
Level and smooth item E simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:
E s ( S m t ; I m t ) = λ Σ x Σ y ∈ N ( x ) | S m t ( x ) - S m t ( y ) | 1 + | | I m t ( x ) - I m t ( y ) | | 2
For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential;
Described step 3) is:
(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic pixel point suppose that its candidate's degree of depth is d, be first projected to the video m ' of synchronization t according to d, obtain corresponding pixel points relatively with color and Geometrical consistency, its computing formula is as follows:
L g ( x m t , x m ′ t ) = p c ( x m t , x m ′ t ) p g ( x m t , x m ′ t )
Wherein estimate with colour consistency, its computing formula is as follows:
p c ( x m t , x m ′ t ) = σ c σ c + | | I m t ( x m t ) - I m ′ t ( x m ′ t ) | | 1
σ ccontrol the susceptibility of color distortion,
estimate with geometrical consistency, its computing formula is as follows:
p g ( x m t , x m ′ t ) = σ w σ g + d g ( x m t , x m ′ t ; D m t , D m ′ t )
σ gthe susceptibility of controlling depth difference, symmetrical projection error computing function d gwill be projected to the video m ' of synchronization t projected position and calculate its with distance, calculate simultaneously the projected position being projected to t m video with distance, then calculate both average distance;
Next, light stream is utilized to incite somebody to action with track to adjacent moment t ' and obtain corresponding pixel points with relatively with color and Geometrical consistency, its computing formula is as follows:
L g ( x ^ m t ′ , x ^ m ′ t ′ ) = p c ( x ^ m t ′ , x ^ m ′ t ′ ) p g ( x ^ m t ′ , x ^ m ′ t ′ )
Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:
E d ′ ( D m t ; I ^ , D ^ ) = Σ x m t 1 - Σ t ′ ∈ N ( t ) Σ m ′ ≠ m L g ( x ^ m t ′ , x ^ m ′ t ′ ) ( M - 1 ) | N ( t ) |
Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.
Embodiment
As shown in Figure 1, the step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) the initialized depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out dynamically/the classification of static state to the pixel of every two field picture;
3) for step 2) the dynamic and static state pixel that divides, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
Described step 1) is:
(1) multi-view geometry methods combining DAISY feature descriptor is utilized, binocular video frame for the synchronization such as shown in Fig. 2 (a) He Fig. 2 (b) carries out Stereo matching, the initialization depth map of each time chart picture frame is solved, as shown in Fig. 2 (c) by energy-optimised equation;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two, as shown in Fig. 2 (d);
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; And, when there is mistake in the initialization depth value of invisible image vegetarian refreshments, Mean-shift technology is utilized to split every two field picture, for each cut zone, the degree of depth of visible image vegetarian refreshments is utilized to carry out fit Plane, the plane of matching is utilized to fill up the depth value correcting invisible image vegetarian refreshments, as shown in Fig. 2 (e);
Described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized to be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, as shown in Fig. 3 (a);
(2) dynamic probability figure binaryzation is obtained the initially dynamically/static segmentation figure of every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out, as shown in Fig. 3 (b) to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, dynamic/static segmentation figure of each time chart picture frame is optimized, as shown in Fig. 3 (c) by energy-optimised equation; For Fig. 3 (c), utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential, as shown in Fig. 3 (d);
Described step 3) is:
(1) for static point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic point space-time consistency depth optimization method as shown in Figure 5, suppose pixel candidate's degree of depth be d, be first projected to the video m ' of synchronization t according to d, obtained corresponding pixel points relatively with color and Geometrical consistency; Next, light stream is utilized to incite somebody to action with track to adjacent moment t ' and obtain corresponding pixel points with relatively with color and Geometrical consistency; Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, the dynamic pixel point depth value in the every two field picture of energy-optimised equation optimization is utilized thus on time-space domain, obtain depth map consistent on time-space domain, as shown in Fig. 4 (c) He Fig. 4 (f).

Claims (4)

1., for a method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting, it is characterized in that its step is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) step 1 is utilized) the initialization depth map that obtains calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;
3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
2., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 1) be:
(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:
Wherein represent M many orders synchronized video frames in t, represent the picture frame of the t of m video, represent the depth map of the t of m video; be data item, represent middle pixel and basis calculate dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:
Wherein be used to the penalty of the DAISY characteristic similarity estimating respective pixel, represent pixel dAISY feature descriptor, be utilize be projected to in projected position; be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:
Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:
Wherein represent ? in whether visible, 1 represent visible, 0 expression invisible; δ dthe threshold value of depth difference, by utilizing will be projected to on to calculate; Utilize the visuality figure obtained, to each pixel calculated population is visual if all invisible in t all the other frame of video all, then be 0, otherwise be 1;
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine the data item of pixel:
Wherein σ dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked.
3., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:
Wherein heuristic function be used for judging at all the other frames whether upper geometry is consistent with color; First compare with correspondence position depth value difference, if ? on depth value with the degree of depth dissimilar, then think that geometry is inconsistent, if with depth value similar, then compare its color value, if color similarity, then think with color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;
(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4 pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:
Wherein represent from t to t ' the light stream side-play amount in moment, represent at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:
Wherein represent dynamic/static segmentation figure of video m at t frame; Data item E dbe defined as follows:
Level and smooth item E simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:
For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential.
4., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 3) be:
(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic pixel point suppose that its candidate's degree of depth is d, be first projected to the video m ' of synchronization t according to d, obtain corresponding pixel points relatively with color and Geometrical consistency, its computing formula is as follows:
Wherein estimate with colour consistency, its computing formula is as follows:
σ ccontrol the susceptibility of color distortion,
estimate with geometrical consistency, its computing formula is as follows:
σ gthe susceptibility of controlling depth difference, symmetrical projection error computing function d gwill be projected to the video m ' of synchronization t projected position and calculate its with distance, calculate simultaneously the projected position being projected to t m video with distance, then calculate both average distance;
Next, light stream is utilized to incite somebody to action with track to adjacent moment t ' and obtain corresponding pixel points with relatively with color and Geometrical consistency, its computing formula is as follows:
Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:
Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.
CN201210360976.0A 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera Active CN103002309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210360976.0A CN103002309B (en) 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210360976.0A CN103002309B (en) 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Publications (2)

Publication Number Publication Date
CN103002309A CN103002309A (en) 2013-03-27
CN103002309B true CN103002309B (en) 2014-12-24

Family

ID=47930367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210360976.0A Active CN103002309B (en) 2012-09-25 2012-09-25 Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera

Country Status (1)

Country Link
CN (1) CN103002309B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105229706B (en) * 2013-05-27 2018-04-24 索尼公司 Image processing apparatus, image processing method and program
CN104899855A (en) * 2014-03-06 2015-09-09 株式会社日立制作所 Three-dimensional obstacle detection method and apparatus
EP3007130A1 (en) 2014-10-08 2016-04-13 Thomson Licensing Method and apparatus for generating superpixel clusters
CN106296696B (en) * 2016-08-12 2019-05-24 深圳市利众信息科技有限公司 The processing method and image capture device of color of image consistency
CN106887015B (en) * 2017-01-19 2019-06-11 华中科技大学 It is a kind of based on space-time consistency without constraint polyphaser picture matching process
CN107507236B (en) * 2017-09-04 2018-08-03 北京建筑大学 The progressive space-time restriction alignment schemes of level and device
CN108322730A (en) * 2018-03-09 2018-07-24 嘀拍信息科技南通有限公司 A kind of panorama depth camera system acquiring 360 degree of scene structures
CN109410145B (en) * 2018-11-01 2020-12-18 北京达佳互联信息技术有限公司 Time sequence smoothing method and device and electronic equipment
CN110782490B (en) * 2019-09-24 2022-07-05 武汉大学 Video depth map estimation method and device with space-time consistency
CN112738423B (en) * 2021-01-19 2022-02-25 深圳市前海手绘科技文化有限公司 Method and device for exporting animation video

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945299A (en) * 2010-07-09 2011-01-12 清华大学 Camera-equipment-array based dynamic scene depth restoring method
CN102074020A (en) * 2010-12-31 2011-05-25 浙江大学 Method for performing multi-body depth recovery and segmentation on video

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945299A (en) * 2010-07-09 2011-01-12 清华大学 Camera-equipment-array based dynamic scene depth restoring method
CN102074020A (en) * 2010-12-31 2011-05-25 浙江大学 Method for performing multi-body depth recovery and segmentation on video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Consistent Depth Maps Recovery from a Video Sequence;Guoeng Zhang eta;《IEEE TRANSACTIONS ON PATTERN ANALYSISI AND MACHINE INTELLIGENCE》;20090630;974-988 *
基于能量最小化扩展深度的实现方法;姜晓红等;《中国图象图形学报》;20061231;1854-1858 *

Also Published As

Publication number Publication date
CN103002309A (en) 2013-03-27

Similar Documents

Publication Publication Date Title
CN103002309B (en) Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera
CN102903096B (en) Monocular video based object depth extraction method
Russell et al. Video pop-up: Monocular 3d reconstruction of dynamic scenes
Zhang et al. Semantic segmentation of urban scenes using dense depth maps
CN103400409B (en) A kind of coverage 3D method for visualizing based on photographic head attitude Fast estimation
CN102750711B (en) A kind of binocular video depth map calculating method based on Iamge Segmentation and estimation
CN109242950B (en) Multi-view human dynamic three-dimensional reconstruction method under multi-person tight interaction scene
CN102074020B (en) Method for performing multi-body depth recovery and segmentation on video
CN104616286A (en) Fast semi-automatic multi-view depth restoring method
Tung et al. Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo
CN103049929A (en) Multi-camera dynamic scene 3D (three-dimensional) rebuilding method based on joint optimization
CN101765019A (en) Stereo matching algorithm for motion blur and illumination change image
KR101125061B1 (en) A Method For Transforming 2D Video To 3D Video By Using LDI Method
Tran et al. View synthesis based on conditional random fields and graph cuts
CN102724530B (en) Three-dimensional method for plane videos based on feedback control
CN107578419A (en) A kind of stereo-picture dividing method based on uniformity contours extract
Doulamis et al. Unsupervised semantic object segmentation of stereoscopic video sequences
Lei et al. A new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering
Liu et al. Disparity Estimation in Stereo Sequences using Scene Flow.
Nam et al. Improved depth estimation algorithm via superpixel segmentation and graph-cut
Abdein et al. Self-supervised learning of optical flow, depth, camera pose and rigidity segmentation with occlusion handling
Turetken et al. Temporally consistent layer depth ordering via pixel voting for pseudo 3D representation
Li et al. 3D building extraction with semi-global matching from stereo pair worldview-2 satellite imageries
Kim et al. Accurate ground-truth depth image generation via overfit training of point cloud registration using local frame sets
Aung Computing the Three Dimensional Depth Measurement by the Multi Stereo Images

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210707

Address after: Room 288-8, 857 Shixin North Road, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province

Patentee after: ZHEJIANG SHANGTANG TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: 310027 No. 38, Zhejiang Road, Hangzhou, Zhejiang, Xihu District

Patentee before: ZHEJIANG University

TR01 Transfer of patent right