CN103002309B - Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera - Google Patents
Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera Download PDFInfo
- Publication number
- CN103002309B CN103002309B CN201210360976.0A CN201210360976A CN103002309B CN 103002309 B CN103002309 B CN 103002309B CN 201210360976 A CN201210360976 A CN 201210360976A CN 103002309 B CN103002309 B CN 103002309B
- Authority
- CN
- China
- Prior art keywords
- depth
- dynamic
- pixel
- video
- consistency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000011084 recovery Methods 0.000 title claims abstract description 30
- 230000001360 synchronised effect Effects 0.000 title claims abstract description 9
- 230000003068 static effect Effects 0.000 claims abstract description 56
- 238000005457 optimization Methods 0.000 claims abstract description 35
- 241000132023 Bellis perennis Species 0.000 claims abstract description 24
- 235000005633 Chrysanthemum balsamita Nutrition 0.000 claims abstract description 24
- 101100126955 Arabidopsis thaliana KCS2 gene Proteins 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 5
- 230000011218 segmentation Effects 0.000 claims description 26
- 230000000007 visual effect Effects 0.000 claims description 23
- 238000005516 engineering process Methods 0.000 claims description 15
- 238000000638 solvent extraction Methods 0.000 claims description 7
- 238000009499 grossing Methods 0.000 claims description 3
- 239000003086 colorant Substances 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 2
- 241000350158 Prioria balsamifera Species 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Landscapes
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a depth recovery method for time-space consistency of dynamic scene videos shot by a multi-view synchronous camera. A multi-view geometric method is combined with DAISY feature vectors, and three-dimensional matching is performed to multi-view video frames in a same time period to obtain an initial depth image of the multi-view video at each time; dynamic probability graph of each image frame of the multi-view video is calculated, dynamic pixel points and static pixel points division is performed to each image frame according to the dynamic probability graph, different optimization methods are used for depth optimization of time-space consistency, for static points, a bundle optimization method is used to combine a plurality of colors at adjacent time with geometric consistency bundles for optimization; for dynamic points, corresponding pixel point color and geometric consistency bundle information among the multi-view synchronous camera at multiple adjacent time are counted, and time-space consistency optimization is performed to dynamic depth value at each time. The depth recovery method has high application value in fields of 3D (three-dimensional) images, 3D animation, augment reality, motion capture and the like.
Description
Technical field
The present invention relates to Stereo matching and depth recovery method, particularly relate to a kind of method of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera.
Background technology
The dense depth recovery technology of video is one of the basic technology in computer middle level field, and it has and important application in the various fields such as 3D modeling, 3D image, augmented reality and capturing movement.These application require that depth recovery result has very high accuracy and space-time consistency usually.
The difficult point of the dense depth recovery technology of video is: for the Static and dynamic object in scene, the depth value recovered has very high precision and space-time consistency.Although the depth information with degree of precision can have been recovered for the depth recovery technology of static scene at present, but nature is filled with the object of motion everywhere, for the dynamic object comprised in video scene, existing depth recovery method is all difficult to reach the consistency on higher precision and time-space domain.These methods require that the synchronization camera of more multiple fixed placement is caught scene usually, utilize the method for multi-view geometry to carry out Stereo matching, thus recover the depth information in each moment in each moment to synchronous many orders frame of video.And this image pickup method be mostly the shooting work being applied to dynamic scene in laboratory, in actual photographed process, this screening-mode has a lot of restriction.Existing method is optimized in the process of the degree of depth in sequential in addition, usually light stream is utilized to search out not corresponding pixel points in frame of video in the same time, then the depth value of corresponding points or 3D point position are carried out linear or curve, thus estimate the depth information of current frame pixel point.In this time domain, the method for 3D SmoothNumerical TechniqueandIt can only make the degree of depth of corresponding pixel points in sequential more consistent, can not the real depth value accurately of optimization; Simultaneously because the ubiquity of not robustness is estimated in light stream, make the depth optimization problem of dynamic point become more complexity and be difficult to resolve.
Existing video depth restoration methods is mainly divided into two large classes:
1. for the time domain consistency depth recovery of monocular static scene video
These class methods comparatively typically Zhang in 09 year propose method: G.Zhang, J.Jia, T.-T.Wong, and H.Bao.Consistent depth maps recovery from a video sequence.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (6): 974-988,2009..First the method utilizes the degree of depth of the every two field picture of method initialization of traditional multi-view geometry, in time domain, then utilize the geometry in bundle optimization stroke analysis multiple moment and colour consistency to optimize the degree of depth of present frame.The method can recover high accuracy depth figure for static scene; For the scene comprising dynamic object, the method can not the depth value of Restoration dynamics object.
2. for the depth recovery of many orders dynamic scene video
The method of these class methods comparatively typically Zitnick: C.L.Zitnick, S.B.Kang, M.Uyttendaele, S.Winder, and R.Szeliski.High-quality video view interpolation using a layered representation.ACM Transactions on Graphics, 23:600-608, August 2004., the method of Larsen: E.S.Larsen, P.Mordohai, M.Pollefeys, and H.Fuchs.Temporally consistent reconstruction from multiple video streams using enhanced belief propagation.In ICCV, pages 1-8, 2007. and the method for Lei: C.Lei, X.D.Chen, and Y.H.Yang.A new multi-view spacetime-consistent depth recovery framework for free viewpoint video rendering.In ICCV, pages 1570-1577, 2009..These methods all utilize many orders synchronized video frames of synchronization to recover depth map, require the synchronization camera shooting dynamic scene utilizing the fixed placement of greater number, are not suitable for outdoor actual photographed.The method of Larsen and Lei utilizes the method for energy-optimised on time-space domain and time domain 3D SmoothNumerical TechniqueandIts to optimize depth value respectively, makes the inadequate robust of these methods, can not process the situation that light stream estimates to produce gross error.
Step 1) for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting employs the DAISY feature descriptor that Tola proposes: E.Tola, V.Lepetit, and P.Fua.Daisy:An efficient dense descriptor applied to wide-baseline stereo.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (5): 815-830,2010.
The step 1) of method and step 2 for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) employ the Mean-shift technology that Comaniciu proposes: D.Comaniciu, P.Meer, and S.Member.Mean shift:A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:603-619,2002.
The step 2 of method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) employ the Grabcut technology that Rother proposes: C.Rother, V.Kolmogorov, and A.Blake. " grabcut ": interactive foreground extraction using iterated graph cuts.ACM Transactions on Graphics, 23:309-314, August 2004.
Step 1), the step 2 of method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting) and step 3) employ the energy equation optimisation technique of Felzenszwalb proposition: P.F.Felzenszwalb and D.P.Huttenlocher.Efficient belief propagation for early vision.International Journal of Computer Vision, 70 (1): 41-54,2006.
Summary of the invention
The object of the invention is to for the deficiencies in the prior art, a kind of method of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera is provided.
Step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) the initialization depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;
3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
Described step 1) is:
(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:
Wherein
represent M many orders synchronized video frames in t,
represent the picture frame of the t of m video,
represent the depth map of the t of m video;
be data item, represent
middle pixel and basis
calculate
dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:
Wherein
be used to the penalty of the DAISY characteristic similarity estimating respective pixel,
represent pixel
dAISY feature descriptor,
be
utilize
be projected to
in projected position;
be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:
Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:
Wherein
represent
?
in whether visible, 1 represent visible, 0 expression invisible; δ
dthe threshold value of depth difference,
by utilizing
will
be projected to
on to calculate; Utilize the visuality figure obtained, to each pixel
calculated population is visual
if
all invisible in t all the other frame of video all, then
be 0, otherwise
be 1;
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when
pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize
the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine
the data item of pixel:
Wherein σ
dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel
coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked;
Described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized
be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:
Wherein heuristic function
be used for judging
at all the other frames
whether upper geometry is consistent with color; First compare
with correspondence position
depth value difference, if
?
on depth value with
the degree of depth dissimilar, then think that geometry is inconsistent, if
with
depth value similar, then compare its color value, if color similarity, then think
with
color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;
(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4
pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:
Wherein
represent
from t to t ' the light stream side-play amount in moment,
represent
at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:
Wherein
represent dynamic/static segmentation figure of video m at t frame; Data item E
dbe defined as follows:
Level and smooth item E
simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:
For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential;
Described step 3) is:
(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic pixel point
suppose that its candidate's degree of depth is d, be first projected to the video m of synchronization t according to d, obtain corresponding pixel points
relatively
with
color and Geometrical consistency, its computing formula is as follows:
Wherein
estimate
with
colour consistency, its computing formula is as follows:
σ
ccontrol the susceptibility of color distortion,
estimate
with
geometrical consistency, its computing formula is as follows:
σ
gthe susceptibility of controlling depth difference, symmetrical projection error computing function d
gwill
be projected to the video m ' of synchronization t projected position and calculate its with
distance, calculate simultaneously
the projected position being projected to t m video with
distance, then calculate both average distance;
Next, light stream is utilized to incite somebody to action
with
track to adjacent moment t ' and obtain corresponding pixel points
with
relatively
with
color and Geometrical consistency, its computing formula is as follows:
Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:
Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.
The present invention is for the dynamic object comprised in video scene, existing depth recovery method is all difficult to reach the consistency on higher precision and time-space domain, these methods require that the synchronization camera of more multiple fixed placement is caught scene usually, this image pickup method is mostly is the shooting work being applied to dynamic scene in laboratory, and in actual photographed process, this screening-mode has a lot of restriction; The method of the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera proposed by the invention can recover the accurate depth figure in each moment for the dynamic and static state object in how visual frequency, also can keep the high consistency of depth map between multiple moment.The method allows multi-lens camera freely to move independently, and allows the dynamic scene that the video camera of fewer number of (only 2) is taken, more practical in actual photographed process.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of space-time consistency depth recovery of the dynamic scene video for the shooting of many orders synchronization camera;
Fig. 2 (a) is a two field picture of video sequence;
Fig. 2 (b) is a two field picture synchronous with Fig. 2 (a);
Fig. 2 (c) is the initialization depth map of Fig. 2 (a);
Fig. 2 (d) is the visuality figure utilizing Fig. 2 (a) and Fig. 2 (b) to estimate;
Fig. 2 (e) is the initialization depth map utilizing Fig. 2 (d) to carry out plane fitting correction;
Fig. 3 (a) is the dynamic probability figure of Fig. 2 (a);
Fig. 3 (b) is that Fig. 3 (a) utilizes Mean-shift to split dynamic/static segmentation figure carried out after boundary adjustment and denoising through binaryzation;
Fig. 3 (c) is through the segmentation figure that time domain is optimized;
Fig. 3 (d) is through the segmentation figure of Grabcut technical optimization;
Fig. 3 (e) is the partial enlarged drawing of boxed area in Fig. 3 (a-d);
Fig. 4 (a) is a two field picture of video sequence;
Fig. 4 (b) is dynamic/static segmentation figure of Fig. 4 (a);
Fig. 4 (c) is the depth map of Fig. 4 (a) after space-time consistency is optimized;
Fig. 4 (d) is the partial enlarged drawing of boxed area in Fig. 4 (a) and Fig. 4 (c);
Fig. 4 (e) is another two field picture of video sequence;
Fig. 4 (f) is the depth map results that Fig. 4 (e) optimizes through space-time consistency;
Fig. 4 (g) is the result after the 3D model of place and texture mapping utilizing Fig. 4 (f) to reconstruct;
Fig. 5 is the schematic diagram of space-time consistency depth optimization.
Embodiment
Step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) the initialization depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;
3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
Described step 1) is:
(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:
Wherein
represent M many orders synchronized video frames in t,
represent the picture frame of the t of m video,
represent the depth map of the t of m video;
be data item, represent
middle pixel and basis
calculate
dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:
Wherein
be used to the penalty of the DAISY characteristic similarity estimating respective pixel,
represent pixel
dAISY feature descriptor,
be
utilize
be projected to
in projected position;
be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:
Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:
Wherein
represent
?
in whether visible, 1 represent visible, 0 expression invisible; δ
dthe threshold value of depth difference,
by utilizing
will
be projected to
on to calculate; Utilize the visuality figure obtained, to each pixel
calculated population is visual
if
all invisible in t all the other frame of video all, then
be 0, otherwise
be 1;
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when
pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize
the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine
the data item of pixel:
Wherein σ
dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel
coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked;
Described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized
be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:
Wherein heuristic function
be used for judging
at all the other frames
whether upper geometry is consistent with color; First compare
with correspondence position
depth value difference, if
?
on depth value with
the degree of depth dissimilar, then think that geometry is inconsistent, if
with
depth value similar, then compare its color value, if color similarity, then think
with
color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;
(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4
pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:
Wherein
represent
from t to t ' the light stream side-play amount in moment,
represent
at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:
Wherein
represent dynamic/static segmentation figure of video m at t frame; Data item E
dbe defined as follows:
Level and smooth item E
simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:
For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential;
Described step 3) is:
(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic pixel point
suppose that its candidate's degree of depth is d, be first projected to the video m ' of synchronization t according to d, obtain corresponding pixel points
relatively
with
color and Geometrical consistency, its computing formula is as follows:
Wherein
estimate
with
colour consistency, its computing formula is as follows:
σ
ccontrol the susceptibility of color distortion,
estimate
with
geometrical consistency, its computing formula is as follows:
σ
gthe susceptibility of controlling depth difference, symmetrical projection error computing function d
gwill
be projected to the video m ' of synchronization t projected position and calculate its with
distance, calculate simultaneously
the projected position being projected to t m video with
distance, then calculate both average distance;
Next, light stream is utilized to incite somebody to action
with
track to adjacent moment t ' and obtain corresponding pixel points
with
relatively
with
color and Geometrical consistency, its computing formula is as follows:
Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:
Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.
Embodiment
As shown in Figure 1, the step for the method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) the initialized depth map utilizing step 1) to obtain calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out dynamically/the classification of static state to the pixel of every two field picture;
3) for step 2) the dynamic and static state pixel that divides, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
Described step 1) is:
(1) multi-view geometry methods combining DAISY feature descriptor is utilized, binocular video frame for the synchronization such as shown in Fig. 2 (a) He Fig. 2 (b) carries out Stereo matching, the initialization depth map of each time chart picture frame is solved, as shown in Fig. 2 (c) by energy-optimised equation;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two, as shown in Fig. 2 (d);
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; And, when there is mistake in the initialization depth value of invisible image vegetarian refreshments, Mean-shift technology is utilized to split every two field picture, for each cut zone, the degree of depth of visible image vegetarian refreshments is utilized to carry out fit Plane, the plane of matching is utilized to fill up the depth value correcting invisible image vegetarian refreshments, as shown in Fig. 2 (e);
Described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized to be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, as shown in Fig. 3 (a);
(2) dynamic probability figure binaryzation is obtained the initially dynamically/static segmentation figure of every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out, as shown in Fig. 3 (b) to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, dynamic/static segmentation figure of each time chart picture frame is optimized, as shown in Fig. 3 (c) by energy-optimised equation; For Fig. 3 (c), utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential, as shown in Fig. 3 (d);
Described step 3) is:
(1) for static point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic point space-time consistency depth optimization method as shown in Figure 5, suppose pixel
candidate's degree of depth be d, be first projected to the video m ' of synchronization t according to d, obtained corresponding pixel points
relatively
with
color and Geometrical consistency; Next, light stream is utilized to incite somebody to action
with
track to adjacent moment t ' and obtain corresponding pixel points
with
relatively
with
color and Geometrical consistency; Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, the dynamic pixel point depth value in the every two field picture of energy-optimised equation optimization is utilized thus on time-space domain, obtain depth map consistent on time-space domain, as shown in Fig. 4 (c) He Fig. 4 (f).
Claims (4)
1., for a method for the space-time consistency depth recovery of the dynamic scene video of many orders synchronization camera shooting, it is characterized in that its step is as follows:
1) utilize multi-view geometry methods combining DAISY characteristic vector, the many orders frame of video for synchronization carries out Stereo matching, obtains the initialization depth map in how visual each moment of frequency;
2) step 1 is utilized) the initialization depth map that obtains calculates dynamic probability figure for each two field picture of how visual frequency, and utilizes dynamic probability figure to carry out the division of dynamic pixel point and static pixels point to every two field picture;
3) for step 2) the dynamic pixel point that divides and static pixels point, different optimization methods is utilized to carry out the depth optimization of space-time consistency, for static pixels point, the color of the multiple adjacent moment of bundle optimization methods combining and Geometrical consistency constraint is utilized to be optimized; For dynamic pixel point, between the multi-lens camera adding up multiple adjacent moment, the color of corresponding pixel points and Geometrical consistency constraint information, carry out space-time consistency optimization to each moment dynamic depth value thus.
2., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 1) be:
(1) utilize multi-view geometry methods combining DAISY feature descriptor, the many orders frame of video for synchronization carries out Stereo matching, is solved the initialization depth map of each time chart picture frame by following energy-optimised equation:
Wherein
represent M many orders synchronized video frames in t,
represent the picture frame of the t of m video,
represent the depth map of the t of m video;
be data item, represent
middle pixel and basis
calculate
dAISY characteristic similarity between middle remaining image frame subpoint, its computing formula is as follows:
Wherein
be used to the penalty of the DAISY characteristic similarity estimating respective pixel,
represent pixel
dAISY feature descriptor,
be
utilize
be projected to
in projected position;
be level and smooth item, represent the depth smooth degree between neighbor x, y, its computing formula is as follows:
Wherein smoothing weights λ is 0.008, and the cutoff value η of depth difference is 3;
(2) utilize the initialization degree of depth of many orders frame of video consistency in the 3 d space whether visible in all the other video cameras of synchronization to judge each pixel in every two field picture, thus obtain the multiple video camera of synchronization visuality figure between any two; The computing formula of visual figure is as follows:
Wherein
represent
?
in whether visible, 1 represent visible, 0 expression invisible; δ
dthe threshold value of depth difference,
by utilizing
will
be projected to
on to calculate; Utilize the visuality figure obtained, to each pixel
calculated population is visual
if
all invisible in t all the other frame of video all, then
be 0, otherwise
be 1;
(3) combine the depth map that the visual figure tried to achieve reinitializes every two field picture, DAISY characteristic similarity only compares estimation at visible pixel lattice point; Further, when
pixel initialization depth value occur mistake when, utilize Mean-shift technology to split every two field picture, for each cut zone, utilize
the degree of depth of pixel carry out the plane that fitting parameter is [a, b, c], utilize the plane of matching to redefine
the data item of pixel:
Wherein σ
dbe used for the susceptibility of control data item for the range difference of depth value and fit Plane, x and y is pixel
coordinate figure; Utilize the data item redefined to carry out energy-optimised, thus correct the wrong depth value of the pixel that is blocked.
3., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 2) be:
(1) for the pixel in every two field picture, the initialization degree of depth is utilized
be projected to all the other moment frames, the geometry of the correspondence position of compared pixels point on current time frame and all the other moment frames and consistency of colour, statistics depth value and color value all the other ratio values shared by moment frame number consistent, the probable value of dynamic object is belonged to as pixel, thus obtain the dynamic probability figure of every two field picture, its computing formula is as follows:
Wherein heuristic function
be used for judging
at all the other frames
whether upper geometry is consistent with color; First compare
with correspondence position
depth value difference, if
?
on depth value with
the degree of depth dissimilar, then think that geometry is inconsistent, if
with
depth value similar, then compare its color value, if color similarity, then think
with
color value consistent, otherwise think that color is inconsistent; Statistics has depth value and color value all the other ratios shared by moment frame number conforming, belongs to the probable value of dynamic object as pixel;
(2) being desired to make money or profit by dynamic probability by size is the threshold value η of 0.4
pcarry out initially dynamically/static segmentation figure that binaryzation obtains every two field picture; Mean-shift technology is utilized to carry out over-segmentation to every two field picture, namely the Iamge Segmentation that granularity is little, for the ratio value that the dynamic pixel after each cut zone statistics binaryzation is counted out, if ratio value is greater than 0.5, then the pixel of whole cut zone is labeled as dynamically, otherwise be labeled as static state, thus boundary adjustment and denoising carried out to binarization segmentation figure;
(3) the coordinate offset amount of corresponding pixel points between consecutive hours needle drawing picture is utilized, the adjacent moment frame tracked to by the pixel of every two field picture in same video finds corresponding pixel points, the ratio of statistics corresponding pixel points dividing mark shared by dynamic frame number, the time domain dynamic probability of calculating pixel point thus, its computing formula is as follows:
Wherein
represent
from t to t ' the light stream side-play amount in moment,
represent
at dynamic/static dividing mark of t ' moment corresponding pixel points, N (t) represents continuous 5 adjacent moment frames before and after t; Utilize time domain dynamic probability, optimized dynamic/static segmentation figure of each time chart picture frame by following energy-optimised equation:
Wherein
represent dynamic/static segmentation figure of video m at t frame; Data item E
dbe defined as follows:
Level and smooth item E
simpel partitioning boundary and image boundary consistent as far as possible, it is defined as follows:
For dynamic/static segmentation figure after energy-optimised, utilize Grabcut cutting techniques to optimize further, the burr on removing partitioning boundary, obtain dynamically consistent/static division in final sequential.
4., according to the method for the space-time consistency depth recovery of a kind of dynamic scene video for the shooting of many orders synchronization camera described in claim 1, it is characterized in that described step 3) be:
(1) for static pixels point, utilize the color on bundle optimization method statistic current time frame pixel and the multiple adjacent moment frame of how visual frequency between corresponding pixel points and Geometrical consistency constraint information, thus current time static depth value is optimized;
(2) for dynamic pixel point
suppose that its candidate's degree of depth is d, be first projected to the video m ' of synchronization t according to d, obtain corresponding pixel points
relatively
with
color and Geometrical consistency, its computing formula is as follows:
Wherein
estimate
with
colour consistency, its computing formula is as follows:
σ
ccontrol the susceptibility of color distortion,
estimate
with
geometrical consistency, its computing formula is as follows:
σ
gthe susceptibility of controlling depth difference, symmetrical projection error computing function d
gwill
be projected to the video m ' of synchronization t projected position and calculate its with
distance, calculate simultaneously
the projected position being projected to t m video with
distance, then calculate both average distance;
Next, light stream is utilized to incite somebody to action
with
track to adjacent moment t ' and obtain corresponding pixel points
with
relatively
with
color and Geometrical consistency, its computing formula is as follows:
Accumulate color and the Geometrical consistency estimated value of multiple adjacent moment, redefine the energy equation data item for dynamic pixel point depth optimization thus:
Utilize the data item redefined to carry out energy-optimised equation to solve, thus on time-space domain, optimize the dynamic pixel point depth value in every two field picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210360976.0A CN103002309B (en) | 2012-09-25 | 2012-09-25 | Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210360976.0A CN103002309B (en) | 2012-09-25 | 2012-09-25 | Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103002309A CN103002309A (en) | 2013-03-27 |
CN103002309B true CN103002309B (en) | 2014-12-24 |
Family
ID=47930367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210360976.0A Active CN103002309B (en) | 2012-09-25 | 2012-09-25 | Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103002309B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105229706B (en) * | 2013-05-27 | 2018-04-24 | 索尼公司 | Image processing apparatus, image processing method and program |
CN104899855A (en) * | 2014-03-06 | 2015-09-09 | 株式会社日立制作所 | Three-dimensional obstacle detection method and apparatus |
EP3007130A1 (en) | 2014-10-08 | 2016-04-13 | Thomson Licensing | Method and apparatus for generating superpixel clusters |
CN106296696B (en) * | 2016-08-12 | 2019-05-24 | 深圳市利众信息科技有限公司 | The processing method and image capture device of color of image consistency |
CN106887015B (en) * | 2017-01-19 | 2019-06-11 | 华中科技大学 | It is a kind of based on space-time consistency without constraint polyphaser picture matching process |
CN107507236B (en) * | 2017-09-04 | 2018-08-03 | 北京建筑大学 | The progressive space-time restriction alignment schemes of level and device |
CN108322730A (en) * | 2018-03-09 | 2018-07-24 | 嘀拍信息科技南通有限公司 | A kind of panorama depth camera system acquiring 360 degree of scene structures |
CN109410145B (en) * | 2018-11-01 | 2020-12-18 | 北京达佳互联信息技术有限公司 | Time sequence smoothing method and device and electronic equipment |
CN110782490B (en) * | 2019-09-24 | 2022-07-05 | 武汉大学 | Video depth map estimation method and device with space-time consistency |
CN112738423B (en) * | 2021-01-19 | 2022-02-25 | 深圳市前海手绘科技文化有限公司 | Method and device for exporting animation video |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101945299A (en) * | 2010-07-09 | 2011-01-12 | 清华大学 | Camera-equipment-array based dynamic scene depth restoring method |
CN102074020A (en) * | 2010-12-31 | 2011-05-25 | 浙江大学 | Method for performing multi-body depth recovery and segmentation on video |
-
2012
- 2012-09-25 CN CN201210360976.0A patent/CN103002309B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101945299A (en) * | 2010-07-09 | 2011-01-12 | 清华大学 | Camera-equipment-array based dynamic scene depth restoring method |
CN102074020A (en) * | 2010-12-31 | 2011-05-25 | 浙江大学 | Method for performing multi-body depth recovery and segmentation on video |
Non-Patent Citations (2)
Title |
---|
Consistent Depth Maps Recovery from a Video Sequence;Guoeng Zhang eta;《IEEE TRANSACTIONS ON PATTERN ANALYSISI AND MACHINE INTELLIGENCE》;20090630;974-988 * |
基于能量最小化扩展深度的实现方法;姜晓红等;《中国图象图形学报》;20061231;1854-1858 * |
Also Published As
Publication number | Publication date |
---|---|
CN103002309A (en) | 2013-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103002309B (en) | Depth recovery method for time-space consistency of dynamic scene videos shot by multi-view synchronous camera | |
CN102903096B (en) | Monocular video based object depth extraction method | |
Russell et al. | Video pop-up: Monocular 3d reconstruction of dynamic scenes | |
Zhang et al. | Semantic segmentation of urban scenes using dense depth maps | |
CN103400409B (en) | A kind of coverage 3D method for visualizing based on photographic head attitude Fast estimation | |
CN102750711B (en) | A kind of binocular video depth map calculating method based on Iamge Segmentation and estimation | |
CN109242950B (en) | Multi-view human dynamic three-dimensional reconstruction method under multi-person tight interaction scene | |
CN102074020B (en) | Method for performing multi-body depth recovery and segmentation on video | |
CN104616286A (en) | Fast semi-automatic multi-view depth restoring method | |
Tung et al. | Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo | |
CN103049929A (en) | Multi-camera dynamic scene 3D (three-dimensional) rebuilding method based on joint optimization | |
CN101765019A (en) | Stereo matching algorithm for motion blur and illumination change image | |
KR101125061B1 (en) | A Method For Transforming 2D Video To 3D Video By Using LDI Method | |
Tran et al. | View synthesis based on conditional random fields and graph cuts | |
CN102724530B (en) | Three-dimensional method for plane videos based on feedback control | |
CN107578419A (en) | A kind of stereo-picture dividing method based on uniformity contours extract | |
Doulamis et al. | Unsupervised semantic object segmentation of stereoscopic video sequences | |
Lei et al. | A new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering | |
Liu et al. | Disparity Estimation in Stereo Sequences using Scene Flow. | |
Nam et al. | Improved depth estimation algorithm via superpixel segmentation and graph-cut | |
Abdein et al. | Self-supervised learning of optical flow, depth, camera pose and rigidity segmentation with occlusion handling | |
Turetken et al. | Temporally consistent layer depth ordering via pixel voting for pseudo 3D representation | |
Li et al. | 3D building extraction with semi-global matching from stereo pair worldview-2 satellite imageries | |
Kim et al. | Accurate ground-truth depth image generation via overfit training of point cloud registration using local frame sets | |
Aung | Computing the Three Dimensional Depth Measurement by the Multi Stereo Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210707 Address after: Room 288-8, 857 Shixin North Road, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province Patentee after: ZHEJIANG SHANGTANG TECHNOLOGY DEVELOPMENT Co.,Ltd. Address before: 310027 No. 38, Zhejiang Road, Hangzhou, Zhejiang, Xihu District Patentee before: ZHEJIANG University |
|
TR01 | Transfer of patent right |