CN100592338C - Multi-visual angle video image depth detecting method and depth estimating method - Google Patents

Multi-visual angle video image depth detecting method and depth estimating method Download PDF

Info

Publication number
CN100592338C
CN100592338C CN200810300330A CN200810300330A CN100592338C CN 100592338 C CN100592338 C CN 100592338C CN 200810300330 A CN200810300330 A CN 200810300330A CN 200810300330 A CN200810300330 A CN 200810300330A CN 100592338 C CN100592338 C CN 100592338C
Authority
CN
China
Prior art keywords
search
pixel
depth
depth value
camera
Prior art date
Application number
CN200810300330A
Other languages
Chinese (zh)
Other versions
CN101231754A (en
Inventor
张小云
乔治L.杨
Original Assignee
四川虹微技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 四川虹微技术有限公司 filed Critical 四川虹微技术有限公司
Priority to CN200810300330A priority Critical patent/CN100592338C/en
Publication of CN101231754A publication Critical patent/CN101231754A/en
Application granted granted Critical
Publication of CN100592338C publication Critical patent/CN100592338C/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Abstract

The invention relates to a multi-view video image processing technology, and provides an adaptive determination method for search length of a step and a depth estimation method based on the adaptive search length. The multi-view video image depth search method is that the search length of each step within the depth search coverage is dynamically adjusted according to the current depth value so that the search length of each step corresponds to the identical pixel search precision. The multi-view video image depth estimation method is that the search length of each step is dynamically adjustedaccording to the current depth value within the depth search coverage in the depth-based image synthesis and the depth search based on block matching. The technical proposal provided by the inventionis applicable for multi-view video depth search and depth estimation. The depth search performance of the invention is higher than the fixed search length of the step. The absolute difference betweenthe synthetic image blocks and the reference image blocks is small with small misestimate, low calculated amount or low frequency of depth search.

Description

Multi-vision angle video image deep search method and depth estimation method

Technical field

The present invention relates to the multi-vision angle video image treatment technology.

Background technology

In recent years, researchers recognize gradually, following advanced three-dimensional television and any (FVV of visual angle Video Applications system, Free Viewpoint Video System) should utilize computer vision, Video processing in and based on technology such as the scene of depth image are synthetic, obtaining and showing to be provided with and separate video, promptly watch the visual angle unrestricted mutually, thereby dirigibility, interactivity and the operability of height are provided with the camera orientation that obtains video.The stereotelevision project in Europe has adopted data layout (" based on the three-dimensional television new method of synthesizing, compressing and transmitting of depth image ", stereo display and virtual reality system SPIE meeting, 2004. of video plus depth; C.Fehn, " Depth-image-based rendering (DIBR); compression and transmission for a new approach on 3D-TV; " in Proc.SPIE Conf.Stereoscopic Displays and Virtual Reality Systems XI, vol.5291, CA, U.S.A., Jan.2004, pp.93-104.), i.e. corresponding depth value of each pixel of image; Utilization is based on the view synthesizing method (DIBR:Depth Image Based Rendering) of depth image: the receiving end demoder is provided with and to watch the visual angle to generate stereo-picture right according to showing, watches the visual angle unrestricted mutually with the camera orientation that obtains video thereby make.In April, 2007 JVT meeting motion (" data layout of the multi-angle video plus depth of advanced 3 D video system "; A.Smolic and K.Mueller, et al., " Multi-View Video plus Depth (MVD) Format for Advanced 3DVideo Systems ", ISO/IEC JTC1/SC29/WG11, Doc.JVT-W100, San Jose, USA, April2007.) the video plus depth is generalized to multi-angle video, has proposed the multi-view coded data layout MVD (Multi-view video plus depth) of the video man degree of depth.Because MVD can satisfy an essential demand of advanced 3 D video or any visual angle Video Applications, the view at continuous any visual angle that promptly can be in decoding end generates certain limit, rather than a limited number of discrete views, so the MVD scheme of video plus depth is adopted by JVT, is confirmed as developing direction from now on.

So how the depth information that obtains scene from two width of cloth or several views at different visual angles becomes one of multi-angle video major issue handling.

Present deep search mode is: adopt fixing search step-length (uniform depth-grid) to carry out deep search in the fixing search scope.When using the fixing search step-length, as if the side-play amount of the step-size in search given at less depth value place corresponding to 1 pixel, then at big depth value place, the pixel-shift amount of this step-size in search correspondence will be less than 1 pixel.When supposing to project to non-integer pixel under given depth value, the pixel of getting arest neighbors is as subpoint, then will search same pixel at a plurality of different depth value places during deep search, repeat search promptly occurred.Conversely, if given step-size in search is the side-play amount corresponding to 1 pixel at big depth value place, then will be greater than 1 pixel in the pixel-shift amount of this step-size in search correspondence of less depth value place, be that adjacent two depth values will search two non-adjacent pixels, thereby make some pixel omission, the generation search is incomplete.So, be desirably in hunting zone [z originally Min, z Max] in N pixel of search, but owing to produced the pixel repeat search or leak to have searched for, actual search to the efficient search point that has to be less than N.In order to guarantee that the hunting zone comprises the possible value of institute of scene real depth value, usually establish the hunting zone enough greatly, and in order to guarantee certain search precision, establish step-size in search lessly, this has increased searching times and corresponding calculated amount greatly, and owing to leak the existence of search and repeat search, the search effect is also bad.

So far, existing a lot of research and the algorithm for estimating relevant with estimation of Depth, but most of by to correction, parallel stereo-picture is to carrying out disparity estimation earlier, concerns compute depth information according to the parallax and the degree of depth again.For example, only have horizontal parallax between two width of cloth images in the parallel camera system, utilize the method elder generation estimating disparity based on feature or piece coupling, the relation that is inversely proportional to according to the degree of depth and parallax calculates depth information then; And, then to just can obtain the depth map of original view correspondence to a series of processing such as correction, parallax coupling, depth calculation and anti-corrections through image for non-parallel camera system.Be exactly to carry out disparity estimation in such estimation of Depth question essence, its performance is mainly determined by the disparity estimation algorithm.As everyone knows, disparity estimation or three-dimensional coupling are the classical problems in the computer vision, though existing so far number of research projects and achievement, texture information lack or block caused coupling ambiguity or uncertainly make that the parallax matching problem still is research focus and the difficult point in the computer vision.

2006, JVT meeting motion (" multi-view video coding core experiment 3 reports "; S.Yea, J.Oh, S.Ince, E.Martinian and A.Vetro, " Report on Core Experiment CE3 of MultiviewCoding ", ISO/IEC JTC1/SC29/WG11, Doc.JVT-T123, Klagenfurt, Austria, July2006.) proposed to utilize camera internal and external parameter and synthetic based on the view of the degree of depth, with given step-size in search, search makes the degree of depth of synthesizing the error minimum between view and the actual view as estimated value in the depth range search of a certain appointment.People such as M.Okutom have proposed the solid matching method (A multiple-baseline stereo) of many baselines stero, this method is utilized the inverse relation of the degree of depth and parallax, disparity estimation is converted into the degree of depth finds the solution problem, and a uncertain difficult problem (" many baselines stero ", pattern-recognition and machine intelligence IEEE journal in the parallax coupling have been eliminated; M.Okutomi and K.Kanade, " A multiple-baseline stereo ", IEEE Trans.on Pattern Analysis andMachine Intelligence 15 (4): 353-363,1993.).People such as N.Kim have proposed directly to carry out deep search, coupling and view synthetic operation (" general many baselines stero and the direct view that utilizes the deep space search, mates and synthesize are synthetic ", the international periodical of computer vision at distance/deep space; N.Kim, M.Trivedi and H.Ishiguro, " Generalized multiple baseline stereo and direct view synthesis usingrange-space search; match; and render ", International Journal of ComputerVision 47 (1/2/3): 131-148,2002.): directly carry out deep search at deep space, do not need the parallax coupling, image correction process is directly finished in the deep search process, and depth value is a successive value, and its precision is subjected to the restriction of image pixel resolution unlike disparity vector.But in actual the finding the solution, need designated depth hunting zone and step-size in search, ask optimum solution according to a certain cost function, and whether the value of hunting zone and step-length is suitable most important to estimated performance.

In the parallax coupling, the parallax hunting zone is intuitively definite according to image property usually, and in the deep search, particularly in non-parallel camera system, because the relation of change in depth and image pixel skew is not apparent, so its hunting zone is difficult to rationally determine.

So, how given various visual angles view is determined the key that suitable deep search interval and step-length become effective estimating depth information.

JVT-WO59 (" the synthetic prediction of view core experiment 6 reports "; S.Yea and A.Vetro, " Report ofCE6 on View Synthesis Prediction ", ISO/IEC JTC1/SC29/WG11, Doc.JVT-WO59, SanJose, USA, April 2007.) propose to utilize the matching characteristic point of two width of cloth views right, minimum value from some groups of alternative deep search, choose in maximal value and the step-size in search make the matching characteristic point between a group of error minimum as depth range search and step-length, this method need be used KLT (Kanade-Lucas-Tomasi) algorithm (" detection and tracking of unique point ", Carnegie Mellon University's technical report; C.Tomasi, and T.Kanade, " Detection and tracking ofpoint features ", Technical Report CMU-CS-91-132, Carnegie Mellon University, 1991.) carry out the feature extraction coupling, performance depends on the correctness of characteristic matching.

People such as M.Okutom and N.Kim mentions the pairing change in depth value of 1 pixel-shift amount with the reference-view of long baseline as step-size in search, thereby the pixel-shift amount in every other reference-view of guaranteeing is less than 1 pixel.

Above-mentioned two kinds of methods all are to use fixing step-size in search, do not adjust step-length adaptively according to the variation of picture material or scene.

Summary of the invention

Technical matters to be solved by this invention is that the self-adaptation that proposes a kind of step-size in search is determined method, can avoid the pixel repeat search or leak search.In addition, the invention allows for a kind of depth estimation method based on the adaptable search step-length.

The present invention solves the problems of the technologies described above the technical scheme that is adopted to be, multi-vision angle video image deep search method is characterized in that the step-size in search in each step is dynamically adjusted according to current depth value in depth range search, current depth value is more little, and the step-size in search of employing is more little; Current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision; The length of pixel-shift vector each time during described pixel search precision equals to search for; Described pixel search precision can for minute pixel precision as 1/2nd pixels, 1/4th pixels, or whole pixel precision, as a pixel, two pixels; Described step-size in search equals in the search the pairing change in depth value of pixel-shift vector each time;

According to the relation of change in depth value and pixel-shift vector, described depth range search and step-size in search determined to be converted into determining of pixel hunting zone and pixel search precision;

The step-size in search of target view is according to computer vision perspective projection principle with based on the view composition principle of the degree of depth, determine by the camera internal and external parameter of the pixel-shift vector sum view correspondence in current depth value, the reference-view in the target view, in the target view step-size in search in each step in reference-view corresponding to the pixel-shift vector of equal length.Described target view is meant the current image that needs estimating depth, and described reference-view is meant other images in the multi-angle video system.Reference-view can be selected in the deep search process or be specified by the user automatically;

Step-size in search is obtained by following formula:

Wherein: P is a pixel for the treatment of estimation of Depth in the target view, and z is the current depth value of pixel P, and Δ z is that the change in depth value of pixel P is a step-size in search, Δ P rBe the change in depth value Δ z of pixel P in the target view corresponding pixel-shift vector in reference-view r, ‖ Δ P r2=Δ P r TΔ P r; With the matrix that is 3 * 3, be tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A rCamera inner parameter matrix for reference viewing angle; b 3And c 3It is respectively matrix B rAnd C rThe third line vector.For parallel camera system, square being directly proportional of described change in depth value and current depth value.

Pixel-shift vector in the described reference-view satisfies the polar curve equation of constraint of target visual angle and reference viewing angle: Δ P r T ( C r Δt r × B r ) P = 0 . There are two rightabout each other described pixel-shift vector Δ P rSatisfy described polar curve equation of constraint, the corresponding respectively depth value augment direction of described 2 pixel-shift vectors, depth value reduce direction; The pairing change in depth value of the offset vector of depth value augment direction reduces the pairing change in depth value of offset vector of direction greater than depth value.

The depth estimation method of multi-vision angle video image, utilize synthetic based on the view of the degree of depth and the deep search of joining based on piece in, the depth range search of target view and step-size in search are by the pixel hunting zone and the decision of pixel search precision of reference-view; In depth range search, the step-size in search in each step is dynamically adjusted according to current depth value, and current depth value is more little, and the step-size in search of employing is more little; Current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision; Described view based on the degree of depth is synthetic, be meant the pixel and the depth value of given target view, camera internal and external parameter according to target visual angle and reference viewing angle, this pixel back projection is arrived the three-dimensional scenic spatial point, again this spatial point is projected to again the method for the plane of delineation of reference viewing angle, obtain the synthetic view of target view in this reference viewing angle; Described based on

The view of the degree of depth synthesizes and is specially based on the deep search that piece is joined, and utilize current depth value to carry out view and synthesize, and the error between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view; Adopting the depth value of least error correspondence is the estimation of Depth value of target view;

Described deep search step-length determined by the camera internal and external parameter of the pixel-shift vector sum view correspondence in current depth value, the reference-view in the target view, in the target view step-size in search in each step in reference-view corresponding to the pixel-shift vector of equal length;

The depth estimation method of multi-vision angle video image specifically may further comprise the steps:

Deep search initial value z in the step 1 estimating target view k=0;

Step 2 is determined deep search corresponding to pixel hunting zone in the reference-view and pixel search precision, obtains pixel-shift vector Δ P in the reference-view according to the pixel search precision r

Step 3 is according to current depth value z kWith pixel-shift vector Δ P r, obtaining corresponding change in depth value Δ z, described change in depth value Δ z is next step step-size in search;

Step 4 is utilized current depth value z kIt is synthetic to carry out view, and the error e between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view k

Step 4 is upgraded current depth value z k=z k+ Δ z; K=k+1;

Step 5 judges whether to surpass given pixel hunting zone, enters step 6 in this way, as not, enters step 3;

Step 6 is with error e k(k=0 ..., N-1, N is the search total step number) in the depth value of least error correspondence be estimated value.

Described step-size in search is obtained by following formula:

Δz = ( zb 3 T P + c 3 T Δ t r ) 2 | | Δ P r | | 2 ΔP r T ( b 3 T PC r Δ t r - c 3 T Δ t r B r P ) - ( zb 3 T P + c 3 T Δ t r ) ( b 3 T P ) | | Δ P r | | 2

Wherein: P is a pixel for the treatment of estimation of Depth in the target view, and z is the current depth value of pixel P, and Δ z is that the change in depth value of pixel P is a step-size in search, Δ P rBe the change in depth value Δ z of pixel P in the target view corresponding pixel-shift vector in reference-view r, ‖ Δ P r2=Δ P r TΔ P r; B r=A rR r -1RA -1And C r=A rR r -1Be 3 * 3 matrix, Δ t r=t-t rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t rFor reference viewing angle

Camera coordinates system is with respect to the translation vector of world coordinate system; A rCamera inner parameter matrix for reference viewing angle; b 3And c 3It is respectively matrix B rAnd C rThe third line vector.For parallel camera system, square being directly proportional of described current depth value with the change in depth value.Pixel-shift vector in the described reference-view satisfies the polar curve equation of constraint of target visual angle and reference viewing angle:

Δ P r T ( C r Δ t r × B r ) P = 0 .

The invention has the beneficial effects as follows that the deep search of adaptable search step-length pixel can not occur and leak search and repeat search, image block synthetic in the estimation of Depth is little with the absolute difference of reference image block, and mistake is estimated to lack, and calculated amount or deep search number of times are few.

Description of drawings

Fig. 1 is provided with synoptic diagram for the coordinate system in the multi-angle video system;

Fig. 2 is based on the synthetic synoptic diagram of the view of the degree of depth;

The view of the initial time of the video sequence of the 7th camera in Fig. 3 (a) Uli cycle tests;

The view of the initial time of the video sequence of the 7th camera in Fig. 3 (b) Uli cycle tests;

Fig. 3 (c) is the partial schematic diagram of Fig. 2 (a), and 16 signal zones that show are the image-region of pixel [527,430] to [590,493];

Fig. 4 is the synoptic diagram that concerns of change in depth value and depth value square;

Fig. 5 is the synoptic diagram of change in depth value of the present invention and pixel-shift vector;

Fig. 6 is that depth value pixel hour is leaked the synoptic diagram of search;

Fig. 7 is the synoptic diagram of the pixel repeat search of depth value when big;

Fig. 8 adjusts the synoptic diagram of deep search step-length for self-adaptation of the present invention;

Fig. 9 searches the distribution schematic diagram of pixel for adopting the adaptive elongated step-size in search of the present invention;

Figure 10 adopts the deep search performance synoptic diagram of fixing search step-length and adaptive step of the present invention.

Embodiment

The self-adaptation that the present invention proposes a kind of deep search step-length is determined method, utilize camera internal and external parameter and perspective projection relation, at first derive the relation between the pixel-shift amount of subpoint in synthetic view that pixel depth value, change in depth value and change in depth cause, according to the relational expression between change in depth value of deriving and the respective pixel side-play amount, depth range search determined to be converted into determining of pixel hunting zone, the pixel-shift amount has meaning directly perceived in image, rationally determine easily; And relation according to pixel-shift amount and depth value, be that depth value is big more, the pixel-shift amount that identical change in depth value causes is just more little, dynamically adjust step-size in search, make each step-size in search correspondence identical pixel search precision, avoid the pixel repeat search or leaked search, thereby improved search efficiency and performance.In addition, the invention allows for a kind of simple and effective initial depth method of estimation, this method is by finding the solution the convergent point that converges camera optical axis in the camera system, and this point is regarded as scene epitome point, thereby obtains that of scene depth is general to be estimated.

Usually need the coordinate system of three types to describe scene and picture position information thereof in multi-angle video, they are respectively world coordinate system, camera coordinates system and the pixel coordinate system at scene place, as shown in Figure 1.Wherein, camera coordinates system is that initial point, optical axis are the z axle with the camera center, and the xy plane is parallel with the plane of delineation; Pixel coordinate system is a true origin with the image upper left corner then, and level and vertical coordinate are u, v.

If camera c i(i=1 ..., camera coordinates m) is o i-x iy iz iWith respect to the position of world coordinate system o-xyz three-dimensional rotation matrix R iWith translation vector t iExpression, wherein m is the camera number.Any vectorial p=[x of coordinate under world coordinate system in the scene, y, z] expression is o in camera coordinates i-x iy iz iIn the vectorial p of coordinate i=[x i, y i, z i] expression, then according to space geometry and coordinate transform, following relation is arranged:

p=R iP i+t i????(1)

According to computer vision perspective projection principle, the coordinate p under the camera coordinates system iWith its homogeneous pixel coordinate P at the plane of delineation i=[u i, v i, 1] satisfy following the relation:

z ip i=A ip i????(2)

Wherein, A iCamera c for reference viewing angle iThe inner parameter matrix mainly comprises parameters such as camera focus, center and deformation coefficient.

The present invention carries out deep search based on piece coupling at deep space, promptly utilize camera internal and external parameter and synthetic based on the view of the degree of depth, in depth range search, make the depth value of the error minimum between the block of pixels of block of pixels and corresponding actual reference-view of synthetic view with step-size in search search, and the estimation of Depth value of this depth value as the pixel of target view.Target view and target visual angle are meant current image and the corresponding visual angle that needs estimating depth, and reference-view and reference viewing angle are meant other images and the visual angle in the multi-angle video system.Reference-view and reference viewing angle can be selected in the deep search process or be specified by the user automatically.

The depth value of pixel is given regularly in view, can in the scene space, obtain a spatial point according to the camera internal and external parameter to this pixel back projection (backproject), again the plane of delineation of this spatial point projection again (re project) to required view directions, obtain the synthetic view at this visual angle, Here it is based on the view synthetic technology of the degree of depth, as shown in Figure 2.

Consider the situation of two views, establish view 1 and be target view that view 2 is a reference-view.Pixel P in the view 1 1At its camera c 1Depth value under the coordinate system is z 1, this corresponding pixel points in view 2 is P 2, at its camera c 2Depth value under the coordinate system is z 2, can derive according to formula (1) (2) obtains

z 1 R 1 A 1 - 1 P 1 + t 1 = z 2 R 2 A 2 - 1 P 2 + t 2 - - - ( 3 )

Obtain by formula (3):

A 2 R 2 - 1 ( z 1 R 1 A 1 - 1 P 1 + t 1 - t 2 ) = z 2 P 2 - - - ( 4 )

Note is described for convenient:

C = A 2 R 2 - 1 , B = A 2 R 2 - 1 R 1 A 1 - 1 = CR 1 A 1 - 1 , t = t 1 - t 2 - - - ( 5 )

Then (4) formula becomes:

z 1BP 1+Ct=z 2P 2????????????????(6)

B wherein, C is a three-dimensional matrice, t is two translation vectors between the camera.Because P 1, P 2Be homogeneous coordinates, but the z in the cancellation (6) 2, obtain pixel P 1Pixel homogeneous coordinates in view 2 are:

P 2 = z 2 P 2 z 2 = z 1 BP 1 + Ct z 1 b 3 T P 1 + c 3 T t = ^ f 2 ( z 1 , P 1 ) - - - ( 7 )

B wherein 3And c 3It is respectively the third line vector of matrix B and C;

Can draw by formula (9): at camera c 1With c 2Under the known situation of internal and external parameter, the pixel point value of view 2 is about the pixel point value in the view 1 and the function of depth value thereof.Utilizing formula (7) to carry out the view of view 1 in reference viewing angle 2 synthesizes.

Pixel P in the view 1 1, under given depth z, obtain it at camera c by back projection and re-projection 2The visual angle in the pixel P of synthetic view 2 2, P 2 = ^ f 2 ( z , P 1 ) , According to the hypothesis commonly used in the computer vision, the corresponding pixel points of Same Scene point in the view of different visual angles has identical YC value, then the pixel P of view 1 under depth value z 1Pixel P in the synthetic view 2 at visual angle 2 2The YC value be:

Synthesized_I 2(P 2)=Synthesized_I 2(f 2(z,P 1))=I 1(P 1)???????(8)

I 1Be view 1, I 2Be view 2, Synthesized_I 2Be the synthetic view 2 of view 1 in reference camera visual angle 2.Above-mentioned explanation is that the camera system of forming with two cameras is an example, can further draw the camera system of being made up of m camera equally and go for above-mentioned principle.

Suppose the pixel P in the local window W that with pixel P is the center jHave identical scene depth value, then be at the synthetic view 2 of window W internal view 1 and the absolute difference of the camera reference-view that 2 actual photographed obtain at the visual angle 2:

SAD ( z , P ) = Σ P j ∈ W | | Synthesise d _ I 2 ( f ( z , P j ) - I 2 ( f ( z , P j ) ) | |

= Σ P j ∈ W | | I 1 ( P j ) - I 2 ( f ( z , P j ) ) | | - - - ( 9 )

Because synthetic view 2 is to utilize the camera parameter of reference-view 2 correspondences to calculate, so the synthetic view 2 under the real scene depth value has identical YC value with reference-view 2 in theory.Therefore, view 1 is found the solution at the depth value of pixel P and can be converted into following problem:

min z ∈ { depth range } SAD ( z , P ) - - - ( 10 )

Promptly in given depth hunting zone (depth range), making the depth z of absolute difference minimum of synthetic view and reference-view as final estimation of Depth value.

Thisly directly carry out the method for deep search at deep space, do not need the parallax coupling, image correction process is directly finished in the deep search process, and depth value is successive value, and its precision is subjected to the restriction of image pixel resolution unlike disparity vector.

Know that from formula (7) under the known situation of camera internal and external parameter, the pixel of synthetic view 2 is the functions about pixel in the view 1 and depth value thereof.If the depth value changes delta z of the pixel P1 correspondence in the view 1, then its pixel coordinate in synthetic view 2 is:

P 2 ′ = ( z 1 + Δz ) BP 1 + Ct ( z 1 + Δz ) b 3 T P 1 + c 3 T t - - - ( 11 )

So, the pixel P in the view 1 1Depth value changes delta z cause that its pixel-shift vector in synthetic view 2 is:

ΔP = P 2 - P 2 ′ = Δu Δv 0 = z 1 BP 1 + Ct z 1 b 3 T P 1 + c 3 T t - ( z 1 + Δz ) BP 1 + Ct ( z 1 + Δz ) b 3 T P 1 + c 3 T t - - - ( 12 )

The pass that can be derived the depth value changes delta z of the pixel in the view 1 and the respective pixel point offset vector Δ P in the synthetic view 2 by formula (12) is:

Δz ( b 3 T P 1 Ct - c 3 T tB P 1 - ( z 1 b 3 T P 1 + c 3 T t ) b 3 T P 1 ΔP ) = ( z 1 b 3 T P 1 + c 3 T t ) 2 ΔP - - - ( 13 )

Use Δ P TThe both sides of premultiplication (13) obtain:

Δz = ( z 1 b 3 T P 1 + c 3 T t ) 2 | | ΔP | | 2 ΔP T ( b 3 T P 1 Ct - c 3 T t BP 1 ) - ( z 1 b 3 T P 1 + c 3 T t ) ( b 3 T P 1 ) | | ΔP | | 2 - - - ( 14 )

Wherein, ‖ Δ P ‖ 2=Δ P TΔ P be pixel-shift vector Δ P mould square.So, when camera parameter is known, can try to achieve at the pairing change in depth value of the depth z 1 pixel-shift vector Δ P of place Δ z by (14).

In addition, by formula (6) can obtain two width of cloth views corresponding pixel points this satisfy following polar curve equation of constraint:

P 2 T(Ct×B)P 1=0?????????????????(15)

P 2T(Ct×B)P 1=0???????????????(16)

Wherein * be the multiplication cross of vector.So formula (15) deducts formula (16) and obtains pixel-shift vector Δ P and also should satisfy the polar curve equation of constraint:

ΔP T(Ct×B)P 1=0????????????????(17)

Given camera parameter and pixel P 1Situation under, formula (17) is about two the component Δ u of pixel-shift vector Δ P and the homogeneous linear equations of Δ v.

For parallel camera system, parallax d and its degree of depth of Same Scene o'clock in two width of cloth views is inversely proportional to, promptly

d = fB z - - - ( 18 )

Wherein d and z are respectively the parallax and the degree of depth, and f and B are respectively the focal length and the base length of camera.The pixel P in the view 1 then 1Depth value by z 1Change to z 2The time, the pixel-shift amount of its corresponding subpoint in synthetic view 2 is

| | ΔP | | = | d 1 - d 2 | = fB | z 1 - z 2 | z 1 z 2 ≈ fB | Δz | z 1 2 - - - ( 19 )

Know that according to (19) the change in depth value is directly proportional with the pixel-shift amount, with square being inversely proportional to of depth value.For identical pixel-shift amount, when residing depth value is big more, corresponding change in depth value is just big more, and when residing depth value is more little, corresponding change in depth value is just more little.For converging camera system,, from formula (12) also as can be seen, between change in depth value, pixel-shift amount and the depth value approximate relation is arranged when the angle of two cameras when not being very big.

In order to verify this conclusion, we with as shown in Figure 3 Uli cycle tests (these these multi-angle video data are provided by the Heinrich-Hertz-Institut (HHI) of Germany, can be from https: //www.3dtv-research.org/3dav_CfP_FhG_HHI/https: //www.3dtv-research.org/3dav_CfP_FhG_HHI/ downloads and obtains.This video sequence adopts by 8 video cameras shootings of arranging with the ethod of remittance and obtains, video format is 1024x768,25fps) this paper adopts the view of initial time of the video sequence of the 7th and the 8th camera) the parameter of the 7th and the 8th camera, the pixel P=[526 in view 7 (as Fig. 3 (a)) according to formula (14) and (17) calculating, 429] locate (corresponding to the clasp on shirt collar the right), the relation between change in depth value, the depth value quadratic sum pixel-shift amount.The given unit picture element offset vector that satisfies polar curve constraint (17), be ‖ Δ P ‖=1, calculate the change in depth value of different depth value correspondences according to (14), the relation between them as shown in Figure 4, wherein horizontal ordinate be depth value square, ordinate is the change in depth value.Fig. 4 shows, pixel-shift amount in synthetic view is given regularly, change in depth value and depth value square be approximated to linear relationship, this means that at different depth value place the change in depth value of the same amount of pixel causes different pixel-shift amounts in the view 1 in synthetic view.

It should be noted that because (17) are the homogeneous linear equations about pixel-shift vector Δ P, so there are rightabout each other two kinds of situation Δ P in Δ P +With Δ P -, can try to achieve one positive one two negative change in depth value Δ z to their substitutions (14) +With Δ z -, i.e. Δ P +With Δ P -Correspond respectively to depth value increase and depth value and reduce caused pixel-shift vector.Know that by preceding surface analysis the pixel-shift amount is given regularly, change in depth value and depth value square be approximated to proportional relation, so the change in depth value of two pixel-shifts vector Δ P correspondences of the identical direction that varies in size is also inequality, i.e. depth value decrease | Δ z -| less than the depth value increase | Δ z +|, as shown in Figure 5.For example, get the pixel P=[526 in the Uli view 7 (shown in Fig. 3 (a)), 429], depth value 3172mm, the pixel-shift amount is 64 pixels, and promptly ‖ Δ P ‖=64 are tried to achieve two corresponding change in depth values of rightabout pixel-shift vector according to (14) (17) and are respectively Δ z +=930 and Δ z -=-593.

Know according to above analysis, under the situation of same pixel side-play amount, change in depth value and depth value square be approximated to proportional relation.So when using the fixing search step-length, as if the side-play amount of the step-size in search given at less depth value place corresponding to 1 pixel, then at big depth value place, the pixel-shift amount of this step-size in search correspondence will be less than 1 pixel.When supposing to project to non-integer pixel under given depth value, the pixel of getting arest neighbors is as subpoint, then will search same pixel at a plurality of different depth value places during deep search, repeat search promptly occurred.Conversely, if given step-size in search is the side-play amount corresponding to 1 pixel at big depth value place, then will be greater than 1 pixel in the pixel-shift amount of this step-size in search correspondence of less depth value place, be that adjacent two depth values will search two non-adjacent pixels, thereby make some pixel omission, the generation search is incomplete.So, be desirably in hunting zone [z originally Min, z Max] in N pixel of search, but owing to produced the pixel repeat search or leak to have searched for, actual search to the efficient search point that has to be less than N.

For example, we are to the pixel P=[526 in the Uli view 7,429], in the scope of [2000,4500], carry out deep search with the fixed step size of 10mm.As shown in Figure 6, when depth value hour, the pixel u coordinate that for example searches at 2090 places is 661, and the u coordinate of the pixel that searches at depth value 2080 places is 663, the centre has pixel to skip and does not have searched arriving; And as shown in Figure 7, when depth value was big, for example to have searched the u coordinate be 437 same pixel two different depth values 4450 and 4460 places, and promptly pixel has carried out repeat search.Since the step-size in search of 10mm in real depth is worth 3170 subranges corresponding to the search precision of 1 pixel, so we were desirably in [2000 originally, 4500] search 250 different pixels in the scope, but leak search and repeat search because pixel has taken place, actual computation finds only to have searched for 200 pixels

In order to make in the deep search process, step-size in search is corresponding to pixel search precision identical in the reference-view, be that step-size in search is all the time corresponding to the side-play amount of fixing a pixel in the reference-view, must dynamically adjust step-size in search according to the relation between change in depth value and the depth value, and determine corresponding hunting zone.Suppose the pixel P in the view 1 1The initial search depth value be z 0, then can try to achieve in depth z easily according to formula (14) 0Down, the change in depth value Δ z in the view 1 of the pixel-shift amount Δ P correspondence in the reference-view 2.As initial depth value z 0Differ under the situation that is not very big pixel P with the real depth value 1True corresponding pixel points and depth z in reference-view 2 0Pixel-shift amount between the pixel of trying to achieve is confined in a certain scope usually down.Be given in below in the N of pixel hunting zone, how determine step-size in search, make the side-play amount of the corresponding fixing all the time pixel of step-size in search according to the depth value self-adaptation.

Given pixel P 1And camera parameter, according to the polar curve equation of constraint (16) of pixel-shift vector, be easy to find the solution two rightabout each other offset vector Δ P that obtain pixel-shift amount ‖ Δ P ‖ correspondence +With Δ P -, calculate two corresponding change in depth value Δ z according to (14) then +With Δ z -, they are diminished as next step depth value and become the step-size in search of general orientation, as shown in Figure 8,

z -1=z 0+Δz -1???????????(20)

Then, at depth value Δ z -1With offset vector Δ P -Utilize (14) to calculate corresponding change in depth value Δ z down, -2, at depth value z 1With offset vector Δ P +Utilize (14) to calculate corresponding change in depth value Δ z down, + 2, and them respectively as next step step-size in search, promptly

z -2=z -1+Δz -2

z 2=z 1+Δz 2??????????????(21)

By that analogy, can obtain the search depth and the step-length in n step is:

z -n=z -(n-1)+Δz -n

z n=z (n-1)+Δz n??????????(22)

Wherein, search step number n determines that according to hunting zone N and search precision promptly n satisfies n Δ≤N.

So determined hunting zone and initial depth value z 0After, utilize the elongated step-size in search that above method just can self-adaptation be adjusted in the hope of changing along with depth value, make to keep identical pixel search precision in the deep search process, overcome pixel repeat search in the fixing search step-length or leaked the defective of search.Because depth range search obtains by adding up of step-size in search, thereby also adjusts adaptively along with the variation of depth value, when depth value became big, the depth range search of identical pixel-shift amount ‖ Δ P ‖ correspondence correspondingly became big; When depth value diminished, the depth range search of identical pixel-shift amount ‖ Δ P ‖ correspondence also correspondingly diminished.In addition, we can also pass through pixel precision Δ controlling depth search precision easily, and as the search precision of Δ=1 corresponding to unit picture element, and Δ=1/2 is corresponding to the search precision of half-pix.

So, relation between depth value variation and the pixel-shift vector has been arranged, suc as formula (14), just can determine corresponding deep search step-length, definite determining of corresponding pixel-shift amount that also be converted into of depth range search by the method for determining the pixel search precision.Pixel-shift amount and search precision determine to be similar to determining of hunting zone and precision in the disparity estimation, have meaning directly perceived, determine easily, and can be according to picture material or application demand, by adjusting pixel-shift amount and search precision, dynamically determine corresponding depth range search and step-length.

In the estimation of Depth process, need a given degree of depth initial value z 0, the value quality of this initial value affects deep search performance and effect.Work as z 0With the deviation of real depth value hour, can use less pixel-shift amount is that the hunting zone can be smaller, thereby reduces the too high search speed of volumes of searches; Work as z 0When big, then will use relatively large pixel-shift amount with the deviation of real depth value, guaranteeing the searching real depth value, thereby calculated amount is bigger.Though the degree of depth initial value of difference can be by setting large-scale hunting zone and high-precision step-size in search improves search performance, but good degree of depth initial value can determine among a small circle the hunting zone and suitable step-length, thereby improve the efficient and the performance of deep search.So, the estimation of degree of depth initial value and definite also extremely important in the estimation of Depth process.

The definite of the initial depth value of video sequence image can be divided into two kinds of situations, the image of initial time and successive image.The definite of the degree of depth initial value of initial time image is divided into two kinds again, i.e. first pixel and other pixel.For first pixel,,, need to consider how from information such as characteristics of image and camera parameter, to obtain the big probable value of scene depth this moment as initial value therefore without any known scene depth information owing to also any pixel was not carried out deep search; For follow-up other pixels, then can determine its initial depth according to the estimation of Depth value of neighborhood pixels point in the image.For follow-up image, because the depth value of the video sequence image at same visual angle has very strong correlativity, the degree of depth of actionless background area remains unchanged, and have only the degree of depth of the moving region of minority to change, so can be the depth value of the same pixel position of previous moment image as initial value.So in the determining of initial depth value, key is to obtain the scene depth information of initial time image, for first pixel provides degree of depth initial value preferably.

In the multi-angle video, the difference between the image of different views or the positional information of camera are comprising the information of relevant scene depth usually.At converging two kinds of situations of camera system and parallel camera system, be given under the situation without any known depth information below, carry out the initial estimation of scene depth according to camera parameter or image information.

The main target of multi-angle video is the information in a plurality of angle shot Same Scene, place so camera is circular arc usually, and the camera light axle converges at a bit, i.e. collecting system.In the practical application, though camera may not strictly converge at a bit,

But always can find a point nearest with each camera optical axis distance, this point is considered to convergent point.Convergent point all is the position at scene place usually, can think an epitome point of scene, so the position by trying to achieve convergent point just can be in the hope of a big probable value of scene depth, and this value as the initial value in the deep search.

If the coordinate of convergent point in world coordinate system is Mc=[x c, y c, z c], this point is positioned on the optical axis of each camera, so this point can be expressed as in the camera coordinates system that with the optical axis is the z axle:

M 1=[0,0,z r1]

M 2=[Q,0,z r2]

……????????????????????????(23)

M m=[0,0,z rm]

Z wherein RiBe that convergent point is at camera c iThe degree of depth in the coordinate system.According to the relation of world coordinates and camera coordinates, following formula is arranged:

M c=R 1M 1+t 1

M c=R 2M 2+t 2

……???????????????????????(24)

M c=R mM m+t m

Cancellation Mc obtains:

R 1[0,0,z r1]+t 1=R 2[0,0,z r2]+t 2

R 1[0,0,z r1]+t 1=R 3[0,0,z r3]+t 3

……??????????????????????(25)

R 1[0,0,z r1]+t 1=R m[0,0,z rm]+t m

Formula (25) is about depth z R1, z R2..., z RmThe individual linear equation of 3 (m-1).T wherein 1T mBeing respectively camera coordinates is C 1C mWith respect to world coordinate system M cThe translation vector of position, R 1R mBeing respectively camera coordinates is C 1C mWith respect to world coordinate system M cThe three-dimensional rotation matrix of position, m are the camera number.With linear least square solving equation group (25), can obtain the depth value z of convergent point in each camera coordinates system R1, z R2..., z Rm, they are big probable values of scene depth, can be used as the degree of depth initial value in the deep search.

Do not have convergent point in the parallel camera system, can not ask depth information, but parallax and the degree of depth there were simple inverse relation (18) this moment, so can obtain depth information by the method for calculating the global disparity between two width of cloth views in order to last method.

Global disparity may be defined as the pixel-shift amount of the absolute difference minimum that makes two views, promptly tries to achieve by the following method:

g x = min x [ Σ i , j ∈ R | | I 1 ( i , j ) - I 2 ( i + x , j ) | | R ] - - - ( 26 )

Wherein, R is the number of pixels of looking the overlapping region of Fig. 1 and 2.Since less demanding to the estimated accuracy of global disparity, so the search unit of pixel-shift amount x can establish more greatly in the formula (26), such as 8 pixels or 16 pixels, thereby can significantly reduce calculated amount.After trying to achieve global disparity,, can try to achieve degree of depth initial value according to the relational expression (18) that the degree of depth and parallax are inversely proportional to.

A scene point utilizing Uli video sequence parameter document to provide: the real world coordinates [35.07 of the high brightness point on the glasses left side, 433.93,-1189.78] (mm of unit), and can obtain coordinate and the real depth information of this scene point under camera coordinates system according to the relational expression (1) of world coordinates and camera coordinates; Utilize the above-mentioned method of asking two camera convergent points again, promptly obtain depth value under the coordinate system of convergent point in camera 7 and camera 8 by finding the solution system of linear equations (26), result of calculation is as shown in table 1.Judge that according to human eye observation the depth of field of Uli scene changes little, and the real depth information of degree of depth initial estimate and scene point is more or less the same in the table 1, has illustrated that degree of depth initial estimate is comparatively effective and reasonable, for estimation of Depth provides good initial value.

(coordinate unit mm) Depth value under camera 7 coordinate systems Depth value under camera 8 coordinate systems The real depth value 3076.2 3163.7 Degree of depth initial estimate 2871.2 2955.7

Table 1

Uli view shown in Fig. 3 (c) is from the 64x64 image-region of pixel [527,430] to [590,493].To in this zone every the pixel of 15 pixels, totally 16 pixels carry out deep search respectively with fixed step size and adaptive step.Carry out adopting for three times the search of fixing search step-length in fixing search scope [2000,5000], step-length is respectively 20,35, and 50.In the determining of adaptable search step-length, initial depth is 2817, and the pixel-shift amount is made as 32 pixels, and search precision is 1 pixel, and the degree of depth initial value of later pixel point is made as the estimation of Depth value of neighborhood pixels point.Definite method of adaptable search step-length according to the present invention, can obtain pixel [527,430] in the hunting zone of departing from 32 pixels of initial ranging pixel, the pairing step-size in search of the search precision of per unit pixel, as shown in table 2, search pixel as shown in Figure 9 by these step-size in searchs.Table 2 explanation, the step-length that reduces direction along depth value is a negative value, and along with the reducing of the increase of pixel-shift amount, depth value, the absolute value of step-length reduces gradually; And along the step-length of depth value augment direction be on the occasion of, and along with the increase of pixel-shift amount, the increase of depth value, the absolute value of step-length increases gradually.Fig. 9 shows, when carrying out deep search with the elongated step-size in search of table 2, corresponding pixel search precision is guaranteed to hold constant, is 1 pixel all the time.

The pixel-shift amount Step-length (depth value augment direction) Step-length (depth value reduces direction) The pixel-shift amount Step-length (depth value augment direction) Step-length (depth value reduces direction) ??1 ??11.4877 ??-11.2503 ??17 ??12.8909 ??-10.1000 ??2 ??11.5686 ??-11.1728 ??18 ??12.9870 ??-10.0340 ??3 ??11.6502 ??-11.0960 ??19 ??13.0842 ??-9.9687 ??4 ??11.7328 ??-11.0201 ??20 ??13.1824 ??-9.9041 ??5 ??11.8162 ??-10.9450 ??21 ??13.2818 ??-9.8400 ??6 ??11.9005 ??-10.8706 ??22 ??13.3823 ??-9.7766 ??7 ??11.9858 ??-10.7969 ??23 ??13.4840 ??-9.7138 ??8 ??12.0719 ??-10.7240 ??24 ??13.5868 ??-9.6515 ??9 ??12.1590 ??-10.6519 ??25 ??13.6908 ??-9.5899 ??10 ??12.2470 ??-10.5805 ??26 ??13.7961 ??-9.5289 ??11 ??12.3360 ??-10.5098 ??27 ??13.9025 ??-9.4684 ??12 ??12.4260 ??-10.4397 ??28 ??14.0101 ??-9.4086 ??13 ??12.5169 ??-10.3704 ??29 ??14.1191 ??-9.3493 ??14 ??12.6089 ??-10.3018 ??30 ??14.2293 ??-9.2905 ??15 ??12.7018 ??-10.2339 ??31 ??14.3407 ??-9.2323 ??16 ??12.7958 ??-10.1666 ??32 ??14.4535 ??-9.1746

Table 2

Utilization is carried out estimation of Depth based on the method for piece coupling, and deep search result is shown in Figure 10, synthetic piece under the depth value that each some expression search obtains among the figure and the absolute difference between the actual block, and the more little common representative estimation of Depth value of this value is accurate more.When adopting the fixing search step-length, because the more little expression search accuracy of step-length is high more, so the effect of estimation of Depth is good more, the absolute difference that obtains under the depth value that search obtains as step-length 20mm is less than the absolute difference of step-length 35mm, and the absolute difference of 35mm is less than 50mm.But it is best that adaptive step-size in search is searched for the depth value result who obtains down, the absolute difference minimum that it is corresponding.

Fig. 3 (c) is 16 pixels that pixel [527,430] carries out estimation of Depth to the image-region of [590,493] in the view 2 (a), adopts the adaptable search step-length respectively, fixed step size 20,35, and 50 search for.Table 3 shows, when adopting the adaptable search step-length, 16 pixels have all searched correct depth value, and has wrong estimation of Depth when adopting fixed step size.This is because these several pixels are in the zone that texture lacks, and in large-scale fixing search scope, the absolute difference smallest point that search obtains does not also correspond to correct pixel.And when adopting adaptive step-size in search, because initial value determines that according to neighbor information the pixel-shift amount can be established lessly, promptly searches in less relatively subrange, reduce the probability that searches erroneous pixel point, and guaranteed certain degree of depth slickness.Table 3 has been listed depth estimation result, deep search number of times and the wrong estimation number when using adaptable search step-length and fixing search step-length, and the data that frame is arranged in the table are misdata.Searching times is few and do not have wrong to estimate during table 3 presentation of results adaptable search step length searching, and searching times is many and also exist and wrongly estimate during the fixing search step length searching.For example need only 64 depth values of search in the self-adaptation deep search of 32 pixel-shift amounts, and the fixing search step-length of 20mm needs to search for 150 depth values in the hunting zone of [2000,5000].

Table 3

Result by table 3 and Figure 10, reach a conclusion: be higher than the fixing search step-length on the deep search performance of adaptable search step-length, the synthetic image block of the depth value of promptly utilize estimating is little with the absolute difference of reference image block, and mistake is estimated to lack, and calculated amount or deep search number of times are few.

Claims (17)

1, multi-vision angle video image deep search method is characterized in that, the step-size in search in each step is dynamically adjusted according to current depth value in depth range search, and current depth value is more little, and the step-size in search of employing is more little; Current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision; The length of pixel-shift vector each time during described pixel search precision equals to search for, described pixel search precision is for dividing pixel precision or whole pixel precision; Described step-size in search equals in the search the pairing change in depth value of pixel-shift vector each time.
2, multi-vision angle video image deep search method according to claim 1, it is characterized in that, according to the relation of change in depth value and pixel-shift vector, described depth range search and step-size in search determined to be converted into determining of pixel hunting zone and pixel search precision.
3, as multi-vision angle video image deep search method as described in the claim 2, it is characterized in that, described step-size in search is determined by current depth value, pixel-shift vector sum camera internal and external parameter according to computer vision perspective projection principle with based on the view composition principle of the degree of depth.
4, as multi-vision angle video image deep search method as described in the claim 3, it is characterized in that step-size in search is obtained by following formula:
Δz = ( zb 3 T P + c 3 T Δt r ) 2 | | ΔP r | | 2 ΔP r T ( b 3 T PC r Δt r - c 3 T Δt r B r P ) - ( zb 3 T P + c 3 T Δt r ) ( b 3 T P ) | | ΔP r | | 2
Wherein: P is a pixel for the treatment of estimation of Depth in the target view, z is the current depth value of pixel P, Δ z is that the change in depth value of pixel P is a step-size in search, and Δ Pr is the pixel-shift vector of the change in depth value Δ z correspondence in reference-view r of pixel P in the target view, || Δ P r|| 2=Δ P r TΔ P r; Br=A rR r -1RA -1And C r=A rR r -1Be 3 * 3 matrix, Δ t r=t-t rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; Rr is the three-dimensional rotation matrix of the camera coordinates system of reference viewing angle with respect to world coordinate system; t rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A rCamera inner parameter matrix for reference viewing angle; b 3And c 3It is respectively matrix B rAnd C rThe third line vector.
5, as multi-vision angle video image deep search method as described in the claim 4, it is characterized in that the pixel-shift vector in the described reference-view satisfies the polar curve equation of constraint of target visual angle and reference viewing angle: ΔP r T ( C r Δt r × B r ) P = 0 .
6, as multi-vision angle video image deep search method as described in the claim 5, it is characterized in that, exist two rightabout each other described pixel-shift vectors to satisfy described polar curve equation of constraint, corresponding respectively depth value augment direction of described 2 pixel-shift vectors and depth value reduce direction; The pairing change in depth value of the offset vector of depth value augment direction reduces the pairing change in depth value of offset vector of direction greater than depth value.
7, as multi-vision angle video image deep search method as described in the claim 4, it is characterized in that, in parallel camera system, square being directly proportional of described change in depth value and current depth value.
8, the depth estimation method of multi-vision angle video image, it is characterized in that, utilizing based on the view of the degree of depth in the deep search synthetic and that mate based on piece, the depth range search of target view and step-size in search are by the pixel hunting zone and the decision of pixel search precision of reference-view; In depth range search, the step-size in search in each step is dynamically adjusted according to current depth value, and current depth value is more little, the step-size in search that adopts is more little, current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision; Described view based on degree of depth deep search synthetic and that mate based on piece is specially, and utilize current depth value to carry out view and synthesize, and the error between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view; Adopting the depth value of least error correspondence is the estimation of Depth value of target view; The length of pixel-shift vector each time during described pixel search precision equals to search for, described step-size in search equal in the search the pairing change in depth value of pixel-shift vector each time.
9, as the depth estimation method of multi-vision angle video image as described in the claim 8, it is characterized in that, may further comprise the steps:
Deep search initial value z in the step 1 estimating target view K=0
Step 2 is determined deep search corresponding to pixel hunting zone in the reference-view and pixel search precision, obtains pixel-shift vector Δ P in the reference-view according to the pixel search precision r
Step 3 is according to current depth value z kWith pixel-shift vector Δ P r, obtain corresponding change in depth value Δ z k, described change in depth value Δ z kBe next step step-size in search;
Step 4 is utilized current depth value z kIt is synthetic to carry out view, and the error e between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view k
Step 4 is upgraded current depth value z k=z k+ Δ z kK=k+1;
Step 5 judges whether to surpass given pixel hunting zone, enters step 6 in this way, as not, enters step 3;
Step 6 is with error e kThe depth value of middle least error correspondence is an estimated value, k=0 ..., N-1, N is the search total step number.
10, as the depth estimation method of multi-vision angle video image as described in the claim 9, it is characterized in that described error e kBe the absolute difference or the difference of two squares between the block of pixels of the block of pixels of synthetic view and reference-view.
11, as the depth estimation method of multi-vision angle video image as described in the claim 9, it is characterized in that, for converging camera system
System, in the described step 1, with the convergent point place degree of depth that converges camera system as the deep search initial value z in the target view 0
12, as the depth estimation method of multi-vision angle video image as described in the claim 11, it is characterized in that the described convergent point that converges camera system obtains by following Solving Linear:
R[0,0,z 0]+t=R 1[0,0,z r1]+t 1
R[0,0,z 0]+t=R 2[0,0,z r2]+t 2
……
R[0,0,z 0]+t=R m[0,0,z rm]+t m
Z wherein 0Be the depth value of convergent point in the camera coordinates system of target view, t is the translation vector of the camera coordinates system of target view with respect to the world coordinate system position, and R is the three-dimensional rotation matrix of the camera coordinates system of target view with respect to the world coordinate system position, z RiI=1 wherein ..., m is the depth value of convergent point in the camera coordinates system of reference-view i, t i, i=1 wherein ..., m is the translation vector of the camera coordinates of reference-view i system for the world coordinate system position, R i, i=1 wherein ..., m is the three-dimensional rotation matrix of the camera coordinates of reference-view i system for the world coordinate system position, m is the number of reference-view.
13, as the depth estimation method of multi-vision angle video image as described in the claim 9, it is characterized in that, for parallel camera system, in the described step 1, deep search initial value z 0The relation that is inversely proportional to by the global disparity and the degree of depth obtains: z 0 = fB d ; Wherein, z 0Be degree of depth initial value, d is a global disparity, and f is the focal length of camera, and B is the base length of camera.
As the depth estimation method of multi-vision angle video image as described in the claim 13, it is characterized in that 14, described global disparity is the pixel-shift vector of the absolute difference minimum of reference-view after the translation and target view.
15, as the depth estimation method of multi-vision angle video image as described in the claim 9, it is characterized in that change in depth value Δ z kObtain by following formula:
Δ z k = ( zb 3 T P + c 3 T Δt r ) 2 | | ΔP r | | 2 ΔP r T ( b 3 T PC r Δt r - c 3 T Δt r B r P ) - ( zb 3 T P + c 3 T Δt r ) ( b 3 T P ) | | ΔP r | | 2
Wherein: P is a pixel for the treatment of estimation of Depth in the target view, and z is the current depth value of pixel P, Δ z kFor the change in depth value of pixel P is a step-size in search, Δ P rChange in depth value Δ z for pixel P in the target view kThe pixel-shift vector of correspondence in reference-view r, || Δ P r|| 2=Δ P r TΔ P r; B r=A rR r-1RA -1B r=A rR r -1RA -1And C r=A rR r -1C r=A rR r -1Be 3 * 3 matrix, Δ t r=t-t rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A rCamera inner parameter matrix for reference viewing angle; b 3And c 3It is respectively matrix B rAnd C rThe third line vector.
16, as the depth estimation method of multi-vision angle video image as described in the claim 15, it is characterized in that the pixel-shift vector Δ P in the described reference-view rSatisfy the polar curve equation of constraint of target visual angle and reference viewing angle:
ΔP r T ( C r Δt r × B r ) P = 0 .
17, as the depth estimation method of multi-vision angle video image as described in the claim 16, it is characterized in that, exist two rightabout each other described pixel-shift vectors to satisfy described polar curve equation of constraint, the corresponding respectively depth value augment direction of described 2 pixel-shift vectors, depth value reduce direction; The pairing change in depth value of the offset vector of depth value augment direction reduces the pairing change in depth value of offset vector of direction corresponding to two rightabout each other pixel-shift vectors greater than depth value.
CN200810300330A 2008-02-03 2008-02-03 Multi-visual angle video image depth detecting method and depth estimating method CN100592338C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810300330A CN100592338C (en) 2008-02-03 2008-02-03 Multi-visual angle video image depth detecting method and depth estimating method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200810300330A CN100592338C (en) 2008-02-03 2008-02-03 Multi-visual angle video image depth detecting method and depth estimating method
PCT/CN2008/072141 WO2009097714A1 (en) 2008-02-03 2008-08-26 Depth searching method and depth estimating method for multi-viewing angle video image

Publications (2)

Publication Number Publication Date
CN101231754A CN101231754A (en) 2008-07-30
CN100592338C true CN100592338C (en) 2010-02-24

Family

ID=39898199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810300330A CN100592338C (en) 2008-02-03 2008-02-03 Multi-visual angle video image depth detecting method and depth estimating method

Country Status (2)

Country Link
CN (1) CN100592338C (en)
WO (1) WO2009097714A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9524556B2 (en) 2014-05-20 2016-12-20 Nokia Technologies Oy Method, apparatus and computer program product for depth estimation

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100592338C (en) * 2008-02-03 2010-02-24 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method
JP5713624B2 (en) * 2009-11-12 2015-05-07 キヤノン株式会社 3D measurement method
CN101710423B (en) * 2009-12-07 2012-01-04 青岛海信网络科技股份有限公司 Matching search method for stereo image
KR101640404B1 (en) * 2010-09-20 2016-07-18 엘지전자 주식회사 Mobile terminal and operation control method thereof
JP6565188B2 (en) 2014-02-28 2019-08-28 株式会社リコー Parallax value deriving apparatus, device control system, moving body, robot, parallax value deriving method, and program
TWI528783B (en) * 2014-07-21 2016-04-01 由田新技股份有限公司 Methods and systems for generating depth images and related computer products

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0871144A2 (en) * 1997-04-11 1998-10-14 Nec Corporation Maximum flow method for stereo correspondence
CN1522542A (en) * 2001-07-06 2004-08-18 皇家菲利浦电子有限公司 Methods of and units for motion or depth estimation and image processing apparatus provided with such motion estimation unit
CN1851752A (en) * 2006-03-30 2006-10-25 东南大学 Dual video camera calibrating method for three-dimensional reconfiguration system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6384859B1 (en) * 1995-03-29 2002-05-07 Sanyo Electric Co., Ltd. Methods for creating an image for a three-dimensional display, for calculating depth information and for image processing using the depth information
US6606406B1 (en) * 2000-05-04 2003-08-12 Microsoft Corporation System and method for progressive stereo matching of digital images
CN100592338C (en) * 2008-02-03 2010-02-24 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0871144A2 (en) * 1997-04-11 1998-10-14 Nec Corporation Maximum flow method for stereo correspondence
CN1522542A (en) * 2001-07-06 2004-08-18 皇家菲利浦电子有限公司 Methods of and units for motion or depth estimation and image processing apparatus provided with such motion estimation unit
CN1851752A (en) * 2006-03-30 2006-10-25 东南大学 Dual video camera calibrating method for three-dimensional reconfiguration system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9524556B2 (en) 2014-05-20 2016-12-20 Nokia Technologies Oy Method, apparatus and computer program product for depth estimation

Also Published As

Publication number Publication date
CN101231754A (en) 2008-07-30
WO2009097714A1 (en) 2009-08-13

Similar Documents

Publication Publication Date Title
US9509980B2 (en) Real-time capturing and generating viewpoint images and videos with a monoscopic low power mobile device
US10080012B2 (en) Methods, systems, and computer-readable storage media for generating three-dimensional (3D) images of a scene
Perazzi et al. Panoramic video from unstructured camera arrays
US9600889B2 (en) Method and apparatus for performing depth estimation
US9300946B2 (en) System and method for generating a depth map and fusing images from a camera array
Kauff et al. Depth map creation and image-based rendering for advanced 3DTV services providing interoperability and scalability
Hornacek et al. SphereFlow: 6 DoF scene flow from RGB-D pairs
US9635348B2 (en) Methods, systems, and computer-readable storage media for selecting image capture positions to generate three-dimensional images
CN102027752B (en) For measuring the system and method for the potential eye fatigue of stereoscopic motion picture
RU2407220C2 (en) Method of coding and method of decoding of images, devices for them, program for them and information medium for storage of programs
JP5887267B2 (en) 3D image interpolation apparatus, 3D imaging apparatus, and 3D image interpolation method
Roy et al. A maximum-flow formulation of the n-camera stereo correspondence problem
US9609307B1 (en) Method of converting 2D video to 3D video using machine learning
US8810635B2 (en) Methods, systems, and computer-readable storage media for selecting image capture positions to generate three-dimensional images
US8045793B2 (en) Stereo matching system and stereo matching method using the same
Padua et al. Linear sequence-to-sequence alignment
JP4938861B2 (en) Complex adaptive 2D-to-3D video sequence conversion
US7321374B2 (en) Method and device for the generation of 3-D images
US8385628B2 (en) Image encoding and decoding method, apparatuses therefor, programs therefor, and storage media for storing the programs
US9344701B2 (en) Methods, systems, and computer-readable storage media for identifying a rough depth map in a scene and for determining a stereo-base distance for three-dimensional (3D) content creation
CN101400001B (en) Generation method and system for video frame depth chart
US9241147B2 (en) External depth map transformation method for conversion of two-dimensional images to stereoscopic images
US8629901B2 (en) System and method of revising depth of a 3D image pair
Roy Stereo without epipolar lines: A maximum-flow formulation
CN102223556B (en) Multi-view stereoscopic image parallax free correction method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100224

Termination date: 20160203