CN101231754A

CN101231754A - Multi-visual angle video image depth detecting method and depth estimating method

Info

Publication number: CN101231754A
Application number: CNA2008103003307A
Authority: CN
Inventors: 张小云; 乔治; L.杨
Original assignee: Sichuan Hongwei Technology Co Ltd
Current assignee: Sichuan Hongwei Technology Co Ltd
Priority date: 2008-02-03
Filing date: 2008-02-03
Publication date: 2008-07-30
Anticipated expiration: 2028-02-03
Also published as: WO2009097714A1; CN100592338C

Abstract

The invention relates to a multi-view video image processing technology, and provides an adaptive determination method for search length of a step and a depth estimation method based on the adaptive search length. The multi-view video image depth search method is that the search length of each step within the depth search coverage is dynamically adjusted according to the current depth value so that the search length of each step corresponds to the identical pixel search precision. The multi-view video image depth estimation method is that the search length of each step is dynamically adjusted according to the current depth value within the depth search coverage in the depth-based image synthesis and the depth search based on block matching. The technical proposal provided by the invention is applicable for multi-view video depth search and depth estimation. The depth search performance of the invention is higher than the fixed search length of the step. The absolute difference between the synthetic image blocks and the reference image blocks is small with small misestimate, low calculated amount or low frequency of depth search.

Description

Multi-vision angle video image deep search method and depth estimation method

Technical field

The present invention relates to the multi-vision angle video image treatment technology.

Background technology

In recent years, researchers recognize gradually, following advanced three-dimensional television and any (FVV of visual angle Video Applications system, Free Viewpoint Video System) should utilize computer vision, Video processing in and based on technology such as the scene of depth image are synthetic, obtaining and showing to be provided with and separate video, promptly watch the visual angle unrestricted mutually, thereby dirigibility, interactivity and the operability of height are provided with the camera orientation that obtains video.The stereotelevision project in Europe has adopted the data layout (" based on synthesizing of depth image; the three-dimensional television new method of compression and transmission " of video plus depth, stereo display and virtual reality system SPIE meeting, 2004.C.Fehn, " Depth-image-based rendering (DIBR); compression and transmi ssion for a new approach on 3D-TV; " in Proc.SPIE Conf.Stereoscopic Displays and Virtual Reality Systems XI, vol.5291, CA, U.S.A.Jan.2004, pp.93-104.), i.e. corresponding depth value of each pixel of image; Utilization is based on the view synthesizing method (DIBR:Depth Image Based Rendering) of depth image: the receiving end demoder is provided with and to watch the visual angle to generate stereo-picture right according to showing, watches the visual angle unrestricted mutually with the camera orientation that obtains video thereby make.In April, 2007 JVT meeting motion (" data layout of the multi-angle video plus depth of advanced 3 D video system "; A.Smolic and K.Mueller, et al. " Multi-View Video plus Depth (MVD) Format for Advanced 3DVideo Systems ", ISO/IEC JTC1/SC29/WG11, Doc.JVT W100, San Jose, USA April2007.) is generalized to multi-angle video to the video plus depth, has proposed the multi-view coded data layout MVD (Multi-view video plus depth) of the video man degree of depth.Because MVD can satisfy an essential demand of advanced 3 D video or any visual angle Video Applications, the view at continuous any visual angle that promptly can be in decoding end generates certain limit, rather than a limited number of discrete views, so the MVD scheme of video plus depth is adopted by JVT, is confirmed as developing direction from now on.

So how the depth information that obtains scene from two width of cloth or several views at different visual angles becomes one of multi-angle video major issue handling.

Present deep search mode is: adopt fixing search step-length (uniform depth-grid) to carry out deep search in the fixing search scope.When using the fixing search step-length, as if the side-play amount of the step-size in search given at less depth value place corresponding to 1 pixel, then at big depth value place, the pixel-shift amount of this step-size in search correspondence will be less than 1 pixel.When supposing to project to non-integer pixel under given depth value, the pixel of getting arest neighbors is as subpoint, then will search same pixel at a plurality of different depth value places during deep search, repeat search promptly occurred.Conversely, if given step-size in search is the side-play amount corresponding to 1 pixel at big depth value place, then will be greater than 1 pixel in the pixel-shift amount of this step-size in search correspondence of less depth value place, be that adjacent two depth values will search two non-adjacent pixels, thereby make some pixel omission, the generation search is incomplete.So, be desirably in hunting zone [z originally _Min, z _Max] in N pixel of search, but owing to produced the pixel repeat search or leak to have searched for, actual search to the efficient search point that has to be less than N.In order to guarantee that the hunting zone comprises the possible value of institute of scene real depth value, usually establish the hunting zone enough greatly, and in order to guarantee certain search precision, establish step-size in search lessly, this has increased searching times and corresponding calculated amount greatly, and owing to leak the existence of search and repeat search, the search effect is also bad.

So far, existing a lot of research and the algorithm for estimating relevant with estimation of Depth, but most of by to correction, parallel stereo-picture is to carrying out disparity estimation earlier, concerns compute depth information according to the parallax and the degree of depth again.For example, only have horizontal parallax between two width of cloth images in the parallel camera system, utilize the method elder generation estimating disparity based on feature or piece coupling, the relation that is inversely proportional to according to the degree of depth and parallax calculates depth information then; And, then to just can obtain the depth map of original view correspondence to a series of processing such as correction, parallax coupling, depth calculation and anti-corrections through image for non-parallel camera system.Be exactly to carry out disparity estimation in such estimation of Depth question essence, its performance is mainly determined by the disparity estimation algorithm.As everyone knows, disparity estimation or three-dimensional coupling are the classical problems in the computer vision, though existing so far number of research projects and achievement, texture information lack or block caused coupling ambiguity or uncertainly make that the parallax matching problem still is research focus and the difficult point in the computer vision.

2006, JVT meeting motion (" multi-view video coding core experiment 3 reports "; S.Yea, J.Oh, S.Ince, E.Martinian and A.Vetro, " Report on Core Experiment CE3 of MultiviewCoding ", ISO/IEC JTC1/SC29/WG11, Doc.JVT-T123, Klagenfurt, Austria, July2006.) proposed to utilize camera internal and external parameter and synthetic based on the view of the degree of depth, with given step-size in search, search makes the degree of depth of synthesizing the error minimum between view and the actual view as estimated value in the depth range search of a certain appointment.People such as M.Okutom have proposed the solid matching method (A multiple-baseline stereo) of many baselines stero, this method is utilized the inverse relation of the degree of depth and parallax, disparity estimation is converted into the degree of depth finds the solution problem, and a uncertain difficult problem (" many baselines stero ", pattern-recognition and machine intelligence IEEE journal in the parallax coupling have been eliminated; M.Okutomi and K.Kanade, " A multiple-baseline stereo ", IEEE Trans.on Pattern Analysis andMachine Intelligence 15 (4): 353-363,1993.).People such as N.Kim have proposed directly to carry out deep search, coupling and view synthetic operation (" general many baselines stero and the direct view that utilizes the deep space search, mates and synthesize are synthetic ", the international periodical of computer vision at distance/deep space; N.Kim, M.Trivedi and H.Ishiguro, " Generalized multiple baseline stereo and direct view synthesis usingrange-space search; match; and render ", International Journal of ComputerVision 47 (1/2/3): 131-148,2002.): directly carry out deep search at deep space, do not need the parallax coupling, image correction process is directly finished in the deep search process, and depth value is a successive value, and its precision is subjected to the restriction of image pixel resolution unlike disparity vector.But in actual the finding the solution, need designated depth hunting zone and step-size in search, ask optimum solution according to a certain cost function, and whether the value of hunting zone and step-length is suitable most important to estimated performance.

In the parallax coupling, the parallax hunting zone is intuitively definite according to image property usually, and in the deep search, particularly in non-parallel camera system, because the relation of change in depth and image pixel skew is not apparent, so its hunting zone is difficult to rationally determine.

So, how given various visual angles view is determined the key that suitable deep search interval and step-length become effective estimating depth information.

JVT-W059 (" the synthetic prediction of view core experiment 6 reports "; S.Yea and A.Vetro, " Report ofCE6 on View Synthesis Prediction ", ISO/IEC JTC1/SC29/WG11, Doc.JVT W059, SanJose, USA, April 2007.) propose to utilize the matching characteristic point of two width of cloth views right, minimum value from some groups of alternative deep search, choose in maximal value and the step-size in search make the matching characteristic point between a group of error minimum as depth range search and step-length, this method need be used KLT (Kanade Lucas-Tomasi) algorithm (" detection and tracking of unique point ", Carnegie Mellon University's technical report; C.Tomasi, and T.Kanade, " Detection and tracking ofpoint features ", Technical Report CMU-CS-91-132, Carnegie Mellon University, 1991.) carry out the feature extraction coupling, performance depends on the correctness of characteristic matching.

People such as M.Okutom and N.Kim mentions the pairing change in depth value of 1 pixel-shift amount with the reference-view of long baseline as step-size in search, thereby the pixel-shift amount in every other reference-view of guaranteeing is less than 1 pixel.

Above-mentioned two kinds of methods all are to use fixing step-size in search, do not adjust step-length adaptively according to the variation of picture material or scene.

Summary of the invention

Technical matters to be solved by this invention is that the self-adaptation that proposes a kind of step-size in search is determined method, can avoid the pixel repeat search or leak search.In addition, the invention allows for a kind of depth estimation method based on the adaptable search step-length.

The present invention solves the problems of the technologies described above the technical scheme that is adopted to be, multi-vision angle video image deep search method is characterized in that the step-size in search in each step is dynamically adjusted according to current depth value in depth range search, current depth value is more little, and the step-size in search of employing is more little; Current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision;

According to the relation of change in depth value and pixel-shift vector, described depth range search and step-size in search determined to be converted into determining of pixel hunting zone and pixel search precision; The length of pixel-shift vector each time during described pixel search precision equals to search for; Described pixel search precision can for minute pixel precision as 1/2nd pixels, 1/4th pixels, or whole pixel precision, as a pixel, two pixels; Described step-size in search equals in the search the pairing change in depth value of pixel-shift vector each time;

The step-size in search of target view determined by the camera internal and external parameter of the pixel-shift vector sum view correspondence in current depth value, the reference-view in the target view, in the target view step-size in search in each step in reference-view corresponding to the pixel-shift vector of equal length.Described target view is meant the current image that needs estimating depth, and described reference-view is meant other images in the multi-angle video system.Reference-view can be selected in the deep search process or be specified by the user automatically;

Step-size in search is obtained by following formula:

Δz = \frac{{(z b_{3}^{T} P + c_{3}^{T} Δ t_{r})}^{2} {| | Δ P_{r} | |}^{2}}{Δ P_{r}^{T} (b_{3}^{T} P C_{r} Δ t_{r} - c_{3}^{T} Δ t_{r} B_{r} P) - (z b_{3}^{T} P + c_{3}^{T} Δ t_{r}) (b_{3}^{T} P) {| | Δ P_{r} | |}^{2}}

Wherein: P is a pixel for the treatment of estimation of Depth in the target view, and z is the current depth value of pixel P, and Δ z is that the change in depth value of pixel P is a step-size in search, Δ P _rBe the change in depth value Δ z of pixel P in the target view corresponding pixel-shift vector in reference-view r, ‖ Δ P _r‖ ²=Δ P _r ^TΔ P _r; B _r=A _rR _r ^-1RA ^-1And C _r=A _rR _r ^-1Be 3 * 3 matrix, Δ t _r=t-t _rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R _rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t _rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A _rCamera inner parameter matrix for reference viewing angle; b ₃And c ₃It is respectively matrix B _rAnd C _rThe third line vector.For parallel camera system, square being directly proportional of described change in depth value and current depth value.

Pixel-shift vector in the described reference-view satisfies the polar curve equation of constraint of target visual angle and reference viewing angle:

Δ P_{r}^{T} (C_{r} Δ t_{r} \times B_{r}) P = 0,

Wherein, P is the pixel in the target view, Δ P _rBe the pixel-shift vector in the reference-view.There are two rightabout each other described pixel-shift vector Δ P _rSatisfy described polar curve equation of constraint, the corresponding respectively depth value augment direction of described 2 pixel-shift vectors, depth value reduce direction; The pairing change in depth value of the offset vector of depth value augment direction reduces the pairing change in depth value of offset vector of direction greater than depth value.

The depth estimation method of multi-vision angle video image, utilize synthetic based on the view of the degree of depth and the deep search of joining based on piece in, the depth range search of target view and step-size in search are by the pixel hunting zone and the decision of pixel search precision of reference-view; In depth range search, the step-size in search in each step is dynamically adjusted according to current depth value, and current depth value is more little, and the step-size in search of employing is more little; Current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision;

Described deep search step-length determined by the camera internal and external parameter of the pixel-shift vector sum view correspondence in current depth value, the reference-view in the target view, in the target view step-size in search in each step in reference-view corresponding to the pixel-shift vector of equal length;

Described view based on the degree of depth is synthetic, be meant the pixel and the depth value of given target view, camera internal and external parameter according to target visual angle and reference viewing angle, this pixel back projection is arrived the three-dimensional scenic spatial point, again this spatial point is projected to again the method for the plane of delineation of reference viewing angle, obtain the synthetic view of target view in this reference viewing angle;

Described view based on the degree of depth synthesizes and is specially based on the deep search that piece is joined, and utilize current depth value to carry out view and synthesize, and the error between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view; Adopting the depth value of least error correspondence is the estimation of Depth value of target view;

The depth estimation method of multi-vision angle video image specifically may further comprise the steps:

Deep search initial value z in the step 1 estimating target view _K=0

Step 2 is determined deep search corresponding to pixel hunting zone in the reference-view and pixel search precision, obtains pixel-shift vector Δ P in the reference-view according to the pixel search precision _r

Step 3 is according to current depth value z _kWith pixel-shift vector Δ P _r, obtaining corresponding change in depth value Δ z, described change in depth value Δ z is next step step-size in search;

Step 4 is utilized current depth value z _kIt is synthetic to carry out view, and the error e between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view _k

Step 4 is upgraded current depth value z _k=z _k+ Δ z; K=k+1;

Step 5 judges whether to surpass given pixel hunting zone, enters step 6 in this way, as not, enters step 3;

Step 6 is with error e _k(k=0 ..., N-1, N is the search total step number) in the depth value of least error correspondence be estimated value.Described step-size in search is obtained by following formula:

Δz = \frac{{(z b_{3}^{T} P + c_{3}^{T} Δ t_{r})}^{2} {| | Δ P_{r} | |}^{2}}{Δ P_{r}^{T} (b_{3}^{T} P C_{r} Δ t_{r} - c_{3}^{T} Δ t_{r} B_{r} P) - (z b_{3}^{T} P + c_{3}^{T} Δ t_{r}) (b_{3}^{T} P) {| | Δ P_{r} | |}^{2}}

Wherein: P is a pixel for the treatment of estimation of Depth in the target view, and z is the current depth value of pixel P, and Δ z is that the change in depth value of pixel P is a step-size in search, Δ P _rBe the change in depth value Δ z of pixel P in the target view corresponding pixel-shift vector in reference-view r, ‖ Δ P _r‖ ²=Δ P _r ^TΔ P _r; B _r=A _rR _r ^-1RA ^-1And C _r=A _rR _r ^-1Be 3 * 3 matrix, Δ t _r=t-t _rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R _rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t _rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A _rCamera inner parameter matrix for reference viewing angle; b ₃And c ₃It is respectively matrix B _rAnd C _rThe third line vector.For parallel camera system, square being directly proportional of described current depth value with the change in depth value.Pixel-shift vector in the described reference-view satisfies the polar curve equation of constraint of target visual angle and reference viewing angle:

Δ P_{r}^{T} (C_{r} Δ t_{r} \times B_{r}) P = 0,

Wherein, P is the pixel in the target view, Δ P _rBe the pixel-shift vector in the reference-view.

The invention has the beneficial effects as follows that the deep search of adaptable search step-length pixel can not occur and leak search and repeat search, image block synthetic in the estimation of Depth is little with the absolute difference of reference image block, and mistake is estimated to lack, and calculated amount or deep search number of times are few.

Description of drawings

Fig. 1 is provided with synoptic diagram for the coordinate system in the multi-angle video system;

Fig. 2 is based on the synthetic synoptic diagram of the view of the degree of depth;

The view of the initial time of the video sequence of the 7th camera in Fig. 3 (a) Uli cycle tests;

The view of the initial time of the video sequence of the 7th camera in Fig. 3 (b) Uli cycle tests;

Fig. 3 (c) is the partial schematic diagram of Fig. 2 (a), and 16 signal zones that show are the image-region of pixel [527,430] to [590,493];

Fig. 4 is the synoptic diagram that concerns of change in depth value and depth value square;

Fig. 5 is the synoptic diagram of change in depth value of the present invention and pixel-shift vector;

Fig. 6 is that depth value pixel hour is leaked the synoptic diagram of search;

Fig. 7 is the synoptic diagram of the pixel repeat search of depth value when big;

Fig. 8 adjusts the synoptic diagram of deep search step-length for self-adaptation of the present invention;

Fig. 9 searches the distribution schematic diagram of pixel for adopting the adaptive elongated step-size in search of the present invention;

Figure 10 adopts the deep search performance synoptic diagram of fixing search step-length and adaptive step of the present invention.

Embodiment

The self-adaptation that the present invention proposes a kind of deep search step-length is determined method, utilize camera internal and external parameter and perspective projection relation, at first derive the relation between the pixel-shift amount of subpoint in synthetic view that pixel depth value, change in depth value and change in depth cause, according to the relational expression between change in depth value of deriving and the respective pixel side-play amount, depth range search determined to be converted into determining of pixel hunting zone, the pixel-shift amount has meaning directly perceived in image, rationally determine easily; And relation according to pixel-shift amount and depth value, be that depth value is big more, the pixel-shift amount that identical change in depth value causes is just more little, dynamically adjust step-size in search, make each step-size in search correspondence identical pixel search precision, avoid the pixel repeat search or leaked search, thereby improved search efficiency and performance.In addition, the invention allows for a kind of simple and effective initial depth method of estimation, this method is by finding the solution the convergent point that converges camera optical axis in the camera system, and this point is regarded as scene epitome point, thereby obtains that of scene depth is general to be estimated.

Usually need the coordinate system of three types to describe scene and picture position information thereof in multi-angle video, they are respectively world coordinate system, camera coordinates system and the pixel coordinate system at scene place, as shown in Figure 1.Wherein, camera coordinates system is that initial point, optical axis are the z axle with the camera center, and the xy plane is parallel with the plane of delineation; Pixel coordinate system is a true origin with the image upper left corner then, and level and vertical coordinate are u, v.

If camera c _i(i=1 ..., camera coordinates m) is o _i-x _iy _iz _iWith respect to the position of world coordinate system o-xyz three-dimensional rotation matrix R _iWith translation vector t _iExpression, wherein m is the camera number.Any vectorial p=[x of coordinate under world coordinate system in the scene, y, z] expression is o in camera coordinates _i-x _iy _iz _iIn the vectorial p of coordinate _i=[x _i, y _i, z _i] expression, then according to space geometry and coordinate transform, following relation is arranged:

p＝R _ip _i+t _i (1)

According to computer vision perspective projection principle, the coordinate p under the camera coordinates system _iWith its homogeneous pixel coordinate P at the plane of delineation _i=[u _i, v _i, 1] satisfy following the relation:

z _ip _i＝A _ip _i (2)

Wherein, A _iCamera c for reference viewing angle _iThe inner parameter matrix mainly comprises parameters such as camera focus, center and deformation coefficient.

The present invention carries out deep search based on piece coupling at deep space, promptly utilize camera internal and external parameter and synthetic based on the view of the degree of depth, in depth range search, make the depth value of the error minimum between the block of pixels of block of pixels and corresponding actual reference-view of synthetic view with step-size in search search, and the estimation of Depth value of this depth value as the pixel of target view.Target view and target visual angle are meant current image and the corresponding visual angle that needs estimating depth, and reference-view and reference viewing angle are meant other images and the visual angle in the multi-angle video system.Reference-view and reference viewing angle can be selected in the deep search process or be specified by the user automatically.

The depth value of pixel is given regularly in view, can in the scene space, obtain a spatial point according to the camera internal and external parameter to this pixel back projection (backproject), again the plane of delineation of this spatial point projection again (re project) to required view directions, obtain the synthetic view at this visual angle, Here it is based on the view synthetic technology of the degree of depth, as shown in Figure 2.

Consider the situation of two views, establish view 1 and be target view that view 2 is a reference-view.Pixel P in the view 1 ₁At its camera c ₁Depth value under the coordinate system is z ₁, this corresponding pixel points in view 2 is P ₂, at its camera c ₂Depth value under the coordinate system is z ₂, can derive according to formula (1) (2) obtains

z_{1} R_{1} A_{1}^{- 1} P_{1} + t_{1} = z_{2} R_{2} A_{2}^{- 1} P_{2} + t_{2} - - - (3)

Obtain by formula (3):

A_{2} R_{2}^{- 1} (z_{1} R_{1} A_{1}^{- 1} P_{1} + t_{1} - t_{2}) = z_{2} P_{2} - - - (4)

Note is described for convenient:

C = A_{2} R_{2}^{- 1}, B = A_{2} R_{2}^{- 1} R_{1} A_{1}^{- 1} = {CR}_{1} A_{1}^{- 1}, t = t_{1} - t_{2} - - - (5)

Then (4) formula becomes:

z ₁BP ₁+Ct＝z ₂P ₂ (6)

B wherein, C is a three-dimensional matrice, t is two translation vectors between the camera.Because P ₁, P ₂Be homogeneous coordinates, but the z in the cancellation (6) ₂, obtain pixel P ₁Pixel homogeneous coordinates in view 2 are:

P_{2} = \frac{z_{2} P_{2}}{z_{2}} = \frac{z_{1} {BP}_{1} + Ct}{z_{1} b_{3}^{T} P_{1} + c_{3}^{T} t} \hat{=} f_{2} (z_{1}, P_{1}) - - - (7)

B wherein ₃And c ₃It is respectively the third line vector of matrix B and C;

Can draw by formula (9): at camera c ₁With c ₂Under the known situation of internal and external parameter, the pixel point value of view 2 is about the pixel point value in the view 1 and the function of depth value thereof.Utilizing formula (7) to carry out the view of view 1 in reference viewing angle 2 synthesizes.

Pixel P in the view 1 ₁, under given depth z, obtain it at camera c by back projection and re-projection ₂The visual angle in the pixel P of synthetic view 2 ₂,

P_{2} \hat{=} f_{2} (z, P_{1}),

According to the hypothesis commonly used in the computer vision, the corresponding pixel points of Same Scene point in the view of different visual angles has identical YC value, then the pixel P of view 1 under depth value z ₁Pixel P in the synthetic view 2 at visual angle 2 ₂The YC value be:

Synthesized_I ₂(P ₂)＝Synthesized_I ₂(f ₂(z，P ₁))＝I ₁(P ₁) (8)

I ₁Be view 1, I ₂Be view 2, Synthesized_I ₂Be the synthetic view 2 of view 1 in reference camera visual angle 2.Above-mentioned explanation is that the camera system of forming with two cameras is an example, can further draw the camera system of being made up of m camera equally and go for above-mentioned principle.

Suppose the pixel P in the local window W that with pixel P is the center _jHave identical scene depth value, then be at the synthetic view 2 of window W internal view 1 and the absolute difference of the camera reference-view that 2 actual photographed obtain at the visual angle 2:

SAD (z, P) = \underset{P_{j} &Element; W}{Σ} | | Synthesised_I_{2} (f (z, P_{j}) - I_{2} (f (z, P_{j})) | |

= \underset{P_{j} &Element; W}{Σ} | | I_{1} (P_{j}) - I_{2} (f (z, P_{j})) | | - - - (9)

Because synthetic view 2 is to utilize the camera parameter of reference-view 2 correspondences to calculate, so the synthetic view 2 under the real scene depth value has identical YC value with reference-view 2 in theory.Therefore, view 1 is found the solution at the depth value of pixel P and can be converted into following problem:

\min_{z &Element; {depth range}} SAD (z, P) - - - (10)

Promptly in given depth hunting zone (depth range), making the depth z of absolute difference minimum of synthetic view and reference-view as final estimation of Depth value.

Thisly directly carry out the method for deep search at deep space, do not need the parallax coupling, image correction process is directly finished in the deep search process, and depth value is successive value, and its precision is subjected to the restriction of image pixel resolution unlike disparity vector

Know that from formula (7) under the known situation of camera internal and external parameter, the pixel of synthetic view 2 is the functions about pixel in the view 1 and depth value thereof.If the depth value changes delta z of the pixel P1 correspondence in the view 1, then its pixel coordinate in synthetic view 2 is:

P_{2}^{'} = \frac{(z_{1} + Δz) {BP}_{1} + Ct}{(z_{1} + Δz) b_{3}^{T} P_{1} + c_{3}^{T} t} - - - (11)

So, the pixel P in the view 1 ₁Depth value changes delta z cause that its pixel-shift vector in synthetic view 2 is:

ΔP = P_{2} - P_{2}^{'} = [\begin{matrix} Δu \\ Δv \\ 0 \end{matrix}] = \frac{z_{1} {BP}_{1} + Ct}{z_{1} b_{3}^{T} P_{1} + c_{3}^{T} t} - \frac{(z_{1} + Δz) {BP}_{1} + Ct}{(z_{1} + Δz) b_{3}^{T} P_{1} + c_{3}^{T} t} - - - (12)

The pass that can be derived the depth value changes delta z of the pixel in the view 1 and the respective pixel point offset vector Δ P in the synthetic view 2 by formula (12) is:

Δz (b_{3}^{T} P_{1} Ct - c_{3}^{T} t {BP}_{1} - (z_{1} b_{3}^{T} P_{1} + c_{3}^{T} t) b_{3}^{T} P_{1} ΔP) = {(z_{1} b_{3}^{T} P_{1} + c_{3}^{T} t)}^{2} ΔP - - - (13)

Use Δ P ^TThe both sides of premultiplication (13) obtain:

Δz = \frac{{(z_{1} b_{3}^{T} P_{1} + c_{3}^{T} t)}^{2} {| | ΔP | |}^{2}}{Δ P^{T} (b_{3}^{T} P_{1} Ct - c_{3}^{T} t {BP}_{1}) - (z_{1} b_{3}^{T} P_{1} + c_{3}^{T} t) (b_{3}^{T} P_{1}) {| | ΔP | |}^{2}} - - - (14)

Wherein, ‖ Δ P ‖ ²=Δ P ^TΔ P be pixel-shift vector Δ P mould square.So, when camera parameter is known, can try to achieve in depth z by (14) ₁The pairing change in depth value of the pixel-shift vector Δ P of place Δ z.

In addition, by formula (6) can obtain two width of cloth views corresponding pixel points this satisfy following polar curve equation of constraint:

P ₂ ^T(Ct×B)P ₁＝0 (15)

P ₂′ ^T(Ct×B)P ₁＝0 (16)

Wherein * be taking advantage of again of vector.So formula (15) deducts formula (16) and obtains pixel-shift vector Δ P and also should satisfy the polar curve equation of constraint:

ΔP ^T(Ct×B)P ₁＝0 (17)

Given camera parameter and pixel P ₁Situation under, formula (17) is about two the component Δ u of pixel-shift vector Δ P and the homogeneous linear equations of Δ v.

For parallel camera system, parallax d and its degree of depth of Same Scene o'clock in two width of cloth views is inversely proportional to, promptly

d = \frac{fB}{z} - - - (18)

Wherein d and z are respectively the parallax and the degree of depth, and f and B are respectively the focal length and the base length of camera.The pixel P in the view 1 then ₁Depth value by z ₁Change to z ₂The time, the pixel-shift amount of its corresponding subpoint in synthetic view 2 is

| | ΔP | | = | d_{1} - d_{2} | = fB \frac{| z_{1} - z_{2} |}{z_{1} z_{2}} \approx fB \frac{| Δz |}{{z_{1}}^{2}} - - - (19)

Know that according to (19) the change in depth value is directly proportional with the pixel-shift amount, with square being inversely proportional to of depth value.For identical pixel-shift amount, when residing depth value is big more, corresponding change in depth value is just big more, and when residing depth value is more little, corresponding change in depth value is just more little.For converging camera system,, from formula (12) also as can be seen, between change in depth value, pixel-shift amount and the depth value approximate relation is arranged when the angle of two cameras when not being very big.

In order to verify this conclusion, we with as shown in Figure 3 Uli cycle tests (these these multi-angle video data are provided by the Heinrich-Hertz-Institut (HHI) of Germany, can be from https: //www.3dtv-research.org/3dav_CfP_FhG_HHI/https: //www.3dtv-research.org/3dav_CfP_FhG_HHI/ downloads and obtains.This video sequence adopts by 8 video cameras shootings of arranging with the ethod of remittance and obtains, video format is 1024x768,25fps) this paper adopts the view of initial time of the video sequence of the 7th and the 8th camera) the parameter of the 7th and the 8th camera, the pixel P=[526 in view 7 (as Fig. 3 (a)) according to formula (14) and (17) calculating, 429] locate (corresponding to the clasp on shirt collar the right), the relation between change in depth value, the depth value quadratic sum pixel-shift amount.The given unit picture element offset vector that satisfies polar curve constraint (17), be ‖ Δ P ‖=1, calculate the change in depth value of different depth value correspondences according to (14), the relation between them as shown in Figure 4, wherein horizontal ordinate be depth value square, ordinate is the change in depth value.Fig. 4 shows, pixel-shift amount in synthetic view is given regularly, change in depth value and depth value square be approximated to linear relationship, this means that at different depth value place the change in depth value of the same amount of pixel causes different pixel-shift amounts in the view 1 in synthetic view.

It should be noted that because (17) are the homogeneous linear equations about pixel-shift vector Δ P, so there are rightabout each other two kinds of situation Δ P in Δ P ₊With Δ P _-, can try to achieve one positive one two negative change in depth value Δ z to their substitutions (14) ₊With Δ z _-, i.e. Δ P ₊With Δ P _-Correspond respectively to depth value increase and depth value and reduce caused pixel-shift vector.Know that by preceding surface analysis the pixel-shift amount is given regularly, change in depth value and depth value square be approximated to proportional relation, so the change in depth value of two pixel-shifts vector Δ P correspondences of the identical direction that varies in size is also inequality, i.e. depth value decrease | Δ z _-| less than the depth value increase | Δ z ₊|, as shown in Figure 5.For example, get the pixel P=[526 in the Uli view 7 (shown in Fig. 3 (a)), 429], depth value 3172mm, the pixel-shift amount is 64 pixels, and promptly ‖ Δ P ‖=64 are tried to achieve two corresponding change in depth values of rightabout pixel-shift vector according to (14) (17) and are respectively Δ z ₊=930 and Δ z _-=-593.

Know according to above analysis, under the situation of same pixel side-play amount, change in depth value and depth value square be approximated to proportional relation.So when using the fixing search step-length, as if the side-play amount of the step-size in search given at less depth value place corresponding to 1 pixel, then at big depth value place, the pixel-shift amount of this step-size in search correspondence will be less than 1 pixel.When supposing to project to non-integer pixel under given depth value, the pixel of getting arest neighbors is as subpoint, then will search same pixel at a plurality of different depth value places during deep search, repeat search promptly occurred.Conversely, if given step-size in search is the side-play amount corresponding to 1 pixel at big depth value place, then will be greater than 1 pixel in the pixel-shift amount of this step-size in search correspondence of less depth value place, be that adjacent two depth values will search two non-adjacent pixels, thereby make some pixel omission, the generation search is incomplete.So, be desirably in hunting zone [z originally _Min, z _Max] in N pixel of search, but owing to produced the pixel repeat search or leak to have searched for, actual search to the efficient search point that has to be less than N.

For example, we are to the pixel P=[526 in the Uli view 7,429], in the scope of [2000,4500], carry out deep search with the fixed step size of 10mm.As shown in Figure 6, when depth value hour, the pixel u coordinate that for example searches at 2090 places is 661, and the u coordinate of the pixel that searches at depth value 2080 places is 663, the centre has pixel to skip and does not have searched arriving; And as shown in Figure 7, when depth value was big, for example to have searched the u coordinate be 437 same pixel two

different depth values

4450 and 4460 places, and promptly pixel has carried out repeat search.Since the step-size in search of 10mm in real depth is worth 3170 subranges corresponding to the search precision of 1 pixel, so we were desirably in [2000 originally, 4500] search 250 different pixels in the scope, but leak search and repeat search because pixel has taken place, actual computation finds only to have searched for 200 pixels.

In order to make in the deep search process, step-size in search is corresponding to pixel search precision identical in the reference-view, be that step-size in search is all the time corresponding to the side-play amount of fixing a pixel in the reference-view, must dynamically adjust step-size in search according to the relation between change in depth value and the depth value, and determine corresponding hunting zone.Suppose the pixel P in the view 1 ₁The initial search depth value be z ₀, then can try to achieve in depth z easily according to formula (14) ₀Down, the change in depth value Δ z in the view 1 of the pixel-shift amount Δ P correspondence in the reference-view 2.As initial depth value z ₀Differ under the situation that is not very big pixel P with the real depth value ₁True corresponding pixel points and depth z in reference-view 2 ₀Pixel-shift amount between the pixel of trying to achieve is confined in a certain scope usually down.Be given in below in the N of pixel hunting zone, how determine step-size in search, make the side-play amount of the corresponding fixing all the time pixel of step-size in search according to the depth value self-adaptation.

Given pixel P ₁And camera parameter, according to the polar curve equation of constraint (16) of pixel-shift vector, be easy to find the solution two rightabout each other offset vector Δ P that obtain pixel-shift amount ‖ Δ P ‖ correspondence ₊With Δ P _-, calculate two corresponding change in depth value Δ z according to (14) then ₊With Δ z _-, they are diminished as next step depth value and become the step-size in search of general orientation, as shown in Figure 8,

z _-1＝z ₀+Δz ₊₁ (20)

z ₁＝z ₀+Δz ₁

Connect and see, at depth value Δ z _-1With offset vector Δ P _-Utilize (14) to calculate corresponding change in depth value Δ z down, _-2, at depth value z ₁With offset vector Δ P ₊Utilize (14) to calculate corresponding change in depth value Δ z down, _{+ 2}, and them respectively as next step step-size in search, promptly

z _-2＝z _-1+Δz _-2

z ₂＝z ₁+Δz ₂ (21)

By that analogy, can obtain the search depth and the step-length in n step is:

z _-n＝z _-(n-1)+Δz _-n

z _n＝z _(n-1)+Δz _n (22)

Wherein, search step number n determines that according to hunting zone N and search precision promptly n satisfies n Δ≤N.

So determined hunting zone and initial depth value z ₀After, utilize the elongated step-size in search that above method just can self-adaptation be adjusted in the hope of changing along with depth value, make to keep identical pixel search precision in the deep search process, overcome pixel repeat search in the fixing search step-length or leaked the defective of search.Because depth range search obtains by adding up of step-size in search, thereby also adjusts adaptively along with the variation of depth value, when depth value became big, the depth range search of identical pixel-shift amount ‖ Δ P ‖ correspondence correspondingly became big; When depth value diminished, the depth range search of identical pixel-shift amount ‖ Δ P ‖ correspondence also correspondingly diminished.In addition, we can also pass through pixel precision Δ controlling depth search precision easily, and as the search precision of Δ=1 corresponding to unit picture element, and Δ=1/2 is corresponding to the search precision of half-pix.

So, relation between depth value variation and the pixel-shift vector has been arranged, suc as formula (14), just can determine corresponding deep search step-length, definite determining of corresponding pixel-shift amount that also be converted into of depth range search by the method for determining the pixel search precision.Pixel-shift amount and search precision determine to be similar to determining of hunting zone and precision in the disparity estimation, have meaning directly perceived, determine easily, and can be according to picture material or application demand, by adjusting pixel-shift amount and search precision, dynamically determine corresponding depth range search and step-length.

In the estimation of Depth process, need a given degree of depth initial value z ₀, the value quality of this initial value affects deep search performance and effect.Work as z ₀With the deviation of real depth value hour, can use less pixel-shift amount is that the hunting zone can be smaller, thereby reduces the too high search speed of volumes of searches; Work as z ₀When big, then will use relatively large pixel-shift amount with the deviation of real depth value, guaranteeing the searching real depth value, thereby calculated amount is bigger.Though the degree of depth initial value of difference can be by setting large-scale hunting zone and high-precision step-size in search improves search performance, but good degree of depth initial value can determine among a small circle the hunting zone and suitable step-length, thereby improve the efficient and the performance of deep search.So, the estimation of degree of depth initial value and definite also extremely important in the estimation of Depth process.

The definite of the initial depth value of video sequence image can be divided into two kinds of situations, the image of initial time and successive image.The definite of the degree of depth initial value of initial time image is divided into two kinds again, i.e. first pixel and other pixel.For first pixel,,, need to consider how from information such as characteristics of image and camera parameter, to obtain the big probable value of scene depth this moment as initial value therefore without any known scene depth information owing to also any pixel was not carried out deep search; For follow-up other pixels, then can determine its initial depth according to the estimation of Depth value of neighborhood pixels point in the image.For follow-up image, because the depth value of the video sequence image at same visual angle has very strong correlativity, the degree of depth of actionless background area remains unchanged, and have only the degree of depth of the moving region of minority to change, so can be the depth value of the same pixel position of previous moment image as initial value.So in the determining of initial depth value, key is to obtain the scene depth information of initial time image, for first pixel provides degree of depth initial value preferably.

In the multi-angle video, the difference between the image of different views or the positional information of camera are comprising the information of relevant scene depth usually.At converging two kinds of situations of camera system and parallel camera system, be given under the situation without any known depth information below, carry out the initial estimation of scene depth according to camera parameter or image information.

The main target of multi-angle video is the information in a plurality of angle shot Same Scene, place so camera is circular arc usually, and the camera light axle converges at a bit, i.e. collecting system.In the practical application, though camera may not strictly converge at a bit, always can find a point nearest with each camera optical axis distance, this point is considered to convergent point.Convergent point all is the position at scene place usually, can think an epitome point of scene, so the position by trying to achieve convergent point just can be in the hope of a big probable value of scene depth, and this value as the initial value in the deep search.

If the coordinate of convergent point in world coordinate system is Mc=[x _c, y _c, z _c], this point is positioned on the optical axis of each camera, so this point can be expressed as in the camera coordinates system that with the optical axis is the z axle:

M ₁＝[0，0，z _r1]

M ₂＝[0，0，z _r2]

------ (23)

M _m＝[0，0，z _rm]

Z wherein _RiBe that convergent point is at camera c _iThe degree of depth in the coordinate system.According to the relation of world coordinates and camera coordinates, following formula is arranged:

M _c＝R ₁M ₁+t ₁

M _c＝R ₂M ₂+t ₂

------ (24)

M _c＝R _mM _m+t _m

Cancellation Mc obtains:

R ₁[0，0，z _r1]+t ₁＝R ₂[0，0，z _r2]+t ₂

R ₁[0，0，z _r1]+t ₁＝R ₃[0，0，z _r3]+t ₃

------ (25)

R ₁[0，0，Z _r1]+t ₁＝R _m[0，0，z _rm]+t _m

Formula (25) is about depth z _R1, z _R2,---, z _RmThe individual linear equation of 3 (m-1).With linear least square solving equation group (25), can obtain the depth value z of convergent point in each camera coordinates system _R1, z _R2,---, z _Rm, they are big probable values of scene depth, can be used as the degree of depth initial value in the deep search.

Do not have convergent point in the parallel camera system, can not ask depth information, but parallax and the degree of depth there were simple inverse relation (18) this moment, so can obtain depth information by the method for calculating the global disparity between two width of cloth views in order to last method.

Global disparity may be defined as the pixel-shift amount of the absolute difference minimum that makes two views, promptly tries to achieve by the following method:

g_{x} = \min_{x} [\frac{\underset{i, j &Element; R}{Σ} | | I_{1} (i, j) - I_{2} (i + x, j) | |}{R}] - - - (26)

Wherein, R is the number of pixels of looking the overlapping region of Fig. 1 and 2.Since less demanding to the estimated accuracy of global disparity, so the search unit of pixel-shift amount x can establish more greatly in the formula (26), such as 8 pixels or 16 pixels, thereby can significantly reduce calculated amount.After trying to achieve global disparity,, can try to achieve degree of depth initial value according to the relational expression (18) that the degree of depth and parallax are inversely proportional to.

A scene point utilizing Uli video sequence parameter document to provide: the real world coordinates [35.07 of the high brightness point on the glasses left side, 433.93,-1189.78] (mm of unit), and can obtain coordinate and the real depth information of this scene point under camera coordinates system according to the relational expression (1) of world coordinates and camera coordinates; Utilize the above-mentioned method of asking two camera convergent points again, promptly obtain depth value under the coordinate system of convergent point in camera 7 and camera 8 by finding the solution system of linear equations (26), result of calculation is as shown in table 1.Judge that according to human eye observation the depth of field of Uli scene changes little, and the real depth information of degree of depth initial estimate and scene point is more or less the same in the table 1, has illustrated that degree of depth initial estimate is comparatively effective and reasonable, for estimation of Depth provides good initial value.

(coordinate unit mm)	Depth value under camera 7 coordinate systems	Depth value under camera 8 coordinate systems
(coordinate unit mm)	Depth value under camera 7 coordinate systems	Depth value under camera 8 coordinate systems	The real depth value	3076.2	3163.7
Degree of depth initial estimate	2871.2	2955.7	The real depth value	3076.2	3163.7

Table 1

Uli view shown in Fig. 3 (c) is from the 64x64 image-region of pixel [527,430] to [590,493].To in this zone every the pixel of 15 pixels, totally 16 pixels carry out deep search respectively with fixed step size and adaptive step.Carry out adopting for three times the search of fixing search step-length in fixing search scope [2000,5000], step-length is respectively 20,35, and 50.In the determining of adaptable search step-length, initial depth is 2817, and the pixel-shift amount is made as 32 pixels, and search precision is 1 pixel, and the degree of depth initial value of later pixel point is made as the estimation of Depth value of neighborhood pixels point.Definite method of adaptable search step-length according to the present invention, can obtain pixel [527,430] in the hunting zone of departing from 32 pixels of initial ranging pixel, the pairing step-size in search of the search precision of per unit pixel, as shown in table 2, search pixel as shown in Figure 9 by these step-size in searchs.Table 2 explanation, the step-length that reduces direction along depth value is a negative value, and along with the reducing of the increase of pixel-shift amount, depth value, the absolute value of step-length reduces gradually; And along the step-length of depth value augment direction be on the occasion of, and along with the increase of pixel-shift amount, the increase of depth value, the absolute value of step-length increases gradually.Fig. 9 shows, when carrying out deep search with the elongated step-size in search of table 2, corresponding pixel search precision is guaranteed to hold constant, is 1 pixel all the time.

The pixel-shift amount	Step-length (depth value augment direction)	Step-length (depth value reduces direction)	The pixel-shift amount	Step-length (depth value augment direction)	Step-length (depth value reduces direction)
The pixel-shift amount	Step-length (depth value augment direction)	Step-length (depth value reduces direction)	The pixel-shift amount	Step-length (depth value augment direction)	Step-length (depth value reduces direction)	1	11.4877	-11.2503	17	12.8909	-10.1000
2	11.5686	-11.1728	18	12.9870	-10.0340	1	11.4877	-11.2503	17	12.8909	-10.1000
2	11.5686	-11.1728	18	12.9870	-10.0340	3	11.6502	-11.0960	19	13.0842	-9.9687
4	11.7328	-11.0201	20	13.1824	-9.9041	3	11.6502	-11.0960	19	13.0842	-9.9687
4	11.7328	-11.0201	20	13.1824	-9.9041	5	11.8162	-10.9450	21	13.2818	-9.8400
6	11.9005	-10.8706	22	13.3823	-9.7766	5	11.8162	-10.9450	21	13.2818	-9.8400
6	11.9005	-10.8706	22	13.3823	-9.7766	7	11.9858	-10.7969	23	13.4840	-9.7138
8	12.0719	-10.7240	24	13.5868	-9.6515	7	11.9858	-10.7969	23	13.4840	-9.7138
8	12.0719	-10.7240	24	13.5868	-9.6515	9	12.1590	-10.6519	25	13.6908	-9.5899
10	12.2470	-10.5805	26	13.7961	-9.5289	9	12.1590	-10.6519	25	13.6908	-9.5899
10	12.2470	-10.5805	26	13.7961	-9.5289	11	12.3360	-10.5098	27	13.9025	-9.4684
12	12.4260	-10.4397	28	14.0101	-9.4086	11	12.3360	-10.5098	27	13.9025	-9.4684
12	12.4260	-10.4397	28	14.0101	-9.4086	13	12.5169	-10.3704	29	14.1191	-9.3493
14	12.6089	-10.3018	30	14.2293	-9.2905	13	12.5169	-10.3704	29	14.1191	-9.3493
14	12.6089	-10.3018	30	14.2293	-9.2905	15	12.7018	-10.2339	31	14.3407	-9.2323
16	12.7958	-10.1666	32	14.4535	-9.1746	15	12.7018	-10.2339	31	14.3407	-9.2323

Table 2

Utilization is carried out estimation of Depth based on the method for piece coupling, and deep search result is shown in Figure 10, synthetic piece under the depth value that each some expression search obtains among the figure and the absolute difference between the actual block, and the more little common representative estimation of Depth value of this value is accurate more.When adopting the fixing search step-length, because the more little expression search accuracy of step-length is high more, so the effect of estimation of Depth is good more, the absolute difference that obtains under the depth value that search obtains as step-length 20mm is less than the absolute difference of step-length 35mm, and the absolute difference of 35mm is less than 50mm.But it is best that adaptive step-size in search is searched for the depth value result who obtains down, the absolute difference minimum that it is corresponding.

Fig. 3 (c) is 16 pixels that pixel [527,430] carries out estimation of Depth to the image-region of [590,493] in the view 2 (a), adopts the adaptable search step-length respectively, fixed step size 20,35, and 50 search for.Table 3 shows, when adopting the adaptable search step-length, 16 pixels have all searched correct depth value, and has wrong estimation of Depth when adopting fixed step size.This is because these several pixels are in the zone that texture lacks, and in large-scale fixing search scope, the absolute difference smallest point that search obtains does not also correspond to correct pixel.And when adopting adaptive step-size in search, because initial value determines that according to neighbor information the pixel-shift amount can be established lessly, promptly searches in less relatively subrange, reduce the probability that searches erroneous pixel point, and guaranteed certain degree of depth slickness.Table 3 has been listed depth estimation result, deep search number of times and the wrong estimation number when using adaptable search step-length and fixing search step-length, and the data that frame is arranged in the table are misdata.Searching times is few and do not have wrong to estimate during table 3 presentation of results adaptable search step length searching, and searching times is many and also exist and wrongly estimate during the fixing search step length searching.For example need only 64 depth values of search in the self-adaptation deep search of 32 pixel-shift amounts, and the fixing search step-length of 20mm needs to search for 150 depth values in the hunting zone of [2000,5000].

Table 3

Result by table 3 and Figure 10, reach a conclusion: be higher than the fixing search step-length on the deep search performance of adaptable search step-length, the synthetic image block of the depth value of promptly utilize estimating is little with the absolute difference of reference image block, and mistake is estimated to lack, and calculated amount or deep search number of times are few.

Claims

1. multi-vision angle video image deep search method is characterized in that, the step-size in search in each step is dynamically adjusted according to current depth value in depth range search, and current depth value is more little, and the step-size in search of employing is more little; Current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision.

2. multi-vision angle video image deep search method according to claim 1, it is characterized in that, according to the relation of change in depth value and pixel-shift vector, described depth range search and step-size in search determined to be converted into determining of pixel hunting zone and pixel search precision.

3. as multi-vision angle video image deep search method as described in the claim 2, it is characterized in that, the length of pixel-shift vector each time during described pixel search precision equals to search for, described pixel search precision is for dividing pixel precision or whole pixel precision.

4. as multi-vision angle video image deep search method as described in the claim 2, it is characterized in that described step-size in search equals in the search the pairing change in depth value of pixel-shift vector each time.

5. as multi-vision angle video image deep search method as described in the claim 2, it is characterized in that described step-size in search is determined by current depth value, pixel-shift vector sum camera internal and external parameter.

6. as multi-vision angle video image deep search method as described in the claim 5, it is characterized in that step-size in search is obtained by following formula:

Δz = \frac{{(z b_{3}^{T} P + c_{3}^{T} Δ t_{r})}^{2} {| | Δ P_{r} | |}^{2}}{Δ P_{r}^{T} (b_{3}^{T} P C_{r} Δ t_{r} - c_{3}^{T} Δ t_{r} B_{r} P) - (z b_{3}^{T} P + c_{3}^{T} {Δt}_{r}) (b_{3}^{T} P) {| | Δ P_{r} | |}^{2}}

Wherein: P is a pixel for the treatment of estimation of Depth in the target view, z is the current depth value of pixel P, Δ z is that the change in depth value of pixel P is a step-size in search, and Δ Pr is the pixel-shift vector of the change in depth value Δ z correspondence in reference-view r of pixel P in the target view, ‖ Δ P _r‖ ²=Δ P _r ^TΔ P _r; B _r=A _rR _r ^-1RA ^-1And C _r=A _rR _r ^-1Be 3 * 3 matrix, Δ t _r=t-t _rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R _rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t _rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A _rCamera inner parameter matrix for reference viewing angle; b ₃And c ₃It is respectively matrix B _rAnd C _rThe third line vector.

7. as multi-vision angle video image deep search method as described in the claim 6, it is characterized in that the pixel-shift vector in the described reference-view satisfies the polar curve equation of constraint of target visual angle and reference viewing angle:

Δ P_{r}^{T} (C_{r} Δ t_{r} \times B_{r}) P = 0,

Wherein, P is the pixel in the target view, Δ P _rBe the pixel-shift vector in the reference-view

8. as multi-vision angle video image deep search method as described in the claim 7, it is characterized in that, exist two rightabout each other described pixel-shift vectors to satisfy described polar curve equation of constraint, corresponding respectively depth value augment direction of described 2 pixel-shift vectors and depth value reduce direction; The pairing change in depth value of the offset vector of depth value augment direction reduces the pairing change in depth value of offset vector of direction greater than depth value.

9. as multi-vision angle video image deep search method as described in the claim 6, it is characterized in that, in parallel camera system, square being directly proportional of described change in depth value and current depth value.

10. the depth estimation method of multi-vision angle video image, it is characterized in that, utilize synthetic based on the view of the degree of depth and the deep search of joining based on piece in, the depth range search of target view and step-size in search are by the pixel hunting zone and the decision of pixel search precision of reference-view; In depth range search, the step-size in search in each step is dynamically adjusted according to current depth value, and current depth value is more little, the step-size in search that adopts is more little, current depth value is big more, and the step-size in search of employing is big more, makes the step-size in search in each step corresponding to identical pixel search precision.

11. as multi-vision angle video image deep search method as described in the claim 10, it is characterized in that, the length of pixel-shift vector each time during described pixel search precision equals to search for, described step-size in search equal in the search the pairing change in depth value of pixel-shift vector each time.

12. depth estimation method as multi-vision angle video image as described in the claim 10, it is characterized in that, described view based on the degree of depth synthesizes and is specially based on the deep search that piece is joined, utilize current depth value to carry out view and synthesize, and the error between the block of pixels of the block of pixels of the synthetic view of calculating and reference-view; Adopting the depth value of least error correspondence is the estimation of Depth value of target view.

13. the depth estimation method as multi-vision angle video image as described in the claim 12 is characterized in that, may further comprise the steps:

Deep search initial value z in the step 1 estimating target view _k=0;

Step 4 is upgraded current depth value z _k=z _k+ Δ z; K=k+1;

Step 6 is with error e _k(k=0 ..., N-1, N is the search total step number) in the depth value of least error correspondence be estimated value.

14. the depth estimation method as multi-vision angle video image as described in the claim 13 is characterized in that described error e _kBe the absolute difference or the difference of two squares between the block of pixels of the block of pixels of synthetic view and reference-view.

15. the depth estimation method as multi-vision angle video image as described in the claim 13 is characterized in that, for converging camera system, in the described step 1, with the convergent point place degree of depth that converges camera system as the deep search initial value z in the target view ₀

16. the depth estimation method as multi-vision angle video image as described in the claim 15 is characterized in that the described convergent point that converges camera system obtains by following Solving Linear:

R[0，0，z ₀]+t＝R ₁[0，0，z _r1]+t ₁

R[0，0，z ₀]+t＝R ₂[0，0，z _r2]+t ₂

R[0，0，z ₀]+t＝R _m[0，0，z _rm]+t _m

Z wherein ₀Be the depth value of convergent point in the camera coordinates system of target view, Z _Ri(i=1 ..., be the depth value of convergent point in the camera coordinates system of reference-view i m), m is the number of reference-view.

17. the depth estimation method as multi-vision angle video image as described in the claim 13 is characterized in that, for parallel camera system, and in the described step 1, deep search initial value z ₀The relation that is inversely proportional to by the global disparity and the degree of depth obtains:

z_{0} = \frac{fB}{d}

Wherein, z ₀Be degree of depth initial value, d is a global disparity, and f is the focal length of camera, and B is the base length of camera.

18. the depth estimation method as multi-vision angle video image as described in the claim 17 is characterized in that, described global disparity is the pixel-shift vector of the absolute difference minimum of reference-view after the translation and target view.

19. the depth estimation method as multi-vision angle video image as described in the claim 13 is characterized in that change in depth value Δ z is obtained by following formula:

Δz = \frac{{(z b_{3}^{T} P + c_{3}^{T} Δ t_{r})}^{2} {| | Δ P_{r} | |}^{2}}{Δ P_{r}^{T} (b_{3}^{T} P C_{r} Δ t_{r} - c_{3}^{T} Δ t_{r} B_{r} P) - (z b_{3}^{T} P + c_{3}^{T} {Δt}_{r}) (b_{3}^{T} P) {| | Δ P_{r} | |}^{2}}

Wherein: P is a pixel for the treatment of estimation of Depth in the target view, and z is the current depth value of pixel P, and Δ z is that the change in depth value of pixel P is a step-size in search, Δ P _rBe the change in depth value Δ z of pixel P in the target view corresponding pixel-shift vector in reference-view r, ‖ Δ P _r‖ ²=Δ P _r ^TΔ P _r; B _r=A _rR _r ^-1RA ^-1And C _r=A _rR _r ^-1Be 3 * 3 matrix, Δ t _r=t-t _rIt is tri-vector; Wherein, R is the three-dimensional rotation matrix of the camera coordinates system at target visual angle with respect to world coordinate system; T is the translation vector of the camera coordinates system at target visual angle with respect to world coordinate system; A is the camera inner parameter matrix at target visual angle; R _rBe the three-dimensional rotation matrix of the camera coordinates of reference viewing angle system with respect to world coordinate system; t _rBe the translation vector of the camera coordinates of reference viewing angle system with respect to world coordinate system; A _rCamera inner parameter matrix for reference viewing angle; b ₃And c ₃It is respectively matrix B _rAnd C _rThe third line vector.

20. the depth estimation method as multi-vision angle video image as described in the claim 19 is characterized in that, the pixel-shift vector Δ P in the described reference-view _rSatisfy the polar curve equation of constraint of target visual angle and reference viewing angle:

Δ P_{r}^{T} (C_{r} Δ t_{r} \times B_{r}) P = 0,

21. depth estimation method as multi-vision angle video image as described in the claim 20, it is characterized in that, exist two rightabout each other described pixel-shift vectors to satisfy described polar curve equation of constraint, the corresponding respectively depth value augment direction of described 2 pixel-shift vectors, depth value reduce direction; The pairing change in depth value of the offset vector of depth value augment direction reduces the pairing change in depth value of offset vector of direction corresponding to two rightabout each other pixel-shift vectors greater than depth value.