CN101640809B

CN101640809B - Depth extraction method of merging motion information and geometric information

Info

Publication number: CN101640809B
Application number: CN2009101021536A
Authority: CN
Inventors: 黄晓军; 黄俊钧; 王梁昊; 李东晓; 张明
Original assignee: Zhejiang University ZJU
Current assignee: Wan D display technology (Shenzhen) Co., Ltd.
Priority date: 2009-08-17
Filing date: 2009-08-17
Publication date: 2010-11-03
Anticipated expiration: 2029-08-17
Also published as: CN101640809A

Abstract

The invention discloses a depth extraction method of merging motion information and geometric information, which comprises the following steps: (1) carrying out scene segmentation on each frame of two-dimensional video image, and separating the static background and the dynamic foreground; (2) processing a scene segmentation chart by binaryzation and filtering; (3) generating a geometric depth chart of the static background based on the geometric information; (4) calculating the motion vector of the foreground object, and converting the motion vector into the motion amplitude; (5) linearly transforming the motion amplitude of the foreground object according to the position of the foreground object, and obtaining a motion depth chart; and (6) merging the motion depth chart and the geometric depth chart, and filtering to obtain a final depth chart. The method only calculates the motion vector of the separated dynamic foreground object, thereby eliminating the mismatching points of the background and reducing the amount of calculation. Meanwhile, the motion amplitude of the foreground object is linearly transformed according to the position of the foreground object, the motion amplitude is merged into the background depth, thereby integrally improving the quality of the depth chart.

Description

The depth extraction method of a kind of merging motion information and geological information

Technical field

Depth extraction method when the present invention relates to two-dimensional video conversion 3 D video relates in particular to the depth extraction method of a kind of merging motion information and geological information.

Background technology

Since the TV invention forties in 20th century, experienced the revolution first time from the black-and-white TV to the color TV and the revolution second time from the simulated television to the Digital Television, 3DTV will be the revolution for the third time of TV tech after Digital Television.

The demonstration of 3DTV be unable to do without the making of appropriate content.Conventional two-dimensional video film source can not be applicable to three-dimensional display system, needs to make the 3D content that satisfies the demonstration needs.A kind of method is directly to produce the 3D video with stereoscopic camera, but this method is of a high price.Another kind method is to seek a kind of suitable algorithm original 2D video to be converted to the 3D video that can be used for stereo display.And at present owing to the existence of a large amount of traditional 2D videos, 2D changes the 3D Study on Technology more realistic meanings, and this Study on Technology not only can provide abundant material for stereo display, and can save the cost of content production greatly.

2D changes the 3D technical process and mainly comprises:

(1) utilizes various degree of depth clues, produce the dense depth sequence by original 2D video;

(2) use DIBR (depth image based rendring) technology, utilize the range image sequence of one road 2D video and its correspondence, reconstruct multichannel 2D video;

(3) utilization image composition algorithm synthesizes one road 3D video with the multi-channel video that produces, and is input in the 3D display device to show.

How accurately producing dense depth and be used for the first step that DIBR reconstruct is whole 2D commentaries on classics 3D technical process, also is the core content that 2D changes the 3D technical research.

The researcher searches out many degree of depth clues on the basis of analyzing human physiology and psycho-visual.For example, object of which movement information, scene space geological information, article surface vein and shape information, object image-forming shade, object edge information and the focusing of taking camera with defocus situation.Utilize wherein a certain degree of depth clue can only handle a class 2D video, blanket degree of depth clue does not exist.Yet,,, utilize the movable information of object can produce dense depth effectively in conjunction with the temporal correlation of video because most of videos all have the foreground object of motion.Therefore, object of which movement information in the generation of depth map generally as topmost degree of depth clue.On the other hand, in some specific occasion, at this moment not motion of object in the scene just needs to seek other degree of depth clues and replaces the movable information clue.No matter be not difficult to find, be indoor or outdoor, usually comprised abundant geological information in the scene, as the wall line of outdoor extension to the road in a distant place, indoor rule.Space geometry information has played good supplementary function for object of which movement information in the middle of the generation of depth map, especially in the static occasion of scene.

Present depth map produces algorithm and is mostly to utilize a kind of degree of depth clue to remove to handle a class video, and actual video has comprised multiple degree of depth clue often, and the depth map that utilizes these degree of depth clues to produce width of cloth fusion will effectively improve the quality of depth map.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, merge multiple degree of depth clue, the depth extraction method of a kind of merging motion information and geological information is provided.

The depth extraction method of merging motion information and geological information comprises the steps:

(1) two-dimensional video to be converted is set up the statistics background model,, isolate static background part and sport foreground part by moving object detection;

(2), and carry out medium filtering and mathematical morphology filter to isolating static background part and sport foreground image binaryzation partly;

(3) background image that the statistics background modeling is obtained adopts the method based on scene geometric information to produce geometric depath figure;

(4) by from seeking coupling between the two adjacent in time two field pictures of original two-dimensional video to be converted, obtain the motion vector of sport foreground object, and be converted into motion amplitude;

(5) according to sport foreground part present position, its motion amplitude is done linear transformation, obtain the motion depth map;

(6) fusional movement depth map and geometric depath figure, and carry out gaussian filtering, obtain ultimate depth figure, be used for the expression of 3 D video.

Described two-dimensional video to be converted is set up added up background model, by moving object detection, isolates static background part and sport foreground part steps:

(a) the continuous N two field picture of intercepting in two-dimensional video file to be converted is to this N two field picture I _f(x y) carries out scanning on time and the space, try to achieve corresponding to the average of N pixel value of each pixel coordinate position as a setting image B (x, y) at this locational pixel value, computing formula is as follows:

B (x, y) = \frac{1}{N} Σ_{f = 1}^{N} I_{f} (x, y);

(b) (x is y) with each two field picture I of two-dimensional video to be converted with the background image B that obtains in the step (a) _f(x y) does subtraction, and by comparing to determine foreground point in the image with a pre-set threshold th, representation formula is as follows:

I_{f} (x, y) = \{\begin{matrix} Background & | I_{f} (x, y) - B (x, y) | < th \\ Foreground & otherwise \end{matrix} .

Described to isolating static background part and sport foreground image binaryzation partly, and carry out medium filtering and mathematical morphology filter step:

(c) the front and back scape that utilizes step (b) to make is judged, is each two field picture I in the video _f(x y) produces a width of cloth bianry image, 0 value representation background wherein, and 255 value representation prospects, that is:

I_{f}^{1} (x, y) = \{\begin{matrix} 0 & if & I_{f} (x, y) isBackground \\ 255 & if & I_{f} (x, y) isForeground \end{matrix};

(d) bianry image that step (c) is obtained carries out the medium filtering of 3 * 3 window sizes, eliminates background noise;

(e) image behind the medium filtering that step (d) is obtained carries out typical opening operation and closed operation in the mathematical morphology filter, and the zone of action and the cavity of eliminating small size in the foreground image adopt 3 * 3 squares corrosion unit and expansion unit to finish opening operation and closed operation.

The described background image that the statistics background modeling is obtained adopts the method based on scene geometric information to produce geometric depath figure step:

(f) the luminance component B of the background image that step (a) is obtained _y(x y) carries out rim detection with the Sobel operator, obtains horizontal gradient figure S _x(x is y) with vertical gradient map S _y(x, y), with this two width of cloth figure addition obtain gradient map S (x, y), Th compares with threshold value, obtains the background edge detection figure of binaryzation, choosing according to following formula of threshold value Th calculated:

Th＝α[S(x，y) _max-S(x，y) _min]+S(x，y) _min

Wherein, α is the weight coefficient of value between 0～1, and S (x, y) _MaxBe the gradient map max pixel value, and S (x, y) _MinIt is the gradient map minimum pixel value;

(g) adopt the binaryzation edge detection graph extraction main straight line wherein that classical H ough transfer pair step (f) obtains in the image processing, its result and original binaryzation edge detection graph are done AND-operation, extract the line that goes out in the background, the mid point in the zone of the line intersection point probability of occurrence maximum of going out is as vanishing point;

(h) with the vanishing point that obtains in the step (g) as a setting in degree of depth deepest point, deepen gradually with differential 2 speed toward the vanishing point direction degree of depth along the line that goes out, obtain background image geometric depath figure G (x, y).

Described by mating from seeking between the two adjacent in time two field pictures of original two-dimensional video to be converted, obtain the motion vector of sport foreground object, and be converted into the motion amplitude step:

(i) the current time picture frame I in the scanning two-dimensional video to be converted _f(x, y), the front and back scape of the filtering that obtains according to step (e) separates bianry image, judge if current pixel point is the foreground point, then be its previous moment picture frame I at current time _F-1(x y) goes up searching optimum Match pixel, adopts the method for calculating the coupling cost in the W * W neighborhood window that with the current pixel point is the center, to improve the accuracy of mating, establishes match search scope S _{N * N}Size is N * N, u and v are respectively horizontal offset and the vertical offset of pixel on the previous moment picture frame on the present frame when seeking coupling, i and j are respectively to be the interior horizontal offset and the vertical offset of W * W neighborhood at center with the current pixel point, and definition coupling cost is as follows:

C (x, y; u, v) = Σ_{i = - (w - 1) / 2}^{(w - 1) / 2} Σ_{j = - (w - 1) / 2}^{(w - 1) / 2} | I_{f} (x + i, y + i) - I_{f - 1} (x + u + i, y + v + j) |

(x,y)∈foreground,?(u,v)∈S _N×N，F=1，2，3......

Each pixel in the traversal current pixel point hunting zone calculates and mates cost accordingly, finds out the horizontal offset and the vertical offset that wherein have the smallest match cost, with the motion vector as current pixel point, horizontal MV _x, vertical MV _y, formulate is as follows:

C _min(x，y；MV _x，MV _y)＝Min[C(x，y；u，v)]；

(j) the motion vector horizontal direction component of establishing each the foreground pixel point that obtains in the step (i) is MV _x(x, y), the vertical direction component is MV _y(x, y), motion amplitude is defined as:

F (x, y) = \sqrt{{MV}_{x}^{2} (x, y) + {MV}_{y}^{2} (x, y)};

Describedly its motion amplitude is done linear transformation, obtains motion depth map step according to sport foreground part present position:

(k) motion amplitude that obtains in the step (j) is done linear transformation and downward rounding operation, the span of each foreground pixel point of guaranteeing the motion depth map is at [a, b], and is integer, and the linear transformation formula is as follows:

Wherein, linear transformation lower limit a and upper limit b value are all between 0～255, and a gets the minimum depth value of the background parts geometric depath figure that sport foreground shelters from, and b gets the depth value of the corresponding background parts geometric depath of sport foreground part minimum point figure.

Described fusional movement depth map and geometric depath figure, and carry out gaussian filtering, obtain ultimate depth figure step:

(l) the front and back scape of the filtering that obtains according to step (e) separate bianry image A (x, y), the motion depth map M (x that step (k) is obtained, y) and the geometric depath figure G that obtains of step (h) (x y) merges, and obtains merging depth map D (x, y), fusion formula is defined as follows:

D (x, y) = \{\begin{matrix} M (x, y) & A (x, y) = 255 \\ G (x, y) & A (x, y) = 0 \end{matrix};

(m) the fusion depth map that step (l) is obtained carries out gaussian filtering, obtains ultimate depth figure, is used for the expression of 3 D video.

The present invention is applicable to that the video file to the uncompressed of dynamic prospect and static background produces depth map.Utilized the object of which movement information to produce the method for depth map in the past, a kind of piece that is based on, there is blocking effect in object edge in the depth map of generation; Another kind is based on pixel, though the more block-based depth map of object edge is level and smooth, but the slight change of illumination condition all can bring strong background noise, the present invention in conjunction with moving object detection with based on the Pixel-level depth map generating technique of object of which movement information, level and smooth preferably object edge, suppressed background noise simultaneously, and, significantly reduced amount of calculation only to sport foreground object calculating kinematical vector.On the other hand, the present invention has overcome the deficiency of only utilizing single degree of depth clue to produce depth map in the conventional art, and fusion movable information and scene geometric information produce depth map simultaneously, have enlarged the scope of application, have improved the quality of depth map.

Description of drawings

Fig. 1 merges the FB(flow block) that depth map produces;

Some schematic diagrames when Fig. 2 (a) is the Hough conversion in the image space;

A sine curve schematic diagram when Fig. 2 (b) is the Hough conversion in the feature space;

Straight line schematic diagram when Fig. 2 (c) is the Hough conversion in the image space;

The schematic diagram that many sine curves when Fig. 2 (d) is the Hough conversion in the feature space intersect at a point;

Fig. 3 is the schematic diagram of estimation when producing calculating kinematical vector;

Fig. 4 is the sectional drawing of Hall Monitor video;

Fig. 5 is the background image that obtains from the modeling of Hall Monitor video;

Fig. 6 be after the filtering of Fig. 4 video interception correspondence before and after the scape binary map of separating;

Fig. 7 is the geometric depath figure of Fig. 5 background image;

Fig. 8 is the motion depth map of Fig. 4 video interception correspondence;

Fig. 9 is the ultimate depth figure that Fig. 7 geometric depath figure and Fig. 8 motion depth map merge and the process gaussian filtering is handled.

Embodiment

The depth extraction method of merging motion information and geological information comprises the steps (overall flow figure is as shown in Figure 1):

B (x, y) = \frac{1}{N} Σ_{f = 1}^{N} I_{f} (x, y);

I_{f} (x, y) = \{\begin{matrix} Background & | I_{f} (x, y) - B (x, y) | < th \\ Foreground & otherwise \end{matrix} .

I_{f}^{1} (x, y) = \{\begin{matrix} 0 & if & I_{f} (x, y) isBackground \\ 255 & if & I_{f} (x, y) isForeground \end{matrix};

Sliding window that contains odd number point of the general employing of the medium filtering of image, replace the gray value of window mid point with the Mesophyticum of each point gray value in the window, adopt the window of 3 * 3 sizes, though bigger filter window can more effectively get filtering noise, but can bring level and smooth excessively, the moving region detail content is disappeared, bring difficulty for following processing.

Th＝α·[S(x，y) _max-S(x，y) _min]+S(x，y) _min

α is more little, and edge of image and details are clear more, otherwise then edge of image and details are just fuzzy more.Help more the going out extraction of line that edge of image is clear more, but the extraction of the clear line that then is unfavorable for going out of details.

The principle of Hough conversion is exactly to utilize a little-symmetry between the line in essence.A point on the image space (Fig. 2 (a)) corresponding feature space (r, the θ) sine curve on (Fig. 2 (b)), and image space (x, y) straight line on (Fig. 2 (c)) corresponding feature space (r, θ) point on (Fig. 2 (d)).Straight line can be regarded set a little as, and therefore, the straight line correspondence on the image space the cluster sine curve on the feature space, and the coordinate of the intersection point of these curves is exactly the characteristic quantity of line correspondence in the image space.

When realizing, meet at the bar number of the curve of same point in the accumulative total feature space,, aggregate-value thinks the straight line that the intersecting point coordinate that has available characteristic of correspondence space in the image space characterizes if surpassing threshold value.

(i) the current time picture frame I in the scanning two-dimensional video to be converted _f(x, y), the front and back scape of the filtering that obtains according to step (e) separates bianry image, judge if current pixel point is the foreground point, then be its previous moment picture frame I at current time _F-1(x y) goes up searching optimum Match pixel, adopts the method for calculating the coupling cost in the W * W neighborhood window that with the current pixel point is the center, to improve the accuracy of mating, as shown in Figure 3, establishes match search scope S _{N * N}Size is N * N, u and v are respectively horizontal offset and the vertical offset of pixel on the previous moment picture frame on the present frame when seeking coupling, i and j are respectively to be the interior horizontal offset and the vertical offset of W * W neighborhood at center with the current pixel point, and definition coupling cost is as follows:

C (x, y; u, v) = Σ_{i = - (w - 1) / 2}^{(w - 1) / 2} Σ_{j = - (w - 1) / 2}^{(w - 1) / 2} | I_{f} (x + i, y + i) - I_{f - 1} (x + u + i, y + v + j) |

(x,y)∈foreground，(u,v)∈S _N×N，f=1，2，3......

C _min(x，y；MV _x，MV _y)＝Min[C(x，y；u，v)]；

F (x, y) = \sqrt{{MV}_{x}^{2} (x, y) + {MV}_{y}^{2} (x, y)};

Because in three dimensions, has same movement speed and motion amplitude that the object of different depth shows and inequality on two dimensional image plane, the object of which movement amplitude that the degree of depth is little is big, otherwise then motion amplitude is little, so motion amplitude can be used for describing the degree of depth of object.The motion depth map is a gray-scale map, and therefore the motion amplitude that first calculated must be obtained is done linear transformation and rounded operation downwards, just can be used for describing the object degree of depth.In addition, in order finally better the motion depth map to be incorporated among the geometric depath figure, the bound of linear transformation scope is taken from the background depth value around the sport foreground object.

D (x, y) = \{\begin{matrix} M (x, y) & A (x, y) = 255 \\ G (x, y) & A (x, y) = 0 \end{matrix};

During utilization DIBR technology reconstruct virtual view, require the depth map of input smoother, therefore, need carry out gaussian filtering one time the fusion depth map that fusional movement depth map and geometric depath figure obtain.

Embodiment:

(1) be that 352 * 288 Hall Monitor test code streams is as the 2D video file of waiting to produce depth map with image resolution ratio.Fig. 4 is the sectional drawing of Hall Monitor video.

(2) set up the statistics background model, produce background image.Fig. 5 is the background image that the modeling of Hall Monitor video obtains.

(3) by comparing, detect the sport foreground of Hall Monitor video with background image, prospect and background that binaryzation is separated, and carry out medium filtering and mathematical morphology filter.The binary map that scape separated before and after Fig. 6 was after the filtering of Fig. 4 video interception correspondence;

(4) background image that background modeling is obtained adopts the method based on scene geometric information to produce geometric depath figure.Fig. 7 is the geometric depath figure of Fig. 5 background image.

(5) ask for the motion vector of Hall Monitor video file prospect part, and be converted into motion amplitude,, its motion amplitude is done linear transformation, obtain the motion depth map according to the foreground object present position.Fig. 8 is the motion depth map of Fig. 4 video interception correspondence.

(6) fusional movement depth map and geometric depath figure, and carry out gaussian filtering, obtain ultimate depth figure, be used for the expression of 3 D video.Fig. 9 is Fig. 7 geometric depath figure and Fig. 8 motion depth map merges and the ultimate depth figure of process gaussian filtering processing.

Claims

1. the depth extraction method of merging motion information and geological information is characterized in that comprising the steps:

(6) fusional movement depth map and geometric depath figure, and carry out gaussian filtering, obtain ultimate depth figure, be used for the expression of 3 D video;

Wherein,

Described two-dimensional video to be converted is set up the statistics background model,, isolates static background part and sport foreground part steps and be by moving object detection:

B (x, y) = \frac{1}{N} Σ_{f = 1}^{N} I_{f} (x, y);

I_{f} = \{\begin{matrix} Background & | I_{f} (x, y) - B (x, y) | < th \\ Foreground & otherwise \end{matrix};

Described to isolating static background part and sport foreground image binaryzation partly, and carry out medium filtering and the mathematical morphology filter step is:

I_{f}^{1} (x, y) = \{\begin{matrix} 0 & if & I_{f} (x, y) is Backaround \\ 255 & if & I_{f} (x, y) is Foreground \end{matrix};

(e) image behind the medium filtering that step (d) is obtained carries out typical opening operation and closed operation in the mathematical morphology filter, and the zone of action and the cavity of eliminating small size in the foreground image adopt 3 * 3 squares corrosion unit and expansion unit to finish opening operation and closed operation;

The described background image that the statistics background modeling is obtained adopts the method generation geometric depath figure step based on scene geometric information to be:

Th＝α·[S(x，y) _max-S(x，y) _min]+S(x，y) _min

(h) with the vanishing point that obtains in the step (g) as a setting in degree of depth deepest point, deepen gradually with differential 2 speed toward the vanishing point direction degree of depth along the line that goes out, obtain background image geometric depath figure G (x, y);

Describedly obtain the motion vector of sport foreground object, and be converted into the motion amplitude step and be by from seeking coupling between the two adjacent in time two field pictures of original two-dimensional video to be converted:

C (x, y; u, v) = Σ_{i = - (w - 1) / 2}^{(w - 1) / 2} Σ_{j = - (w - 1) / 2}^{(w - 1) / 2} | I_{f} (x + i, y + i) - I_{f - 1} (x + u + i, y + v + j) |

(x，y)∈foreground，(u，v)∈S _N×N，f＝1，2，3……

C _min(x，y；MV _x，MV _y)＝Min[C(x，y；u，v)]；

F (x, y) = \sqrt{{MV}_{x}^{2} (x, y) + {MV}_{y}^{2} (x, y)};

Describedly its motion amplitude is done linear transformation, obtains motion depth map step and be according to sport foreground part present position:

Wherein, linear transformation lower limit a and upper limit b value are all between 0～255, and a gets the minimum depth value of the background parts geometric depath figure that sport foreground shelters from, and b gets the depth value of the corresponding background parts geometric depath of sport foreground part minimum point figure;

Be described fusional movement depth map and geometric depath figure, and carry out gaussian filtering, obtain ultimate depth figure step and be:

D (x, y) = \{\begin{matrix} M (x, y) & A (x, y) = 255 \\ G (x, y) & A (x, y) = 0 \end{matrix} :

(m) the fusion depth map that step (1) is obtained carries out gaussian filtering, obtains ultimate depth figure, is used for the expression of 3 D video.