CN102271268A

CN102271268A - Multi-viewpoint three-dimensional video depth sequence generation method and device

Info

Publication number: CN102271268A
Application number: CN2011102274356A
Authority: CN
Inventors: 季向阳; 刘琼; 戴琼海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2011-08-09
Filing date: 2011-08-09
Publication date: 2011-12-07
Anticipated expiration: 2031-08-09
Also published as: CN102271268B

Abstract

The invention discloses a multi-viewpoint three-dimensional video depth sequence generation method and a multi-viewpoint three-dimensional video depth sequence generation device. The method comprises the following steps of: providing a multi-viewpoint three-dimensional video sequence, wherein the number of viewpoints of the multi-viewpoint three-dimensional video sequence is N which is an integer not less than 2; constructing a Bayes model of the multi-viewpoint three-dimensional video sequence according to the multi-viewpoint three-dimensional video sequence, and determining depth acquisition ways for each pixel in images of each frame according to the model; performing depth endowing on corresponding pixels according to the depth acquisition ways to obtain depth images of the images of each frame, wherein the depth acquisition ways comprise time domain prediction, inter-viewpoint prediction and a depth computing method; and integrating all the depth images according to each viewpoint of the sequence and image acquisition time to obtain a depth sequence. By the method provided by the embodiment of the invention, a high-accuracy depth graph can be obtained, computational complexity is low and computational efficiency is effectively improved; and the device has a simple structure and is easy to implement.

Description

The depth sequence generation method of multi-viewpoint three-dimensional video and device

Technical field

The present invention relates to computer vision and technical field of video processing, particularly a kind of depth sequence generation method of multi-viewpoint three-dimensional video and device.

Background technology

The multi-viewpoint three-dimensional video is the research focus of ambits such as current computer vision, graphics, image/video processing, is widely used in fields such as production of film and TV, historical relic's protection, military simulation.The multi-viewpoint three-dimensional video data is made of multidimensional information such as color, texture, motion, the degree of depth.The wherein existence of depth information makes the multi-viewpoint three-dimensional video visually have the strong sense of reality and feeling of immersion.But obtaining of dynamic scene high accuracy dense depth map needs great computational resource at present, limited the extensive use and the development that need the multi-viewpoint three-dimensional of depth information video system.Therefore, how obtaining the dense depth map sequence efficiently, is a key issue that needs to be resolved hurrily.

The process of disparity estimation is the process of three-dimensional coupling, is the prerequisite of carrying out multi-viewpoint three-dimensional video depth calculation.Solid matching method can be summed up as two classes at present: based on the partial approach of window and the global approach that utilizes energy function to find the solution.Characteristics based on the partial approach of window are that computation complexity is low, carry out the efficient height, but the parallactic error that obtains are bigger, and matching precision is lower, is difficult to satisfy the practical application that those are had relatively high expectations to matching precision; Then owing to can obtain comparatively accurate parallax value, therefore be subjected to attention more and more widely based on the global approach of energy.Based on the energy function in the global approach of energy generally is following two parts energy sum: at first be data item, be used for measuring the correspondence between the single primitive; Next is level and smooth, is used for retraining the jump of parallax between adjacent primitive.After having constructed energy function, can utilize optimized Algorithm to minimize again or maximize energy function and try to achieve the optimal solution of parallax.Using the most successful optimized Algorithm at present in the solid coupling is that figure cuts (GC, Graph Cut) and puts letter transmission (BP, Belief Propagation) algorithm.

In order to obtain high-precision multi-viewpoint three-dimensional video degree of depth sequence, existing method adopts the method for global optimization to come to obtain optimal solution by iteration repeatedly usually, and still, existing method is the computation complexity height not only, and to the hardware requirement height, brings the raising of hardware cost.

Summary of the invention

The present invention is intended to one of solve the problems of the technologies described above at least.

For this reason, one object of the present invention is to propose a kind of depth sequence generation method of multi-viewpoint three-dimensional video, and this method is by the effective anticipation to depth information, thereby the reduction amount of calculation improves computational efficiency.In addition, this method is little to calculating shared resource consumption, and is low to hardware requirement, saves hardware cost.

Another object of the present invention is to propose a kind of degree of depth sequence generator of multi-viewpoint three-dimensional video, this device is by the effective anticipation to depth information, thus the reduction amount of calculation, improve computational efficiency, and little to calculating shared resource consumption, low to hardware requirement, save hardware cost.In addition, this apparatus structure is simple, is easy to realize.

To achieve these goals, the depth sequence generation method of the multi-viewpoint three-dimensional video that first aspect present invention embodiment proposes may further comprise the steps: provide the multi-viewpoint three-dimensional video sequence, wherein, the quantity of the viewpoint of described multi-viewpoint three-dimensional video sequence is N, and N is not less than 2 integer; Construct the Bayesian model of described multi-viewpoint three-dimensional video sequence according to described multi-viewpoint three-dimensional video sequence, and determine in the described multi-viewpoint three-dimensional video sequence degree of depth obtain manner of each pixel in each two field picture according to described Bayesian model; According to described degree of depth obtain manner corresponding pixel is carried out degree of depth assignment to obtain the depth image of described each two field picture, wherein, described degree of depth obtain manner comprises time domain prediction method, interview prediction method and depth computing method; And according to each viewpoint of described multi-viewpoint three-dimensional video sequence and the acquisition time of described each two field picture whole described depth images are integrated, to obtain the degree of depth sequence corresponding with described multi-viewpoint three-dimensional video sequence.

Depth sequence generation method according to the multi-viewpoint three-dimensional video of the embodiment of the invention, provide correlation information between time domain and viewpoint by the color video information in the multi-viewpoint three-dimensional video sequence, dynamically update in conjunction with the Bayesian probability model then, depth information is carried out effective anticipation, thereby can reduce the iterations of global optimization, reduce computation complexity.In addition, the depth map that the method for the application embodiment of the invention obtains, can be under the situation that guarantees degree of depth sequence numerical precision, obviously reduce computation complexity, make computing significantly reduction consuming time, improve computational efficiency, and this method take computational resource compare with conventional method less, therefore, to hardware require lowly, and then reduce hardware cost.

In addition, the depth sequence generation method according to multi-viewpoint three-dimensional video of the present invention can also have following additional technical characterictic:

In one embodiment of the invention, described Bayesian model comprises: time domain Bayesian model, described time domain Bayesian model are to set up according to the acquisition time of described each two field picture under the same viewpoint of described multi-viewpoint three-dimensional video sequence; Bayesian model between viewpoint, Bayesian model is to set up according to the relation of described multi-viewpoint three-dimensional video sequence between N image of synchronization, different points of view between described viewpoint.

According to one embodiment of present invention, described time domain Bayesian model is as follows:

Formula 1,

Formula 2,

Wherein, T is the pixel I among the frame I of the described multi-viewpoint three-dimensional video sequence of expression _t(P (T) is the prior probability that incident T takes place for x, the probability event that depth value y) can obtain by described time domain prediction method,

For representing described I _t(x, the probability untoward incidents part of the described probability event T that depth value y) can not obtain by described time domain prediction method,

Presentation of events

The prior probability that takes place, f _t(x y) is pixel I _t(wherein, described (x y) is pixel I for x, temporal signatures parameter y) _t(described t is the acquisition time of frame I, P (f for x, the y) coordinate in described frame I _t(x, y) | f when T) taking place for incident T _t(x, probability distribution y), Be incident

Ft during generation (x, probability distribution y);

Bayesian model is as follows between described viewpoint:

Formula 3,

Formula 4,

Wherein, S is expression I _t(x, depth value y) can obtain by described interview prediction method, For representing described I _t(x, the probability untoward incidents part of the described probability event S that depth value y) can not obtain by described interview prediction method, f _s(x y) is pixel I _t(x, characteristic parameter between viewpoint y), wherein, the prior probability that P (S) occurs for incident T.

For representing described I _t(x, the probability untoward incidents part of the described probability event S that depth value y) can not obtain by described time domain prediction method,

Presentation of events

The prior probability that takes place.P (f _s(x, y) | f when S) occurring for incident S _s(x, probability distribution y), Be incident

F during generation _s(x, probability distribution y).

According to one embodiment of present invention, described pixel I _t(x, temporal signatures parameter f y) _t(x, computing formula y) is as follows:

Formula 5,

Wherein,

Be frame

Middle coordinate is (x, pixel y).

According to one embodiment of present invention, described pixel I _t(x, characteristic parameter f between viewpoint y) _s(x, y) calculate in the following way:

Formula 6,

Wherein, B _tBe with

For center and size are the element blocks of (2m+1) λ (2m+1), wherein m is the positive integer more than or equal to 1, and σ () is the standard deviation operator,

Pixel I for current view point _t(x is y) with adjacent viewpoint corresponding pixel points I ' _t(x, absolute difference y),

Wherein, described element blocks B _tMiddle element number and I _tIn so that (x, y) identical for the number of pixels that piece comprised of (2m+1) λ (2m+1) for center and size, and described element is corresponding one by one with described pixel, the value of each element is the parallax value of its corresponding pixel points.

According to one embodiment of present invention, judge according to formula 1 and formula 2 whether following formula is correct, wherein, described formula is:

Formula 7,

If judgment formula

For very, then described pixel I _t(x, depth value y) adopt the time domain prediction method to calculate;

Judge according to formula 3 and formula 4 whether following formula is correct, wherein, described formula is:

Formula 8,

If judgment formula

For very, then described pixel I _t(x, depth value y) adopt the interview prediction method to calculate.Like this,, can reduce the iterations of global optimization approach effectively, further reduce computation complexity by prejudging the depth calculation mode that obtains.

According to one embodiment of present invention, if judgement formula 7 and formula 8 all are false, then described pixel I _t(x, depth value y) adopt depth computing method to obtain.Like this,, can reduce the iterations of global optimization approach effectively, further reduce computation complexity by prejudging the depth calculation mode that obtains.

According to one embodiment of present invention, described pixel I _t(x, depth value y) adopt the time domain prediction method to calculate, and further comprise: at I _T-1Obtain in the frame and I _t(x, y) Dui Ying pixel I _T-1(x ', y '); With described pixel I _T-1The depth value of (x ', y ') is given pixel I _t(x, y).

According to one embodiment of present invention, described pixel I _t(x, y) depth value adopt the interview prediction method to calculate, and further comprise: with frame I _tFrame I ' with the moment and different points of view _tIn obtain and I _t(x, y) Dui Ying pixel I ' _t(x ', y '); With described pixel I ' _tThe depth value of (x ', y ') is given described pixel I _T-1(x ', y ').

In one embodiment of the invention, described depth computing method comprises: image segmentation algorithm Graph Cut, conviction Law of Communication BeliefPropagation and dynamic programming Dynamic Program one of at least.

Degree of depth sequence generator according to the multi-viewpoint three-dimensional video of second aspect present invention embodiment, comprise: the image extraction module, described image extraction module is used to provide the multi-viewpoint three-dimensional video sequence, wherein, the quantity of the viewpoint of described multi-viewpoint three-dimensional video sequence is N, and N is not less than 2 integer; Judge module, described judge module is used for constructing according to described multi-viewpoint three-dimensional video sequence the Bayesian model of described multi-viewpoint three-dimensional video sequence, and determines in the described multi-viewpoint three-dimensional video sequence degree of depth obtain manner of each pixel in each two field picture according to described Bayesian model; Computing module, described computing module is used for according to described degree of depth obtain manner corresponding pixel being carried out degree of depth assignment to obtain the depth image of described each two field picture, wherein, described degree of depth obtain manner comprises time domain prediction method, interview prediction method and depth computing method; And integrate module, described integrate module is used for according to the acquisition time of each viewpoint of described multi-viewpoint three-dimensional video sequence and described each two field picture whole described depth images being integrated, to obtain the degree of depth sequence corresponding with described multi-viewpoint three-dimensional video sequence.

Degree of depth sequence generator according to the multi-viewpoint three-dimensional video of the embodiment of the invention, provide correlation information between time domain and viewpoint by the color video information in the multi-viewpoint three-dimensional video sequence, dynamically update in conjunction with the Bayesian probability model then, depth information is carried out effective anticipation, thereby can reduce the iterations of global optimization, reduce computation complexity.In addition, the depth map that device by the embodiment of the invention obtains, can be under the situation that guarantees degree of depth sequence numerical precision, obviously reduce computation complexity, make computing significantly reduction consuming time, improve computational efficiency, and this device take computational resource compare with conventional method less, therefore, to hardware require lowly, and then reduce hardware cost.Moreover this apparatus structure is simple, is easy to realize.

In addition, the degree of depth sequence generator according to multi-viewpoint three-dimensional video of the present invention can also have following additional technical characterictic

According to one embodiment of present invention, described time domain Bayesian model is:

Formula 1,

Formula 2,

For representing described I _t(x, the probability untoward incidents part of the described probability event T that depth value y) can not obtain by described time domain prediction method, Presentation of events The prior probability that takes place, f _t(x y) is pixel I _t(wherein, described (x y) is pixel I for x, temporal signatures parameter y) _t(described t is the acquisition time of frame I, P (f for x, the y) coordinate in described frame I _t(x, y) | f when T) taking place for incident T _t(x, probability distribution y), Be incident

F during generation _t(x, probability distribution y);

Bayesian model is between described viewpoint:

Formula 3,

Formula 4,

Wherein, S is expression I _t(x, depth value y) can obtain by described interview prediction method,

For representing described I _t(x, the probability untoward incidents part of the described probability event S that depth value y) can not obtain by described interview prediction method, f _s(x y) is pixel I _t(x, characteristic parameter between viewpoint y), wherein, the prior probability that P (S) occurs for incident T.

Presentation of events The prior probability that takes place.P (f _s(x, y) | f when S) occurring for incident S _s(x, probability distribution y),

Be incident

F during generation _s(x, probability distribution y).

Formula 5,

Wherein,

Be frame

Middle coordinate is (x, pixel y);

Described pixel I _t(x, characteristic parameter f between viewpoint y) _s(x, y) calculate in the following way:

Formula 6,

Wherein, B _tBe with

Described element blocks B _tMiddle element number and I _tIn so that (x, y) identical for the number of pixels that piece comprised of (2m+1) λ (2m+1) for center and size, and described element is corresponding one by one with described pixel, the value of each element is the parallax value of its corresponding pixel points.

Formula 7,

If judgment formula

Formula 8,

If judgment formula

For very, then described pixel I _t(x, depth value y) adopt the interview prediction method to calculate.

According to one embodiment of present invention, if judgement formula 7 and formula 8 all are false, then described pixel I _t(x, depth value y) adopt depth computing method to obtain.

According to one embodiment of present invention, described pixel I _t(x, depth value y) adopt the calculating of time domain prediction method to pass through at I _T-1Obtain in the frame and I _t(x, y) Dui Ying pixel I _T-1(x ', y '), and with described pixel I _T-1The depth value of (x ', y ') is given pixel I _t(x, y), described pixel I _t(x, y) depth value adopt the interview prediction method calculate by with frame I _tFrame I ' with the moment and different points of view _tIn obtain and I _t(x, y) Dui Ying pixel I ' _t(x ', y '), and with described pixel I ' _tThe depth value of (x ', y ') is given described pixel I _T-1(x ', y ').

In one embodiment of the present of invention, described computing module is by depth computing method compute depth value, wherein, described depth computing method comprises: image segmentation algorithm Graph Cut, conviction Law of Communication Belief Propagation and dynamic programming Dynamic Program one of at least.

Additional aspect of the present invention and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.

Description of drawings

Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment in conjunction with following accompanying drawing, wherein:

Fig. 1 is the flow chart of depth sequence generation method of the multi-viewpoint three-dimensional video of the embodiment of the invention;

Fig. 2 A-2B is respectively the comparison diagram of using conventional method and using the depth map that the one embodiment of the invention method obtains;

Fig. 3 is the time comparison diagram that two kinds of methods of Fig. 2 A-2B are calculated;

Fig. 4 A-4B is respectively the comparison diagram of using conventional method and using the depth map that another embodiment of the present invention method obtains;

Fig. 5 is the time comparison diagram that two kinds of methods of Fig. 4 A-4B are calculated; And

Fig. 6 is the structure chart of degree of depth sequence generator of the multi-viewpoint three-dimensional video of the embodiment of the invention.

Embodiment

Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical from start to finish or similar label is represented identical or similar elements or the element with identical or similar functions.Below by the embodiment that is described with reference to the drawings is exemplary, only is used to explain the present invention, and can not be interpreted as limitation of the present invention.

In description of the invention, it will be appreciated that, term " " center "; " vertically "; " laterally "; " on "; D score; " preceding ", " back ", " left side ", " right side ", " vertically ", " level ", " top ", " end ", " interior ", close the orientation of indications such as " outward " or position is based on orientation shown in the drawings or position relation, only be that the present invention for convenience of description and simplification are described, rather than the device or the element of indication or hint indication must have specific orientation, therefore orientation structure and operation with specific can not be interpreted as limitation of the present invention.In addition, term " first ", " second " only are used to describe purpose, and can not be interpreted as indication or hint relative importance.

In description of the invention, need to prove that unless clear and definite regulation and qualification are arranged in addition, term " installation ", " linking to each other ", " connection " should be done broad understanding, for example, can be fixedly connected, also can be to removably connect, or connect integratedly; Can be mechanical connection, also can be to be electrically connected; Can be directly to link to each other, also can link to each other indirectly by intermediary, can be the connection of two element internals.For the ordinary skill in the art, can concrete condition understand above-mentioned term concrete implication in the present invention.

Need to prove, the described method and apparatus of the embodiment of the invention is the depth map that calculates one of them two field picture in the various visual angles stereoscopic video sequence by existing depth computing method, like this, the method and apparatus of the embodiment of the invention can be the ID image with this frame depth image, thus, provide necessary condition for computational methods between time domain computational methods involved among the present invention and viewpoint.

Below in conjunction with accompanying drawing 1-5 depth sequence generation method according to the multi-viewpoint three-dimensional video of the embodiment of the invention is described at first.

As shown in Figure 1, the depth sequence generation method according to the multi-viewpoint three-dimensional video of the embodiment of the invention comprises the steps:

Step S101 provides the multi-viewpoint three-dimensional video sequence, and wherein, the quantity of the viewpoint of described multi-viewpoint three-dimensional video sequence is N, and N is not less than 2 integer.For example, the multi-viewpoint three-dimensional video sequence can be gathered gained by N video camera, a N camera, perhaps obtains by the computer specific purpose tool.For guaranteeing that multi-view point video can supply three-dimensional watching, it is gathered the viewpoint number and should be 2 at least.

Step S102, construct the Bayesian model of described multi-viewpoint three-dimensional video sequence according to described multi-viewpoint three-dimensional video sequence, and determine in the described multi-viewpoint three-dimensional video sequence degree of depth obtain manner of each pixel in each two field picture according to described Bayesian model.

In examples more of the present invention, described Bayesian model comprises: Bayesian model between time domain Bayesian model and viewpoint.

Wherein, the time domain Bayesian model is to set up according to the acquisition time of each two field picture under the same viewpoint of described multi-viewpoint three-dimensional video sequence.Same viewpoint for by same video camera or camera collection to the continuous images sequence, for example 150-180 is an image sequence.

Correspondingly, in another embodiment of the present invention, for example, setting up of time domain Bayesian model is as follows:

Formula 1,

Formula 2,

For formula 1 and formula 2, the implication of each letter and/or symbol is as follows:

T is the pixel I among the frame I of the described multi-viewpoint three-dimensional video sequence of expression _t(P (T) is the prior probability that incident T takes place for x, the probability event that depth value y) can obtain by described time domain prediction method,

Presentation of events

The prior probability that takes place, f _t(x y) is pixel I _t(wherein, described (x y) is pixel I for x, temporal signatures parameter y) _t(described t is the acquisition time of frame I, P (f for x, the y) coordinate in described frame I _t(x, y) | f when T) taking place for incident T _t(x, probability distribution y),

Be incident

F during generation _t(x, probability distribution y).

Bayesian model between viewpoint, Bayesian model is to set up according to the relation of described multi-viewpoint three-dimensional video sequence between N image of synchronization, different points of view between described viewpoint.The multi-viewpoint three-dimensional video sequence for by N camera collection to, therefore, synchronization for a scene, can be gathered the image of N viewpoint.

Correspondingly, for example the foundation of Bayesian model can be in the following way between viewpoint:

Formula 3,

Formula 4,

S is expression I _t(x, depth value y) can obtain by described interview prediction method,

Be incident

F during generation _s(x, probability distribution y).

Thus, by setting up between viewpoint and the time domain Bayesian model, can predict P (T|f _t(x, y)) and P (S|f _sThe probability of happening of (x, y)) distributes.

In addition, in some embodiments of the invention, described pixel I _t(x, temporal signatures parameter f y) _t(x, can obtain in the following way y):

Formula 5,

Wherein,

Be frame

Middle coordinate is (x, pixel y).But of the present invention is that strength is not limited to this, can also calculate by alternate manner.

Moreover, described pixel I _t(x, characteristic parameter f between viewpoint y) _s(x, y) can calculate acquisition in the following way:

Formula 6,

Wherein, B _tBe with For center and size are the element blocks of (2m+1) λ (2m+1), wherein m is the positive integer more than or equal to 1, and σ () is the standard deviation operator,

Pixel I for current view point _t(x is y) with adjacent viewpoint corresponding pixel points I ' _t(x, absolute difference y), described element blocks B _tMiddle element number and I _tIn so that (x, y) identical for the number of pixels that piece comprised of (2m+1) λ (2m+1) for center and size, and described element is corresponding one by one with described pixel, the value of each element is the depth value of its corresponding pixel points.Like this,, calculate the probability of happening of different event, select the bigger incident of probability then, can effectively evade the erroneous judgement risk, reduce error rate as the result of decision according to the characteristic parameter that calculates.Particularly, as described in the following step.

Step S103 carries out degree of depth assignment to obtain the depth image of described each two field picture according to described degree of depth obtain manner to corresponding pixel, and wherein, described degree of depth obtain manner comprises time domain prediction method, interview prediction method and depth computing method.

As a concrete example, at first utilize the computational methods of the foregoing description Chinese style 1, formula 2,5 pairs of depth values of formula to judge, for example: if judgment formula

(formula 7) is correct, then described pixel I _t(x, depth value y) adopt the time domain prediction method to calculate.

If above-mentioned formula 7 is false, then continue to utilize the computational methods of formula 3 in the foregoing description, formula 4,6 pairs of depth values of formula to judge once more, specific as follows:

Judgment formula

Whether (formula 8) is true, if be true, then judges described pixel I _t(x, depth value y) adopt the interview prediction method to calculate.

Otherwise, if the judged result of formula 7 and formula 8 all is false, so will be according to existing depth computing method compute depth value.

Further, depth computing method for example can adopt that image segmentation algorithm Graph Cut, conviction Law of Communication BeliefPropagation and dynamic programming Dynamic Program's is a kind of or wherein several.

In addition, embodiments of the invention have provided a specific implementation of time domain prediction method and interview prediction method, and are as follows:

The time domain prediction method comprises step (1) and step (2), and is specific as follows:

(1) at I _T-1Obtain in the frame and I _t(x, y) Dui Ying pixel I _T-1(x ', y ').

(2) with described pixel I _T-1The depth value of (x ', y ') is given pixel I _t(x, y).

Certainly, embodiments of the invention are not limited to this, can also calculate by alternate manner.

The interview prediction method comprises step (3) and (4), and is specific as follows:

(3) with frame I _tFrame I ' with the moment and different points of view _tIn obtain and I _t(x, y) Dui Ying pixel I ' _t(x ', y ').

(4) with described pixel I ' _tThe depth value of (x ', y ') is given described pixel I _T-1(x ', y ').

Need to prove, in step (1) and the step (3), obtain pixel I _T-1(x ', y ') and pixel I ' _tThe method of (x ', y ') for example adopts existing optical flow method, estimation method, window matching process, feature point tracking method etc.Embodiments of the invention are to adopting the not restriction of which kind of method.

Like this, by repeating to travel through above-mentioned steps S101, S102 and S103, then can obtain the depth image of each two field picture by each pixel on each two field picture.

Step S104 integrates whole described depth images according to each viewpoint of described multi-viewpoint three-dimensional video sequence and the acquisition time of described each two field picture, to obtain the degree of depth sequence corresponding with described multi-viewpoint three-dimensional video sequence.

Particularly, stop to integrate until on demand, the depth map that all are obtained is integrated by corresponding viewpoint, corresponding time sequencing then, obtains the degree of depth sequence of multi-viewpoint three-dimensional video.

Be respectively the comparison diagram of using conventional method and using the depth map that the one embodiment of the invention method obtains as Fig. 2 A-2B, wherein, Fig. 2 A is the depth map of a certain frame that adopts conventional method and obtain, as the 180th frame of multi-view point video sequence, and Fig. 2 B is the depth map of the 180th frame that adopts method of the present invention and obtain.As can be seen from the figure, use the depth map that method of the present invention obtains advantage of high precision is arranged.In conjunction with Fig. 3, be the time comparison diagram of two kinds of methods calculating of Fig. 2 A-2B.Can find out that from Fig. 3 method 310 of the present invention improves computational efficiency than conventional method 320 obvious weak points consuming time.

Image to the 190th frame of the present invention compares once more, is respectively the comparison diagram of using conventional method and using the depth map that another embodiment of the present invention method obtains as Fig. 4 A-4B, in conjunction with Fig. 5, is the time comparison diagram that two kinds of methods of Fig. 4 A-4B are calculated.Similarly, use the consuming time weak point of method 510 of the present invention than conventional method 520.Thus, method of the present invention has obviously reduced computing time under the situation that guarantees precision.

Below in conjunction with the degree of depth sequence generator of Fig. 6 description according to the multi-viewpoint three-dimensional video of the embodiment of the invention.

As shown in Figure 6, the degree of depth sequence generator 600 according to the multi-viewpoint three-dimensional video of the embodiment of the invention comprises: image extraction module 610, judge module 620, computing module 630 and integrate module 640.

Particularly, described image extraction module 610 is used to provide the multi-viewpoint three-dimensional video sequence, and wherein, the quantity of the viewpoint of described multi-viewpoint three-dimensional video sequence is N, and N is not less than 2 integer.For example, the multi-viewpoint three-dimensional video sequence can be gathered gained by N video camera, a N camera, perhaps obtains by the computer specific purpose tool.For guaranteeing that multi-view point video can supply three-dimensional watching, it is gathered the viewpoint number and should be 2 at least.

Described judge module 620 is used for constructing according to described multi-viewpoint three-dimensional video sequence the Bayesian model of described multi-viewpoint three-dimensional video sequence, and determines in the described multi-viewpoint three-dimensional video sequence degree of depth obtain manner of each pixel in each two field picture according to described Bayesian model.In examples more of the present invention, described Bayesian model comprises: Bayesian model between time domain Bayesian model and viewpoint.

Formula 1,

Formula 2,

Presentation of events

Be incident F during generation _t(x, probability distribution y).

Formula 3,

Formula 4,

For representing described I _t(x, the probability untoward incidents part of the described probability event S that depth value y) can not obtain by described interview prediction method, f _s(x y) is pixel I _t(x, characteristic parameter between viewpoint y), wherein, the prior probability that P (S) occurs for incident T. For representing described I _t(x, the probability untoward incidents part of the described probability event S that depth value y) can not obtain by described time domain prediction method, Presentation of events The prior probability that takes place.P (f _s(x, y) | f when S) occurring for incident S _s(x, probability distribution y),

Be incident

F during generation _s(x, probability distribution y).

Thus, by setting up between viewpoint and the time domain Bayesian model, can predict P (T|f _t(x, y)) and P (S|f _sThe distribution of the probability of happening of (x, y)).

Formula 5,

Wherein, Be frame

Formula 6,

Wherein, B _tBe with

Pixel I for current view point _t(x is y) with adjacent viewpoint corresponding pixel points I ' _t(x, absolute difference y), described element blocks B _tMiddle element number and I _tIn with (x, y) identical for the number of pixels that piece comprised of (2m+1) λ (2m+1), and described element is corresponding one by one with described pixel for center and size.Like this,, calculate the probability of happening of different event, select the bigger incident of probability then, can effectively evade the erroneous judgement risk, reduce error rate as the result of decision according to the characteristic parameter that calculates.Particularly, as described in the computing module 630.

Described computing module 630 is used for according to described degree of depth obtain manner corresponding pixel being carried out degree of depth assignment to obtain the depth image of described each two field picture, wherein, described degree of depth obtain manner comprises time domain prediction method, interview prediction method and depth computing method.As a concrete example, at first utilize the computational methods of the foregoing description Chinese style 1, formula 2,5 pairs of depth values of formula to judge, for example: if judgment formula (formula 7) is correct, then described pixel I _t(x, depth value y) adopt the time domain prediction method to calculate.

Judgment formula Whether (formula 8) is true, if be true, then judges described pixel I _t(x, depth value y) adopt the interview prediction method to calculate.

(3) with frame It with constantly and the frame I ' of different points of view _tIn obtain and I _t(x, y) Dui Ying pixel I ' _t(x ', y ').

Described integrate module 640 is used for according to the acquisition time of each viewpoint of described multi-viewpoint three-dimensional video sequence and described each two field picture whole described depth images being integrated, to obtain the degree of depth sequence corresponding with described multi-viewpoint three-dimensional video sequence.

In the description of this specification, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.

Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: can carry out multiple variation, modification, replacement and modification to these embodiment under the situation that does not break away from principle of the present invention and aim, scope of the present invention is by claim and be equal to and limit.

Claims

1. the depth sequence generation method of a multi-viewpoint three-dimensional video is characterized in that, may further comprise the steps:

The multi-viewpoint three-dimensional video sequence is provided, and wherein, the quantity of the viewpoint of described multi-viewpoint three-dimensional video sequence is N, and N is not less than 2 integer;

Construct the Bayesian model of described multi-viewpoint three-dimensional video sequence according to described multi-viewpoint three-dimensional video sequence, and determine in the described multi-viewpoint three-dimensional video sequence degree of depth obtain manner of each pixel in each two field picture according to described Bayesian model;

According to described degree of depth obtain manner corresponding pixel is carried out degree of depth assignment to obtain the depth image of described each two field picture, wherein, described degree of depth obtain manner comprises time domain prediction method, interview prediction method and depth computing method; With

According to each viewpoint of described multi-viewpoint three-dimensional video sequence and the acquisition time of described each two field picture whole described depth images are integrated, to obtain the degree of depth sequence corresponding with described multi-viewpoint three-dimensional video sequence.

2. depth sequence generation method according to claim 1 is characterized in that, described Bayesian model comprises:

Time domain Bayesian model, described time domain Bayesian model are to set up according to the acquisition time of described each two field picture under the same viewpoint of described multi-viewpoint three-dimensional video sequence; And

Bayesian model between viewpoint, Bayesian model is to set up according to the relation of described multi-viewpoint three-dimensional video sequence between N image of synchronization, different points of view between described viewpoint.

3. depth sequence generation method according to claim 2 is characterized in that, described time domain Bayesian model is as follows:

P (T | f_{t} (x, y)) = \frac{P (f_{t} (x, y) | T) P (T)}{P (f_{t} (x, y) | T) P (T) + P (f_{t} (x, y) | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T})}

Formula 1,

P (\overset{&OverBar;}{T} | f_{t} (x, y)) = \frac{P (f_{t} (x, y) | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T})}{P (f_{t} (x, y) | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T}) + P (f_{t} (x, y) | T) P (T)}

Formula 2,

Presentation of events

Be incident F during generation _t(x, probability distribution y);

Bayesian model is as follows between described viewpoint:

P (S | f_{s} (x, y)) = \frac{P (f_{s} (x, y) | S) P (S)}{P (f_{s} (x, y) | S) P (S) + P (f_{s} (x, y) | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S})}

Formula 3,

P (\overset{&OverBar;}{S} | f_{s} (x, y)) = \frac{P (f_{s} (x, y) | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S})}{P (f_{s} (x, y) | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S}) + P (f_{s} (x, y) | S) P (S)}

Formula 4,

Presentation of events

The prior probability that takes place.P (f _s(x, y) | f when S) occurring for incident S _s(x, probability distribution y),

Be incident

F during generation _s(x, probability distribution y).

4. depth sequence generation method according to claim 3 is characterized in that, described pixel I _t(x, temporal signatures parameter f y) _t(x, computing formula y) is as follows:

f _t(x, y)=I _T-1(x, y)-I _t(x, y) formula 5,

Wherein, I _T-1(x y) is frame I _T-1Middle coordinate is (x, pixel y).

5. depth sequence generation method according to claim 3 is characterized in that, described pixel I _t(x, characteristic parameter f between viewpoint y) _s(x, y) calculate in the following way:

f _s(x, y)=σ (B) formula 6,

Wherein, B is with Δ I _t(x is center and the size element blocks for (2m+1) * (2m+1) y), and wherein m is the positive integer more than or equal to 1, and σ () is the standard deviation operator, Δ I _t(x y) is the pixel I of current view point _t(x is y) with adjacent viewpoint corresponding pixel points I ' _t(x, absolute difference y),

Wherein, element number and I among the described element blocks B _tIn so that (x, y) identical for the number of pixels that piece comprised of (2m+1) * (2m+1) for center and size, and described element is corresponding one by one with described pixel, the value of each element is the parallax value of its corresponding pixel points.

6. depth sequence generation method according to claim 3 is characterized in that, judges according to formula 1 and formula 2 whether following formula is correct, and wherein, described formula is:

P (f_{t} | T) P (T) > P (f_{t} | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T})

Formula 7,

If judgment formula

For very, then described pixel I _t(x, the depth value of y adopt the time domain prediction method to calculate;

P (f_{s} | S) P (S) > P (f_{s} | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S})

Formula 8,

If judgment formula For very, then described pixel I _t(x, depth value y) adopt the interview prediction method to calculate.

7. depth sequence generation method according to claim 6 is characterized in that, if judgement formula 7 and formula 8 all are false, and then described pixel I _t(x, depth value y) adopt depth computing method to obtain.

8. depth sequence generation method according to claim 6 is characterized in that, described pixel I _t(x, depth value y) adopt the time domain prediction method to calculate, and further comprise:

At I _T-1Obtain in the frame and I _t(x, y) Dui Ying pixel I _T-1(x ', y ');

With described pixel I _T-1The depth value of (x ', y ') is given pixel I _t(x, y).

9. depth sequence generation method according to claim 6 is characterized in that, described pixel I _t(x, depth value y) adopt the interview prediction method to calculate, and further comprise:

With frame I _tFrame I ' with the moment and different points of view _tIn obtain and I _t(x, y) Dui Ying pixel I ' _t(x ', y ');

With described pixel I ' _tThe depth value of (x ', y ') is given described pixel I _T-1(x ', y ').

10. depth sequence generation method according to claim 1 is characterized in that, described depth computing method comprises:

Image segmentation algorithm Graph Cut, conviction Law of Communication Belief Propagation and dynamic programming Dynamic Program one of at least.

11. the degree of depth sequence generator of a multi-viewpoint three-dimensional video is characterized in that, comprising:

The image extraction module, described image extraction module is used to provide the multi-viewpoint three-dimensional video sequence, and wherein, the quantity of the viewpoint of described multi-viewpoint three-dimensional video sequence is N, and N is not less than 2 integer;

Judge module, described judge module is used for constructing according to described multi-viewpoint three-dimensional video sequence the Bayesian model of described multi-viewpoint three-dimensional video sequence, and determines in the described multi-viewpoint three-dimensional video sequence degree of depth obtain manner of each pixel in each two field picture according to described Bayesian model;

Computing module, described computing module is used for according to described degree of depth obtain manner corresponding pixel being carried out degree of depth assignment to obtain the depth image of described each two field picture, wherein, described degree of depth obtain manner comprises time domain prediction method, interview prediction method and depth computing method; And

Integrate module, described integrate module is used for according to the acquisition time of each viewpoint of described multi-viewpoint three-dimensional video sequence and described each two field picture whole described depth images being integrated, to obtain the degree of depth sequence corresponding with described multi-viewpoint three-dimensional video sequence.

12. degree of depth sequence generator according to claim 11 is characterized in that, described Bayesian model comprises:

Time domain Bayesian model, described time domain Bayesian model are to set up according to the acquisition time of described each two field picture under the same viewpoint of described multi-viewpoint three-dimensional video sequence;

13. degree of depth sequence generator according to claim 12 is characterized in that, described time domain Bayesian model is:

P (T | f_{t} (x, y)) = \frac{P (f {(x, y)}_{t} | T) P (T)}{P (f_{t} (x, y) | T) P (T) + P (f_{t} (x, y) | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T})}

Formula 1,

P (\overset{&OverBar;}{T} | f_{t} (x, y)) = \frac{P (f_{t} (x, y) | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T})}{P (f_{t} (x, y) | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T}) + P (f_{t} (x, y) | T) P (T)}

Formula 2,

Presentation of events

Be incident

F during generation _t(x, probability distribution y);

Bayesian model is between described viewpoint:

P (S | f_{s} (x, y)) = \frac{P (f_{s} (x, y) | S) P (S)}{P (f_{s} (x, y) | S) P (S) + P (f_{s} (x, y) | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S})}

Formula 3,

P (\overset{&OverBar;}{S} | f_{s} (x, y)) = \frac{P (f_{s} (x, y) | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S})}{P (f_{s} (x, y) | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S}) + P (f_{s} (x, y) | S) P (S)}

Formula 4,

Presentation of events

Be incident

F during generation _s(x, probability distribution y).

14. degree of depth sequence generator according to claim 13 is characterized in that, described pixel I _t(x, temporal signatures parameter f y) _t(x, computing formula y) is as follows:

f _t(x, y)=I _T-1(x, y)-I _t(x, y) formula 5,

Wherein, I _T-1(x y) is frame I _T-1Middle coordinate is (x, pixel y);

f _s(x, y)=σ (B) formula 6,

Wherein, B _tBe with Δ I _t(x is center and the size element blocks for (2m+1) * (2m+1) y), and wherein m is the positive integer more than or equal to 1, and σ () is the standard deviation operator, Δ I _t(x y) is the pixel I of current view point _t(x is y) with adjacent viewpoint corresponding pixel points I ' _t(x, absolute difference y),

Described element blocks B _tMiddle element number and I _tIn so that (x, y) identical for the number of pixels that piece comprised of (2m+1) * (2m+1) for center and size, and described element is corresponding one by one with described pixel, the value of each element is the parallax value of its corresponding pixel points.

15. degree of depth sequence generator according to claim 13 is characterized in that, judges according to formula 1 and formula 2 whether following formula is correct, wherein, described formula is:

P (f_{t} | T) P (T) > P (f_{t} | \overset{&OverBar;}{T}) P (\overset{&OverBar;}{T})

Formula 7,

If judgment formula For very, then described pixel I _t(x, depth value y) adopt the time domain prediction method to calculate;

P (f_{s} | S) P (S) > P (f_{s} | \overset{&OverBar;}{S}) P (\overset{&OverBar;}{S})

Formula 8,

If judgment formula

16. degree of depth sequence generator according to claim 15 is characterized in that, if judgement formula 7 and formula 8 all are false, and then described pixel I _t(x, depth value y) adopt depth computing method to obtain.

17. degree of depth sequence generator according to claim 15 is characterized in that, described pixel I _t(x, depth value y) adopt the calculating of time domain prediction method to pass through at I _T-1Obtain in the frame and I _t(x, y) Dui Ying pixel I _T-1(x ', y '), and with described pixel I _T-1The depth value of (x ', y ') is given pixel I _t(x, y),

Described pixel I _t(x, y) depth value adopt the interview prediction method calculate by with frame I _tFrame I ' with the moment and different points of view _tIn obtain and I _t(x, y) Dui Ying pixel I ' _t(x ', y '), and with described pixel I ' _tThe depth value of (x ', y ') is given described pixel I _T-1(x ', y ').

18. degree of depth sequence generator according to claim 10 is characterized in that, described computing module is by depth computing method compute depth value, and wherein, described depth computing method comprises: