CN103248906A

CN103248906A - Method and system for acquiring depth map of binocular stereo video sequence

Info

Publication number: CN103248906A
Application number: CN2013101347151A
Authority: CN
Inventors: 王好谦; 杜成立; 张永兵; 戴琼海
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2013-04-17
Filing date: 2013-04-17
Publication date: 2013-08-14
Anticipated expiration: 2033-04-17
Also published as: HK1183577A1; CN103248906B

Abstract

The invention discloses a method and a system for acquiring a depth map of a binocular stereo video sequence. The method comprises the following steps: performing clustering on a first image in two images, adopting a region growing manner to transform all clusters to connected regions, and recording average five-dimension coordinates of all regions and adjacent information between two regions; receiving manual signs for marking foreground and background of the first image input by an operator, and calculating connection weight and sign weight of the regions; taking the connection weight and sign weight of the regions as input, and transferring the Graphcut algorithm to partition the foreground and the background of the first image; and taking the second image of the two images as a reference image, according to the partition result, calculating a disparity map of the foreground based on the partial self-adaption weight stereo matching algorithm, calculating a disparity map of the background based on the Rank transformation stereo matching algorithm, and then transforming the disparity maps into depth maps. The system is a system executing the method. The method and the system have the benefits of correct result and low complexity.

Description

A kind of depth map acquisition methods and system of binocular tri-dimensional frequency sequence

Technical field

The invention belongs to the Computer Image Processing field, particularly a kind of depth map acquisition methods and system of binocular tri-dimensional frequency sequence.

Background technology

Develop rapidly along with current social, it is also increasing that demand is appreciated in humane amusement, the requirement of watching for telecine, it is not only high clear colorful, viewing person needs real more 3-D effect, three-dimensional video-frequency correlative study and application become current hot issue, and the technical problem that three-dimensional video-frequency is relevant is also thought problem demanding prompt solution.

Three-dimensional video-frequency is divided into traditional binocular tri-dimensional video and multi-view point video, binocular tri-dimensional video is made up of the two-path video sequence, though can bring stereoeffect to people, but effect is more stiff, must wear corresponding eyes just can watch, and viewpoint is single, and is far apart with the solid impression in the middle of the real-life.Multi-view point video then can adapt to the demand of bore hole reality, can satisfy people's bore hole and watch the demand of video, simultaneously, reason owing to multi-view point video, can so that the video sequence that people watch from different perspectives to inequality, and then the scene angle that observes is also inequality, and the latter is more near real three-dimensional impression.

Although multi-view point video possesses the advantage how binocular tri-dimensional video does not possess, the collection difficulty of multi-view point video is also bigger, and how obtaining high-quality multi-view point video becomes a key and study a question.Can directly be converted to multi-view point video from the 2d video at present, but because information seriously lacks the depth perception extreme difference of the multi-view point video sequence that obtains.And the collection of binocular video is fairly simple comparatively speaking, and amount of information is also abundant relatively, and binocular changes many orders and becomes a valid approach.

Binocular tri-dimensional video changes many purposes basic ideas: at first obtain depth information by binocular video, adopt the synthetic method of virtual view to obtain the multi-channel video sequence according to the depth map that obtains then.Wherein most critical is exactly how to obtain depth information accurately.The main method of present stage is exactly three-dimensional coupling, obtains depth information by the skew (parallax) of the pixel between the two-path video correspondence image of the search synchronization left and right sides, and the degree of depth is the relation that is inversely proportional to parallax.

Stereo Matching Algorithm mainly is divided into two classes, overall Stereo Matching Algorithm and local Stereo Matching Algorithm, and two kinds of algorithms have a common problem to be exactly, and for the situation of natural information disappearances such as blocking, are difficult to solve.

Summary of the invention

Technical problem to be solved by this invention is to provide a kind of depth map acquisition methods and system of binocular tri-dimensional frequency sequence, the complexity of reduction acquisition process when guaranteeing accuracy.

Technical scheme of the present invention is solved by following technological means:

As shown in Figure 1, a kind of depth map acquisition methods of binocular tri-dimensional frequency sequence may further comprise the steps:

S100) pre-treatment step: read in two width of cloth images of same time point, wherein first width of cloth image is carried out clustering processing; The mode that adopts region growing then is converted into the zone of connection with each cluster, records average five dimension coordinates and the interregional neighbor information in each zone.

Because two width of cloth images of same time point have very strong correlation, cutting operation to image, one of two width of cloth images are handled and are got final product about only needing, and another width of cloth figure is as the reference diagram of follow-up Stereo Matching Algorithm, and first image in this step can be that left figure also can be right figure.

S200) foreground extraction step: comprising: S210) weights calculation procedure: receive the prospect that is used for mark first width of cloth image of operator's input and the handmarking of background, calculate and represent the mark weights that zone connective between the zone connects correlation between weights and each zone and handmarking; S220) image segmentation step: described zone is connected weights and mark weights as input, call the GrpahCut algorithm first width of cloth image is carried out cutting apart of prospect part and background parts.

S300) depth map obtaining step: with second width of cloth image in described two width of cloth images as the reference image, segmentation result according to the foreground extraction step, calculate the disparity map of prospect part based on local auto-adaptive weights Stereo Matching Algorithm, based on the disparity map of Rank conversion Stereo Matching Algorithm calculating background parts, then the disparity map of prospect part and the disparity map of background parts are converted to depth map.

Preferably, also comprise: S400) post-processing step: to the pixel of prospect part, adopting following steps to carry out depth value proofreaies and correct: choose the depth map correcting window, and according to the result of pre-treatment step, upgrade the depth value of pixel to be corrected with the mean depth value that all and pixel to be corrected in the correcting window is belonged to the pixel of the same area.

Preferably, clustering processing may further comprise the steps described in the described pre-treatment step:

S110) pending image is carried out denoising;

S120) adopt the K-means clustering algorithm, according to the similarity degree of quintuple space coordinate described first width of cloth image is carried out cluster, pixel is attributed to its quintuple space apart from the classification under the cluster centre of minimum.

Preferably, described step S210) may further comprise the steps:

S211) according to step S100) the interregional neighbor information that obtains, for non-conterminous zone, establishing its zone, to connect weights be 0, for any adjacent areas a and regional b, its zone connection weights

Wherein,

D_{ab} = \sqrt{{(R_{a} - R_{b})}^{2} {+ (G_{a} - G_{b})}^{2} {(B_{a} - B_{b})}^{2}},

(R _a, G _a, B _a) and (R _b, G _b, B _b) be respectively each pixel of regional a and regional b at the mean value of each component of RGB color space;

S212) receive operator's input prospect gauge point s and context marker point t;

S213) for each regional k of image, calculate its prospect mark weights foreW _KsWith context marker weights backW _Kt, wherein:

ForeD _KsMinimum quintuple space distance for regional k and prospect gauge point s;

BackD _KtMinimum quintuple space distance for regional k and context marker point t.

Preferably, described step S400) may further comprise the steps:

S401) read current pixel point;

S402) judge whether current pixel point belongs to the prospect part, if not, then return step S401) carry out the processing of next pixel, otherwise carry out next step;

S403) choose matrix centered by current pixel point as the depth map correcting window, according to the pre-treatment step result, the pixel that does not belong to the same area with current pixel point in the correcting window is defined as the inactive pixels point, and the rest of pixels point is defined as effective pixel points;

S404) depth value of renewal current pixel point is the mean value of the depth value of all effective pixel points in the depth map correcting window;

S405) check whether all pixels are proofreaied and correct and finish, if not, then enter next pixel and handle; If then finish to proofread and correct.

A kind of depth map of binocular tri-dimensional frequency sequence obtains system, it is characterized in that, comprising:

Pretreatment module (201) comprises cluster module and region growing module, and described cluster module is used for reading in two width of cloth images of same time point, and wherein first width of cloth image is carried out clustering processing; The mode that described region growing module be used for to adopt region growing is converted into the zone of connection with each cluster, records average five dimension coordinates and the interregional neighbor information in each zone;

Foreground extracting module (202): comprise that weights computing module and image cut apart module, described weights computing module is used for receiving the prospect that is used for mark first width of cloth image of operator's input and the handmarking of background, calculates and represents the mark weights that zone connective between the zone connects correlation between weights and each zone and handmarking; Described image is cut apart module and is used for described zone is connected weights and mark weights as input, calls the GrpahCut algorithm first width of cloth image is carried out cutting apart of prospect part and background parts;

Degree of depth acquisition module (203): be used for second width of cloth image with described two width of cloth images as the reference image, segmentation result according to the foreground extraction step, calculate the disparity map of prospect part based on local auto-adaptive weights Stereo Matching Algorithm, based on the disparity map of Rank conversion Stereo Matching Algorithm calculating background parts, then the disparity map of prospect part and the disparity map of background parts are converted to depth map.

Preferably, also comprise: post-processing module (204): be used for the pixel to the prospect part, adopting following steps to carry out depth value proofreaies and correct: choose the depth map correcting window, and according to the result of pre-treatment step, upgrade the depth value of pixel to be corrected with the mean depth value that all and pixel to be corrected in the correcting window is belonged to the pixel of the same area.

Preferably, described cluster module comprises: the denoising module: be used for pending image is carried out denoising; K-means module: be used for to adopt the K-means clustering algorithm, according to the similarity degree of quintuple space coordinate described first width of cloth image is carried out cluster, pixel is attributed to its quintuple space apart from the classification under the cluster centre of minimum.

Preferably, described weights computing module comprises:

The regional weights computing module that connects: be used for the interregional neighbor information according to the pretreatment module acquisition, for non-conterminous zone, establishing its zone connection weights is 0, and for any adjacent areas a and regional b, its zone connects weights

Wherein,

D_{ab} = \sqrt{{(R_{a} - R_{b})}^{2} {+ (G_{a} - G_{b})}^{2} {(B_{a} - B_{b})}^{2}},

The mark input module: be used for to receive operator's input prospect gauge point s and context marker point t;

Mark weights computing module: be used for each the regional k for image, calculate its prospect mark weights foreW _KsWith context marker weights backW _Kt, wherein:

Preferably, described post-processing module comprises:

Read module: be used for reading current pixel point;

Judge module: be used for judging whether current pixel point belongs to the prospect part, if not then read module carries out the processing of next pixel, handle otherwise carry out next module;

Valid pixel is chosen module: be used for choosing matrix centered by current pixel point as the depth map correcting window, according to the pre-treatment step result, the pixel that does not belong to the same area with current pixel point in the correcting window is defined as the inactive pixels point, and the rest of pixels point is defined as effective pixel points;

The depth value update module: the depth value that is used for the renewal current pixel point is the mean value of the depth value of all effective pixel points of depth map correcting window;

Stop judge module: finish for checking whether all pixels are proofreaied and correct, if not, then enter next pixel and handle; If then finish to proofread and correct.

Compared with prior art, the present invention is at first cut apart fore/background, calculate the depth value of prospect part then based on local auto-adaptive weights Stereo Matching Algorithm, more accurate to guarantee the depth information of the bigger prospect part of beholder's influence, and calculate the depth value of background parts based on the Stereo Matching Algorithm of RANK conversion fast, to reduce the complexity of algorithm.And, carrying out fore/background when cutting apart, at first utilize cluster and region growing to form picture portion, carry out alternately with the user then, utilize the user image to be carried out the handmarking of fore/background, correlation by regional connectivity and each zone and handmarking's part realizes that fore/background is cut apart accurately, thereby has guaranteed that follow-up depth value calculates the good effect of acquisition.

In the preferred version, further cluster and the region growing result to utilizing pre-treatment step carried out further correction to the depth value of prospect part pixel, can make the depth value of prospect part more near the objective degree of depth of image.

In the preferred version, before cluster, image is carried out denoising, can weaken noise to the influence of clustering algorithm, and adopt K-means algorithm (K-means clustering algorithm) to utilize the quintuple space distance of pixel to carry out cluster, guaranteed the effect of cluster.

Description of drawings

Fig. 1 is the technical scheme flow chart that the present invention designs.

Fig. 2 is the technical scheme modular structure schematic diagram that the present invention designs.

Fig. 3 is the flow chart of pretreatment module.

Fig. 4 is the flow chart of foreground extracting module.

Fig. 5 is the flow chart of depth map acquisition module.

Fig. 6 is the flow chart of post-processing module.

Embodiment

Preferred embodiment the invention will be further described for contrast accompanying drawing and combination below.

Present embodiment is a kind of depth map acquisition methods and system of binocular tri-dimensional frequency sequence, and Fig. 2 is the modular structure figure of described system, and this system mainly comprises following four modules:

Pretreatment module 201;

Foreground extracting module 202;

Depth map acquisition module 203; And

Post-processing module 204.

Pretreatment module is used for reading in left and right sides road picture, and right figure is done denoising, utilizes clustering algorithm that the similar pixel in the image is carried out cluster and merges into the zone then, and posting field information adopts the K-Means algorithm to carry out clustering processing in this implementation method.

The FB(flow block) of pretreatment module specifically comprises referring to Fig. 3:

Read in picture step 301: read in current point in time about two width of cloth images.

Image denoising step 302: purpose is to weaken picture noise to the influence of clustering algorithm.Adopt the gaussian filtering algorithm that image is carried out denoising in this example.Two width of cloth images have very strong spatial coherence about synchronization, for the image cutting operation, only need handle right figure getting final product, left figure as the reference picture of follow-up Stereo Matching Algorithm (left figure also can, follow-up with right figure as with reference to image)

What following step 303-306 carried out is that image clustering is handled, and adopts the K-means clustering algorithm in this example, and the basic calculating foundation is five dimension coordinate (x of image, y, r, g, b), wherein (x y) is the location of pixels coordinate, (b) pixel is at the component value of rgb color space for r, g, similarity degree according to five dimension coordinates carries out cluster operation, sets similarity variables D (x, y, r, g, b).

Initial cluster center step 303 is set: this example is divided into several fixedly rectangular blocks of length and width with image, calculates the quintuple space coordinate mean value of all pixels in each rectangular block as initial cluster center.Use C (x, y, r, g, b) position and the color space value of expression cluster centre for sake of convenience.X _CAnd Y _CThe horizontal ordinate of representing cluster centre respectively, R _C, G _CAnd B _CRepresent three color component value of red, green, blue of cluster centre respectively, other formula all adopt similar symbol for the quintuple space coordinate hereinafter, hereinafter will not give unnecessary details.

With the step 304 of pixel according to quintuple space coordinate cluster: handle each pixel in the image successively.For current pixel point, calculate the quintuple space distance of each cluster centre in this point and its hunting zone, wherein the classification under the cluster centre of Zui Xiao space length value correspondence is the classification that this pixel belongs to.Suppose current pixel point be P (x, y, r, g, b), corresponding cluster centre pixel be C (x, y, r, g, b) concrete account form is as follows:

The color space distance;

D_{pc_color} = \sqrt{{(R_{p} - R_{c})}^{2} + {(G_{p} - G_{c})}^{2} + {(B_{p} - B_{c})}^{2}}

The locational space distance:

D_{pc_position} = \sqrt{{(X_{p} - X_{c})}^{2} + {(Y_{p} - Y_{c})}^{2}}

The quintuple space distance of this pixel and cluster centre:

D (x, y, r, g, b) = \sqrt{{D_{pc_color}}^{2} + {D_{pc_position}}^{2}}

By calculating the quintuple space distance of this pixel and different cluster centres, choose minimum space apart from minD (x, y, r, g, cluster centre b) is analog result, just this similitude is attributed to corresponding cluster centre one class.

Upgrade cluster centre step 305: add up all pixels that comprise of all categories, calculate the quintuple space coordinate mean value of these pixels, as new cluster centre, number of times tter_num is carried out in the statistics circulation.

Judge the step 306 whether cluster finishes: concrete grammar is as follows: in the computed image pixel apart from the minimum quintuple space of cluster centre apart from minD (x, y, r, g, b) sum sumD:

sumD=∑minD(x,y,r,g,b)

Current cycle calculations obtain apart from high and be sumD _Current, the distance that a preceding cycle calculations obtains and be sumD _Prevtous, the condition that cluster finishes is that following two formulas have an establishment;

sumD _prevtous-surmD _current≤T

tter_num＞max_tter

Wherein T is that given threshold value (determine according to picture material, is preferably set to (1.2～2) sumD doubly by this parameter _Prevtous/ tter_num); Tter_num is that number of times is carried out in current circulation, and max_tter is for carrying out number of times (being set at usually 5～10 times) with the largest loop of setting.If the cluster termination condition is false, then returns step 304 and continue circulation; If the cluster termination condition is set up, then enter region growing step 307.

Region growing step 307: the region growing algorithm that at first adopts the neighbours territory is converted into the zone of connection with each cluster, adds up the pixel number in each zone.If certain area pixel point number greater than given upper limit threshold, then arranges two or more cluster centres, call the K-means clustering algorithm again, be two or more subregions with this Region Segmentation; If certain area pixel point number less than given lower threshold, then is incorporated into neighborhood nearest in the quintuple space with this zone.

At last, record average five dimension coordinates in each zone and the neighbor information that ask in the zone (that is: expression zone whether adjacent information).

Foreground extracting module, be used for receiving the prospect that is used for mark first width of cloth image of operator's input and the handmarking of background, handmarking according to input, weights are calculated in each zone, comprise representing the mark weights that zone connective between the zone connects correlation between weights and zone and handmarking; Also be used for to calculate good weights as input, call the GrpahCut algorithm and cut apart the extraction foreground area to what present image carried out prospect part and background parts.The flow chart of foreground extracting module specifically comprises referring to Fig. 4:

The zone connects weights calculation procedure 401: judge according to step 307, for non-conterminous zone on the space, do not consider that it connects weights, namely connect weights and be made as O, for adjacent areas on the space, consider degree of correlation information on the color, calculate according to following formula and connect weights (be example with adjacent areas a and regional b):

Calculating pixel color distortion value:

D_{ab} = \sqrt{{(R_{a} - R_{b})}^{2} {+ (G_{a} - G_{b})}^{2} {(B_{a} - B_{b})}^{2}}

Zone a, the connection weights between the b:

(R wherein _a, G _a, B _a) and (R _b, G _b, B _b) be respectively each pixel of regional a and b at the mean value of each component of RGB color space;

Handmarking's input step 402: the handmarking who receives operator's input.The labeling method that provides in this example is that the operator makes marks at foreground object with left mouse button, makes marks at background object with right mouse button.If the number of prospect gauge point or context marker point then utilizes the K-means algorithm that its cluster is cluster_num classification greater than predefined cluster threshold value cluster_num, get the cluster centre of each classification as final gauge point; In order to guarantee the validity of manual intervention information, if the number of gauge point is less than or equal to cluster threshold value cluster_num, then do not carry out cluster.

Mark weights calculation procedure 403: for each zone in the image, calculate it for the weights of prospect mark s and context marker k.

Calculating prospect mark weights are exactly the five bit space distances of calculating current region k and prospect gauge point s, and concrete computational methods are as follows:

Calculate the color space distance:

D_{ks_color} = \sqrt{{(R_{k} - R_{s})}^{2} + {(G_{k} - G_{s})}^{2} + {(B_{k} - B_{s})}^{2}}

The calculating location space length:

D_{ks_position} = \sqrt{{(X_{k} - X_{s})}^{2} + {(Y_{k} - Y_{s})}^{2}}

Five bit space distances of current region and prospect gauge point:

D_{ks} = \sqrt{{D_{ks_color}}^{2} + {D_{ks_position}}^{2}}

foreD _ks=minD _ks

{foreW}_{ks} = \frac{1}{{foreD}_{ks}}

Wherein, (X _k, Y _k, R _k, G _k, B _k) and (X _s, Y _s, R _s, G _s, B _s) be respectively the quintuple space coordinate of regional k and prospect gauge point s; D _{Ks_color}Color space distance for regional k and prospect gauge point s; D _{Ks_position}Position distance for regional k and prospect gauge point s; D _KsQuintuple space distance for regional k and prospect gauge point s; ForeD _KsMinimum quintuple space distance for regional k and prospect mark; ForeW _KsProspect mark weights for regional k.

The computing formula of context marker weights and prospect mark weights account form are similar, and current region k and context marker point t computational methods are as follows:

D_{kt_color} = \sqrt{{(R_{k} - R_{t})}^{2} + {(G_{k} - G_{t})}^{2} {(B_{k} - B_{t})}^{2}}

D_{kt_position} = \sqrt{{(X_{k} - X_{t})}^{2} + {(Y_{k} - Y_{t})}^{2}}

D_{kt} = \sqrt{{D_{kt_color}}^{2} + {D_{kt_position}}^{2}}

backD _ks＝minD _ks

{backW}_{ks} = \frac{1}{{backD}_{ks}}

Wherein, (X _k, Y _k, R _k, G _k, B _k) and (X _t, Y _t, R _t, G _t, B _t) be respectively the quintuple space coordinate of regional k and context marker point t; D _{Kt_color}Color space distance for regional k and context marker point t; D _{Kt_position}Position distance for regional k and context marker point t; D _KtQuintuple space distance for regional k and context marker point t; BackD _KtMinimum quintuple space distance for regional k and context marker; BackW _KtContext marker weights for regional k.

GrapchCut segmentation step 404: the zone is connected weights and mark weights as input parameter, call the GraphCut algorithm and obtain the Region Segmentation result.

The determining step 405 whether Region Segmentation result meets the demands: in this example, judge that by the operator whether the Region Segmentation result isolates prospect and the background of image comparatively accurately, namely observes segmentation result.If segmentation result can not meet the demands, then return step 402 and add the handmarking again, carry out the Region Segmentation operation; If segmentation result can meet the demands, then enter step 501 and carry out follow-up three-dimensional coupling and obtain the depth map stage.

The depth map acquisition module, result according to Region Segmentation, adopt different Stereo Matching Algorithm to handle to prospect part and background parts, obtain disparity map, be converted into depth map again, depth map and disparity map are the relations that is inversely proportional to, and obtain disparity map by the solid coupling and then can be converted into depth map, and hereinafter part will describe the process of obtaining disparity map by Stereo Matching Algorithm in detail.

In order to narrate conveniently, in this example, use P _LThe left figure that expression is read in, P _RThe right figure that expression is read in, P _L(i, j) the left figure i of expression is capable, the pixel of j row, P _R(i, j) the left figure i of expression is capable, the pixel of j row.For left figure current pixel point P _L(i j), sets its parallax hunting zone (DSR=20), and namely the region of search is the same horizontal pixel point set P of right figure _R(i, j-d), d ∈ [0, DSR] wherein.Successively to each the reference point P in d ∈ [0, the DSR] scope _R(i j-d), calculates itself and P _L(i, coupling cost Cost j) _d, choose coupling cost Cost _dMinimum reference point is as optimal match point, and then the parallax that this point is corresponding then is current pixel point P _L(i, parallax value j) is d.Concrete steps are as shown in Figure 5:

Matching process is selected step 501: judge that current pixel belongs to prospect partly or background parts, according to the result of the foreground extracting module of a last module, part then enters step 502 if pixel belongs to prospect, otherwise enters step 505;

Belong to foreground area if judge pixel, should adopt Stereo Matching Algorithm more accurately, i.e. adaptive weight matching algorithm, computational methods are as step 502～504 hereinafter:

Self adaptation support window determining step 502: with current pixel point P _L(i, j) centered by, choose the size be the support window of W * W (target window), parameter W range of choice is 27～37 odd number, according to each pixel in the window and current pixel P _L(i, brightness j) and colour information and range information calculate itself and P _L(i, degree of correlation j) is as weights.Pixel P in the target window _L(i+m, weights j+n) are designated as Ω _L(p, q), wherein q represents pixel P _L(i+m, j+n).

The calculation procedure 503 of weights: original image is carried out medium filtering to remove noise jamming.The selection 5 * 5 of filter window size or 3 * 3(base unit are pixels).Illustrate: calculate weights Ω _L(p q) need consider heterochromia and range information simultaneously.The more big weights of color similarity are more big, and the more near weights of distance are more big.In order to reduce the influence of noise signal, the colouring information that adopts when calculating weights is reference with the image that original image is carried out behind the medium filtering all, when medium filtering only is used for weights and calculates, still should handle original image in the matching process.

The color similarity is calculated:

D_{Colour} = \sqrt{{(R_{p} - R_{q})}^{2} + {(G_{p} - G_{q})}^{2} + {(B_{p} - B_{q})}^{2}},

The chrominance component of RGB represent pixel wherein.

Distance is calculated: Wherein X, Y distinguish horizontal stroke and the ordinate of represent pixel.

Weights calculate: Ω _L(p, q)=exp[-(D _Colour/ γ _C+ D _Distance/ γ _D)], γ wherein _C∈ (6,12), γ _D∈ (18,28)

The calculating of coupling cost need consider simultaneously that target window and reference windows are (with P _R(i, j-d) support window centered by) in order to obtain weights more accurately, need consider the weights of each pixel in the target window and the weights of interior each pixel of reference windows simultaneously, and the two need calculate respectively according to the different information in the own window, obtains Ω _L(p _L, q _L) and Ω _R(p _R, q _R).

For the pixel in the support window, if it then will not got rid of in foreground area, the method that adopts is the mark value 1 that weights be multiply by foreground area, mark value is 0 if background pixel is put then, multiply each other and to remove by weights, background pixel point makes the result more accurate as the situation of pixel in the support window.

To sum up, final weights computing formula is:

Ω(q _L,q _R)＝Ω _L(p _L,q _L)×Ω _R(p _R,q _R)×P _Segment

Ω wherein _L(p _L, q _L) and Ω _R(p _R, q _R) weight matrix of target window and reference windows, P _SegmentBe that each pixel is the matrix that belongs to prospect or background in the expression current window, respective pixel is that prospect is then at matrix P _SegmentThe value of middle correspondence position is 1, and background is correspondence 0 then;

Absolute error (AD) calculation procedure 504: computing formula is as follows:

Cost _AD(p,q)=|R _p-R _q|+|G _p-G _q|+|B _p-B _q|；

The chrominance component of RGB represent pixel wherein.And the computing formula of coupling cost accumulation is as follows:

{Cost}_{SAD} (d) = \frac{Σ_{m = - W / 2}^{W / 2} Σ_{n = - W / 2}^{W / 2} Ω (q_{L} (i + m, j + n), q_{R} (i + m, j - d + n)) \times {Cost}_{AD} (q_{L} (i + m, j + n), q_{R} (i + m, j - d + n))}{Σ_{m = - W / 2}^{W / 2} Σ_{n = - W / 2}^{W / 2} Ω (q_{L} (i + m, j + n), q_{R} (i + m, j - d + n))}

Wherein:

Ω(q _L(i+m,j+n),q _R(i+m,j-d+n))

=Ω _L(p _L(i,j),q _L(i+m,j+n))×Ω _R(q _R(i,j-d),q _R(i+m,j-d+n))×P _Segment(i+m,j+n)

Current pixel point is the background pixel point, adopts based on Rank conversion Stereo Matching Algorithm and handles, the following step 505 of calculation procedure～507:

Step 505: with current pixel point P _L(i, j) centered by, choose the size be the support window of X * Y (target window), wherein the range of choice of X and Y is 17～25 odd number, the two can be unequal.

Step 506: calculate luminance difference Diff earlier, namely the brightness value of interior each pixel of support window deducts the brightness value of center pixel, and difference is divided into 5 grades, obtains the Rank matrix, and computational methods are as follows:

Rank = \{\begin{matrix} - 2 & Diff < - v \\ - 1 & - v \leq Diff < - u \\ 0 & - u \leq Diff < u \\ 1 & u \leq Diff < v \\ 2 & Diff > v \end{matrix}

Wherein u and v are threshold parameter, u=2,3,4, v=8,9,10

Calculate the Rank matrix of target window and reference windows respectively, obtain RankL (i, j) and RankR (i, j-d), the two is the matrix of WinX * WinY size.

Calculate

RankCost (m, n) = \{\begin{matrix} 0 & RankL (m, n) = RankR (m, n) \\ 1 & RankL (m, n) &NotEqual; RankR (m, n) \end{matrix},

M wherein, n is the variable that uses in the accumulation calculating,

Step 507: coupling cost Cost _RT(d) calculate:

{Cost}_{RT} (d) = Σ_{m = - X / 2}^{X / 2} Σ_{n = Y / 2}^{Y} RankCost (m, n)

Step 508: according to the coupling cost that step 504 or step 507 are obtained, choose the parallax value of smallest match cost correspondence, as the parallax value of this pixel;

Step 509: judge whether all pixels of present image dispose, and disposing then enters step 510, carry out the processing of next pixel otherwise can arrive 501;

Step 510: the disparity map that obtains is converted into depth map;

Post-processing module, be used for according to the foreground information of foreground extracting module 202 extractions and the ID figure information of depth map acquisition module 203 acquisitions, disparity map to the prospect part is further proofreaied and correct processing, obtains more reliable foreground information, makes that the depth information of entire image is more reliable.Concrete steps are as follows:

For sake of convenience, suppose that the depth map that obtains represents with Depth, (i, j) expression depth map i is capable, the depth value of j row for Depth.Each pixel to depth map is handled successively.

Step 601: read current pixel point;

Step 602: judge whether current pixel point belongs to foreground area, is not then to get back to the processing that step 601 is carried out next pixel, select corresponding depth map correcting window otherwise enter step 603;

Step 603: choose the depth map correcting window, be generally the matrix centered by current depth point, suppose that size is N * N, N is 7～15 odd number, judge according to the pretreatment module cluster result whether pixel and current pixel point in the correcting window belong to same classification, if not then remove this support pixel (namely being defined as the inactive pixels point), the matrix that is N * N with a size records this information, the support pixel (namely being defined as effective pixel points) that belongs to same classification, the correspondence position value is 1 in matrix, otherwise is 0;

Step 604: all effective pixel points according to the correcting window of determining in 603 are proofreaied and correct processing to current pixel point, the depth value of current pixel point supports effectively that for all the depth value of pixel gets mean value, can reject inactive pixels point participation mean value calculation by the mode that depth value and positional value with pixel multiply each other.

Step 605: check whether all pixels dispose, handle otherwise enter next pixel.Dispose and then finish whole technical proposal.

Above content be in conjunction with concrete preferred implementation to further describing that the present invention does, can not assert that concrete enforcement of the present invention is confined to these explanations.For those skilled in the art, without departing from the inventive concept of the premise, can also make some being equal to substitute or obvious modification, and performance or purposes are identical, all should be considered as belonging to protection scope of the present invention.

Claims

1. the depth map acquisition methods of a binocular tri-dimensional frequency sequence is characterized in that, may further comprise the steps:

S100) pre-treatment step: read in two width of cloth images of same time point, wherein first width of cloth image is carried out clustering processing; The mode that adopts region growing then is converted into the zone of connection with each cluster, records average five dimension coordinates and the interregional neighbor information in each zone;

S200) foreground extraction step: comprising: S210) weights calculation procedure: receive the prospect that is used for mark first width of cloth image of operator's input and the handmarking of background, calculate and represent the mark weights that zone connective between the zone connects correlation between weights and each zone and handmarking; S220) image segmentation step: described zone is connected weights and mark weights as input, call the GrpahCut algorithm first width of cloth image is carried out cutting apart of prospect part and background parts;

2. the depth map acquisition methods of binocular tri-dimensional frequency sequence according to claim 1 is characterized in that, also comprises:

S400) post-processing step: to the pixel of prospect part, adopting following steps to carry out depth value proofreaies and correct: choose the depth map correcting window, and according to the result of pre-treatment step, upgrade the depth value of pixel to be corrected with the mean depth value that all and pixel to be corrected in the correcting window is belonged to the pixel of the same area.

3. the depth map acquisition methods of binocular tri-dimensional frequency sequence according to claim 1 and 2 is characterized in that, clustering processing may further comprise the steps described in the described pre-treatment step:

S110) pending image is carried out denoising;

4. the depth map acquisition methods of binocular tri-dimensional frequency sequence according to claim 1 and 2 is characterized in that, described step S210) may further comprise the steps:

Wherein,

D_{ab} = \sqrt{{(R_{a} - R_{b})}^{2} {+ (G_{a} - G_{b})}^{2} {(B_{a} - B_{b})}^{2}},

5. the depth map acquisition methods of binocular tri-dimensional frequency sequence according to claim 2 is characterized in that, described step S400) may further comprise the steps:

S401) read current pixel point;

6. the depth map of a binocular tri-dimensional frequency sequence obtains system, it is characterized in that, comprising:

Foreground extracting module (202): bag weights computing module and image are cut apart module, described weights computing module is used for receiving the prospect that is used for mark first width of cloth image of operator's input and the handmarking of background, calculates and represents the mark weights that zone connective between the zone connects correlation between weights and each zone and handmarking; Described image is cut apart module and is used for described zone is connected weights and mark weights as input, calls the GrpahCut algorithm first width of cloth image is carried out cutting apart of prospect part and background parts;

7. the depth map of binocular tri-dimensional frequency sequence according to claim 6 obtains system, it is characterized in that, also comprises:

Post-processing module (204): be used for the pixel to the prospect part, adopting following steps to carry out depth value proofreaies and correct: choose the depth map correcting window, and according to the result of pre-treatment step, upgrade the depth value of pixel to be corrected with the mean depth value that all and pixel to be corrected in the correcting window is belonged to the pixel of the same area.

8. the depth map according to claim 6 or 7 described binocular tri-dimensional frequency sequences obtains system, it is characterized in that, described cluster module comprises:

Denoising module: be used for pending image is carried out denoising;

K-means module: be used for to adopt the K-means clustering algorithm, according to the similarity degree of quintuple space coordinate described first width of cloth image is carried out cluster, pixel is attributed to its quintuple space apart from the classification under the cluster centre of minimum.

9. the depth map according to claim 6 or 7 described binocular tri-dimensional frequency sequences obtains system, it is characterized in that, described weights computing module comprises:

Wherein,

D_{ab} = \sqrt{{(R_{a} - R_{b})}^{2} {+ (G_{a} - G_{b})}^{2} {(B_{a} - B_{b})}^{2}},

10. the depth map of binocular tri-dimensional frequency sequence according to claim 7 obtains system, it is characterized in that, described post-processing module comprises:

Read module: be used for reading current pixel point;