CN102999901B

CN102999901B - Based on the processing method after the Online Video segmentation of depth transducer and system

Info

Publication number: CN102999901B
Application number: CN201210395366.4A
Authority: CN
Inventors: 黄美玉; 陈益强; 纪雯
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2012-10-17
Filing date: 2012-10-17
Publication date: 2016-06-29
Anticipated expiration: 2032-10-17
Also published as: CN102999901A

Abstract

The invention discloses the processing method after a kind of Online Video based on depth transducer is split and system, the method includes: step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image；Step 2, detects the prospect hole in this binary segmentation image and fills, obtain the bianry image after prospect holes filling；Step 3, carries out border optimization to the bianry image after this prospect holes filling, obtains the bianry image after optimizing；Step 4, merges by the bianry image after this optimization with virtual background and described frame of video, generates virtual reality fusion image.The invention solves the Online Video based on depth transducer to be segmented in the discontinuous place of the degree of depth and easily make mistakes and prior art exists the problem attended to one thing and lose sight of another in accuracy and real-time, it is provided that the high-quality Online Video based on depth transducer of a kind of requirement of real time splits post-processing approach and virtual reality fusion system.

Description

Based on the processing method after the Online Video segmentation of depth transducer and system

Technical field

The present invention relates to video content analysis, image procossing and computer vision field, particularly relate to the processing method after a kind of Online Video based on depth transducer is split and system.

Background technology

Along with the development of general fit calculation technology and video coding technique and wideband network technology, exchange and the communication jointly of the people being positioned at different place on the internet by long-distance video realization become the new focus of 21 century, and show wide application prospect.Except the application category of traditional Executive Council and office meeting, the mutual range of application of long-distance video has spread over tele-medicine, long-distance education, remote business meeting and law etc. field.Recent years, long-distance video is mutual progressively to providing immersion to experience development, purpose is that the person of letting on has body to face the sensation of wonderland, but current long-distance video there is also a lot of problem, and not as " face-to-face " exchange nature, one of them important problem is exactly that exchange possesses serious spatial separation sense, owing to video interactive user is in different locus, therefore the natural fusion of multiple scene can not be realized by simple image mosaic technology, thus session people can not be allowed to feel, all session members are all among same session space, produce the estrangement of soul.Real-time high-quality Online Video cutting techniques is by foreground extraction accurately, and can realize the user of different physical location just as being in same virtual session space in conjunction with virtual reality fusion technology.

Online Video refers to the prospect (mostly being human body) extracted in Online Video, its objective is to separate object (i.e. prospect) interested for user with the other parts (i.e. background) in frame of video, prospect is carried out special handling, as background is replaced and virtual reality fusion.So-called virtual reality fusion, refers to and the prospect of extraction and virtual scene is fused in a unified three dimensions.In order to obtain high-quality syncretizing effect, the exact boundary of object have to be obtained for the dividing method of foreground extraction, the alpha value of the result available pixel of foreground extraction represents, alpha value is background equal to 0 expression pixel, and alpha value is prospect equal to 1 expression pixel.For soft segmentation or stingy figure, the successive value between alpha value desirable 0 to 1.

Owing to Online Video segmentation can not have user mutual, and speed and the robustness of algorithm are had higher requirement by it, therefore up to the present also in very primary conceptual phase.In order to reach in real time, Online Video dividing method is difficult to use for reference the technology of image and video matting, and input picture can only be carried out binary segmentation frame by frame.A kind of binary segmentation method is the energy minimization problem that segmentation problem is converted into a markov random file, and uses figure cutting algorithm to solve rapidly.In order to obtain high-quality binary segmentation, traditional method is to extract the multiple feature that can be used for distinguishing foreground and background, and these features is dissolved in existing image segmentation framework.Conventional feature has distribution of color, image border, Background etc..In addition, Online Video segmentation can also adopt the feature that some are new, such as depth information.Owing to depth information is for the robustness of illumination variation, therefore, utilize depth information helpful to improving segmentation result.But, the real-time deep image obtained due to depth transducer is easily made mistakes at the discontinuous place of the degree of depth, and highly unstable, thus causing that flicker occurs near border in segmentation result.

In order to obtain good visual effect, improving the foreground segmentation result based on depth information, a kind of method is to merge the multiple extraneous information extracted in coloured image upon splitting to improve segmentation precision.Owing to front, background border place are exactly often the discontinuous place of the degree of depth, therefore there is serious mistake segmentation.A kind of post-processing approach [1] is to utilize the local color model of boundary pixel and boundary model to estimate its mixing alpha value, owing to the method can be adaptively adjusted the width of borderline region, make sharpness of border but not stiff, ensure that the accuracy of single frames segmentation result and slickness to a certain extent.But the method does not utilize the sequential segmentation result of frame of video, it is difficult to ensure the concordance of Video segmentation sequential.Adopt the method scratching figure can also border be optimized, but the method speed of service scratching figure is relatively slow, is difficult to meet the requirement of real-time of Online Video segmentation.

To sum up, it can be seen that the method splitting post processing at present is difficult to reach a balance in accuracy and on spending in real time, it is difficult to meet the requirement of Online Video segmentation.

Summary of the invention

It is an object of the invention to: solve to be segmented in, for the Online Video based on depth transducer, the phenomenon easily made mistakes in border, and there is the problem attended to one thing and lose sight of another in accuracy and real-time in prior art, thus providing the high-quality Online Video based on depth transducer of a kind of requirement of real time to split post-processing approach and a set of virtual reality fusion system.

For achieving the above object, the present invention proposes the processing method after a kind of Online Video based on depth transducer is split, including:

Step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image, and in this bianry image, 0 represents that pixel is background, and 1 represents that pixel is prospect；

Step 2, detects the prospect hole in this bianry image and fills, obtain the bianry image after prospect holes filling；

Step 3, carries out border optimization to the bianry image after this prospect holes filling, obtains the bianry image after optimizing；

Step 4, merges by the bianry image after this optimization with virtual background and described frame of video, generates virtual reality fusion image.

Described step 2 includes:

Step 201, carries out contour detecting to described bianry image, and note profile number is Num, initializes profile enumerator n=1；

Step 202, it is judged that whether n is less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs step 203；Otherwise perform step 3；

Step 203, carries out labelling to the interior zone of the n-th profile, and remembers that this profile and internal inclusion region thereof are a hole；

Step 204, the number of the non-zero depth pixel of the region correspondence position of statistics described hole；

Step 205, it is judged that whether described number is zero, performs step 206 during non-zero, otherwise perform step 211；

Step 206, calculates hole contour edge and weight thereof；

Step 207, calculates perforated context similarity and weight thereof；

Step 208, is weighted described hole contour edge and weight, described hole regional background similarity and weight thereof, obtains hole context similarity；

Step 209, it is judged that whether described hole context similarity is less than given threshold value T^b, less than time, perform step 210, otherwise perform step 211；

Step 210, carries out prospect and fills the bianry image after obtaining described prospect holes filling described hole；

Step 211, performs to add an operation to profile enumerator n, and returns step 202.

Threshold value T in described step 209^bIt is 0.3.

Described step 3 includes:

Step 301, obtains the prospect of bianry image after described prospect holes filling, background border transitional region；

Step 302, calculates the local alpha value of each pixel in described border transition region, and alpha value herein has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background.Successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel；

Step 303, calculates the probability of motion of the relative front cross frame of each pixel in described border transition region；

Step 304, with probability of motion for weights, calculates the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtains mixing alpha value；

Step 305, it is judged that whether described mixing alpha value is more than given threshold value T^f, more than time, the pixel value of correspondence position on bianry image is set to 1, represents that this pixel is prospect；Otherwise the pixel value of correspondence position on bianry image is set to 0, represents that this pixel is background；

Step 306, obtains the bianry image after border optimizes according to described step 305.

Threshold value T in described step 305^fIt is 0.5.

Present invention also offers the process system after a kind of Online Video based on depth transducer is split, including:

Before Online Video, background segment module, the depth image based on frame of video and its correspondence extracts feature, described feature is carried out frame of video prospect, background segment obtains bianry image；

Check packing module, for the prospect hole in this binary segmentation image is detected and fill the bianry image after obtaining prospect holes filling；

Optimization process module, for the bianry image after this prospect holes filling carries out border optimization, obtains the bianry image after optimizing；

Virtual reality fusion module, for the bianry image after described optimization merges virtual background and described frame of video, generates virtual reality fusion synthetic video.

Described inspection packing module includes:

Profile detection module, for described bianry image is carried out contour detecting, note profile number is Num, initializes profile enumerator n=1；

Whether first judge module, be used for the profile number judging the n-th profile less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs hole marks module, otherwise performs optimization process module；

Hole marks module, for the interior zone of described n-th profile carries out labelling, and remembers that this profile and internal inclusion region thereof are a hole；

Statistical module, for adding up the number of the non-zero depth pixel of the region correspondence position of described hole；

Second judge module, is used for judging whether number is zero, performs computing module during non-zero, otherwise performs to add a module；

First computing module, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and utilize described hole contour edge and weight, described hole regional background similarity and weight thereof to be weighted, obtain hole context similarity；

3rd judge module, for judging that whether described hole context similarity is less than given threshold value T^bIf, less than, then enter packing module, otherwise enter and add a module；

Packing module, carries out prospect and fills the bianry image after obtaining prospect holes filling described hole；

Add a module, for performing to add an operation to profile enumerator n, and return judge module 1.

Wherein, the threshold value T in described 3rd judge module^bIt is 0.3.

Described optimization process module includes:

Acquisition module, is used for the prospect of bianry image after obtaining described prospect holes filling, background border transitional region；

Second computing module, for calculating the probability of motion of the relative front cross frame of each pixel in the local alpha value of each pixel in described border transition region, described border transition region, and with probability of motion for weights, calculate the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtain mixing alpha value, alpha value herein has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background.Successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel；

Before, background judge module, for judging that whether described mixing alpha value is more than given threshold value T^f, more than time, the pixel value of correspondence position on bianry image is put 1, represents that this pixel is prospect；Otherwise the pixel value of correspondence position on bianry image is set to 0, represents that this pixel is background；

Bianry image obtains module, for obtaining the bianry image after border optimizes according to described foreground and background.

Wherein, threshold value T before described, in background judge module^fIt is 0.5.

The beneficial effects of the present invention is: the present invention has fully excavated the reason that the Online Video segmentation result based on depth transducer is made mistakes, the hole being mistaken for background for the prospect caused due to degree of depth loss gives a kind of prospect cavity detection filling algorithm, and for due to front, background border depth estimation is inaccurate or loses the mistake point phenomenon that causes and gives and a kind of merge sequential, color, the border optimized algorithm of border and movable information, in conjunction with above two algorithm, the present invention efficiently can improve the quality of the Online Video segmentation result based on depth transducer in real time, and can guarantee that segmentation result concordance along time shaft, avoid video flashes.The prospect cavity detection algorithm of the present invention had both considered the essential reason of generation prospect hole, it is also contemplated that the essential attribute of prospect hole, namely contour edge should be as far as possible little, and context similarity also should be as far as possible low, it is thus possible to the hole of identification prospect well, wiping out background hole.Additionally, the border optimized algorithm of the present invention have employed various features, on the basis of the local alpha value estimated by local color model and boundary model, merge sequential segmentation result further, and using Weighted Edges frame difference with as probability of motion, local alpha value and sequential segmentation result are carried out Decision fusion such that it is able to ensure the temporal consistency of segmentation result.

Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.

Accompanying drawing explanation

Fig. 1 is the process flow figure after the Online Video segmentation of the present invention；

Fig. 2 is the process system schematic after the Online Video segmentation of the present invention；

Fig. 3 is based on the Online Video segmentation result exemplary plot of depth transducer；

Fig. 4 is the flow chart of virtual reality fusion system approach；

Fig. 5 is based on the Online Video segmentation result post processing block flow diagram of depth transducer.

Detailed description of the invention

Recent years, due to the volume miniaturization gradually of depth transducer, cost price is also being gradually lowered, and therefore uses the depth information that depth transducer directly obtains to assist Video segmentation to become pratical and feasible.Depth information, for the robustness of illumination variation and dynamic shadow itself, will improve the quality of image segmentation.Fig. 3 is based on the Online Video segmentation result exemplary plot of a certain frame of video that Kinect depth transducer adopts the scene cut application programming interfaces in OpenNI to obtain, wherein Fig. 3 (a) is frame of video, Fig. 3 (b) is the depth image corresponding with frame of video obtained by depth transducer, Fig. 3 (c) is based on the prospect that Range Image Segmentation goes out, Fig. 3 (d) amplification shows the segmentation result of marked region in Fig. 3 (c), can be seen that from Fig. 3 (c), even if the Online Video segmentation based on depth transducer can also obtain good segmentation result at complex scene, but the mistake at boundary divides phenomenon ratio more serious, and inside prospect, there is also some prospect holes being mistaken for background.Occur that the depth information obtained based on depth transducer that has its source in of above-mentioned phenomenon is easily made mistakes at the discontinuous place of the degree of depth or loses.

In order to solve the problems referred to above, the invention provides the processing method after a kind of Online Video based on depth transducer is split, for improving the Online Video segmentation result based on depth transducer.Fig. 1 is the process flow figure after the Online Video segmentation of the present invention.A kind of based on the processing method after the Online Video segmentation of depth transducer as shown in Figure 1, including:

The hole that the present invention is mistaken for background first against the prospect caused because of degree of depth loss gives a kind of distinguished number based on contour edge and region background color similarity, and it is correctly filled to prospect.Then provide the border optimized algorithm that the sequential of a kind of high-efficiency high-quality is consistent, merge sequential, color, border and movable information and recalculate the border pixel values of the segmentation result based on depth image, eliminate border point phenomenon by mistake.This post-processing approach is embedded in a set of virtual reality fusion system by the present invention further, it is achieved immersion long-distance video is mutual.

Further, described step 2 includes:

Step 203, carries out labelling to the inside inclusion region of the n-th profile, and remembers that this profile and internal inclusion region thereof are a hole；

Step 206, calculates hole contour edge and weight thereof；

Step 207, calculates perforated context similarity and weight thereof；

Step 210, carries out prospect at described hole and fills the bianry image after obtaining prospect holes filling；

Step 211, performs to add an operation to profile enumerator n, and returns step 202；

Described step 3 includes:

Step 302, calculate the local alpha value of each pixel in described border transition region, alpha value herein has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background, successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel；

Step 306, obtains the bianry image after border optimizes according to the foreground and background of described step 305.

Similarly to the prior art, flow chart as shown in Figure 4, is divided into 3 key steps to virtual reality fusion system in the present invention: before video, background segment, segmentation post processing, virtual reality fusion.Wherein before video, background segment be based on depth transducer and realize, concrete implementation method is referred to the existing Online Video cutting techniques based on depth information, is not described in detail herein.Also had a lot of relevant research about virtual reality fusion, concrete implementation method is referred to these achievements in research.Present invention primarily contemplates that the Online Video segmentation result based on depth transducer occurs lose, because of the degree of depth, the prospect hole being mistaken for background and the border caused because depth estimation is inaccurate point phenomenon and prior art by mistake are difficult in accuracy and real-time to reach the present Research that balances, provide a kind of method splitting post processing, and it is described in detail.The input of this post processing is depth image and the initial binary segmentation result of frame of video and correspondence thereof, in addition to keep temporal consistency, this input also includes sequential frame of video and sequential segmentation result, and output is the bianry image through post processing.Post processing includes prospect cavity detection and filling and front, background border optimization, as it is shown in figure 5, wherein, bianry image is as checking and the input of profile detection module in packing module for concrete flow chart；Depth image is as checking and the input of statistical module in packing module；Frame of video is as the input checking the computing module optimizing module with packing module and border, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and for calculating the probability of motion of the relative front cross frame of each pixel in the local alpha value of each pixel in described border transition region, described border transition region；Sequential frame of video optimizes the input of the computing module of module as border, for calculating the probability of motion of the relative front cross frame of each pixel in described border transition region；Sequential local alpha value image optimizes the input of the computing module of module as border, for calculating the mixing alpha value of each pixel in described border transition region..

First the present invention introduces prospect cavity detection and filling.Shown in rectangular area (the upper left region in Fig. 3 (d)) in Fig. 3 (c), when the hair of prospect (target person) hangs down loosely on the shoulders, there is the phenomenon that the degree of depth is lost in the intersection at hair Yu shoulder, forming prospect and be mistaken for the hole of background, the appearance of these holes will largely effect on the precision of segmentation.When these prospect holes are sufficiently small time, generally can adopt mathematical morphological operation, namely dilation operation is utilized to realize hole repairing, but, when prospect hole is bigger, use the dilation operation of aforesaid big mask structure when filling up prospect hole, it is likely that to cause and carry out the background area in prospect filling by mistake.Owing to the action of standing akimbo of target person also can be internally formed hole in prospect, shown in the rectangular area (the mid portion region in Fig. 3 (d)) in Fig. 3 (c), so the institute's hole within prospect broadly can not being filled.The characteristic of the depth data that the present invention obtains according to depth transducer and the intrinsic propesties of prospect hole, give the distinguished number of a kind of prospect hole.

The present invention first passes through profile algorithm and finds all profiles in binary segmentation image, then the inside inclusion region of each profile carries out labelling successively, and each profile and interior zone thereof are designated as a hole Φ.For each hole, first the present invention travels through the depth data of statistics position, perforated, judge whether that the degree of depth is not the pixel of 0, if existed, then this hole is made without filling, because this hole is not due to what degree of depth disappearance produced, otherwise this hole becomes candidate's prospect hole, and it is further differentiated by the context similarity that the weighted sum based on contour edge and region background color similarity calculates, if the context similarity of candidate's prospect hole is less than threshold value T^b, then it is prospect by this candidate's prospect holes filling in bianry image.

One by one the feature utilized in above-mentioned prospect hole distinguished number is described explanation below.

Owing to the contour edge of prospect hole is typically small, the contour edge of background hole is relatively big, before therefore contour edge is used as one-dimensional differentiation, the feature of background hole.Gradient is usually used in calculating the edge of pixel, but owing to dividing phenomenon serious based on the Video segmentation of the depth information mistake on border, even the background hole therefore within prospect, its profile is also not necessarily the border of real prospect and background, mean that the gradient of contour pixel might not be big, thus before will be unable to distinguish based on the gradient of profile, background hole.Based on above-mentioned consideration, the invention provides the approximate data at a kind of edge, specifically, the present embodiment adopts sharpness of border degree to estimate the edge of pixel.First frame of video converted to gray level image by the present embodiment and it is carried out gaussian filtering, then greyscale color space being divided into L=32 color sub-spaces B^l(l=1,2 ..., L).Note N_p(L_s) be the neighborhood window size of pixel p it is L_sNeighborhood, then N_p(Ls) foreground pixel and background pixel must be comprised in, because p is the pixel on profile simultaneously.Note N_p(L_s) in foreground pixel and background pixel sample set respectivelyWithIf N_p(L_s) in color sub-spaces B^lComprise sample set simultaneouslyWithIn element, then it is assumed that it is ambiguous.Note N_pFor the sample set that all ambiguous color sub-spaces compriseWithThe sum of middle color card, then the sharpness of border degree of pixel p is:

γ_{p} = 1 - \frac{N_{p}}{L_{s}^{2}},

Thus the edge e of pixel p_p=γ_p, the edge of whole profile can be obtained by the mean value calculation at the edge of all contour pixels.Computing formula is as follows:

e_{Φ^{c}} = \frac{1}{M} \underset{p &Element; Φ^{c}}{Σ} e_{p},

Wherein Φ^cBeing the profile of Φ, M is hole profile Φ^cOn the total number of pixel.

Owing to the regional background similarity of prospect hole is low, the regional background similarity of background hole is high, before therefore regional background similarity is also used as one-dimensional differentiation, the feature of background hole.In order to calculate the context similarity of perforated, it is necessary first to the background color of pixel is modeled.It not absolute rest due to scene, the such as interference factor such as illumination variation and dynamic shadow all can occur, it is therefore desirable to adopt the model of real-time update that background color is modeled.The present invention adopts accumulation difference color histogram that the background color of each pixel is modeled.For each pixel in frame of video, and if only if, and it is labeled as background in the initial segmentation of t, and when its degree of depth is not 0, the color of this pixel is referred to as background color, and carries out accumulation histogram modeling.In the present embodiment, it is 32 sub-blocks that greyscale color space is divided evenly, and the difference color histogram of t pixel p is:

H_{p} (t) = [h_{p}^{1} (t), h_{p}^{2} (t), . . ., h_{p}^{L} (t)],

L=32

WhereinCharacterize the distribution of color of t pixel p at the l color block B^lIn frequency, computing formula is:

h_{p}^{l} (t) = β * h_{p}^{l} (t - 1) + δ (l_{p} (t) = l,

a_{p}^{b} (t) = 0,

d_p(t)≠0)

Wherein β=0.95, is used for weakening historical background color for effect current, background color model.The effect of δ (.) function is when parameter is true value expression formula, and functional value is 1, is otherwise 0.L_p(t),d_pT () represents the pixel p color sub-spaces label in t, initial binary dividing mark and depth value respectively.According to formulaDifference color histogram H to above-mentioned each pixel p_pT () performs normalization operation after, the similarity of each candidate's hole and difference color histogram can be calculated by following formula and obtain:

Wherein Φ^rBeing the interior zone of hole Φ, N is perforated Φ^rThe total number of pixel.

In the present embodiment, use contour edgeWith regional background similarityWeighted sum p_ΦCalculate the similarity of hole and background.Owing to the reliability of the pure and fresh degree in border based on color block and the color complexity of regional area are closely coupled, thereforeWeight w_cDepend on the number of neighborhood non-zero color subspace, can be calculated by following formula and obtain:

w_{c} = 1 - \frac{1}{M} (\frac{Σ_{p &Element; Φ^{c}} N_{p}^{n}}{L}),

WhereinCharacterize the number of non-zero color block in the neighborhood of pixel p.With w_cCalculating the same, background pixelWeight w_rRelevant to the confidence level of background color model, scene that and if only if exists a small amount of illumination variation, when namely the number of the non-zero color subspace in the background color model of perforated is less, background color model is only reliably.Therefore, w_rCan calculate according to the following formula:

w_{r} = 1 - \frac{1}{N} \underset{p &Element; Φ^{r}}{Σ} (\frac{Σ_{l = 1}^{L} δ (h_{p}^{l} (t) &NotEqual; 0)}{L})

Thus the similarity p of hole and background_ΦCan be obtained by following formula estimation:

p_{Φ} = w * {lh}_{Φ^{r}} + (1 - w) * e_{Φ^{c}},

Wherein,Work as p_ΦLess than given empirical value T^bTime, this area filling is prospect by the present embodiment.What deserves to be explained is, T in the present embodiment^bIt is set to 0.3.

It is described below and the present invention is directed to owing to the border of foreground and background occurs that the mistake that the degree of depth is lost or depth estimation mistake causes splits the border optimized algorithm that the sequential provided is consistent.Due to the intersection in foreground and background, the degree of depth is discrete, and therefore these intersections often lose depth information, occurs that prospect is mistaken for the phenomenon of background.On the other hand, even if obtaining depth information at intersection, also often occur estimating inaccurate situation, it is easy to occur being mistaken for background the phenomenon of prospect.Can significantly find out that the depth information that prospect head border is lost is maximum from Fig. 3 (c) rectangular area (the bottom left section region Fig. 3 (d)), misjudgment phenomenon is serious, additionally, then there is the phenomenon that background is significantly judged to prospect by mistake in hand edge or garment edge in prospect, in Fig. 3 (c) shown in rectangular area (right areas in Fig. 3 (d)).Although the mistake frame by frame caused by above-mentioned reason is smaller, but flicker can be caused in video, have a strong impact on visual effect.In order to eliminate these segmentations by mistake, it is necessary to utilize the color image information of corresponding depth image that the pixel of the juncture area of foreground and background is re-started labelling.In the present embodiment, adopt the thought scratching figure, first calculate the alpha value of juncture area, then again through certain threshold value quantizing alpha value, it is achieved the binary segmentation of borderline region.

The present embodiment adopts the method proposed in the first trifle of the second section of paper " Online Video segmentation real-time post-treatment " be designated as method [1], find the pixel near border.Concrete implementation method is: the set of note boundary pixel is Ω, then Ω can be defined by following formula:

Ω(L_e)={p|τ₀< s_p<τ₁},

s_{p} = \frac{1}{L_{e}^{2}} \underset{q &Element; N_{p} (L_{e})}{Σ} a_{q}^{s},

Wherein N_p(L_e) be the window size of pixel p it is L_e×L_eNeighborhood；For pixel q binary segmentation label after prospect holes filling；S_pMeansigma methods for pixel binary segmentation labels all in the neighborhood of pixel p；Ω (L_e) for the collection of pixels in boundaries on either side one belt-like zone, the width in this region is by parameter τ₀, τ₁Control, 0 < τ₀<τ₁< 1.

The depth information obtained at the discontinuous place of the degree of depth due to depth transducer is highly unstable, even if prospect remains stationary, its depth information also there will be very big fluctuation, so that the segmentation result based on depth information is also rather unstable.Therefore, the information being based only upon present frame removes to estimate the alpha value of pending pixel, and when prospect is static, the alpha value of front and back two frame is also likely to be different, can not eliminate interframe flicker.In order to keep the concordance of segmentation result, it is possible to use time sequence information revises the alpha value of present frame.Further, before and after certain pixel, the motion of two frames is more little, and segmentation result should be more similar.Based on above-mentioned analysis, present embodiments provide a kind of border optimized algorithm based on sequential, color, border and movable information.Local color model that this algorithm proposes first by existing method [1] and boundary function calculate the local alpha value of each pending pixel, then a kind of simple estimation method is adopted, estimate the probability of motion figure of present frame and adjacent front cross frame, then using probability of motion as weights, the weighted sum mixing alpha value as pending pixel of local alpha value and sequential alpha value is asked for.

One by one the feature utilized in above-mentioned border optimized algorithm is described explanation below.

The present embodiment adopts the weighted sum of color alpha value and border alpha value to calculate the local alpha value of pending pixel.Wherein color alpha valueIt is based on the local color model calculating of pixel.If N_p(L_b) be the window size of pixel p it is L_bNeighborhood, N_p(L_s) in foreground pixel and background pixel sample set respectivelyWithThen color alpha valueCan be calculated by following formula:

α_{p}^{c} = \frac{P (c_{p} | M_{p}^{F})}{P (c_{p} | M_{p}^{F}) + P (c_{p} | M_{p}^{B})},

Wherein c_pIt is the RGB color of pixel p,WithBeing foreground color similarity and the background color similarity of pixel p respectively, wherein color model is uniform piecemeal mixed Gauss model.

Due to prospect, background color very close to region, color alpha value has obvious mistake, causes that wide translucent area and border are rough.In this case, it should defer to the result of binary segmentation.Existing method [1] uses one or four bound of parameter functions to calculate the border alpha value of pixel, and intuitively, a pixel distance border is more remote, then it should be more low with the similarity of prospect, and border alpha value also should be less.Specifically, border alpha value is calculated by following formula:

Parameter δ_p,a_p,b_p,c_pCalculation procedure be referred to existing method [1].

In the present embodiment, the weight of color alpha value depends on the definition on border, and the weight of border alpha value depends on the error rate of binary segmentation, and the computing formula of local alpha value is as follows:

a_{p}^{l} = (1 - w_{p}) α_{p}^{c} + w_{p} α_{p}^{b},

Wherein

w_{p} = \frac{w_{p}^{b}}{w_{p}^{b} + w_{p}^{c}}, w_{p}^{b} = \frac{1}{9} \underset{q &Element; N_{p} (3)}{Σ} | α_{q}^{c} - α_{q}^{s} |,

w_{p}^{c} = γ_{p} .

Owing to frame of video is caught continuously, therefore having certain relatedness between frame and frame, segmentation result there is also certain dependence.Intuitively, if pixel front and back two frame remains stationary, then its classification should keep consistent.Therefore the sequential alpha value seriality at present frame can be weighed according to the probability of motion of pixel.Owing to the estimation process of sequential local alpha value have employed local color model and local frontier properties, the motion summation of the neighborhood of each pixel is therefore used more can accurately to reflect the motion of a pixel.The present embodiment adopts the Weighted Edges frame difference of neighborhood of pixels and calculates the probability of motion of pixel.Assume that pixel p probability of motion from t-1 moment to t is defined asThen its computing formula is as follows:

p_{p}^{m} (t - 1) = \frac{\underset{q &Element; N_{p} (L_{s})}{Σ} f_{q} (t - 1) * e_{q} (t)}{\underset{q &Element; N_{p} (L_{s})}{Σ} e_{q} (t)},

Wherein f_q(t-1) it is that pixel q is poor relative to the frame in t-1 moment in t, in order to as probability Estimation, frame difference herein is that the frame after normalization is poor, e_qT () is then the gradient of current time pixel q.In order to remove noise, frame difference is not directly the gray level image of front and back two frame is made difference to obtain, but front and back two two field picture after carrying out Gaussian smoothing is made difference and obtains, and namely the computing formula of frame difference is as follows:

f_p(t-1)=Norm(|G(g_p(t))-G(g_p(t-1)) |),

Wherein Norm (.) represents normalized function, and G (.) represents the gaussian kernel function that yardstick is 0.8, g_p(t) and g_p(t-1) color value of respectively t and t-1 moment pixel p.The edge of image is then obtain by carrying out convolution with the first derivative of Gaussian function, it may be assumed that

e_{p} (t) = | &dtri; G (g_{p} (t)) |,

The present embodiment adopt same method calculate pixel p in the t probability of motion relative to the t-2 momentWhat deserves to be explained is, pixel p is in the t probability of motion relative to tIt is 0.

The present embodiment uses the weighted sum of local alpha value and sequential alpha value to estimate the mixing alpha value of each pending pixel, and wherein weight coefficient is determined by probability of motion, and the specific formula for calculation of mixing alpha value is as follows:

a_{p}^{h} (t) = {\overset{\cdot}{p}}_{p} (t - 2) a_{p}^{l} (t - 2) + {\overset{\cdot}{p}}_{p} (t - 1) a_{p}^{l} (t - 1) + {\overset{\cdot}{p}}_{p} (t) a_{p}^{l} (t),

Wherein WithIt is pixel p office's alpha value in t-2 moment, t-1 moment and t respectively, It is the weight coefficient after normalization, it may be assumed that

{\overset{\cdot}{p}}_{p} (t - 2) = \frac{p_{p} (t - 2)}{p_{p} (t - 2) + p_{p} (t - 1) + p_{p} (t)}

{\overset{\cdot}{p}}_{p} (t - 1) = \frac{p_{p} (t - 1)}{p_{p} (t - 2) + p_{p} (t - 1) + p_{p} (t)}

{\overset{\cdot}{p}}_{p} (t) = \frac{p_{p} (t)}{p_{p} (t - 2) + p_{p} (t - 1) + p_{p} (t)},

Wherein

p_{p} (t - 2) = 1 - p_{p}^{m} (t - 2),

p_{p} (t - 1) = 1 - p_{p}^{m} (t - 1),

p_{p} (t) = 1 - p_{p}^{m} (t)

It is the pixel p probability of motion in t relative t-2 moment, t-1 moment and t respectively, the computing formula of above-mentioned mixing alpha value shows that pixel p is more big relative to the probability of motion in a certain moment in t, and the alpha value in a certain moment is more little in the seriality of t.

The present invention also proposes the process system after a kind of Online Video based on depth transducer is split, as in figure 2 it is shown, Fig. 2 is the process system schematic after the Online Video segmentation of the present invention, this system includes:

Check packing module, for the prospect hole in this bianry image is detected and fill the bianry image after obtaining prospect holes filling；

Described inspection packing module includes:

Hole marks module, the inside inclusion region for described n-th profile is carried out carries out labelling, and remembers that this profile and internal inclusion region thereof are a hole；

Packing module, fills the bianry image after obtaining prospect holes filling for described hole carries out prospect；

Described optimization process module includes:

Second computing module, for calculating the local alpha value of each pixel in described border transition region, the probability of motion of the relative front cross frame of each pixel in described border transition region, and with probability of motion for weights, calculate the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtain mixing alpha value, wherein alpha value has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background, successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, it is more similar to prospect that alpha value more levels off to 1 expression pixel；

Before, background judge module, for judging that whether described mixing alpha value is more than given threshold value T^f, more than time, the pixel of correspondence position on bianry image is set to 1, represents that this pixel is prospect；Otherwise the pixel of correspondence position on bianry image is set to 0, represents that this pixel is background；

Similarly to the prior art, flow chart as shown in Figure 4, is divided into 3 key steps to virtual reality fusion system in the present invention: before video, background segment, segmentation post processing, virtual reality fusion.Wherein before video, background segment be based on depth transducer and realize, concrete implementation method is referred to the existing Online Video cutting techniques based on depth information, is not described in detail herein.Also had a lot of relevant research about virtual reality fusion, concrete implementation method is referred to these achievements in research.Present invention primarily contemplates that the Online Video segmentation result based on depth transducer occurs lose, because of the degree of depth, the prospect hole being mistaken for background and the border caused because depth estimation is inaccurate point phenomenon and prior art by mistake are difficult in accuracy and real-time to reach the present Research that balances, provide a kind of method splitting post processing, and it is described in detail.The input of this post processing is depth image and the initial binary segmentation result of frame of video and correspondence thereof, in addition to keep temporal consistency, the input of this process also includes sequential frame of video and sequential segmentation result, and output is the bianry image through post processing.This process comprises prospect cavity detection and filling and front, background border optimization, and concrete flow chart is as shown in Figure 5.

The present invention first passes through profile algorithm and finds all profiles in binary segmentation image, and the inside inclusion region then successively each profile carried out carries out labelling, and each profile and interior zone thereof are designated as a hole Φ.For each hole, first the present invention travels through the depth data of statistics position, perforated, judge whether that the degree of depth is not the pixel of 0, if existed, then this hole is made without filling, because this hole is not due to what degree of depth disappearance produced, otherwise this hole becomes candidate's prospect hole, and it is further differentiated by the context similarity that the weighted sum based on contour edge and region background color similarity calculates, if the context similarity of candidate's prospect hole is less than threshold value T^b, then it is prospect by this candidate's prospect holes filling in bianry image.

Owing to the contour edge of prospect hole is typically small, the contour edge of background hole is relatively big, before therefore contour edge is used as one-dimensional differentiation, the feature of background hole.Gradient is usually used in calculating the edge of pixel, but owing to dividing phenomenon serious based on the Video segmentation of the depth information mistake on border, even the background hole therefore within prospect, its profile is also not necessarily the border of real prospect and background, mean that the gradient of contour pixel might not be big, thus before will be unable to distinguish based on the gradient of profile, background hole.Based on above-mentioned consideration, the invention provides the approximate data at a kind of edge, specifically, the present embodiment adopts sharpness of border degree to estimate the edge of pixel.First frame of video converted to gray level image by the present embodiment and it is carried out gaussian filtering, then greyscale color space being divided into L=32 color sub-spaces B^l(l=1,2 ..., L).Note N_p(L_s) be the neighborhood window size of pixel p it is L_sNeighborhood, then N_p(L_s) in must comprise foreground pixel and background pixel simultaneously because p is the pixel on profile.Note N_p(L_s) in foreground pixel and background pixel sample set respectivelyWithIf N_p(L_s) in color sub-spaces B^lComprise sample set simultaneouslyWithIn element, then it is assumed that it is ambiguous.Note N_pFor the sample set that all ambiguous color sub-spaces compriseWithThe sum of middle color card, then the sharpness of border degree of pixel p is:

γ_{p} = 1 - \frac{N_{p}}{L_{s}^{2}},

e_{Φ^{c}} = \frac{1}{M} \underset{p &Element; Φ^{c}}{Σ} e_{p},

H_{p} (t) = [h_{p}^{1} (t), h_{p}^{2} (t), . . ., h_{p}^{L} (t)],

L=32

h_{p}^{l} (t) = β * h_{p}^{l} (t - 1) + δ (l_{p} (t) = l,

a_{p}^{b} (t) = 0,

d_p(t)≠0)

w_{c} = 1 - \frac{1}{M} (\frac{Σ_{p &Element; Φ^{c}} N_{p}^{n}}{L}),

w_{r} = 1 - \frac{1}{N} \underset{p &Element; Φ^{r}}{Σ} (\frac{Σ_{l = 1}^{L} δ (h_{p}^{l} (t) &NotEqual; 0)}{L})

p_{Φ} = w * {lh}_{Φ^{r}} + (1 - w) * e_{Φ^{c}},

It is described below and the present invention is directed to owing to the border of foreground and background occurs that the mistake that the degree of depth is lost or depth estimation mistake causes splits the border optimized algorithm that the sequential provided is consistent.Due to the intersection in foreground and background, the degree of depth is discrete, and therefore these intersections often lose depth information, occurs that prospect is mistaken for the phenomenon of background.On the other hand, even if obtaining depth information at intersection, also often occur estimating inaccurate situation, it is easy to occur being mistaken for background the phenomenon of prospect.Can significantly find out that the depth information that prospect head border is lost is maximum from Fig. 3 (c) rectangular area (the bottom left section region Fig. 3 (d)), misjudgment phenomenon is serious, additionally, then there is the phenomenon that background is significantly judged to prospect by mistake in hand edge or garment edge in prospect, in Fig. 3 (c) shown in rectangular area (right areas in Fig. 3 (d)).Although the mistake frame by frame caused by above-mentioned reason is smaller, but flicker can be caused in video, have a strong impact on visual effect.In order to eliminate these segmentations by mistake, it is necessary to utilize the color image information of corresponding depth image that the pixel of the juncture area of foreground and background is re-started labelling.In the present embodiment, adopt the thought scratching figure, first calculate the alpha value of juncture area, then again through certain threshold value quantizing alpha value, realizing the binary segmentation of borderline region, wherein alpha value has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background.Successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel.

Ω(L_e)={p|τ₀<s_p<τ1},

s_{p} = \frac{1}{L_{e}^{2}} \underset{q &Element; N_{p} (L_{e})}{Σ} a_{q}^{s}

Wherein N_p(L_e) be the window size of pixel p it is L_e×L_eNeighborhood；For pixel q binary segmentation label after prospect holes filling；S_pMeansigma methods for pixel binary segmentation labels all in the neighborhood of pixel p；Ω (L_e) for the collection of pixels in boundaries on either side one belt-like zone, the width in this region is by parameter τ₀, τ₁Control, 0 < τ₀<τ₁<1。

α_{p}^{c} = \frac{P (c_{p} | M_{p}^{F})}{P (c_{p} | M_{p}^{F}) + P (c_{p} | M_{p}^{B})},

a_{p}^{b} = \frac{a_{p}}{1 + e^{(c_{p} - s_{p}) / δ_{p}}} + b_{p},

a_{p}^{l} = (1 - w_{p}) α_{p}^{c} + w_{p} α_{p}^{b},

Wherein

w_{p} = \frac{w_{p}^{b}}{w_{p}^{b} + w_{p}^{c}},

w_{p}^{b} = \frac{1}{9} \underset{q &Element; N_{p} (3)}{Σ} | α_{q}^{c} - α_{q}^{s} |, w_{p}^{c} = γ_{p} .

Owing to frame of video is caught continuously, therefore having certain relatedness between frame and frame, segmentation result there is also certain dependence.Intuitively, if pixel front and back two frame remains stationary, then its classification should keep consistent.Therefore the sequential local alpha value seriality at present frame can be weighed according to the probability of motion of pixel.Owing to the estimation process of sequential local alpha value have employed local color model and local frontier properties, the motion summation of the neighborhood of each pixel is therefore used more can accurately to reflect the motion of a pixel.The present embodiment adopts the Weighted Edges frame difference of neighborhood of pixels and calculates the probability of motion of pixel.Assume that pixel p probability of motion from t-1 moment to t is defined asThen its computing formula is as follows:

p_{p}^{m} (t - 1) = \frac{\underset{q &Element; N_{p} (L_{s})}{Σ} f_{q} (t - 1) * e_{q} (t)}{\underset{q &Element; N_{p} (L_{s})}{Σ} e_{q} (t)},

Wherein f_p(t-1) it is that pixel q is poor relative to the frame in t-1 moment in t, in order to as probability Estimation, frame difference herein is that the frame after normalization is poor, e_qT () is then the gradient of current time pixel q.In order to remove noise, frame difference is not directly the gray level image of front and back two frame is made difference to obtain, but front and back two two field picture after carrying out Gaussian smoothing is made difference and obtains, and namely the computing formula of frame difference is as follows:

f_p(t-1)=Norm(|G(g_p(t))-G(g_p(t-1)) |),

e_{p} (t) = | &dtri; G (g_{p} (t)) |,

a_{p}^{h} (t) = {\overset{\cdot}{p}}_{p} (t - 2) a_{p}^{l} (t - 2) + {\overset{\cdot}{p}}_{p} (t - 1) a_{p}^{l} (t - 1) + {\overset{\cdot}{p}}_{p} (t) a_{p}^{l} (t),

{\overset{\cdot}{p}}_{p} (t - 2) = \frac{p_{p} (t - 2)}{p_{p} (t - 2) + p_{p} (t - 1) + p_{p} (t)}

{\overset{\cdot}{p}}_{p} (t - 1) = \frac{p_{p} (t - 1)}{p_{p} (t - 2) + p_{p} (t - 1) + p_{p} (t)}

{\overset{\cdot}{p}}_{p} (t) = \frac{p_{p} (t)}{p_{p} (t - 2) + p_{p} (t - 1) + p_{p} (t)},

Wherein

p_{p} (t - 2) = 1 - p_{p}^{m} (t - 2),

p_{p} (t - 1) = 1 - p_{p}^{m} (t - 1),

p_{p} (t) = 1 - p_{p}^{m} (t)

Owing to prospect cavity detection provided by the invention and filling algorithm have taken into full account the intrinsic propesties of essence and the prospect hole producing prospect hole, therefore can the hole of identification prospect well, wiping out background hole, and because the calculating process of algorithm is very simple, therefore speed is very fast.On the other hand, border provided by the invention optimized algorithm has merged various features, both background color will be caused to overflow at front, background similar area as stingy nomography, became to obscure without making border as emergence algorithm, simultaneously, being different from existing method [1], the present invention has merged time sequence information, can eliminate the segmentation result upheaval caused because depth estimation is inaccurate well so that the phenomenon of virtual reality fusion video flashes.The more important thing is, the feature adopted due to border provided by the invention optimized algorithm has quick calculation method, therefore, it is possible to meet real-time demand.

Certainly; the present invention also can have other various embodiments; when without departing substantially from present invention spirit and essence thereof, those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and deform the protection domain that all should belong to the claims in the present invention.

Claims

1. one kind based on depth transducer Online Video split after processing method, it is characterised in that including:

Step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image；

Step 2, detects the prospect hole in this bianry image and fills, obtain the bianry image of prospect holes filling；

Step 3, carries out border optimization to the bianry image of this prospect holes filling, obtains the bianry image after optimizing, and wherein said step 3 includes:

Step 301, obtains the prospect of bianry image of described prospect holes filling, background border transitional region；

Step 302, calculate the local alpha value of each pixel in described border transition region, wherein alpha value is for reflecting the synthesis situation of foreground and background, alpha value takes the successive value between 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel；

Step 305, it is judged that whether described mixing alpha value is more than given threshold value T^f, more than time, the pixel of correspondence position on the bianry image of described prospect holes filling is set to 1, represents that this pixel is prospect；Otherwise the pixel of correspondence position on the bianry image of described prospect holes filling is set to 0, represents that this pixel is background；

Step 306, obtains the bianry image after border optimizes；

Step 4, merges virtual background and described frame of video according to the bianry image after this optimization, generates virtual reality fusion image.

2. the processing method after Online Video segmentation as claimed in claim 1, it is characterised in that described step 2 includes:

Step 206, calculates hole contour edge and weight thereof；

Step 207, calculates perforated context similarity and weight thereof；

Step 210, carries out prospect at described hole and fills the bianry image obtaining prospect holes filling；

3. the processing method after Online Video segmentation as claimed in claim 2, it is characterised in that the threshold value T in described step 209^bIt is 0.3.

4. the processing method after Online Video segmentation as claimed in claim 1, it is characterised in that threshold value T in described step 305^fIt is 0.5.

5. one kind based on depth transducer Online Video split after process system, it is characterised in that including:

Optimization process module, for the bianry image after this prospect holes filling is carried out border optimization, obtaining the bianry image after optimizing, wherein said optimization process module includes acquisition module, the second computing module, front, background judge module, bianry image acquisition module:

Described acquisition module, for obtaining the prospect of bianry image of described prospect holes filling, background border transitional region；

Described second computing module, for calculating the local alpha value of each pixel in described border transition region, the probability of motion of the relative front cross frame of each pixel in described border transition region, and with probability of motion for weights, calculate the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtain mixing alpha value, wherein alpha value is for reflecting the synthesis situation of foreground and background, alpha value takes the successive value between 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, it is more similar to prospect that alpha value more levels off to 1 expression pixel；

Before described, background judge module, for judging that whether described mixing alpha value is more than given threshold value T^f, more than time, the pixel value of correspondence position on the bianry image of described prospect holes filling is set to 1, represents that this pixel is prospect；Otherwise the pixel value of correspondence position on the bianry image of described prospect holes filling is set to 0, represents that this pixel is background；

Described bianry image obtains module, for obtaining the bianry image after border optimizes according to described foreground and background；

Virtual reality fusion module, for merging virtual background and described frame of video according to the bianry image after described optimization, generates virtual reality fusion synthetic video.

6. the process system after Online Video segmentation as claimed in claim 5, it is characterised in that described inspection packing module includes:

Hole marks module, for the inside inclusion region of described n-th profile carries out labelling, and remembers that this profile and internal inclusion region thereof are a hole；

Second judge module, is used for judging whether number is zero, performs the first computing module during non-zero, otherwise performs to add a module；

Described first computing module, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and utilize described hole contour edge and weight, described hole regional background similarity and weight thereof to be weighted, obtain hole context similarity；

Add a module, for performing to add an operation to profile enumerator n, and return the first judge module.

7. the process system after Online Video segmentation as claimed in claim 6, it is characterised in that the threshold value T in described 3rd judge module^bIt is 0.3.

8. the process system after Online Video segmentation as claimed in claim 5, it is characterised in that threshold value T before described, in background judge module^fIt is 0.5.