CN102999901B - Based on the processing method after the Online Video segmentation of depth transducer and system - Google Patents

Based on the processing method after the Online Video segmentation of depth transducer and system Download PDF

Info

Publication number
CN102999901B
CN102999901B CN201210395366.4A CN201210395366A CN102999901B CN 102999901 B CN102999901 B CN 102999901B CN 201210395366 A CN201210395366 A CN 201210395366A CN 102999901 B CN102999901 B CN 102999901B
Authority
CN
China
Prior art keywords
prospect
pixel
bianry image
background
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210395366.4A
Other languages
Chinese (zh)
Other versions
CN102999901A (en
Inventor
黄美玉
陈益强
纪雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201210395366.4A priority Critical patent/CN102999901B/en
Publication of CN102999901A publication Critical patent/CN102999901A/en
Application granted granted Critical
Publication of CN102999901B publication Critical patent/CN102999901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses the processing method after a kind of Online Video based on depth transducer is split and system, the method includes: step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image;Step 2, detects the prospect hole in this binary segmentation image and fills, obtain the bianry image after prospect holes filling;Step 3, carries out border optimization to the bianry image after this prospect holes filling, obtains the bianry image after optimizing;Step 4, merges by the bianry image after this optimization with virtual background and described frame of video, generates virtual reality fusion image.The invention solves the Online Video based on depth transducer to be segmented in the discontinuous place of the degree of depth and easily make mistakes and prior art exists the problem attended to one thing and lose sight of another in accuracy and real-time, it is provided that the high-quality Online Video based on depth transducer of a kind of requirement of real time splits post-processing approach and virtual reality fusion system.

Description

Based on the processing method after the Online Video segmentation of depth transducer and system
Technical field
The present invention relates to video content analysis, image procossing and computer vision field, particularly relate to the processing method after a kind of Online Video based on depth transducer is split and system.
Background technology
Along with the development of general fit calculation technology and video coding technique and wideband network technology, exchange and the communication jointly of the people being positioned at different place on the internet by long-distance video realization become the new focus of 21 century, and show wide application prospect.Except the application category of traditional Executive Council and office meeting, the mutual range of application of long-distance video has spread over tele-medicine, long-distance education, remote business meeting and law etc. field.Recent years, long-distance video is mutual progressively to providing immersion to experience development, purpose is that the person of letting on has body to face the sensation of wonderland, but current long-distance video there is also a lot of problem, and not as " face-to-face " exchange nature, one of them important problem is exactly that exchange possesses serious spatial separation sense, owing to video interactive user is in different locus, therefore the natural fusion of multiple scene can not be realized by simple image mosaic technology, thus session people can not be allowed to feel, all session members are all among same session space, produce the estrangement of soul.Real-time high-quality Online Video cutting techniques is by foreground extraction accurately, and can realize the user of different physical location just as being in same virtual session space in conjunction with virtual reality fusion technology.
Online Video refers to the prospect (mostly being human body) extracted in Online Video, its objective is to separate object (i.e. prospect) interested for user with the other parts (i.e. background) in frame of video, prospect is carried out special handling, as background is replaced and virtual reality fusion.So-called virtual reality fusion, refers to and the prospect of extraction and virtual scene is fused in a unified three dimensions.In order to obtain high-quality syncretizing effect, the exact boundary of object have to be obtained for the dividing method of foreground extraction, the alpha value of the result available pixel of foreground extraction represents, alpha value is background equal to 0 expression pixel, and alpha value is prospect equal to 1 expression pixel.For soft segmentation or stingy figure, the successive value between alpha value desirable 0 to 1.
Owing to Online Video segmentation can not have user mutual, and speed and the robustness of algorithm are had higher requirement by it, therefore up to the present also in very primary conceptual phase.In order to reach in real time, Online Video dividing method is difficult to use for reference the technology of image and video matting, and input picture can only be carried out binary segmentation frame by frame.A kind of binary segmentation method is the energy minimization problem that segmentation problem is converted into a markov random file, and uses figure cutting algorithm to solve rapidly.In order to obtain high-quality binary segmentation, traditional method is to extract the multiple feature that can be used for distinguishing foreground and background, and these features is dissolved in existing image segmentation framework.Conventional feature has distribution of color, image border, Background etc..In addition, Online Video segmentation can also adopt the feature that some are new, such as depth information.Owing to depth information is for the robustness of illumination variation, therefore, utilize depth information helpful to improving segmentation result.But, the real-time deep image obtained due to depth transducer is easily made mistakes at the discontinuous place of the degree of depth, and highly unstable, thus causing that flicker occurs near border in segmentation result.
In order to obtain good visual effect, improving the foreground segmentation result based on depth information, a kind of method is to merge the multiple extraneous information extracted in coloured image upon splitting to improve segmentation precision.Owing to front, background border place are exactly often the discontinuous place of the degree of depth, therefore there is serious mistake segmentation.A kind of post-processing approach [1] is to utilize the local color model of boundary pixel and boundary model to estimate its mixing alpha value, owing to the method can be adaptively adjusted the width of borderline region, make sharpness of border but not stiff, ensure that the accuracy of single frames segmentation result and slickness to a certain extent.But the method does not utilize the sequential segmentation result of frame of video, it is difficult to ensure the concordance of Video segmentation sequential.Adopt the method scratching figure can also border be optimized, but the method speed of service scratching figure is relatively slow, is difficult to meet the requirement of real-time of Online Video segmentation.
To sum up, it can be seen that the method splitting post processing at present is difficult to reach a balance in accuracy and on spending in real time, it is difficult to meet the requirement of Online Video segmentation.
Summary of the invention
It is an object of the invention to: solve to be segmented in, for the Online Video based on depth transducer, the phenomenon easily made mistakes in border, and there is the problem attended to one thing and lose sight of another in accuracy and real-time in prior art, thus providing the high-quality Online Video based on depth transducer of a kind of requirement of real time to split post-processing approach and a set of virtual reality fusion system.
For achieving the above object, the present invention proposes the processing method after a kind of Online Video based on depth transducer is split, including:
Step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image, and in this bianry image, 0 represents that pixel is background, and 1 represents that pixel is prospect;
Step 2, detects the prospect hole in this bianry image and fills, obtain the bianry image after prospect holes filling;
Step 3, carries out border optimization to the bianry image after this prospect holes filling, obtains the bianry image after optimizing;
Step 4, merges by the bianry image after this optimization with virtual background and described frame of video, generates virtual reality fusion image.
Described step 2 includes:
Step 201, carries out contour detecting to described bianry image, and note profile number is Num, initializes profile enumerator n=1;
Step 202, it is judged that whether n is less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs step 203;Otherwise perform step 3;
Step 203, carries out labelling to the interior zone of the n-th profile, and remembers that this profile and internal inclusion region thereof are a hole;
Step 204, the number of the non-zero depth pixel of the region correspondence position of statistics described hole;
Step 205, it is judged that whether described number is zero, performs step 206 during non-zero, otherwise perform step 211;
Step 206, calculates hole contour edge and weight thereof;
Step 207, calculates perforated context similarity and weight thereof;
Step 208, is weighted described hole contour edge and weight, described hole regional background similarity and weight thereof, obtains hole context similarity;
Step 209, it is judged that whether described hole context similarity is less than given threshold value Tb, less than time, perform step 210, otherwise perform step 211;
Step 210, carries out prospect and fills the bianry image after obtaining described prospect holes filling described hole;
Step 211, performs to add an operation to profile enumerator n, and returns step 202.
Threshold value T in described step 209bIt is 0.3.
Described step 3 includes:
Step 301, obtains the prospect of bianry image after described prospect holes filling, background border transitional region;
Step 302, calculates the local alpha value of each pixel in described border transition region, and alpha value herein has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background.Successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel;
Step 303, calculates the probability of motion of the relative front cross frame of each pixel in described border transition region;
Step 304, with probability of motion for weights, calculates the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtains mixing alpha value;
Step 305, it is judged that whether described mixing alpha value is more than given threshold value Tf, more than time, the pixel value of correspondence position on bianry image is set to 1, represents that this pixel is prospect;Otherwise the pixel value of correspondence position on bianry image is set to 0, represents that this pixel is background;
Step 306, obtains the bianry image after border optimizes according to described step 305.
Threshold value T in described step 305fIt is 0.5.
Present invention also offers the process system after a kind of Online Video based on depth transducer is split, including:
Before Online Video, background segment module, the depth image based on frame of video and its correspondence extracts feature, described feature is carried out frame of video prospect, background segment obtains bianry image;
Check packing module, for the prospect hole in this binary segmentation image is detected and fill the bianry image after obtaining prospect holes filling;
Optimization process module, for the bianry image after this prospect holes filling carries out border optimization, obtains the bianry image after optimizing;
Virtual reality fusion module, for the bianry image after described optimization merges virtual background and described frame of video, generates virtual reality fusion synthetic video.
Described inspection packing module includes:
Profile detection module, for described bianry image is carried out contour detecting, note profile number is Num, initializes profile enumerator n=1;
Whether first judge module, be used for the profile number judging the n-th profile less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs hole marks module, otherwise performs optimization process module;
Hole marks module, for the interior zone of described n-th profile carries out labelling, and remembers that this profile and internal inclusion region thereof are a hole;
Statistical module, for adding up the number of the non-zero depth pixel of the region correspondence position of described hole;
Second judge module, is used for judging whether number is zero, performs computing module during non-zero, otherwise performs to add a module;
First computing module, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and utilize described hole contour edge and weight, described hole regional background similarity and weight thereof to be weighted, obtain hole context similarity;
3rd judge module, for judging that whether described hole context similarity is less than given threshold value TbIf, less than, then enter packing module, otherwise enter and add a module;
Packing module, carries out prospect and fills the bianry image after obtaining prospect holes filling described hole;
Add a module, for performing to add an operation to profile enumerator n, and return judge module 1.
Wherein, the threshold value T in described 3rd judge modulebIt is 0.3.
Described optimization process module includes:
Acquisition module, is used for the prospect of bianry image after obtaining described prospect holes filling, background border transitional region;
Second computing module, for calculating the probability of motion of the relative front cross frame of each pixel in the local alpha value of each pixel in described border transition region, described border transition region, and with probability of motion for weights, calculate the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtain mixing alpha value, alpha value herein has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background.Successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel;
Before, background judge module, for judging that whether described mixing alpha value is more than given threshold value Tf, more than time, the pixel value of correspondence position on bianry image is put 1, represents that this pixel is prospect;Otherwise the pixel value of correspondence position on bianry image is set to 0, represents that this pixel is background;
Bianry image obtains module, for obtaining the bianry image after border optimizes according to described foreground and background.
Wherein, threshold value T before described, in background judge modulefIt is 0.5.
The beneficial effects of the present invention is: the present invention has fully excavated the reason that the Online Video segmentation result based on depth transducer is made mistakes, the hole being mistaken for background for the prospect caused due to degree of depth loss gives a kind of prospect cavity detection filling algorithm, and for due to front, background border depth estimation is inaccurate or loses the mistake point phenomenon that causes and gives and a kind of merge sequential, color, the border optimized algorithm of border and movable information, in conjunction with above two algorithm, the present invention efficiently can improve the quality of the Online Video segmentation result based on depth transducer in real time, and can guarantee that segmentation result concordance along time shaft, avoid video flashes.The prospect cavity detection algorithm of the present invention had both considered the essential reason of generation prospect hole, it is also contemplated that the essential attribute of prospect hole, namely contour edge should be as far as possible little, and context similarity also should be as far as possible low, it is thus possible to the hole of identification prospect well, wiping out background hole.Additionally, the border optimized algorithm of the present invention have employed various features, on the basis of the local alpha value estimated by local color model and boundary model, merge sequential segmentation result further, and using Weighted Edges frame difference with as probability of motion, local alpha value and sequential segmentation result are carried out Decision fusion such that it is able to ensure the temporal consistency of segmentation result.
Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.
Accompanying drawing explanation
Fig. 1 is the process flow figure after the Online Video segmentation of the present invention;
Fig. 2 is the process system schematic after the Online Video segmentation of the present invention;
Fig. 3 is based on the Online Video segmentation result exemplary plot of depth transducer;
Fig. 4 is the flow chart of virtual reality fusion system approach;
Fig. 5 is based on the Online Video segmentation result post processing block flow diagram of depth transducer.
Detailed description of the invention
Recent years, due to the volume miniaturization gradually of depth transducer, cost price is also being gradually lowered, and therefore uses the depth information that depth transducer directly obtains to assist Video segmentation to become pratical and feasible.Depth information, for the robustness of illumination variation and dynamic shadow itself, will improve the quality of image segmentation.Fig. 3 is based on the Online Video segmentation result exemplary plot of a certain frame of video that Kinect depth transducer adopts the scene cut application programming interfaces in OpenNI to obtain, wherein Fig. 3 (a) is frame of video, Fig. 3 (b) is the depth image corresponding with frame of video obtained by depth transducer, Fig. 3 (c) is based on the prospect that Range Image Segmentation goes out, Fig. 3 (d) amplification shows the segmentation result of marked region in Fig. 3 (c), can be seen that from Fig. 3 (c), even if the Online Video segmentation based on depth transducer can also obtain good segmentation result at complex scene, but the mistake at boundary divides phenomenon ratio more serious, and inside prospect, there is also some prospect holes being mistaken for background.Occur that the depth information obtained based on depth transducer that has its source in of above-mentioned phenomenon is easily made mistakes at the discontinuous place of the degree of depth or loses.
In order to solve the problems referred to above, the invention provides the processing method after a kind of Online Video based on depth transducer is split, for improving the Online Video segmentation result based on depth transducer.Fig. 1 is the process flow figure after the Online Video segmentation of the present invention.A kind of based on the processing method after the Online Video segmentation of depth transducer as shown in Figure 1, including:
Step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image, and in this bianry image, 0 represents that pixel is background, and 1 represents that pixel is prospect;
Step 2, detects the prospect hole in this bianry image and fills, obtain the bianry image after prospect holes filling;
Step 3, carries out border optimization to the bianry image after this prospect holes filling, obtains the bianry image after optimizing;
Step 4, merges by the bianry image after this optimization with virtual background and described frame of video, generates virtual reality fusion image.
The hole that the present invention is mistaken for background first against the prospect caused because of degree of depth loss gives a kind of distinguished number based on contour edge and region background color similarity, and it is correctly filled to prospect.Then provide the border optimized algorithm that the sequential of a kind of high-efficiency high-quality is consistent, merge sequential, color, border and movable information and recalculate the border pixel values of the segmentation result based on depth image, eliminate border point phenomenon by mistake.This post-processing approach is embedded in a set of virtual reality fusion system by the present invention further, it is achieved immersion long-distance video is mutual.
Further, described step 2 includes:
Step 201, carries out contour detecting to described bianry image, and note profile number is Num, initializes profile enumerator n=1;
Step 202, it is judged that whether n is less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs step 203;Otherwise perform step 3;
Step 203, carries out labelling to the inside inclusion region of the n-th profile, and remembers that this profile and internal inclusion region thereof are a hole;
Step 204, the number of the non-zero depth pixel of the region correspondence position of statistics described hole;
Step 205, it is judged that whether described number is zero, performs step 206 during non-zero, otherwise perform step 211;
Step 206, calculates hole contour edge and weight thereof;
Step 207, calculates perforated context similarity and weight thereof;
Step 208, is weighted described hole contour edge and weight, described hole regional background similarity and weight thereof, obtains hole context similarity;
Step 209, it is judged that whether described hole context similarity is less than given threshold value Tb, less than time, perform step 210, otherwise perform step 211;
Step 210, carries out prospect at described hole and fills the bianry image after obtaining prospect holes filling;
Step 211, performs to add an operation to profile enumerator n, and returns step 202;
Described step 3 includes:
Step 301, obtains the prospect of bianry image after described prospect holes filling, background border transitional region;
Step 302, calculate the local alpha value of each pixel in described border transition region, alpha value herein has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background, successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel;
Step 303, calculates the probability of motion of the relative front cross frame of each pixel in described border transition region;
Step 304, with probability of motion for weights, calculates the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtains mixing alpha value;
Step 305, it is judged that whether described mixing alpha value is more than given threshold value Tf, more than time, the pixel value of correspondence position on bianry image is set to 1, represents that this pixel is prospect;Otherwise the pixel value of correspondence position on bianry image is set to 0, represents that this pixel is background;
Step 306, obtains the bianry image after border optimizes according to the foreground and background of described step 305.
Similarly to the prior art, flow chart as shown in Figure 4, is divided into 3 key steps to virtual reality fusion system in the present invention: before video, background segment, segmentation post processing, virtual reality fusion.Wherein before video, background segment be based on depth transducer and realize, concrete implementation method is referred to the existing Online Video cutting techniques based on depth information, is not described in detail herein.Also had a lot of relevant research about virtual reality fusion, concrete implementation method is referred to these achievements in research.Present invention primarily contemplates that the Online Video segmentation result based on depth transducer occurs lose, because of the degree of depth, the prospect hole being mistaken for background and the border caused because depth estimation is inaccurate point phenomenon and prior art by mistake are difficult in accuracy and real-time to reach the present Research that balances, provide a kind of method splitting post processing, and it is described in detail.The input of this post processing is depth image and the initial binary segmentation result of frame of video and correspondence thereof, in addition to keep temporal consistency, this input also includes sequential frame of video and sequential segmentation result, and output is the bianry image through post processing.Post processing includes prospect cavity detection and filling and front, background border optimization, as it is shown in figure 5, wherein, bianry image is as checking and the input of profile detection module in packing module for concrete flow chart;Depth image is as checking and the input of statistical module in packing module;Frame of video is as the input checking the computing module optimizing module with packing module and border, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and for calculating the probability of motion of the relative front cross frame of each pixel in the local alpha value of each pixel in described border transition region, described border transition region;Sequential frame of video optimizes the input of the computing module of module as border, for calculating the probability of motion of the relative front cross frame of each pixel in described border transition region;Sequential local alpha value image optimizes the input of the computing module of module as border, for calculating the mixing alpha value of each pixel in described border transition region..
First the present invention introduces prospect cavity detection and filling.Shown in rectangular area (the upper left region in Fig. 3 (d)) in Fig. 3 (c), when the hair of prospect (target person) hangs down loosely on the shoulders, there is the phenomenon that the degree of depth is lost in the intersection at hair Yu shoulder, forming prospect and be mistaken for the hole of background, the appearance of these holes will largely effect on the precision of segmentation.When these prospect holes are sufficiently small time, generally can adopt mathematical morphological operation, namely dilation operation is utilized to realize hole repairing, but, when prospect hole is bigger, use the dilation operation of aforesaid big mask structure when filling up prospect hole, it is likely that to cause and carry out the background area in prospect filling by mistake.Owing to the action of standing akimbo of target person also can be internally formed hole in prospect, shown in the rectangular area (the mid portion region in Fig. 3 (d)) in Fig. 3 (c), so the institute's hole within prospect broadly can not being filled.The characteristic of the depth data that the present invention obtains according to depth transducer and the intrinsic propesties of prospect hole, give the distinguished number of a kind of prospect hole.
The present invention first passes through profile algorithm and finds all profiles in binary segmentation image, then the inside inclusion region of each profile carries out labelling successively, and each profile and interior zone thereof are designated as a hole Φ.For each hole, first the present invention travels through the depth data of statistics position, perforated, judge whether that the degree of depth is not the pixel of 0, if existed, then this hole is made without filling, because this hole is not due to what degree of depth disappearance produced, otherwise this hole becomes candidate's prospect hole, and it is further differentiated by the context similarity that the weighted sum based on contour edge and region background color similarity calculates, if the context similarity of candidate's prospect hole is less than threshold value Tb, then it is prospect by this candidate's prospect holes filling in bianry image.
One by one the feature utilized in above-mentioned prospect hole distinguished number is described explanation below.
Owing to the contour edge of prospect hole is typically small, the contour edge of background hole is relatively big, before therefore contour edge is used as one-dimensional differentiation, the feature of background hole.Gradient is usually used in calculating the edge of pixel, but owing to dividing phenomenon serious based on the Video segmentation of the depth information mistake on border, even the background hole therefore within prospect, its profile is also not necessarily the border of real prospect and background, mean that the gradient of contour pixel might not be big, thus before will be unable to distinguish based on the gradient of profile, background hole.Based on above-mentioned consideration, the invention provides the approximate data at a kind of edge, specifically, the present embodiment adopts sharpness of border degree to estimate the edge of pixel.First frame of video converted to gray level image by the present embodiment and it is carried out gaussian filtering, then greyscale color space being divided into L=32 color sub-spaces Bl(l=1,2 ..., L).Note Np(Ls) be the neighborhood window size of pixel p it is LsNeighborhood, then Np(Ls) foreground pixel and background pixel must be comprised in, because p is the pixel on profile simultaneously.Note Np(Ls) in foreground pixel and background pixel sample set respectivelyWithIf Np(Ls) in color sub-spaces BlComprise sample set simultaneouslyWithIn element, then it is assumed that it is ambiguous.Note NpFor the sample set that all ambiguous color sub-spaces compriseWithThe sum of middle color card, then the sharpness of border degree of pixel p is:
γ p = 1 - N p L s 2 ,
Thus the edge e of pixel ppp, the edge of whole profile can be obtained by the mean value calculation at the edge of all contour pixels.Computing formula is as follows:
e Φ c = 1 M Σ p ∈ Φ c e p ,
Wherein ΦcBeing the profile of Φ, M is hole profile ΦcOn the total number of pixel.
Owing to the regional background similarity of prospect hole is low, the regional background similarity of background hole is high, before therefore regional background similarity is also used as one-dimensional differentiation, the feature of background hole.In order to calculate the context similarity of perforated, it is necessary first to the background color of pixel is modeled.It not absolute rest due to scene, the such as interference factor such as illumination variation and dynamic shadow all can occur, it is therefore desirable to adopt the model of real-time update that background color is modeled.The present invention adopts accumulation difference color histogram that the background color of each pixel is modeled.For each pixel in frame of video, and if only if, and it is labeled as background in the initial segmentation of t, and when its degree of depth is not 0, the color of this pixel is referred to as background color, and carries out accumulation histogram modeling.In the present embodiment, it is 32 sub-blocks that greyscale color space is divided evenly, and the difference color histogram of t pixel p is:
H p ( t ) = [ h p 1 ( t ) , h p 2 ( t ) , . . . , h p L ( t ) ] , L=32
WhereinCharacterize the distribution of color of t pixel p at the l color block BlIn frequency, computing formula is:
h p l ( t ) = β * h p l ( t - 1 ) + δ ( l p ( t ) = l , a p b ( t ) = 0 , dp(t)≠0)
Wherein β=0.95, is used for weakening historical background color for effect current, background color model.The effect of δ (.) function is when parameter is true value expression formula, and functional value is 1, is otherwise 0.Lp(t),dpT () represents the pixel p color sub-spaces label in t, initial binary dividing mark and depth value respectively.According to formulaDifference color histogram H to above-mentioned each pixel ppT () performs normalization operation after, the similarity of each candidate's hole and difference color histogram can be calculated by following formula and obtain:
Wherein ΦrBeing the interior zone of hole Φ, N is perforated ΦrThe total number of pixel.
In the present embodiment, use contour edgeWith regional background similarityWeighted sum pΦCalculate the similarity of hole and background.Owing to the reliability of the pure and fresh degree in border based on color block and the color complexity of regional area are closely coupled, thereforeWeight wcDepend on the number of neighborhood non-zero color subspace, can be calculated by following formula and obtain:
w c = 1 - 1 M ( Σ p ∈ Φ c N p n L ) ,
WhereinCharacterize the number of non-zero color block in the neighborhood of pixel p.With wcCalculating the same, background pixelWeight wrRelevant to the confidence level of background color model, scene that and if only if exists a small amount of illumination variation, when namely the number of the non-zero color subspace in the background color model of perforated is less, background color model is only reliably.Therefore, wrCan calculate according to the following formula:
w r = 1 - 1 N Σ p ∈ Φ r ( Σ l = 1 L δ ( h p l ( t ) ≠ 0 ) L )
Thus the similarity p of hole and backgroundΦCan be obtained by following formula estimation:
p Φ = w * lh Φ r + ( 1 - w ) * e Φ c ,
Wherein,Work as pΦLess than given empirical value TbTime, this area filling is prospect by the present embodiment.What deserves to be explained is, T in the present embodimentbIt is set to 0.3.
It is described below and the present invention is directed to owing to the border of foreground and background occurs that the mistake that the degree of depth is lost or depth estimation mistake causes splits the border optimized algorithm that the sequential provided is consistent.Due to the intersection in foreground and background, the degree of depth is discrete, and therefore these intersections often lose depth information, occurs that prospect is mistaken for the phenomenon of background.On the other hand, even if obtaining depth information at intersection, also often occur estimating inaccurate situation, it is easy to occur being mistaken for background the phenomenon of prospect.Can significantly find out that the depth information that prospect head border is lost is maximum from Fig. 3 (c) rectangular area (the bottom left section region Fig. 3 (d)), misjudgment phenomenon is serious, additionally, then there is the phenomenon that background is significantly judged to prospect by mistake in hand edge or garment edge in prospect, in Fig. 3 (c) shown in rectangular area (right areas in Fig. 3 (d)).Although the mistake frame by frame caused by above-mentioned reason is smaller, but flicker can be caused in video, have a strong impact on visual effect.In order to eliminate these segmentations by mistake, it is necessary to utilize the color image information of corresponding depth image that the pixel of the juncture area of foreground and background is re-started labelling.In the present embodiment, adopt the thought scratching figure, first calculate the alpha value of juncture area, then again through certain threshold value quantizing alpha value, it is achieved the binary segmentation of borderline region.
The present embodiment adopts the method proposed in the first trifle of the second section of paper " Online Video segmentation real-time post-treatment " be designated as method [1], find the pixel near border.Concrete implementation method is: the set of note boundary pixel is Ω, then Ω can be defined by following formula:
Ω(Le)={p|τ0< sp1}, s p = 1 L e 2 &Sigma; q &Element; N p ( L e ) a q s ,
Wherein Np(Le) be the window size of pixel p it is Le×LeNeighborhood;For pixel q binary segmentation label after prospect holes filling;SpMeansigma methods for pixel binary segmentation labels all in the neighborhood of pixel p;Ω (Le) for the collection of pixels in boundaries on either side one belt-like zone, the width in this region is by parameter τ0, τ1Control, 0 < τ01< 1.
The depth information obtained at the discontinuous place of the degree of depth due to depth transducer is highly unstable, even if prospect remains stationary, its depth information also there will be very big fluctuation, so that the segmentation result based on depth information is also rather unstable.Therefore, the information being based only upon present frame removes to estimate the alpha value of pending pixel, and when prospect is static, the alpha value of front and back two frame is also likely to be different, can not eliminate interframe flicker.In order to keep the concordance of segmentation result, it is possible to use time sequence information revises the alpha value of present frame.Further, before and after certain pixel, the motion of two frames is more little, and segmentation result should be more similar.Based on above-mentioned analysis, present embodiments provide a kind of border optimized algorithm based on sequential, color, border and movable information.Local color model that this algorithm proposes first by existing method [1] and boundary function calculate the local alpha value of each pending pixel, then a kind of simple estimation method is adopted, estimate the probability of motion figure of present frame and adjacent front cross frame, then using probability of motion as weights, the weighted sum mixing alpha value as pending pixel of local alpha value and sequential alpha value is asked for.
One by one the feature utilized in above-mentioned border optimized algorithm is described explanation below.
The present embodiment adopts the weighted sum of color alpha value and border alpha value to calculate the local alpha value of pending pixel.Wherein color alpha valueIt is based on the local color model calculating of pixel.If Np(Lb) be the window size of pixel p it is LbNeighborhood, Np(Ls) in foreground pixel and background pixel sample set respectivelyWithThen color alpha valueCan be calculated by following formula:
&alpha; p c = P ( c p | M p F ) P ( c p | M p F ) + P ( c p | M p B ) ,
Wherein cpIt is the RGB color of pixel p,WithBeing foreground color similarity and the background color similarity of pixel p respectively, wherein color model is uniform piecemeal mixed Gauss model.
Due to prospect, background color very close to region, color alpha value has obvious mistake, causes that wide translucent area and border are rough.In this case, it should defer to the result of binary segmentation.Existing method [1] uses one or four bound of parameter functions to calculate the border alpha value of pixel, and intuitively, a pixel distance border is more remote, then it should be more low with the similarity of prospect, and border alpha value also should be less.Specifically, border alpha value is calculated by following formula:
Parameter δp,ap,bp,cpCalculation procedure be referred to existing method [1].
In the present embodiment, the weight of color alpha value depends on the definition on border, and the weight of border alpha value depends on the error rate of binary segmentation, and the computing formula of local alpha value is as follows:
a p l = ( 1 - w p ) &alpha; p c + w p &alpha; p b ,
Wherein w p = w p b w p b + w p c , w p b = 1 9 &Sigma; q &Element; N p ( 3 ) | &alpha; q c - &alpha; q s | , w p c = &gamma; p .
Owing to frame of video is caught continuously, therefore having certain relatedness between frame and frame, segmentation result there is also certain dependence.Intuitively, if pixel front and back two frame remains stationary, then its classification should keep consistent.Therefore the sequential alpha value seriality at present frame can be weighed according to the probability of motion of pixel.Owing to the estimation process of sequential local alpha value have employed local color model and local frontier properties, the motion summation of the neighborhood of each pixel is therefore used more can accurately to reflect the motion of a pixel.The present embodiment adopts the Weighted Edges frame difference of neighborhood of pixels and calculates the probability of motion of pixel.Assume that pixel p probability of motion from t-1 moment to t is defined asThen its computing formula is as follows:
p p m ( t - 1 ) = &Sigma; q &Element; N p ( L s ) f q ( t - 1 ) * e q ( t ) &Sigma; q &Element; N p ( L s ) e q ( t ) ,
Wherein fq(t-1) it is that pixel q is poor relative to the frame in t-1 moment in t, in order to as probability Estimation, frame difference herein is that the frame after normalization is poor, eqT () is then the gradient of current time pixel q.In order to remove noise, frame difference is not directly the gray level image of front and back two frame is made difference to obtain, but front and back two two field picture after carrying out Gaussian smoothing is made difference and obtains, and namely the computing formula of frame difference is as follows:
fp(t-1)=Norm(|G(gp(t))-G(gp(t-1)) |),
Wherein Norm (.) represents normalized function, and G (.) represents the gaussian kernel function that yardstick is 0.8, gp(t) and gp(t-1) color value of respectively t and t-1 moment pixel p.The edge of image is then obtain by carrying out convolution with the first derivative of Gaussian function, it may be assumed that
e p ( t ) = | &dtri; G ( g p ( t ) ) | ,
The present embodiment adopt same method calculate pixel p in the t probability of motion relative to the t-2 momentWhat deserves to be explained is, pixel p is in the t probability of motion relative to tIt is 0.
The present embodiment uses the weighted sum of local alpha value and sequential alpha value to estimate the mixing alpha value of each pending pixel, and wherein weight coefficient is determined by probability of motion, and the specific formula for calculation of mixing alpha value is as follows:
a p h ( t ) = p &CenterDot; p ( t - 2 ) a p l ( t - 2 ) + p &CenterDot; p ( t - 1 ) a p l ( t - 1 ) + p &CenterDot; p ( t ) a p l ( t ) ,
Wherein WithIt is pixel p office's alpha value in t-2 moment, t-1 moment and t respectively, It is the weight coefficient after normalization, it may be assumed that
p &CenterDot; p ( t - 2 ) = p p ( t - 2 ) p p ( t - 2 ) + p p ( t - 1 ) + p p ( t )
p &CenterDot; p ( t - 1 ) = p p ( t - 1 ) p p ( t - 2 ) + p p ( t - 1 ) + p p ( t )
p &CenterDot; p ( t ) = p p ( t ) p p ( t - 2 ) + p p ( t - 1 ) + p p ( t ) ,
Wherein p p ( t - 2 ) = 1 - p p m ( t - 2 ) , p p ( t - 1 ) = 1 - p p m ( t - 1 ) , p p ( t ) = 1 - p p m ( t ) It is the pixel p probability of motion in t relative t-2 moment, t-1 moment and t respectively, the computing formula of above-mentioned mixing alpha value shows that pixel p is more big relative to the probability of motion in a certain moment in t, and the alpha value in a certain moment is more little in the seriality of t.
The present invention also proposes the process system after a kind of Online Video based on depth transducer is split, as in figure 2 it is shown, Fig. 2 is the process system schematic after the Online Video segmentation of the present invention, this system includes:
Before Online Video, background segment module, the depth image based on frame of video and its correspondence extracts feature, described feature is carried out frame of video prospect, background segment obtains bianry image;
Check packing module, for the prospect hole in this bianry image is detected and fill the bianry image after obtaining prospect holes filling;
Optimization process module, for the bianry image after this prospect holes filling carries out border optimization, obtains the bianry image after optimizing;
Virtual reality fusion module, for the bianry image after described optimization merges virtual background and described frame of video, generates virtual reality fusion synthetic video.
Described inspection packing module includes:
Profile detection module, for described bianry image is carried out contour detecting, note profile number is Num, initializes profile enumerator n=1;
Whether first judge module, be used for the profile number judging the n-th profile less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs hole marks module, otherwise performs optimization process module;
Hole marks module, the inside inclusion region for described n-th profile is carried out carries out labelling, and remembers that this profile and internal inclusion region thereof are a hole;
Statistical module, for adding up the number of the non-zero depth pixel of the region correspondence position of described hole;
Second judge module, is used for judging whether number is zero, performs computing module during non-zero, otherwise performs to add a module;
First computing module, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and utilize described hole contour edge and weight, described hole regional background similarity and weight thereof to be weighted, obtain hole context similarity;
3rd judge module, for judging that whether described hole context similarity is less than given threshold value TbIf, less than, then enter packing module, otherwise enter and add a module;
Packing module, fills the bianry image after obtaining prospect holes filling for described hole carries out prospect;
Add a module, for performing to add an operation to profile enumerator n, and return judge module 1.
Described optimization process module includes:
Acquisition module, is used for the prospect of bianry image after obtaining described prospect holes filling, background border transitional region;
Second computing module, for calculating the local alpha value of each pixel in described border transition region, the probability of motion of the relative front cross frame of each pixel in described border transition region, and with probability of motion for weights, calculate the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtain mixing alpha value, wherein alpha value has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background, successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, it is more similar to prospect that alpha value more levels off to 1 expression pixel;
Before, background judge module, for judging that whether described mixing alpha value is more than given threshold value Tf, more than time, the pixel of correspondence position on bianry image is set to 1, represents that this pixel is prospect;Otherwise the pixel of correspondence position on bianry image is set to 0, represents that this pixel is background;
Bianry image obtains module, for obtaining the bianry image after border optimizes according to described foreground and background.
Similarly to the prior art, flow chart as shown in Figure 4, is divided into 3 key steps to virtual reality fusion system in the present invention: before video, background segment, segmentation post processing, virtual reality fusion.Wherein before video, background segment be based on depth transducer and realize, concrete implementation method is referred to the existing Online Video cutting techniques based on depth information, is not described in detail herein.Also had a lot of relevant research about virtual reality fusion, concrete implementation method is referred to these achievements in research.Present invention primarily contemplates that the Online Video segmentation result based on depth transducer occurs lose, because of the degree of depth, the prospect hole being mistaken for background and the border caused because depth estimation is inaccurate point phenomenon and prior art by mistake are difficult in accuracy and real-time to reach the present Research that balances, provide a kind of method splitting post processing, and it is described in detail.The input of this post processing is depth image and the initial binary segmentation result of frame of video and correspondence thereof, in addition to keep temporal consistency, the input of this process also includes sequential frame of video and sequential segmentation result, and output is the bianry image through post processing.This process comprises prospect cavity detection and filling and front, background border optimization, and concrete flow chart is as shown in Figure 5.
First the present invention introduces prospect cavity detection and filling.Shown in rectangular area (the upper left region in Fig. 3 (d)) in Fig. 3 (c), when the hair of prospect (target person) hangs down loosely on the shoulders, there is the phenomenon that the degree of depth is lost in the intersection at hair Yu shoulder, forming prospect and be mistaken for the hole of background, the appearance of these holes will largely effect on the precision of segmentation.When these prospect holes are sufficiently small time, generally can adopt mathematical morphological operation, namely dilation operation is utilized to realize hole repairing, but, when prospect hole is bigger, use the dilation operation of aforesaid big mask structure when filling up prospect hole, it is likely that to cause and carry out the background area in prospect filling by mistake.Owing to the action of standing akimbo of target person also can be internally formed hole in prospect, shown in the rectangular area (the mid portion region in Fig. 3 (d)) in Fig. 3 (c), so the institute's hole within prospect broadly can not being filled.The characteristic of the depth data that the present invention obtains according to depth transducer and the intrinsic propesties of prospect hole, give the distinguished number of a kind of prospect hole.
The present invention first passes through profile algorithm and finds all profiles in binary segmentation image, and the inside inclusion region then successively each profile carried out carries out labelling, and each profile and interior zone thereof are designated as a hole Φ.For each hole, first the present invention travels through the depth data of statistics position, perforated, judge whether that the degree of depth is not the pixel of 0, if existed, then this hole is made without filling, because this hole is not due to what degree of depth disappearance produced, otherwise this hole becomes candidate's prospect hole, and it is further differentiated by the context similarity that the weighted sum based on contour edge and region background color similarity calculates, if the context similarity of candidate's prospect hole is less than threshold value Tb, then it is prospect by this candidate's prospect holes filling in bianry image.
One by one the feature utilized in above-mentioned prospect hole distinguished number is described explanation below.
Owing to the contour edge of prospect hole is typically small, the contour edge of background hole is relatively big, before therefore contour edge is used as one-dimensional differentiation, the feature of background hole.Gradient is usually used in calculating the edge of pixel, but owing to dividing phenomenon serious based on the Video segmentation of the depth information mistake on border, even the background hole therefore within prospect, its profile is also not necessarily the border of real prospect and background, mean that the gradient of contour pixel might not be big, thus before will be unable to distinguish based on the gradient of profile, background hole.Based on above-mentioned consideration, the invention provides the approximate data at a kind of edge, specifically, the present embodiment adopts sharpness of border degree to estimate the edge of pixel.First frame of video converted to gray level image by the present embodiment and it is carried out gaussian filtering, then greyscale color space being divided into L=32 color sub-spaces Bl(l=1,2 ..., L).Note Np(Ls) be the neighborhood window size of pixel p it is LsNeighborhood, then Np(Ls) in must comprise foreground pixel and background pixel simultaneously because p is the pixel on profile.Note Np(Ls) in foreground pixel and background pixel sample set respectivelyWithIf Np(Ls) in color sub-spaces BlComprise sample set simultaneouslyWithIn element, then it is assumed that it is ambiguous.Note NpFor the sample set that all ambiguous color sub-spaces compriseWithThe sum of middle color card, then the sharpness of border degree of pixel p is:
&gamma; p = 1 - N p L s 2 ,
Thus the edge e of pixel ppp, the edge of whole profile can be obtained by the mean value calculation at the edge of all contour pixels.Computing formula is as follows:
e &Phi; c = 1 M &Sigma; p &Element; &Phi; c e p ,
Wherein ΦcBeing the profile of Φ, M is hole profile ΦcOn the total number of pixel.
Owing to the regional background similarity of prospect hole is low, the regional background similarity of background hole is high, before therefore regional background similarity is also used as one-dimensional differentiation, the feature of background hole.In order to calculate the context similarity of perforated, it is necessary first to the background color of pixel is modeled.It not absolute rest due to scene, the such as interference factor such as illumination variation and dynamic shadow all can occur, it is therefore desirable to adopt the model of real-time update that background color is modeled.The present invention adopts accumulation difference color histogram that the background color of each pixel is modeled.For each pixel in frame of video, and if only if, and it is labeled as background in the initial segmentation of t, and when its degree of depth is not 0, the color of this pixel is referred to as background color, and carries out accumulation histogram modeling.In the present embodiment, it is 32 sub-blocks that greyscale color space is divided evenly, and the difference color histogram of t pixel p is:
H p ( t ) = [ h p 1 ( t ) , h p 2 ( t ) , . . . , h p L ( t ) ] , L=32
WhereinCharacterize the distribution of color of t pixel p at the l color block BlIn frequency, computing formula is:
h p l ( t ) = &beta; * h p l ( t - 1 ) + &delta; ( l p ( t ) = l , a p b ( t ) = 0 , dp(t)≠0)
Wherein β=0.95, is used for weakening historical background color for effect current, background color model.The effect of δ (.) function is when parameter is true value expression formula, and functional value is 1, is otherwise 0.Lp(t),dpT () represents the pixel p color sub-spaces label in t, initial binary dividing mark and depth value respectively.According to formulaDifference color histogram H to above-mentioned each pixel ppT () performs normalization operation after, the similarity of each candidate's hole and difference color histogram can be calculated by following formula and obtain:
Wherein ΦrBeing the interior zone of hole Φ, N is perforated ΦrThe total number of pixel.
In the present embodiment, use contour edgeWith regional background similarityWeighted sum pΦCalculate the similarity of hole and background.Owing to the reliability of the pure and fresh degree in border based on color block and the color complexity of regional area are closely coupled, thereforeWeight wcDepend on the number of neighborhood non-zero color subspace, can be calculated by following formula and obtain:
w c = 1 - 1 M ( &Sigma; p &Element; &Phi; c N p n L ) ,
WhereinCharacterize the number of non-zero color block in the neighborhood of pixel p.With wcCalculating the same, background pixelWeight wrRelevant to the confidence level of background color model, scene that and if only if exists a small amount of illumination variation, when namely the number of the non-zero color subspace in the background color model of perforated is less, background color model is only reliably.Therefore, wrCan calculate according to the following formula:
w r = 1 - 1 N &Sigma; p &Element; &Phi; r ( &Sigma; l = 1 L &delta; ( h p l ( t ) &NotEqual; 0 ) L )
Thus the similarity p of hole and backgroundΦCan be obtained by following formula estimation:
p &Phi; = w * lh &Phi; r + ( 1 - w ) * e &Phi; c ,
Wherein,Work as pΦLess than given empirical value TbTime, this area filling is prospect by the present embodiment.What deserves to be explained is, T in the present embodimentbIt is set to 0.3.
It is described below and the present invention is directed to owing to the border of foreground and background occurs that the mistake that the degree of depth is lost or depth estimation mistake causes splits the border optimized algorithm that the sequential provided is consistent.Due to the intersection in foreground and background, the degree of depth is discrete, and therefore these intersections often lose depth information, occurs that prospect is mistaken for the phenomenon of background.On the other hand, even if obtaining depth information at intersection, also often occur estimating inaccurate situation, it is easy to occur being mistaken for background the phenomenon of prospect.Can significantly find out that the depth information that prospect head border is lost is maximum from Fig. 3 (c) rectangular area (the bottom left section region Fig. 3 (d)), misjudgment phenomenon is serious, additionally, then there is the phenomenon that background is significantly judged to prospect by mistake in hand edge or garment edge in prospect, in Fig. 3 (c) shown in rectangular area (right areas in Fig. 3 (d)).Although the mistake frame by frame caused by above-mentioned reason is smaller, but flicker can be caused in video, have a strong impact on visual effect.In order to eliminate these segmentations by mistake, it is necessary to utilize the color image information of corresponding depth image that the pixel of the juncture area of foreground and background is re-started labelling.In the present embodiment, adopt the thought scratching figure, first calculate the alpha value of juncture area, then again through certain threshold value quantizing alpha value, realizing the binary segmentation of borderline region, wherein alpha value has used for reference the definition in soft segmentation or stingy figure, for reflecting the synthesis situation of foreground and background.Successive value between alpha value desirable 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel.
The present embodiment adopts the method proposed in the first trifle of the second section of paper " Online Video segmentation real-time post-treatment " be designated as method [1], find the pixel near border.Concrete implementation method is: the set of note boundary pixel is Ω, then Ω can be defined by following formula:
Ω(Le)={p|τ0<sp<τ1}, s p = 1 L e 2 &Sigma; q &Element; N p ( L e ) a q s
Wherein Np(Le) be the window size of pixel p it is Le×LeNeighborhood;For pixel q binary segmentation label after prospect holes filling;SpMeansigma methods for pixel binary segmentation labels all in the neighborhood of pixel p;Ω (Le) for the collection of pixels in boundaries on either side one belt-like zone, the width in this region is by parameter τ0, τ1Control, 0 < τ01<1。
The depth information obtained at the discontinuous place of the degree of depth due to depth transducer is highly unstable, even if prospect remains stationary, its depth information also there will be very big fluctuation, so that the segmentation result based on depth information is also rather unstable.Therefore, the information being based only upon present frame removes to estimate the alpha value of pending pixel, and when prospect is static, the alpha value of front and back two frame is also likely to be different, can not eliminate interframe flicker.In order to keep the concordance of segmentation result, it is possible to use time sequence information revises the alpha value of present frame.Further, before and after certain pixel, the motion of two frames is more little, and segmentation result should be more similar.Based on above-mentioned analysis, present embodiments provide a kind of border optimized algorithm based on sequential, color, border and movable information.Local color model that this algorithm proposes first by existing method [1] and boundary function calculate the local alpha value of each pending pixel, then a kind of simple estimation method is adopted, estimate the probability of motion figure of present frame and adjacent front cross frame, then using probability of motion as weights, the weighted sum mixing alpha value as pending pixel of local alpha value and sequential alpha value is asked for.
One by one the feature utilized in above-mentioned border optimized algorithm is described explanation below.
The present embodiment adopts the weighted sum of color alpha value and border alpha value to calculate the local alpha value of pending pixel.Wherein color alpha valueIt is based on the local color model calculating of pixel.If Np(Lb) be the window size of pixel p it is LbNeighborhood, Np(Ls) in foreground pixel and background pixel sample set respectivelyWithThen color alpha valueCan be calculated by following formula:
&alpha; p c = P ( c p | M p F ) P ( c p | M p F ) + P ( c p | M p B ) ,
Wherein cpIt is the RGB color of pixel p,WithBeing foreground color similarity and the background color similarity of pixel p respectively, wherein color model is uniform piecemeal mixed Gauss model.
Due to prospect, background color very close to region, color alpha value has obvious mistake, causes that wide translucent area and border are rough.In this case, it should defer to the result of binary segmentation.Existing method [1] uses one or four bound of parameter functions to calculate the border alpha value of pixel, and intuitively, a pixel distance border is more remote, then it should be more low with the similarity of prospect, and border alpha value also should be less.Specifically, border alpha value is calculated by following formula:
a p b = a p 1 + e ( c p - s p ) / &delta; p + b p ,
Parameter δp,ap,bp,cpCalculation procedure be referred to existing method [1].
In the present embodiment, the weight of color alpha value depends on the definition on border, and the weight of border alpha value depends on the error rate of binary segmentation, and the computing formula of local alpha value is as follows:
a p l = ( 1 - w p ) &alpha; p c + w p &alpha; p b ,
Wherein w p = w p b w p b + w p c , w p b = 1 9 &Sigma; q &Element; N p ( 3 ) | &alpha; q c - &alpha; q s | , w p c = &gamma; p .
Owing to frame of video is caught continuously, therefore having certain relatedness between frame and frame, segmentation result there is also certain dependence.Intuitively, if pixel front and back two frame remains stationary, then its classification should keep consistent.Therefore the sequential local alpha value seriality at present frame can be weighed according to the probability of motion of pixel.Owing to the estimation process of sequential local alpha value have employed local color model and local frontier properties, the motion summation of the neighborhood of each pixel is therefore used more can accurately to reflect the motion of a pixel.The present embodiment adopts the Weighted Edges frame difference of neighborhood of pixels and calculates the probability of motion of pixel.Assume that pixel p probability of motion from t-1 moment to t is defined asThen its computing formula is as follows:
p p m ( t - 1 ) = &Sigma; q &Element; N p ( L s ) f q ( t - 1 ) * e q ( t ) &Sigma; q &Element; N p ( L s ) e q ( t ) ,
Wherein fp(t-1) it is that pixel q is poor relative to the frame in t-1 moment in t, in order to as probability Estimation, frame difference herein is that the frame after normalization is poor, eqT () is then the gradient of current time pixel q.In order to remove noise, frame difference is not directly the gray level image of front and back two frame is made difference to obtain, but front and back two two field picture after carrying out Gaussian smoothing is made difference and obtains, and namely the computing formula of frame difference is as follows:
fp(t-1)=Norm(|G(gp(t))-G(gp(t-1)) |),
Wherein Norm (.) represents normalized function, and G (.) represents the gaussian kernel function that yardstick is 0.8, gp(t) and gp(t-1) color value of respectively t and t-1 moment pixel p.The edge of image is then obtain by carrying out convolution with the first derivative of Gaussian function, it may be assumed that
e p ( t ) = | &dtri; G ( g p ( t ) ) | ,
The present embodiment adopt same method calculate pixel p in the t probability of motion relative to the t-2 momentWhat deserves to be explained is, pixel p is in the t probability of motion relative to tIt is 0.
The present embodiment uses the weighted sum of local alpha value and sequential alpha value to estimate the mixing alpha value of each pending pixel, and wherein weight coefficient is determined by probability of motion, and the specific formula for calculation of mixing alpha value is as follows:
a p h ( t ) = p &CenterDot; p ( t - 2 ) a p l ( t - 2 ) + p &CenterDot; p ( t - 1 ) a p l ( t - 1 ) + p &CenterDot; p ( t ) a p l ( t ) ,
Wherein WithIt is pixel p office's alpha value in t-2 moment, t-1 moment and t respectively, It is the weight coefficient after normalization, it may be assumed that
p &CenterDot; p ( t - 2 ) = p p ( t - 2 ) p p ( t - 2 ) + p p ( t - 1 ) + p p ( t )
p &CenterDot; p ( t - 1 ) = p p ( t - 1 ) p p ( t - 2 ) + p p ( t - 1 ) + p p ( t )
p &CenterDot; p ( t ) = p p ( t ) p p ( t - 2 ) + p p ( t - 1 ) + p p ( t ) ,
Wherein p p ( t - 2 ) = 1 - p p m ( t - 2 ) , p p ( t - 1 ) = 1 - p p m ( t - 1 ) , p p ( t ) = 1 - p p m ( t ) It is the pixel p probability of motion in t relative t-2 moment, t-1 moment and t respectively, the computing formula of above-mentioned mixing alpha value shows that pixel p is more big relative to the probability of motion in a certain moment in t, and the alpha value in a certain moment is more little in the seriality of t.
Owing to prospect cavity detection provided by the invention and filling algorithm have taken into full account the intrinsic propesties of essence and the prospect hole producing prospect hole, therefore can the hole of identification prospect well, wiping out background hole, and because the calculating process of algorithm is very simple, therefore speed is very fast.On the other hand, border provided by the invention optimized algorithm has merged various features, both background color will be caused to overflow at front, background similar area as stingy nomography, became to obscure without making border as emergence algorithm, simultaneously, being different from existing method [1], the present invention has merged time sequence information, can eliminate the segmentation result upheaval caused because depth estimation is inaccurate well so that the phenomenon of virtual reality fusion video flashes.The more important thing is, the feature adopted due to border provided by the invention optimized algorithm has quick calculation method, therefore, it is possible to meet real-time demand.
Certainly; the present invention also can have other various embodiments; when without departing substantially from present invention spirit and essence thereof, those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and deform the protection domain that all should belong to the claims in the present invention.

Claims (8)

1. one kind based on depth transducer Online Video split after processing method, it is characterised in that including:
Step 1, depth transducer extracts feature based on the depth image of frame of video and its correspondence, described feature is carried out frame of video prospect, background segment, obtains bianry image;
Step 2, detects the prospect hole in this bianry image and fills, obtain the bianry image of prospect holes filling;
Step 3, carries out border optimization to the bianry image of this prospect holes filling, obtains the bianry image after optimizing, and wherein said step 3 includes:
Step 301, obtains the prospect of bianry image of described prospect holes filling, background border transitional region;
Step 302, calculate the local alpha value of each pixel in described border transition region, wherein alpha value is for reflecting the synthesis situation of foreground and background, alpha value takes the successive value between 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, and it is more similar to prospect that alpha value more levels off to 1 expression pixel;
Step 303, calculates the probability of motion of the relative front cross frame of each pixel in described border transition region;
Step 304, with probability of motion for weights, calculates the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtains mixing alpha value;
Step 305, it is judged that whether described mixing alpha value is more than given threshold value Tf, more than time, the pixel of correspondence position on the bianry image of described prospect holes filling is set to 1, represents that this pixel is prospect;Otherwise the pixel of correspondence position on the bianry image of described prospect holes filling is set to 0, represents that this pixel is background;
Step 306, obtains the bianry image after border optimizes;
Step 4, merges virtual background and described frame of video according to the bianry image after this optimization, generates virtual reality fusion image.
2. the processing method after Online Video segmentation as claimed in claim 1, it is characterised in that described step 2 includes:
Step 201, carries out contour detecting to described bianry image, and note profile number is Num, initializes profile enumerator n=1;
Step 202, it is judged that whether n is less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs step 203;Otherwise perform step 3;
Step 203, carries out labelling to the inside inclusion region of the n-th profile, and remembers that this profile and internal inclusion region thereof are a hole;
Step 204, the number of the non-zero depth pixel of the region correspondence position of statistics described hole;
Step 205, it is judged that whether described number is zero, performs step 206 during non-zero, otherwise perform step 211;
Step 206, calculates hole contour edge and weight thereof;
Step 207, calculates perforated context similarity and weight thereof;
Step 208, is weighted described hole contour edge and weight, described hole regional background similarity and weight thereof, obtains hole context similarity;
Step 209, it is judged that whether described hole context similarity is less than given threshold value Tb, less than time, perform step 210, otherwise perform step 211;
Step 210, carries out prospect at described hole and fills the bianry image obtaining prospect holes filling;
Step 211, performs to add an operation to profile enumerator n, and returns step 202.
3. the processing method after Online Video segmentation as claimed in claim 2, it is characterised in that the threshold value T in described step 209bIt is 0.3.
4. the processing method after Online Video segmentation as claimed in claim 1, it is characterised in that threshold value T in described step 305fIt is 0.5.
5. one kind based on depth transducer Online Video split after process system, it is characterised in that including:
Before Online Video, background segment module, the depth image based on frame of video and its correspondence extracts feature, described feature is carried out frame of video prospect, background segment obtains bianry image;
Check packing module, for the prospect hole in this bianry image is detected and fill the bianry image after obtaining prospect holes filling;
Optimization process module, for the bianry image after this prospect holes filling is carried out border optimization, obtaining the bianry image after optimizing, wherein said optimization process module includes acquisition module, the second computing module, front, background judge module, bianry image acquisition module:
Described acquisition module, for obtaining the prospect of bianry image of described prospect holes filling, background border transitional region;
Described second computing module, for calculating the local alpha value of each pixel in described border transition region, the probability of motion of the relative front cross frame of each pixel in described border transition region, and with probability of motion for weights, calculate the sequential segmentation result of each pixel in described border transition region and the weighted sum of local alpha value, obtain mixing alpha value, wherein alpha value is for reflecting the synthesis situation of foreground and background, alpha value takes the successive value between 0 to 1, it is more similar to background that alpha value more levels off to 0 expression pixel, it is more similar to prospect that alpha value more levels off to 1 expression pixel;
Before described, background judge module, for judging that whether described mixing alpha value is more than given threshold value Tf, more than time, the pixel value of correspondence position on the bianry image of described prospect holes filling is set to 1, represents that this pixel is prospect;Otherwise the pixel value of correspondence position on the bianry image of described prospect holes filling is set to 0, represents that this pixel is background;
Described bianry image obtains module, for obtaining the bianry image after border optimizes according to described foreground and background;
Virtual reality fusion module, for merging virtual background and described frame of video according to the bianry image after described optimization, generates virtual reality fusion synthetic video.
6. the process system after Online Video segmentation as claimed in claim 5, it is characterised in that described inspection packing module includes:
Profile detection module, for described bianry image is carried out contour detecting, note profile number is Num, initializes profile enumerator n=1;
Whether first judge module, be used for the profile number judging the n-th profile less than or equal to described profile number Num, when n is less than or equal to described profile number Num, performs hole marks module, otherwise performs optimization process module;
Hole marks module, for the inside inclusion region of described n-th profile carries out labelling, and remembers that this profile and internal inclusion region thereof are a hole;
Statistical module, for adding up the number of the non-zero depth pixel of the region correspondence position of described hole;
Second judge module, is used for judging whether number is zero, performs the first computing module during non-zero, otherwise performs to add a module;
Described first computing module, for calculating hole contour edge and weight, calculating perforated context similarity and weight thereof, and utilize described hole contour edge and weight, described hole regional background similarity and weight thereof to be weighted, obtain hole context similarity;
3rd judge module, for judging that whether described hole context similarity is less than given threshold value TbIf, less than, then enter packing module, otherwise enter and add a module;
Packing module, carries out prospect and fills the bianry image after obtaining prospect holes filling described hole;
Add a module, for performing to add an operation to profile enumerator n, and return the first judge module.
7. the process system after Online Video segmentation as claimed in claim 6, it is characterised in that the threshold value T in described 3rd judge modulebIt is 0.3.
8. the process system after Online Video segmentation as claimed in claim 5, it is characterised in that threshold value T before described, in background judge modulefIt is 0.5.
CN201210395366.4A 2012-10-17 2012-10-17 Based on the processing method after the Online Video segmentation of depth transducer and system Active CN102999901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210395366.4A CN102999901B (en) 2012-10-17 2012-10-17 Based on the processing method after the Online Video segmentation of depth transducer and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210395366.4A CN102999901B (en) 2012-10-17 2012-10-17 Based on the processing method after the Online Video segmentation of depth transducer and system

Publications (2)

Publication Number Publication Date
CN102999901A CN102999901A (en) 2013-03-27
CN102999901B true CN102999901B (en) 2016-06-29

Family

ID=47928435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210395366.4A Active CN102999901B (en) 2012-10-17 2012-10-17 Based on the processing method after the Online Video segmentation of depth transducer and system

Country Status (1)

Country Link
CN (1) CN102999901B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103337079A (en) * 2013-07-09 2013-10-02 广州新节奏智能科技有限公司 Virtual augmented reality teaching method and device
CN104427291B (en) * 2013-08-19 2018-09-28 华为技术有限公司 A kind of image processing method and equipment
CN104052992B (en) * 2014-06-09 2018-02-27 联想(北京)有限公司 A kind of image processing method and electronic equipment
CN105354812B (en) * 2014-07-10 2020-10-16 北京中科盘古科技发展有限公司 Multi-Kinect cooperation-based depth threshold segmentation algorithm contour recognition interaction method
CN104484040B (en) * 2014-12-23 2017-12-08 山东建筑大学 A kind of multimedia interactive teaching control system and learning control mode
CN105374030B (en) * 2015-10-12 2017-12-15 北京深视科技有限公司 A kind of background model and Mobile object detection method and system
CN107346531A (en) * 2016-05-05 2017-11-14 中兴通讯股份有限公司 A kind of image optimization method, device and terminal
CN106447677A (en) * 2016-10-12 2017-02-22 广州视源电子科技股份有限公司 Image processing method and apparatus thereof
CN109544586A (en) * 2017-09-21 2019-03-29 中国电信股份有限公司 Prospect profile extracting method and device and computer readable storage medium
CN107507155B (en) * 2017-09-25 2020-02-18 北京奇虎科技有限公司 Video segmentation result edge optimization real-time processing method and device and computing equipment
CN107610149B (en) * 2017-09-25 2020-03-03 北京奇虎科技有限公司 Image segmentation result edge optimization processing method and device and computing equipment
CN107547803B (en) * 2017-09-25 2020-02-04 北京奇虎科技有限公司 Video segmentation result edge optimization processing method and device and computing equipment
CN110062176B (en) * 2019-04-12 2020-10-30 北京字节跳动网络技术有限公司 Method and device for generating video, electronic equipment and computer readable storage medium
CN110532922B (en) * 2019-08-21 2023-04-14 成都电科慧安科技有限公司 Method for real-time segmentation of depth map video frames on mobile device
CN111203862B (en) * 2020-01-07 2021-03-23 上海高仙自动化科技发展有限公司 Data display method and device, electronic equipment and storage medium
CN111462164A (en) * 2020-03-12 2020-07-28 深圳奥比中光科技有限公司 Foreground segmentation method and data enhancement method based on image synthesis
CN111640187B (en) * 2020-04-20 2023-05-02 中国科学院计算技术研究所 Video stitching method and system based on interpolation transition
CN111899266A (en) * 2020-07-17 2020-11-06 深圳奥比中光科技有限公司 Matting method and system based on RGBD camera
CN112001940B (en) * 2020-08-21 2023-04-07 Oppo(重庆)智能科技有限公司 Image processing method and device, terminal and readable storage medium
CN114520866A (en) * 2020-11-19 2022-05-20 深圳市万普拉斯科技有限公司 Picture shot processing method, electronic device and storage medium
CN112418205A (en) * 2020-11-19 2021-02-26 上海交通大学 Interactive image segmentation method and system based on focusing on wrongly segmented areas
CN113076828B (en) * 2021-03-22 2023-11-28 北京达佳互联信息技术有限公司 Video editing method and device and model training method and device
CN113709560B (en) * 2021-03-31 2024-01-02 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium
CN113490027A (en) * 2021-07-07 2021-10-08 武汉亿融信科科技有限公司 Short video production generation processing method and equipment and computer storage medium
CN116433701B (en) * 2023-06-15 2023-10-10 武汉中观自动化科技有限公司 Workpiece hole profile extraction method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7085409B2 (en) * 2000-10-18 2006-08-01 Sarnoff Corporation Method and apparatus for synthesizing new video and/or still imagery from a collection of real video and/or still imagery
US20060233461A1 (en) * 2005-04-19 2006-10-19 Honeywell International Inc. Systems and methods for transforming 2d image domain data into a 3d dense range map
CN101516040B (en) * 2008-02-20 2011-07-06 华为终端有限公司 Video matching method, device and system

Also Published As

Publication number Publication date
CN102999901A (en) 2013-03-27

Similar Documents

Publication Publication Date Title
CN102999901B (en) Based on the processing method after the Online Video segmentation of depth transducer and system
Lian et al. Attention guided U-Net for accurate iris segmentation
CN105740945B (en) A kind of people counting method based on video analysis
CN104050471B (en) Natural scene character detection method and system
CN102567727B (en) Method and device for replacing background target
CN103310194B (en) Pedestrian based on crown pixel gradient direction in a video shoulder detection method
CN110119728A (en) Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN105825502B (en) A kind of Weakly supervised method for analyzing image of the dictionary study based on conspicuousness guidance
CN106991686B (en) A kind of level set contour tracing method based on super-pixel optical flow field
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN103871076A (en) Moving object extraction method based on optical flow method and superpixel division
CN105069808A (en) Video image depth estimation method based on image segmentation
CN104331151A (en) Optical flow-based gesture motion direction recognition method
CN102420985B (en) Multi-view video object extraction method
CN102622766A (en) Multi-objective optimization multi-lens human motion tracking method
CN109448015A (en) Image based on notable figure fusion cooperates with dividing method
CN106529432A (en) Hand area segmentation method deeply integrating significance detection and prior knowledge
CN102024156A (en) Method for positioning lip region in color face image
CN103826032A (en) Depth map post-processing method
CN104599291B (en) Infrared motion target detection method based on structural similarity and significance analysis
CN112613579A (en) Model training method and evaluation method for human face or human head image quality and selection method for high-quality image
CN102063727A (en) Covariance matching-based active contour tracking method
Wang et al. Depth map enhancement based on color and depth consistency
CN103077383B (en) Based on the human motion identification method of the Divisional of spatio-temporal gradient feature
CN103473789B (en) A kind of human body methods of video segmentation merging multi thread

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant