CN110163873A

CN110163873A - A kind of bilateral video object dividing method and system

Info

Publication number: CN110163873A
Application number: CN201910417693.7A
Authority: CN
Inventors: 桂彦; 田颖; 曾光
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2019-05-20
Filing date: 2019-05-20
Publication date: 2019-08-23
Anticipated expiration: 2039-05-20
Also published as: CN110163873B

Abstract

The invention discloses a kind of bilateral video object dividing method and systems, video sequence with key frame marker is mapped to the bilateral space of higher-dimension by this method, reduce video data to be processed, then using the grid cell of non-empty as the node of figure, simultaneously structure figures cut Optimized model, its key is accurately to estimate a possibility that each grid cell belongs to fore/background by analysis Gaussian Profile rule building confidence active shape model；And higher order term is introduced in energy function, to enhance the temporal correlation of node non-conterminous but with similar appearance feature；Finally, solving energy function using max-flow/minimal cut algorithm, the corresponding label value of each grid cell is obtained, the final label distribution for realizing video image vegetarian refreshments.This method can not only eliminate interference of the unfavorable factor to segmentation well, and can quickly and accurately handle the video object segmentation with complex scene.

Description

A kind of bilateral video object dividing method and system

Technical field

The present invention relates to video object segmentation technologies, and in particular to a kind of combination confidence active shape model and robust The bilateral video object dividing method and system of higher order term.

Background technique

Video object segmentation is that the interested target object of user is partitioned into from video, i.e., all pixels point is divided into It with the relevant fore/background spatio-temporal region such as appearance, movement, and is the prerequisite of many high-level vision applications, as target is examined Survey, video frequency searching, safety monitoring, post film and TV production and intelligent transportation etc..

Existing video object dividing method is broadly divided into full-automatic video Target Segmentation method and interactive video target point Segmentation method two major classes.The former usually estimates video using characteristic point/area tracking, target suggestion areas etc. based on light stream automatically In target object, then optimization etc. is cut by cluster, Dynamic Programming solution, figure and carries out problem solving, it is difficult to realized efficiently and high Precisely segmentation.Interactive video Target Segmentation method then needs user to provide suitable interactive information, i.e., in a frame or multiple passes Foreground and background region is marked in key frame, and in this, as " hard " constraint condition, to generate the video mesh for meeting user's interaction Mark segmentation result.Such method is commonly available to not seek real-time but to the higher applied field of video object boundary required precision It closes, and is to divide the most common method currently used for video object.

In general, the video under different scenes may be blocked comprising similar fore/background region, object, strenuous exercise, be obscured The complicated phenomenons such as boundary, camera shake, illumination variation, dynamic shadow and water flow, and video has the characteristics that data volume is big, these Unfavorable factor is but also existing video object dividing method inefficiency and be difficult to obtain the Video segmentation result of high quality.

Marki et al. propose method (Nicolas Marki, Federico Perazzi, Oliver Wang, Alexander Sorkine-Hornung:Bilateral Space Video Segmentation.CVPR 2016:743- 751) video object segmentation is carried out in bilateral space for the first time.Video data is mapped to by this method using closest difference approach In the bilateral space of higher-dimension, and the bilateral grid vertex based on specification sampling defines figure and cuts Optimized model, makes it possible to using standard Max-flow/minimal cut algorithm carry out global optimization solution, thus finally obtain video object segmentation result.This method can be fast Speed segmentation video object object, and can summary responses user interaction, but due to do not consider solve robust overall situation display model The problems such as building, long and wide scope space-time connection, thus it is difficult to ensure the correctness that long-time segmentation result is propagated.

(old Adam, Hao Chuanyan: the video foreground that the bilateral grid of dynamic is realized is divided to be calculated the method that old Adam et al. proposes Method CAD and image image journal, 2018,30 (11): 2101-2107) mainly for video foreground segmentation skill The difficult points such as fore/background color similar, object blocks, data redundancy in art.This method is that video data constructs a higher-dimension first Mapping space, and using Loop partitioning frame, using optical flow method to grid implement update.This processing side by video frame There is no the timing correlation properties for utilizing video interframe well for method, thus when inaccuracy is estimated in light stream, this method is difficult to obtain Obtain the consistent video object segmentation result of space-time.

Summary of the invention

For the above-mentioned deficiency for solving the prior art, on the one hand, provide a kind of combination confidence active shape model and robust is high The bilateral video object dividing method of rank reduces data to be processed using bilateral grid preprocessed video data Amount；Pixel structure figures are replaced with the grid cell of non-empty and define energy function, and realization is regarded in the bilateral space of higher-dimension Frequency Target Segmentation greatly improves the efficiency of video object segmentation；Pass through building confidence active shape model and introducing simultaneously Higher order term defines global figure and cuts Optimized model, can generate high-precision video object segmentation result, can fast and accurately be located in Manage the foreground extraction task with the video of complex scene.

A kind of example of this method is as follows:

A kind of bilateral video object dividing method, comprising:

(1) pretreatment of video: choosing multiple key frames in given video sequence, be split to it, accurate to mark Remember foreground pixel point and background pixel point out；Each pixel of remaining frame of the key frame of tape label and video sequence is mapped To the bilateral feature space of higher-dimension, the division of rule is carried out to the bilateral feature space of the higher-dimension, obtains bilateral grid, be grid list The set of member；Prospect grid cell and background grid list are determined according to the case where marked pixel for including in grid cell Member, all prospect grid cells and background grid unit respectively constitute foreground seeds point set and background seed point set；It will simultaneously All pixels point in grid cell comprising marked foreground pixel point and background pixel point resets to unlabelled pixel Point；

(2) definition figure cuts Optimized model: constructing confidence active shape model using all non-empty grid cells, estimates each Grid cell belongs to a possibility that fore/background, and defines color criterion according to Gaussian Profile rule, identifies unlabelled face The grid cell of the colour type of color classification or ambiguous；

The confidence active shape model includes dynamic foreground appearance modelWith dynamic background display modelRespectively by the foreground appearance model of all time horizonsWith background appearance modelComposition, wherein Γ_tIt is bilateral The time dimension of grid,WithFor gauss hybrid models (GMM)；For time horizon t_lIn any grid cell v_i, belong to As 1-P (the v in a possibility that prospect be calculated as follows, belong to background a possibility that_i)；

For being identified as the grid cell of the colour type of unlabelled colour type or ambiguous, P (v is set_i) be 0.5；

On the basis of video pre-filtering, the figure G based on Markov random field is constructed, defines energy function, the energy Flow function includes data item, smooth item and higher order term, wherein the data item is estimated according to the confidence active shape model Each grid cell belongs to a possibility that prospect or background and is calculated；

(3) energy function is minimized, the optimal dividing of figure G is obtained, to obtain the corresponding mark of each grid cell Each label value is distributed to all pixels point in corresponding grid cell, obtains Video segmentation result by label value.

Preferably, color criterion includes:

The differentiation of unlabelled colour type: time horizon t is set_lForeground appearance modelIt is made of K Gaussian component Gauss hybrid models, and kth (k ∈ K) a Gaussian component is availableIt indicates, wherein parameter WithRespectively For the weight, mean value and covariance matrix of Gaussian component；The value of setup parameter λ, so that any known foreground color c^fIt falls in SectionMultilayer networks value p (c in range^f|μ_k,σ_k) it is 95% or more, it willPlace's meter Obtained Multilayer networks valueAsMinimum probability density threshold

Similarly, according to known background color c^b, useIn replacement formulaIt obtainsIt is corresponding Minimum probability density thresholdTo time horizon t_lIn any grid cell v_i, work as satisfactionAndWhen, Then think that its corresponding color belongs to unlabelled colour type；

The differentiation of the colour type of ambiguous: to time horizon t_l, any color c is given, if the color belongs to foreground seeds point Collection, butOr the color belongs to background seed point set, butThen assert that the color belongs to two The colour type of justice, these colors for belonging to the category constitute the colour type set of ambiguous, for training a Gaussian Mixture ModelSo that for time horizon t_lAny grid cell v_i, work as satisfactionWhen, then it is assumed that its corresponding color value With ambiguity, wherein probability density threshold valueIt is calculated using Gaussian Profile rule:

Preferably, smooth itemIt is defined as follows:

Wherein,It is grid cell v respectively_i、v_jThe total number of middle pixel；WithIt is v_iAnd v_jLabel Value；||c_i-c_j| | for calculating the color difference between two neighboring grid vertex；β=(2 < (c_i-c_j)²>)^-1It is one normal Amount, wherein<>indicates the desired value of sample in video sequence.

Preferably, higher order term is defined as follows:

All non-empty grid cells are clustered, so that the grid cell with similar features belongs to same cluster, to every Cluster defines an auxiliary node { A_i| i=1 ..., k'}, wherein k' be cluster number, connect cluster in all grid cells with Corresponding auxiliary node while and weighted value while this is set be 1, higher order termIs defined as:

Wherein, φ () is an indicator function, for describing any unduplicated grid cell v in cluster_i、v_jAll with its Assist the connection of node, C_k'It is the set of k' cluster,WithIt is v_iAnd v_jLabel value.

Preferably, the energy function is minimized to realize using max-flow/minimal cut algorithm.

Preferably, prospect grid and background grid are determined according to the case where marked pixel for including in grid cell Method, comprising: when foreground pixel point number marked in grid cell is more than or equal to pixel total number in the grid cell Half when, which is considered as prospect grid cell, similarly, according to background pixel point marked in grid cell Accounting determine background grid unit.

Preferably, the dividing method of the key frame is interactive image segmentation method.The interactive image segmentation side Method can be Lazy Snapping or Grabcut etc..

It preferably, further include the operation that manual correction is carried out to the erroneous segmentation region generated in key frame segmentation result.

On the other hand, a kind of bilateral video object segmenting system is also provided, comprising:

Video pre-processing units: for choosing multiple key frames in given video sequence, it is split, accurately Mark foreground pixel point and background pixel point；Each pixel of remaining frame of the key frame of tape label and video sequence is reflected It is mapped to the bilateral feature space of higher-dimension, the division of rule is carried out to the bilateral feature space of the higher-dimension, obtains bilateral grid, is grid The set of unit；Prospect grid cell and background grid list are determined according to the case where marked pixel for including in grid cell Member, all prospect grid cells and background grid unit respectively constitute foreground seeds point set and background seed point set；It will simultaneously All pixels point in grid cell comprising marked foreground pixel point and background pixel point resets to unlabelled pixel Point；

Confidence active shape model construction unit: for constructing confidence dynamic appearance mould according to all non-empty grid cells Type estimates a possibility that each grid cell belongs to fore/background, and defines color criterion according to Gaussian Profile rule, knows The grid cell of the colour type of not unlabelled colour type or ambiguous；The confidence active shape model includes dynamic prospect Display modelWith dynamic background display modelRespectively by the foreground appearance model of all time horizons With background appearance modelComposition, wherein Γ_tIt is the time dimension of bilateral grid,WithFor gauss hybrid models (GMM)； For time horizon t_lIn any grid cell v_i, a possibility that a possibility that belonging to prospect is calculated as follows, belongs to background, is i.e. For 1-P (v_i)；

Figure cuts Optimized model construction unit: on the basis of video pre-filtering, building to be based on Markov random field Figure G, define energy function, the energy function includes data item, smooth item and higher order term, wherein the data item according to A possibility that each grid cell of the confidence active shape model estimation belongs to prospect or background is calculated；

Foreground extraction unit: for minimizing the energy function, the optimal dividing of figure G is obtained, to obtain each net Each label value is distributed to all pixels point in corresponding grid cell, obtains Video segmentation knot by the corresponding label value of lattice unit Fruit.

Preferably, the color criterion of the confidence active shape model construction unit includes:

The differentiation of unlabelled colour type: time horizon t is set_lForeground appearance modelIt is made of K Gaussian component Gauss hybrid models, and kth (k ∈ K) a Gaussian component is availableIt indicates, wherein parameter WithPoint Not Wei Gaussian component weight, mean value and covariance matrix；The value of setup parameter λ, so that any known foreground color c^fIt falls In sectionMultilayer networks value p (c in range^f|μ_k,σ_k) it is 95% or more, it willPlace The Multilayer networks value being calculatedAsMinimum probability density threshold

Preferably, energy function defined in Optimized model construction unit, smooth item therein are cut for the figureIt is defined as follows:

Preferably, energy function defined in Optimized model construction unit, higher order term definition therein are cut for the figure It is as follows:

Preferably, the foreground extraction unit is using energy function described in max-flow/minimal cut algorithmic minimizing.

Preferably, the video pre-processing units determine prospect grid and background grid using following methods: when grid list When marked foreground pixel point number is more than or equal to the half of pixel total number in the grid cell in member, by the grid list Member is considered as prospect grid cell, similarly, determines background grid according to the accounting of background pixel point marked in grid cell Unit.

Preferably, the video pre-processing units are interactive image segmentation method to the dividing method of key frame.It is described Interactive image segmentation method can be Lazy Snapping or Grabcut etc..

Preferably, the video pre-processing units further include producing in key frame segmentation result to the dividing method of key frame Raw erroneous segmentation region carries out the operation of manual correction.

A kind of example of the present invention the utility model has the advantages that

Compared with prior art, the video sequence with key frame marker is mapped to the bilateral space of higher-dimension by the program, is subtracted It waits a little while,please the video data of processing, it is crucial then using the grid cell of non-empty as the node of figure and structure figures cut Optimized model It is to construct confidence active shape model by analysis Gaussian Profile rule, accurately estimates that each grid cell belongs to fore/background A possibility that；And higher order term is introduced in energy function, to enhance it is non-conterminous but with similar appearance feature node when Empty correlation.Finally, solving energy function using max-flow/minimal cut algorithm, the corresponding label value of each grid cell is obtained, The final label distribution for realizing video image vegetarian refreshments.The present invention can not only eliminate interference of the unfavorable factor to segmentation well, and And the video object segmentation with complex scene can quickly and be accurately handled, for example, can be used in rapidly processing The segmentation task of 480p and the above ultrahigh resolution video, the experimental results showed that, use this method to resolution ratio for the view of 4080p Frequency is handled, and the time for dividing every frame is about 0.35s, is substantially better than existing BVS in segmentation efficiency and quality (Bilateral Space Video Segmentation) method.

Detailed description of the invention

Fig. 1 is the principle framework figure of the bilateral video object dividing method of the specific embodiment of the invention；

Fig. 2 is the bilateral grid legend of specific embodiment of the invention building；

Fig. 3 is that the global figure of the specific embodiment of the invention cuts the schematic diagram of Optimized model；

Fig. 4 is that the higher order term of three classes nonoriented edge in Fig. 3 visualizes legend；

Fig. 5 is the partial video frame of instance of video；

Fig. 6 is segmentation result of the specific embodiment of the invention to video frame shown in Fig. 5；

Fig. 7 is segmentation result of the BVS method to video frame shown in Fig. 5；

Fig. 8 is the structural schematic diagram of the bilateral video object segmenting system of the specific embodiment of the invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.

A kind of bilateral video object dividing method as shown in Figure 1, comprising:

One, the pretreatment of video

Firstly, choosing multiple key frames in given video sequence, for example selected using the frame of fixed quantity as interval It takes, then the progress of the key frame is precisely divided, accurate marker goes out foreground pixel point and background pixel point.It can specifically adopt It is realized with the interactive image segmentation method of Lazy Snapping and Grabcut etc.；Or in key frame manually to it is preceding/ Background area is marked roughly, and by the mark information of automatic spread pixel, i.e., automatically label all colours difference is small In the adjacent unmarked pixel of given threshold value.And the erroneous segmentation region for being generated in key frame segmentation result, allow to use Family carries out manual correction to ensure the accuracy of key frame segmentation result.

According to the RGB color value and spacetime coordinate of each pixel, by its of the key frame of tape label and video sequence Each pixel p=[x, y, t] of remaining frame^TIt is mapped to the bilateral feature space of higher-dimension:Wherein, (c_r,c_g,c_b) indicate RGB color value；The two-dimensional spatial location coordinate of (x, y) corresponding each pixel, t indicate the time of video Coordinate.

Sample rate, i.e. the sample rate s of designated color value axis are set on every dimension_r, the sample rate s of solid axes_sWith The sample rate s of time coordinate axis_t, the division of rule is carried out to the bilateral feature space of higher-dimension, obtains bilateral grid Γ, i.e. grid list The set (as shown in Figure 2) of member.At this point, each pixel p can be mapped to corresponding grid cell v by formula (1)_i:

Γ([c_r/s_r],[c_g/s_r],[c_b/s_r],[x/s_s],[y/s_s],[t/s_t]) +=(I (x, y, t), 1) (1)

Wherein, [] is downward floor operation, for calculating the coordinate of grid cell；Homogeneous coordinates (I (x, y, t), 1) are used To count each grid cell v_iAccumulation color value and pixel number；I (x, y, t) indicates the color of each pixel Value.

Each grid cell v_iColor value calculate it is as follows:Wherein,It is grid cell v_iIn j-th The color value of pixel,It is v_iPixel total number.Obviously, the grid cell for empty (being free of pixel) does not need to count Its color value is calculated, is used without in subsequent video Target Segmentation.

Prospect grid cell and background net are determined according to the case where marked pixel for including in the grid cell of non-empty Lattice unit, specifically, when foreground pixel point number marked in grid cell is total more than or equal to pixel in the grid cell When the half of number, which is considered as prospect grid cell, using similar method, according to marked in grid cell The accounting of background pixel point can determine background grid unit.These prospects, background grid unit respectively constitute foreground seeds point Collect S^fWith background seed point set S^b, and identified background grid unit is then not necessarily to carry out subsequent calculating.

Further, since video may have, fore/background is similar, object of which movement is violent and the complex scenes such as object blocks, May thus exist simultaneously marked foreground pixel point and background pixel point in Partial Mesh unit, the present embodiment by these Grid cell is considered as the grid cell with conflict, and all pixels point therein is reset to unlabelled pixel, to keep away Lead to erroneous segmentation because of label collision when exempting from subsequent calculating.

Two, global figure cuts the definition of Optimized model

On the basis of video pre-filtering, all non-empty grid cells are clustered, so that the grid with similar features Unit belongs to same cluster, and every cluster is made of one or more grid cells, as shown in Fig. 2, different face in each grid cell Color or point of different shapes respectively represent different clusters, can specifically be realized by K-means algorithm.To all non-empty grid cells It is clustered, so that the grid cell with similar features belongs to same cluster, this mainly considers pixel in grid cell Feature consistency and grid cell between edge smoothing.One auxiliary node { A is defined to every cluster_i| i=1 ..., K'}, wherein k' is the number of cluster, and the corresponding k' value of different video sequence may be different.

Building is based on figure G=<V, ε>(global figure as shown in Figure 3 cuts Optimized model) of Markov random field (MRF), Wherein V=Γ ∪ { s, s'} ∪ { A_i, wherein Γ is that bilateral grid, s and s' are for the terminal that indicates " prospect " and " background " Node, A_iIndicate auxiliary node；The set ε on side include three classes nonoriented edge, respectively connection direct neighbor grid cell (or knot Point) between while n-links, connect between all grid cells and two terminal nodes while t-links and connection cluster in it is every The side a-links of one grid cell and corresponding auxiliary node, high-order visualize legend as shown in figure 4, in Fig. 4, are marked as Light color and the grid cell of the dark color cluster that respectively represent two different.

Then, video object segmentation can be exchanged into the two-value label assignment problem based on Markov random field, as often One unit grid v_i∈ Γ distributes unique two-value labelWherein, " 0 " is background；" 1 " is prospect.According to having constructed Figure G define energy function:

Wherein, data itemFor calculating any node v_iLabel value；Smooth itemFor calculating adjacent bonds Point v_iAnd v_jCost when distributing different labels；N is the neighborhood system in bilateral space；Higher order termIt is remote for increasing Temporal correlation between distance and non-conterminous node makes it possible to for not direct neighbor but with the knot of Similar color feature Point distributes identical label；The quantity of k' expression clustering cluster；Parameter lambda₁, λ₂(> 0) for balancing above three energy term.

Finally, can be obtained the optimal dividing of figure G using the above-mentioned energy function E (L) of max-flow/minimal cut algorithmic minimizing (see Fig. 3, in figure, " Min-Cut " representative " minimal cut algorithm "), to obtain the corresponding label value of each grid cell.To obtain Obtain video object segmentation result finally, it is also necessary to which the label value of all grid cells is distributed to all pictures in grid cell The label value of vegetarian refreshments, i.e. all pixels point is consistent with the label value of grid cell where it.

It wherein, is calculating data itemConfidence active shape model is constructed using all non-empty grid cells, with estimation Each grid cell belongs to a possibility that prospect or background.

The confidence active shape model includes dynamic foreground appearance modelWith dynamic background display modelRespectively by the foreground appearance model of all time horizonsWith background appearance modelComposition, wherein Γ_tIt is bilateral The time dimension of grid,WithFor gauss hybrid models (GMM).Specifically, as estimation time horizon t_lForeground appearance modelWhen, i.e., according to the average RGB color value of each non-empty grid cell training prospect GMM, however, due to these grid cell phases For current temporal decomposition level t_lPosition distance and its a possibility that belonging to prospect be different, it is therefore desirable to calculate each non-abortive haul Lattice unit v_iWeight:

In formula,It is grid cell v_iIn have been labeled as the number of foreground pixel point, for describing before it is used for building The percentage contribution of scape display model；Exp () is for measuring grid cell v_iPlace time horizon t '_lWith current temporal decomposition level t_lAway from From distance, wherein parameter γ is used to balance the exponential function；For calculating foreground pixel point in grid cell Ratio；

Similarly, it usesWithDifference alternate form (3)WithIt can estimate background Display model

A possibility that each grid cell belongs to fore/background is estimated using the active shape model constructed, it is specifically, right In any grid cell v_i, according to the time horizon t where it_lCorresponding prospect and background appearance model is selected to calculate before it belongs to A possibility that scape:

And a possibility that belonging to background as 1-P (v_i)。

When there is a situation where that user's interaction is insufficient, part fore/background color is similar, the active shape model constructed is not It can be fitted the distribution of color of fore/background well, this makes unmarked grid cell a possibility that belonging to fore/background of estimation be It is insecure, eventually lead to low-quality segmentation result.For this reason, it may be necessary to which it is unreliable to identify further to define color criterion Estimated result, enable adaptation to various forms of users and interactively enter.

Criterion one: the differentiation of unmarked colour type

When user's interaction is insufficient, the case where there are part colours classifications not by user's mark in video sequence.Due to lacking Weary enough prior informations, the fore/background active shape model of building cannot accurately describe the distribution of color of fore/background, this When the Multilayer networks value of unlabelled colour type that is calculated often all very little.It is assumed that time horizon t_lForeground appearance ModelThe gauss hybrid models being made of K Gaussian component, and kth (k ∈ K) a Gaussian component is availableTable Show, wherein parameterWithThe respectively weight of Gaussian component, mean value and covariance matrix.In general, any known Foreground color c^fFall in sectionMultilayer networks value p (c in range^f|μ_k,σ_k) it is 95% or more, Without the interval range Multilayer networks value p (c | μ_k,σ_k) it is less than 5%, wherein the value of parameter lambda is by within the scope of this Cumulative probability density estimation value calculate when reaching 95%.It can incite somebody to action as a result,The probability density that place is calculated is estimated EvaluationAs minimum probability density threshold

Similarly, according to known background color c^b, useIn replacement formula (5)It obtains Corresponding minimum probability density thresholdGiven time layer t_lIn any grid cell v_i, work as satisfactionAndWhen, then it is assumed that its corresponding color c belongs to the unlabelled colour type set of the time horizon

Criterion two: the differentiation of ambiguous colour type

In time horizon t_lIn, for the similar situation of part fore/background color, i.e., when prospect seed point collectionAnd background Seed point setWhen partly overlapping in color space, the color of these overlappings is regarded as having ambiguous color, certainly, These prospects, background seed point set can be used for the accuracy for the foreground and background display model that verifying has constructed.

To time horizon t_l, any color c is given, if the color belongs to foreground seeds point setButOr the color belongs to background seed point setButThen assert color c is Classified by mistake, and the colour type set of these ambiguous that the time horizon is then constituted by the color of mistake classificationAt this point, The additionally gauss hybrid models of one ambiguous color set of trainingTo time horizon t_lAny grid cell v_i, work as satisfactionWhen, then it is assumed that its corresponding color value c belongs to the colour type of ambiguous, wherein probability density threshold valueUtilize height This regularity of distribution calculates:

It is unreliable to will be identified that the grid cell of the colour type of unlabelled colour type or ambiguous is regarded as having The grid cell of fore/background probabilistic estimated value (is set as by distributing a unessential probability value for these grid cells 0.5), enhance the robustness of the confidence active shape model constructed with this.

Update a possibility that each grid cell belongs to prospect as a result, are as follows:

Wherein, unlabelled colour type setWith the colour type set of ambiguousRespectively by The unlabelled colour type set of all time horizonsWith the colour type set of ambiguousComposition.

Finally, data itemA possibility that prospect is belonged to by each grid cell that confidence active shape model is estimated Negative log Logarithmic calculation obtain:

The definition for being defined similarly as smooth item in traditional MRF energy function of smooth item, smooth itemValue by neighbour The node v of direct neighbor in domain system N_iAnd v_jBetween color difference be calculated:

Wherein,WithIt is grid cell v respectively_iAnd v_jThe total number of middle pixel, for describing corresponding grid unit Percentage contribution；WithIt is v_iAnd v_jLabel value；||c_i-c_j| | for calculating the colour-difference between two neighboring grid vertex It is different；β=(2 < (c_i-c_j)²>)^-1It is a constant, wherein<>indicates the desired value of sample in video sequence.It should be noted that Be, due to the color value of neighboring grid cells be often it is closely similar, the smooth item numerical value being calculated is all larger and compares It is close, thus smooth item can encourage to distribute same label for the grid cell of direct neighbor.In addition, when sample rate is arranged to s_r =256, s_s=1 and s_tWhen=1, the smooth itemIt is made equal based on the calculation of pixel.

Smooth item is mainly used to keep the space-time consistency between neighboring grid cells, but cannot handle well with mould The object boundary of paste and low contrast makes the segmentation result boundary obtained tend to excess smoothness.When extraction such as trees, bushes When Deng these objects with tiny profile, it is easy to appear segmentation object and the excessive non-matching phenomenon of actual profile, so that The accuracy of segmentation result is lower.There is the connectivity of the non-conterminous subject area of similar features by enhancing, to utilize phase Higher order term is defined away from the time-space relationship farther out but between the grid cell with similar appearance feature

The weighted value that side a-links is arranged is 1, higher order termIt may be defined as:

Wherein, φ () is an indicator function, for describe in cluster any unduplicated grid cell all with it is corresponding auxiliary Help the connection of node, C_k'It is the set of k' cluster.

Compared with prior art, the video sequence with key frame marker is mapped to the bilateral space of higher-dimension by the program, is subtracted It waits a little while,please the video data of processing, it is crucial then using the grid cell of non-empty as the node of figure and structure figures cut Optimized model It is to construct confidence active shape model by analysis Gaussian Profile rule, accurately estimates that each grid cell belongs to fore/background A possibility that；And higher order term is introduced in energy function, to enhance it is non-conterminous but with similar appearance feature node when Empty correlation；Finally, solving energy function using max-flow/minimal cut algorithm, the corresponding label value of each grid cell is obtained, The final label distribution for realizing video image vegetarian refreshments.Using the above method, unfavorable factor can not only be eliminated well to segmentation Interference, and can quickly and accurately handle the video object segmentation with complex scene.

By taking actual video sequence (resolution ratio 4080p) as an example, partial video frame is as shown in figure 5, using above-mentioned side Method carries out video object segmentation, and the segmentation result of corresponding video frame is shown in Fig. 6, and the time for dividing every frame is about 0.35s.And it uses existing Some BVS (Bilateral Space Video Segmentation) (Nicolas Marki, Federico Perazzi, Oliver Wang,Alexander Sorkine-Hornung:Bilateral Space Video Segmentation.CVPR 2016:743-751) method handles same video, and the segmentation result to video frame shown in Fig. 5 is as shown in fig. 7, segmentation The time of every frame is about 0.84s, hence it is evident that not as good as this programme.

A kind of illustrative examples of bilateral video object segmenting system, structural schematic diagram are as shown in Figure 8, comprising:

Confidence active shape model construction unit: for constructing confidence dynamic appearance mould according to all non-empty grid cells Type estimates a possibility that each grid cell belongs to fore/background, and defines color criterion according to Gaussian Profile rule, knows The grid cell of the colour type of not unlabelled colour type or ambiguous；The confidence active shape model includes dynamic prospect Display modelWith dynamic background display modelRespectively by all time horizon t_lForeground appearance modelWith background appearance modelComposition, wherein Γ_tIt is the time dimension of bilateral grid,WithFor gauss hybrid models (GMM)；For time horizon t_lIn any grid cell v_i, a possibility that belonging to prospect is calculated as follows, and belong to background can Energy property is 1-P (v_i)；

Video foreground extraction unit: for minimizing the energy function, the optimal dividing of figure G is obtained, to obtain every Each label value is distributed to all pixels point in corresponding grid cell, obtains video point by the corresponding label value of one grid cell Cut result.

The system of the present embodiment can be used for executing the technical solution of the embodiment of video object dividing method shown in FIG. 1, That the realization principle and technical effect are similar is similar for it, does not repeat to repeat herein.

Above embodiments are explanation of the invention, and still, the invention is not limited to specific in above embodiment Details, a variety of equivalent substitutes or simple variant side that those skilled in the art carries out within the scope of the technical concept of the present invention Formula is within the scope of protection of the invention.

Claims

1. a kind of bilateral video object dividing method characterized by comprising

(1) pretreatment of video: multiple key frames are chosen in given video sequence, it are split, accurate marker goes out Foreground pixel point and background pixel point；Each pixel of remaining frame of the key frame of tape label and video sequence is mapped to height Bilateral feature space is tieed up, the division of rule is carried out to the bilateral feature space of the higher-dimension, obtains bilateral grid, is grid cell Set；Prospect grid cell and background grid unit, institute are determined according to the case where marked pixel for including in grid cell Some prospect grid cells and background grid unit respectively constitute foreground seeds point set and background seed point set；It will be simultaneously comprising All pixels point in the foreground pixel point of label and the grid cell of background pixel point resets to unlabelled pixel；

(2) definition figure cuts Optimized model: constructing confidence active shape model using all non-empty grid cells, estimates each grid Unit belongs to a possibility that fore/background, and defines color criterion according to Gaussian Profile rule, identifies unlabelled color class The grid cell of other or ambiguous colour type；

The confidence active shape model includes dynamic foreground appearance modelWith dynamic background display modelRespectively by the foreground appearance model of all time horizonsWith background appearance modelComposition, wherein Γ_tIt is bilateral The time dimension of grid,WithFor gauss hybrid models；For time horizon t_lIn any grid cell v_i, belong to prospect A possibility that a possibility that being calculated as follows, belonging to background as 1-P (v_i)；

For being identified as the grid cell of the colour type of unlabelled colour type or ambiguous, P (v is set_i) it is 0.5；

On the basis of video pre-filtering, the figure G based on Markov random field is constructed, defines energy function, the energy letter Number includes data item, smooth item and higher order term, wherein the data item is estimated each according to the confidence active shape model Grid cell belongs to a possibility that prospect or background and is calculated；

(3) energy function is minimized, the optimal dividing of figure G is obtained, so that the corresponding label value of each grid cell is obtained, Each label value is distributed into all pixels point in corresponding grid cell, obtains Video segmentation result.

2. bilateral video object dividing method according to claim 1, which is characterized in that color criterion includes:

The differentiation of unlabelled colour type: time horizon t is set_lForeground appearance modelThe Gauss being made of K Gaussian component Mixed model, and kth (k ∈ K) a Gaussian component is availableIt indicates, wherein parameter WithRespectively Gauss Weight, mean value and the covariance matrix of component；The value of setup parameter λ, so that any known foreground color c^fFall in sectionMultilayer networks value p (c in range^f|μ_k,σ_k) it is 95% or more, it willPlace calculates The Multilayer networks value arrivedAsMinimum probability density threshold

Similarly, according to known background color c^b, useIn replacement formulaIt obtainsIt is corresponding most Small probability density thresholdTo time horizon t_lIn any grid cell v_i, work as satisfactionAndWhen, then recognize Belong to unlabelled colour type for its corresponding color；

The differentiation of the colour type of ambiguous: to time horizon t_l, any color c is given, if the color belongs to foreground seeds point set, butOr the color belongs to background seed point set, butThen assert that the color belongs to the face of ambiguous Color classification, these colors for belonging to the category constitute the colour type set of ambiguous, for training a gauss hybrid models So that for time horizon t_lAny grid cell v_i, work as satisfactionWhen, then it is assumed that its corresponding color value has ambiguous Property, wherein probability density threshold valueIt is calculated using Gaussian Profile rule:

3. bilateral video object dividing method according to claim 1, which is characterized in that smooth itemDefinition is such as Under:

Wherein,It is grid cell v respectively_i、v_jThe total number of middle pixel；WithIt is v_iAnd v_jLabel value；|| c_i-c_j| | for calculating the color difference between two neighboring grid vertex；β=(2 < (c_i-c_j)²>)^-1It is a constant, wherein <>indicates the desired value of sample in video sequence.

4. bilateral video object dividing method according to claim 1, which is characterized in that higher order term is defined as follows:

All non-empty grid cells are clustered, so that the grid cell with similar features belongs to same cluster, to every cluster Define an auxiliary node { A_i| i=1 ..., k'}, wherein k' be cluster number, connect cluster in all grid cells with it is corresponding Auxiliary node while and weighted value while this is set be 1, higher order termIs defined as:

Wherein, φ () is an indicator function, for describing any unduplicated grid cell v in cluster_i、v_jAll assisted with it The connection of node, C_k'It is the set of k' cluster,WithIt is v_iAnd v_jLabel value.

5. bilateral video object dividing method according to claim 1, which is characterized in that minimize the energy function and adopt It is realized with max-flow/minimal cut algorithm.

6. bilateral video object dividing method according to claim 1, which is characterized in that include according in grid cell The method that the case where marked pixel determines prospect grid and background grid, comprising: when prospect marked in grid cell When pixel number is more than or equal to the half of pixel total number in the grid cell, which is considered as prospect grid list Member similarly determines background grid unit according to the accounting of background pixel point marked in grid cell.

7. a kind of bilateral video object segmenting system characterized by comprising

Video pre-processing units: for choosing multiple key frames in given video sequence, it is split, accurate marker Foreground pixel point and background pixel point out；Each pixel of remaining frame of the key frame of tape label and video sequence is mapped to The bilateral feature space of higher-dimension carries out the division of rule to the bilateral feature space of the higher-dimension, obtains bilateral grid, is grid cell Set；Prospect grid cell and background grid unit are determined according to the case where marked pixel for including in grid cell, All prospect grid cells and background grid unit respectively constitute foreground seeds point set and background seed point set；To include simultaneously All pixels point in the grid cell of marked foreground pixel point and background pixel point resets to unlabelled pixel；

Confidence active shape model construction unit: for constructing confidence active shape model according to all non-empty grid cells, estimate It counts a possibility that each grid cell belongs to fore/background, and color criterion is defined according to Gaussian Profile rule, identification is not marked The grid cell of the colour type of the colour type or ambiguous of note；

Figure cuts Optimized model construction unit: for constructing the figure based on Markov random field on the basis of video pre-filtering G defines energy function, and the energy function includes data item, smooth item and higher order term, wherein the data item is according to A possibility that each grid cell of confidence active shape model estimation belongs to prospect or background is calculated；

Foreground extraction unit: for minimizing the energy function, the optimal dividing of figure G is obtained, to obtain each grid list Each label value is distributed to all pixels point in corresponding grid cell, obtains Video segmentation result by the corresponding label value of member.

8. bilateral video object segmenting system according to claim 7, which is characterized in that the confidence active shape model The color criterion of construction unit includes:

Similarly, according to known background color c^b, useIn replacement formulaIt obtainsCorresponding minimum Probability density threshold valueTo time horizon t_lIn any grid cell v_i, work as satisfactionAndWhen, then it is assumed that Its corresponding color belongs to unlabelled colour type；

The differentiation of the colour type of ambiguous: to time horizon t_l, any color c is given, if the color belongs to foreground seeds point set, butOr the color belongs to background seed point set, butThen assert that the color belongs to the face of ambiguous Color classification, these colors for belonging to the category constitute the colour type set of ambiguous, for training a gauss hybrid modelsSo that for time horizon t_lAny grid cell v_i, work as satisfactionWhen, then it is assumed that its corresponding color value has Ambiguity, wherein probability density threshold valueIt is calculated using Gaussian Profile rule:

9. bilateral video object segmenting system according to claim 7, which is characterized in that cut Optimized model for the figure Energy function defined in construction unit, smooth item thereinIt is defined as follows:

10. bilateral video object segmenting system according to claim 7, which is characterized in that cut optimization mould for the figure Energy function defined in type construction unit, higher order term therein are defined as follows: