CN104134217B

CN104134217B - Video salient object segmentation method based on super voxel graph cut

Info

Publication number: CN104134217B
Application number: CN201410366737.5A
Authority: CN
Inventors: 吴怀宇; 潘春洪; 郑荟
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-07-29
Filing date: 2014-07-29
Publication date: 2017-02-15
Anticipated expiration: 2034-07-29
Also published as: CN104134217A

Abstract

The invention discloses a segmentation method for a salient object in a video. The method includes the steps that first, a static saliency map is obtained by calculating static saliency of each frame in a video sequence through super pixels; second, light streams of an early frame and a later frame in the video sequence are calculated through super pixels, and a dynamic saliency map is obtained by calculating dynamic saliency of each frame; third, the static saliency map and the dynamic saliency map are fused to obtain a dynamic and static saliency map; fourth, an object similarity graph of each frame in the video sequence is calculated; fifth, time and space over-segmentation of the video sequence is calculated, and the static saliency value, the dynamic saliency vale and the object similarity value are respectively mapped to the time and space over-segmentation of the video; sixth, the segmentation energy function relevant to the saliency, the object similarity and the continuity are established, and the energy function is optimized through iterative graph cut on each video frame through the time and space over-segmentation on the super voxel level, binary segmentation is performed on each frame, and a salient foreground object is obtained.

Description

A kind of video obvious object dividing method being cut based on super voxel figure

Technical field

The present invention relates to technical field of computer vision is and in particular to a kind of video obvious object being cut based on super voxel figure Dividing method, the method be based on sound state is notable, the video obvious object dividing method of type objects and continuation.

Background technology

In video sequence, the segmentation of obvious object, as the basis of Video processing, has in the multiple fields of computer vision It is widely applied, such as video frequency abstract, Human bodys' response, video frequency searching, object identification in video, video activity analysis etc..Right A generality difficult problem for the segmentation of object in video sequence includes the motion of video camera, the motion of background and change, and prospect shows Write motion and the deformation of object itself.In video sequence, the segmentation of obvious object can be divided mainly into non-automatic segmentation and automatic segmentation Two big class.

Non-automatic segmentation：The method needs the participation of user, user be required to mark out manually video head frame or some Obvious object in key frame is as initialization data, every with obtaining video sequence using the mode of area tracking or propagation afterwards The obvious object segmentation of one frame.The shortcoming of the method is that manual mark is loaded down with trivial details and time-consuming, therefore is unsuitable for the larger reality of data volume Application.

Automatically split：The method has multiple implementations：1) method based on background subtraction：Mainly to background modeling and Update, frame and background image are held the pixel region differing greatly during difference obtains.This method is transported than less suitable for background The strong situation of dynamic acute variation.2) method based on cluster：As clustered using motion, trajectory clustering, space time information cluster etc., but The method is unsuitable for the complicated situation of object displacement, situation about can move if any object therein.3) it is based on The method of object motion, the method is typically first divided into, frame of video, the cluster that much may include object, then may at these Comprise to process segmentation in the cluster of object, the complexity of the method can be higher.

Although segmentation has been research problem for many years, due to sharply increasing of video data, to VS Automatically the demand of segmentation also increases therewith.And VS segmentation is inevitably in the face of background motion and change and prospect The uncertainty of object itself compound movement and deformation and difficulty.It is therefore desirable to offer is a kind of is applied to what domestic consumer used Low cost and the method for segmentation video obvious object convenient and that there is accuracy and practicality.

Content of the invention

In order to solve problem of the prior art, it is an object of the invention to provide one kind is based on " figure cuts (Graph cut) " Video obvious object dividing method.

In order to reach described purpose, present invention utilizes the outward appearance of object, motion, type objects and apersistence information Information structuring energy equation, decreases the interference of movement background, and utilizes image over-segmentation super-pixel and video space-time mistake Split super voxel to reduce the complexity of algorithm.

The video obvious object dividing method being cut according to super voxel figure proposed by the present invention, including step：

Step 1, splits to the obvious object in the first frame in video sequence, and this step further includes：Step 101, over-segmentation is carried out to this frame and obtains super-pixel；Step 102, to calculate static state significantly by the contrast and distribution of color characteristic Property figure；Step 1033, to calculate dynamic Saliency maps by the contrast of the magnitude of light stream with continuously；Step 104：Merge static aobvious Write figure and dynamic notable figure, obtain sound state notable figure；Step 105：Calculate the type objects of the first frame, calculate potentially each The ROI candidate region of individual object；Step 106, sound state notable figure and object ROI are merged, and filter unnecessary ROI area Domain；Step 107, with ROI region and sound state conspicuousness for weak constraint, constructs energy equation, is carried out with " figure cuts " of iteration Segmentation obtains the estimation of obvious object；Step 2, is carried out to the obvious object of each frame in addition to the first frame for the video sequence point Cut, this step further includes：Step 201：The estimation region of former frame is traveled to next frame as priori；Step 202：Right This frame uses step 101, and 102,103,105 are calculated various required middle level features values；Step 203：Calculate video when Empty over-segmentation, construction, with regard to outward appearance, moves, and the energy equation of type objects and continuation minimizes this energy with " figure cuts " Equation obtains obvious object segmentation.

Beneficial effects of the present invention：The present invention with utilizing the contrast of color and light stream based on image over-segmentation super-pixel And continuity respectively obtains static and dynamic notable figure, the use of super-pixel reduces the complexity of algorithm, and not only considers The method that Characteristic Contrast is also contemplated for being distributed also reduces the interference of some objects close with foreground color in background.Type objects Calculating further increased the foundation of segmentation, improve accuracy.And it is based on the super voxel of video space-time over-segmentation " figure cuts " method is used also to reduce further Space-time Complexity for unit, " figure cuts " is linear complexity in itself in addition, Such utilization makes the calculating cost of algorithm reduce, the equipment of practical costliness that need not be professional.Non-automatic with traditional The difference of VS dividing method be, the present invention, without the manual mark of professional, enables more high-quality simultaneously Obvious object segmentation in the video sequence of amount.

Brief description

The flow chart of the video obvious object dividing method that Fig. 1 is cut based on super voxel figure for the present invention；

Fig. 2A is the original image of frame of video single frames；

Fig. 2 B is the over-segmentation of frame of video single frames, i.e. the schematic diagram of super-pixel；

Fig. 3 is the schematic diagram of the static conspicuousness of frame of video；

Fig. 4 is the schematic diagram of the dynamic conspicuousness of frame of video；

Fig. 5 is the schematic diagram of the type objects of frame of video；

Fig. 6 is the schematic diagram of the Pixel-level type objects of frame of video；

Fig. 7 is the schematic diagram of the sound state conspicuousness of fusion of frame of video；

Fig. 8 is the significantly super voxel result schematic diagram of video；

The result schematic diagram that Fig. 9 is split for video obvious object；

Figure 10 is the result schematic diagram that dynamic and static notable figure merges, and is from left to right followed successively by frame of video artwork, dynamically notable Property figure, static Saliency maps, merge the sound state notable figure obtaining：

Figure 11 is segmentation result figure, and what Far Left lines were irised out is cut zone, is then sound state from left to right respectively Conspicuousness merges figure, dynamic notable figure, static notable figure, type objects figure.

Specific embodiment

The present invention will be described in detail below it is noted that described embodiment is intended merely to facilitate to this Bright understanding, and any restriction effect is not risen to it.

The present invention is to notable in video sequence based on sound state conspicuousness, type objects and continuation using " figure cuts " The method that object is split.The method is divided into two stages, the processing stage to first frame and the dividing processing to each frame.The One stage be to video head frame pretreatment obtain first frame obvious object region estimate because due to the limitation in first frame timing Property and its propagate on importance, therefore first frame has been carried out pretreatment to expect to reach more accurate result；Second-order Section be frame of video is processed one by one and obtains each frame obvious object segmentation, the step for be core procedure, wherein energy equation Design around object outward appearance, motion, type objects and continuation are intended to reduce background change and object self-deformation and fortune The dynamic impact waiting interference.

The method according to the invention, first passes through the obvious object region estimation that pretreatment obtains first frame, then using super Pixel calculates the static conspicuousness of each frame in video sequence, obtains static notable figure；Calculated in video sequence using super-pixel The light stream of two frames before and after every, calculates the dynamic conspicuousness of each frame, obtains dynamic notable figure；To static notable figure and dynamically notable Property carries out fusion treatment and obtains sound state notable figure；Calculate the type objects figure of each frame in video sequence；Calculate video sequence The super voxel of space-time over-segmentation, and the static saliency value of pixel scale, dynamic saliency value and type objects value are reflected respectively It is mapped in the space-time over-segmentation of video；Set up with regard to conspicuousness, the segmentation energy function of type objects and continuation, in space-time Over-segmentation rank to optimize this energy function to each frame of video using " figure cuts " and to carry out binary segmentation to each frame, obtains notable Foreground object.

Fig. 1 is the video obvious object dividing method being cut based on super voxel figure of the present invention.

According to the video obvious object dividing method of the present invention, comprise the steps of：

Step 1, first, carries out over-segmentation to each two field picture in video sequence using K-MEANS algorithm and obtains super picture Element.Super-pixel schematic diagram is as shown in Figure 2.

In this step, the lab color based on each two field picture and position coordinates x, the 5 dimension information of y, to having Similar color And more neighbouring pixel clustered, obtain the over-segmentation of single-frame images, wherein lab value refers to 3 dimensions of lab color space Degree, x, y are the transverse and longitudinal coordinate of pixel；The approximate over-segmentation of the color similarity obtaining and space.Fig. 2 is the schematic diagram of over-segmentation. Because overdivided region remains the effective information carrying out image segmentation further mostly, and typically will not destroy objects in images Boundary information, can directly come to carry out process to image in super-pixel to reduce calculating cost.

Step 2, calculates the static notable figure of frame of video and dynamic notable figure.

In this step, described static state notable figure and dynamic notable figure are required for first calculating the Saliency maps of center surrounding contrast And the Saliency maps that distribution is compact.Static notable figure first precalculated be the Saliency maps of color contrast and colour consistency divides The Saliency maps of cloth, and finally static notable figure is both fusions；Dynamic Saliency maps are also by calculating light stream value in the same manner Contrast Saliency maps and the notable figure of motion continuity of light stream value merge and obtain.

Static color contrast is calculated as follows：

Wherein, N is the total number of frame of video over-segmentation；Cs_jFor the static color contrast of j-th super-pixel, the value model of j Enclose and arrive N for 1；c_jFor the Lab color average of j-th super-pixel, c_kFor the Lab color average of k-th super-pixel, k takes from 1 To N；p_jFor the position mean of all pixels in j-th super-pixel, the span of j is 1 to N, p_kFor k-th super-pixel All pixels position mean, k gets N from 1；w(p_j, p_k) it is coefficient with regard to position relationship, often both could be arranged to Number 1 it can also be provided that with position relationship between super-pixel (distance) change weight, arrange herein this coefficient be Gauss power Weight||c_j-c_k| | for c_jWith c_kDifference, the bigger Cs of its difference_jStatic color Contrast is bigger, and contrast is more big just to mean that this super-pixel is unique in terms of color.

The contrast equation of dynamic motion magnitude is as follows：

Wherein, Cm_jDynamic motion contrast for j-th super-pixel；p_jIt is similarly all pixels in j-th super-pixel Position mean, the span of j is 1 to N, and N is the total number of frame of video over-segmentation, p_kFor k-th super-pixel all The position mean of pixel, k gets N from 1；w(p_j, p_k) both could be arranged to constant 1 for coefficient it can also be provided that with framing bits Put the weight of relation (distance) and change, this coefficient is again arranged to Gauss weight herein；Hf_jLight for j-th super-pixel Flow stage histogram, the span of j arrives N, Hf for 1_kFor the light stream magnitude histogram of k-th super-pixel, k gets N from 1, this Light stream magnitude histogram depth involved by civilian algorithm is 2, and that is, ground floor is the light stream magnitude histogram in abscissa direction, the Two layers be ordinate direction light stream magnitude histogram, it is motion size that the setting of such histogram not only allows for light stream magnitude Distribution, also consider the direction of motion to a certain extent simultaneously；D(Hf_j, Hf_k) it is light stream magnitude histogram Hf_jWith Hf_kCard Square distance, due to card side's distance span be 0 arrive just infinite, here thus make use of negative exponential function by 0 arrive just infinite Card side's distance mapping 0 to 1, in order to calculate, thus, light stream magnitude histogram Hf_jWith Hf_kCard side distance bigger, Cm_jDynamic State motion contrast is also bigger, and contrast means that more greatly this super-pixel is unique in terms of exercise intensity.

The computing formula of the static compact change degree of distribution is as follows：

Wherein, Ds_jFor the compact change degree of static distribution of j-th super-pixel, the change spatially of j-th super-pixel is got over Low, Ds_jThen more low this super-pixel i.e. is spatially more compact for value；w(c_j, c_k) be with regard to super-pixel between color similarity coefficient, It both can be set for constant 1 it is also possible to arranging it is the weight changing with super-pixel color similarity, this coefficient had been set herein For Gauss weightp_kFor k-th super-pixel all pixels position average Value；N is the total number of frame of video over-segmentation；Andμc_jRepresent and j-th super-pixel There is the position mean of the super-pixel of Similar color.

The computing formula of dynamic motion consecutive variations degree is as follows：

Wherein, Dm_jDynamic motion consecutive variations degree for j-th super-pixel；w(Hf_j, Hf_k) be with regard to super-pixel between amount of exercise The coefficient of level histogram similarity, Hf_jFor the light stream magnitude histogram of j-th super-pixel, Hf_kFor the light stream magnitude histogram of k-th super-pixel, D (Hf_j, Hf_k) it is light stream magnitude histogram Hf_jWith Hf_kBetween card side away from From light stream magnitude histogram Hf_jWith Hf_kMore dissimilar, w (Hf_j, Hf_k) value bigger；Andμm_j Represent and have and Hf_jThe mean value of the position of the histogrammic over-segmentation of similar light stream magnitude, wherein p_kInstitute for k-th super-pixel There is the mean value of location of pixels.

Static notable figure Ss is merged by static color contrast Cs and static distribution compactness Ds, and fusion formula is：

Wherein, Ss_jFor the static conspicuousness of j-th super-pixel, Cs_jFor the static color contrast of j-th super-pixel, Ds_jFor The compact change degree of static distribution of j-th super-pixel；Cs_jBigger and Ds_jLess, then Ss_jValue is bigger.

Dynamic notable figure Sm is merged by dynamic motion contrast Cm and dynamic motion continuation degree Dm, and fusion formula is：

Wherein, Sm_jFor the dynamic conspicuousness of j-th super-pixel, Cm_jFor the dynamic motion contrast of j-th super-pixel, Dm_jFor The dynamic motion consecutive variations degree of j-th super-pixel；Cm_jBigger and Dm_jLess, then Sm_jValue is bigger.

The schematic diagram of the static conspicuousness of frame of video as shown in figure 3, but the schematic diagram of the dynamic conspicuousness of frame of video such as Shown in Fig. 4.

Step 3, the fusion of the static notable figure of execution and dynamic notable figure.

The strategy taken in this step is that static Saliency maps Ss is complemented one another with dynamic Saliency maps Sm, due to the mankind Notice is easier passive movement and is attracted, therefore the region with very high motion conspicuousness retains, and shows without very high motion The region of work property is likely to the noise that optical flow algorithm or background motion bring, and they need to be combined to examine with static notable figure Amount, fusion formula is as follows：

Wherein, Sal_jDynamic and static conspicuousness for j-th super-pixel merges obtained sound state saliency value, Ss_jFor jth The static significance value of individual super-pixel, Sm_jDynamic significance value for j-th super-pixel.And Ts is to arrange to obtain very high threshold value, Why herein Ts is carried out with the setting of very high threshold value, Ts is set to 0.8 herein, be that to consider motion preferential former first Then, retain the region that those have high motion conspicuousness；Secondly, it is in order that those have ambiguous motion conspicuousness The region of value can obtain the correction of static conspicuousness, reduces the impact that light flow noise and camera lens movement are brought；? Afterwards, the impact increasing motion conspicuousness in the case of motion conspicuousness very little makes the right of the obvious object in its suppression background The interference of prospect obvious object.

Fig. 7 is that the static conspicuousness of the dynamic notable figure of frame of video merges the sound state schematic diagram obtaining.

Step 4, calculates the type objects of frame of video.

In this step, the result of calculation of first frame type objects can be slightly different, except each frame obtaining of will calculating The type objects figure of Pixel-level, the ROI region to similar object candidate that video sequence also wants to, input here is except bag Include and obtain color contrast before this and super-pixel also inputs the boundary information obtaining using the detection of Canny operator.These three inputs are all Closely bound up with object, wherein color contrast represents the contrast of foreground object color and background；And each mistake of super-pixel Segmentation all represents the color that maintain boundary information with controlling region, and therefore one over-segmentation belongs to the possibility pole of same object Greatly；In addition, border is similarly the important attribute of object.Then obtained using the type objects detector based on Bayesian model Final candidate's ROI region Ro and its type objects value O that potentially include object, and the probability that intermediate result obtains then output pixel The type objects figure of level.

Fig. 5 is the ROI schematic diagram of the type objects of frame of video, and Fig. 6 is the type objects of pixel scale of frame of video Schematic diagram.

Step 5, the screening of type objects candidate's ROI region.

In this step, first, sound state notable figure be processed, the region with 0.5 as threshold value, more than or equal to 0.5 Retain, other give up, obtain conspicuousness be more than 0.5 notable figure R_h, for convenience after operation, need the threshold obtaining Value notable figure binaryzation.The present invention uses, using unrestrained water filling algorithm, image link field is filled to 1, after by remaining area Domain is set to 0.After obtaining the threshold value notable figure of binaryzation, it is carried out morphologic open operation, that is, at expansion after first burn into Reason, to remove the less bright areas of area, reduces the interference of noise.

Thereafter, for connected region R_S, matching covers their ROI region：By horizontal and vertical scanning, the company of finding Logical region R_SUltra-left point, rightest point, peak and minimum point, ((x_l, y_l), (x_r, y_r), (x_u, y_u), (x_d, y_d)), wherein x_l, y_lFor the transverse and longitudinal coordinate of high order end point, x_r, y_rFor the transverse and longitudinal coordinate of low order end point, x_u, y_uTransverse and longitudinal for the point of the top is sat Mark, x_d, y_dTransverse and longitudinal coordinate for the point of bottom.And its ROI region R of covering of matching_S4 apex coordinates (counterclockwise side To) it is ((x_l-0.05(x_r-x_l), y_u), (x_l-0.05(x_r-x_l), y_d), (x_r+0.05(x_r-x_l), y_d), (x_r+0.05(x_r-x_l), y_u)), all widen 5% in left and right here, also increased 5% up and down, wherein x_l-0.05(x_r-x_l), y_uROI square for matching The transverse and longitudinal coordinate of shape region upper left end points, x_l-0.05(x_r-x_l), y_dFor the transverse and longitudinal coordinate of rectangle lower-left end points, x_r+0.05(x_r- x_l), y_dFor the transverse and longitudinal coordinate of rectangle bottom right end points, x_r+0.05(x_r-x_l), y_dFor the transverse and longitudinal coordinate of upper right end points,.

Be made afterwards be exactly to include object candidate's ROI region Ro preliminary screening, first to calculate each may The region area intersecting with Rs including candidate's ROI region Ro of object, and calculate its contrast with itself area, this ratio should be big In threshold value To；Except considering Ro_jCandidate region to saliency value and marking area occur simultaneously, and also a screening criteria is exactly, and wish Hope that it can surround marking area as far as possible, and require here to calculate candidate Ro_jThe area ratio of the region area intersecting with Rs and Rs More than threshold value Ts, it is shown below：

R={ Ro_j|area(Ro_j∩Rs)÷area(Ro_j) ＞ To ∧ area (Ro_j∩ Rs) ÷ area (Rs) ＞ Ts } (8)

Wherein, the Ro that R is filtered out by above formula_iThe set in region, Ro_iRepresent i-th candidate ROI region area (Ro_i∩ Rs) represent candidate's ROI region Ro_iThe size in the region intersected with marking area Rs, area (Ro_i) represent candidate's ROI region Ro_iSize, area (Rs) represents the size of candidate marking area Rs, To and Ta be all threshold value；The sieve of this step Choosing is primarily to exclude some substantially non-compliant candidate regions, to reduce the calculating of the finer screening of next step Amount.

Finally, to each of the R after screening candidate's ROI region, calculate and complete the super-pixel set In in its region Saliency value distribution histogram Hin, and calculate round the super-pixel collection that In is outside its region or a part is outside its region Close the saliency value Sal distribution histogram Hsu of Su, and calculate the super-pixel collection of the outmost turns that are in In set adjacent with Su set Close the saliency value Sal distribution histogram Hbu of Bu；Calculate the contrast of Hin and Hsu afterwards, and the contrast of Hsu and Hbu, due to ROI Interior super-pixel is bigger with the significance value difference in distribution bigger explanation possibility in its region for the object around super-pixel, and interior Circle agrees with better with its region of the bigger explanation of saliency value difference in distribution around super-pixel with object boundary.Finally, this calculation The ROI region selecting to have corresponding to maximum differential value Diff is final candidate's ROI region by method, the computing formula of Diff value It is shown below：

Diff_j=(1-e^{- D (Hsu, Hin)})+α(1-e^{- D (Hsu, Hbu)})²(9)

Wherein, Diff_iRepresent the difference value of i-th candidate's ROI region；Hin_iRepresent super in i-th candidate's ROI region The significance value distribution histogram of pixel set In；Hsu_iRepresent round the super-pixel set In in i-th candidate's ROI region Super-pixel set Su significance value distribution histogram, these super-pixel are outer or a part is in area in i-th candidate's ROI region Overseas；Hbu_iRepresent in i-th candidate's ROI region and gather direct neighbor with Su, that is, be in the super of " outmost turns " that In gathers The significance value distribution histogram of pixel set Bu.Because the scope of card side's distance is 0 to arrive just infinite, and 1-e^{- D (Hsu, Hin)}And 1- e^{- D (Hsu, Hbu)}Scope all 0 to 1, because the important ratio that body form irregular rectangle, therefore border agree with is relatively low, So having carried out square processing and being multiplied by the factor alpha less than 1 to Section 2 contrast.Choose the Diff value in R with maximum ROI candidate region is the final ROI region estimated.

Step 6, the obvious object segmentation work of first frame.

In order to split the obvious object obtaining first frame, to do is to below build energy equation：

E (X)=A (X)+O (X)+AC (X)+OC (X) (10)

Wherein, E (X) is the energy equation in units of super-pixel, and X is super-pixel set, and A (X) is object outward appearance (appearance) unitary item, O (X) is type objects (objectness) unitary item, and AC (X) is color binary item, and OC (X) is Type objects binary item.

A (X) is the unitary item with regard to object outward appearance (appearance), and first first frame will be clustered with two RGB color height This mixed model (GMM), one of GMM be fusion obtained in the previous step sound state conspicuousness be more than 0.5 region R_hAsk Its gauss hybrid models FG, GMM are background models BG for remaining region.Because GMM can calculate according to data generally Rate density, you can to do density estimation (density estimation), the therefore effect of GMM here is to extrapolate to give super picture Element becomes the size of the probability of a prospect or background.If an over-segmentation and prospect are mated very much, but it is marked as background (mark Background is 0, and prospect is 1) i.e. if 0, its penalty value will be very big：

Wherein, A (X) is object outward appearance unitary item,It is to super-pixel x_iMark (mark 0 be background, 1 be prospect),For potential-energy function (potential functions), p (x_i∈ FG), p (x_i∈ BG) it is respectively super-pixel x_iBelong to The probability of prospect FG, and its belong to the probability of background BG.

And O (X) is the unitary item with regard to type objects (objectness), regarding outside the ROI that finally gives depending on previous step For background, and for possible object in ROI, in the same manner, calculate the GMM model of objectness, this type objects for it (objectness) design of unitary item is similar with outward appearance (appearance) unitary item：

Wherein, O (X) is object outward appearance unitary item,It is to super-pixel x_iMark (mark 0 be background, 1 be prospect),For potential-energy function (potential functions), p (x_i∈ OBJ), p (x_i∈ OBG) it is respectively super-pixel xi genus In the probability of possible object OBJ, and its probability belonging to background OBG outside object.

And the setting of binary item is then the relation of concern over-segmentation and over-segmentation, it is not connect mutually between neighborhood over-segmentation Continuous cost and punishment, if two neighborhood over-segmentation difference very littles, then what it belonged to same target or same background can Energy property is just very big, if their difference is very big, that illustrates the edge likely in target and background for this two over-segmentations Part, then the possibility ratio being partitioned from is larger, so when two neighborhood over-segmentation difference are bigger, energy is less.

It is binary item AC (X) of the discontinuous punishment of concern appearance color first, and its more big this discontinuous difference of distance The impact bringing can weaken, and its formula is as follows：

Wherein, AC (X) is color binary item,For K_ijCoefficient,For 1, It is the Euclidean distance between two super-pixel midpoints for 0, dist, dcor is the difference of the color average of over-segmentation, γ It is coefficient with β.

In the same manner, binary item OC (X) of the discontinuous punishment of concern type objects is similar with color binary item, needs first here Use the Pixel-level type objects figure that the 4th step is calculated, then the type objects value of Pixel-level is mapped one by one according to position To in over-segmentation；Binary item OC (X) formula is as follows：

Wherein, OC (X) is type objects binary item,For K_ijCoefficient,For 1, It is the Euclidean distance between two super-pixel midpoints for 0, dist, dobj is the difference of the type objects value of over-segmentation, γ and β For coefficient.

After establishing this energy equation, t-link (connection of node and terminal node) and n-link are (between node Connection) all establish, there has been the figure needed for " figure cuts ", it is possible to use " figure cuts " minimizes energy equation being split ?.It is used herein the thought of the iteration similar to Grab cut, each iterative process all makes to target and background modeling The parameter of GMM is more excellent so that image segmentation is more excellent.Thus, finally can get the obvious object segmentation of first frame.

Step 7, will be undue for space-time that the static conspicuousness of each frame, dynamic conspicuousness and type objects are mapped to video Cut：

This step carries out the space-time over-segmentation of video initially with supervoxel method to video, obtains Supervoxel, i.e. super voxel；Then static conspicuousness will be obtained through the 1st step and the 2nd step, dynamic conspicuousness, with and pixel The type objects of level are mapped to supervoxel over-segmentation one by one according to position, and calculate being wrapped of each supervoxel respectively The static conspicuousness of all pixels including, the mean value of dynamic conspicuousness and type objects is as the static state of this supervoxel Significance value, dynamic conspicuousness and type objects value.

Fig. 8 is the significantly super voxel over-segmentation result schematic diagram of video.

Step 8, the obvious object segmentation of each frame.

In order to split the significant object obtaining each frame after first frame, below we are to be done is still structure energy side Journey, but the energy equation of energy equation here and first frame is slightly different, its equation is as follows：

EF (V)=AF (V)+ACF (V)+OCF (V)+PCF (V) (21)

Wherein, EF (V) is the energy equation in units of supervoxel, and V gathers (i.e. super voxel collection for supervoxel Close), AF (V) is object outward appearance (appearance) unitary item, and ACF (V) is color binary item, and OCF (V) is type objects binary , PCF (V) is continuation binary item.

Wherein AF (V) is still the unitary item with regard to object outward appearance (appearance), A (X) in its definition and formula (10) Define similar.The motion assuming initially that the obvious object between two frames is smooth and gentle, here with dynamically showing calculating The light stream obtaining during work property, the obvious object obtaining that former frame is split calculates notable thing using the direction of light stream and speed The displacement of each pixel in body region, and calculate its position in next frame.In order to accelerate the speed of algorithm, the node of figure Unit is that the space-time of video clusters super voxel (supervoxel) rather than in Pixel-level operations, just includes all here The set of the super voxel of space-time over-segmentation of the pixel that previous frame is propagated is as possible prospect obvious object, remaining region For background.Two RGB color gauss hybrid models (GMM) are clustered respectively to this two regions, sets up foreground model FG and background Model B G.Its formula is as follows：

Wherein, AF (X) is object outward appearance unitary item,It is super voxel v to space-time over-segmentation_iMark (mark 0 is Background, 1 is prospect),For potential-energy function, p (v_i∈ FG), p (v_i∈ BG) it is respectively v_iBelong to prospect FG probability and It belongs to the probability of background BG.

And the setting of binary item ACF (V) is almost consistent with AC (X) setting in formula (10), it is not both the node of simply figure not It is super-pixel (superpixel) again but super voxel (supervoxel), wherein dcor represents the super voxel of space-time over-segmentation The color average of all pixels difference, its formula is as follows：

Wherein, ACF (V) appearance color binary item,For K_ijCoefficient,For 1,It is the Euclidean distance between space-time over-segmentation i.e. super voxel midpoint for 0, dist, dcor is two space-time mistakes Split the difference of the color average of super voxel, γ and β is coefficient.

In the same manner, the setting of binary item OCF (V) is almost consistent with OC (X) setting in formula (10), here firstly the need of using the The type objects value of Pixel-level is then mapped to over-segmentation according to position by Pixel-level type objects figure that 4 steps are calculated one by one On, wherein dobj is the difference of the average type objects value of the super voxel in field, and its formula is as follows：

Wherein, OCF (V) is type objects binary item,For K_ijCoefficient,For 1,It is the Euclidean distance between space-time over-segmentation midpoint for 0, dist, dobj is the type objects value of super voxel Difference, γ and β is coefficient.

As it is assumed that the obvious object in video is gentle in the fluent motion of interframe, that is, it has continuation (Persistence), thus devise binary item PCF (V) with regard to continuation, pay close attention to the interframe of sequential over-segmentation continuous Property.If by the over-segmentation mark of the corresponding previous frame of super voxel very high with previous frame continuity and that outward appearance is much like not With the punishment that it is subject to is larger；Conversely, the super voxel that if two interframe continuities are very high in front and back and outward appearance is much like obtained identical Mark, then the punishment that it is subject to is less.The continuity degree of two super voxels presses optical flow computation by pixel in the super voxel of previous frame The sum of the number of pixels being displaced in the super voxel of next frame is divided by the ratio that obtains of sum of all pixels of this super voxel (the former) Represent, this is represented by pers than row.Its formula is as follows：

K_ij=γ pers (v_i, v '_j)exp(-βdcor(v_i, v '_j)²) (30)

Wherein, PCF (V) is continuation binary item, and in following formula, v represents the super voxel of present frame, and v ' represents present frame Super voxel in former frame,For K_ijCoefficient,For 1,It is two super bodies for 0, dcor The difference of the color average of element, γ and β is coefficient, pers (v_i, v '_j) continuity degree of two super voxels of frame before and after calculating, Space-time over-segmentation v ' by previous frame_jMiddle pixel is displaced to next frame over-segmentation v by optical flow computation_iIn sum of all pixels divided by v '_jIn The ratio that obtains of sum of all pixels represent.

After establishing this energy equation, t-link (connection of node and terminal node) and n-link are (between node Connection) all establish, there has been the figure needed for " figure cuts ", it is possible to use " figure cuts " minimizes energy equation being split ?.It is used herein the thought of the iteration similar to Grab cut, each iterative process all makes to target and background modeling The parameter of GMM is more excellent so that image segmentation is more excellent.Thus, can get the obvious object segmentation of each frame.Fig. 9 is in frame of video The figure of obvious object cut result schematic diagram.

Particular embodiments described above, has carried out detailed further to the purpose of the present invention, technical scheme and beneficial effect Describing in detail bright it should be understood that the foregoing is only the specific embodiment of the present invention, being not limited to the present invention, all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement done etc., should be included in the protection of the present invention Within the scope of.

Claims

1. the video obvious object dividing method that a kind of super voxel figure cuts, the method comprises the following steps：

Step 1, splits to the obvious object in the first frame in video sequence, and this step further includes：

Step 101, carries out over-segmentation to this frame and obtains super-pixel；Step 102, to be calculated by the contrast and distribution of color characteristic Static Saliency maps；Step 103, to calculate dynamic Saliency maps by the contrast of the magnitude of light stream with continuously；Step 104, melts Close static notable figure and dynamic notable figure, obtain sound state notable figure；Step 105, calculates the type objects of the first frame, calculates The ROI candidate region of each object potential；Step 106, sound state notable figure and object ROI is merged, filtration need not The ROI region wanted；Step 107, with ROI region and sound state conspicuousness for weak constraint, constructs energy equation, with iteration " figure cuts " carries out splitting the estimation obtaining obvious object；

Step 2, splits to the obvious object of each frame in addition to the first frame for the video sequence, this step further includes：

Step 201, the estimation region of former frame is traveled to next frame as priori；Step 202：Step 101 is used to this frame, 102,103,104,105 are calculated various required middle level features values；Step 203, calculates the space-time over-segmentation of video, construction With regard to the energy equation of outward appearance, motion, type objects and continuation, minimize this energy equation with " figure cuts " and obtain notable thing Body is split.

2. the method for claim 1 is it is characterised in that described step 101 further includes：Based on each two field picture Lab color and position x, the information of y clusters to having a Similar color and more neighbouring pixel, obtains single-frame images Over-segmentation, i.e. super-pixel, wherein lab value refer to 3 dimensions of lab color space, and x, y are the transverse and longitudinal coordinate of pixel.

3. it is characterised in that step 102,103 further include the method for claim 1：Described static state notable figure and Dynamic notable figure all needs first to calculate the Saliency maps of center surrounding contrast and is distributed compact Saliency maps, and static notable figure is first Precalculated is the Saliency maps of color contrast and the Saliency maps of colour consistency distribution, and finally static notable figure is for both Fusion；Dynamic Saliency maps are also aobvious by the motion continuity contrasting Saliency maps and light stream value calculating light stream value The fusion of work figure obtains.

4. the method for claim 1 is it is characterised in that step 104 further includes：Analyze dynamic notable figure and static state The respective advantage of notable figure and deficiency, adopt threshold value to control to merge static notable figure and dynamic notable figure with piecewise function, obtain To sound state notable figure.

5. the method for claim 1 is it is characterised in that step 105 further includes：Using the inspection of type objects detector Survey the ROI region whether this frame is object.

6. the method for claim 1 is it is characterised in that step 106 further includes：Using ROI region to sound state The level of coverage of marking area, to filter out some type objects ROI candidate, screens the ROI region that may comprise obvious object.

7. the method for claim 1 is it is characterised in that in step 107, setting up and show with regard to ROI region and sound state Write the energy equation of figure, optimizing using " figure cuts " of iteration makes this energy equation minimum, minimizes segmentation cost.

8. the method for claim 1 is it is characterised in that in step 201, the obvious object cut zone that former frame obtains Displacement will be estimated based on the direction of motion of light stream and magnitude, travel to next frame.

9. the method for claim 1 is it is characterised in that in step 202, based on conspicuousness, color contrast, rim detection Information, calculates the type objects figure of pixel scale.

10. the method for claim 1 is it is characterised in that in step 203, construction energy equation further includes to be based on The prior estimate construction continuation binary item of the former frame that step 202 is propagated, based on sound state Saliency maps constructed object outward appearance Unitary item, based on appearance color construction with regard to color successional binary item, constructs the binary with regard to object based on type objects ?；Last still optimization using " figure cuts " of iteration makes this energy equation minimum, minimizes segmentation punishment, thus obtain binary dividing Cut.