CN104134217B - Video salient object segmentation method based on super voxel graph cut - Google Patents

Video salient object segmentation method based on super voxel graph cut Download PDF

Info

Publication number
CN104134217B
CN104134217B CN201410366737.5A CN201410366737A CN104134217B CN 104134217 B CN104134217 B CN 104134217B CN 201410366737 A CN201410366737 A CN 201410366737A CN 104134217 B CN104134217 B CN 104134217B
Authority
CN
China
Prior art keywords
frame
segmentation
notable
super
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410366737.5A
Other languages
Chinese (zh)
Other versions
CN104134217A (en
Inventor
吴怀宇
潘春洪
郑荟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410366737.5A priority Critical patent/CN104134217B/en
Publication of CN104134217A publication Critical patent/CN104134217A/en
Application granted granted Critical
Publication of CN104134217B publication Critical patent/CN104134217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a segmentation method for a salient object in a video. The method includes the steps that first, a static saliency map is obtained by calculating static saliency of each frame in a video sequence through super pixels; second, light streams of an early frame and a later frame in the video sequence are calculated through super pixels, and a dynamic saliency map is obtained by calculating dynamic saliency of each frame; third, the static saliency map and the dynamic saliency map are fused to obtain a dynamic and static saliency map; fourth, an object similarity graph of each frame in the video sequence is calculated; fifth, time and space over-segmentation of the video sequence is calculated, and the static saliency value, the dynamic saliency vale and the object similarity value are respectively mapped to the time and space over-segmentation of the video; sixth, the segmentation energy function relevant to the saliency, the object similarity and the continuity are established, and the energy function is optimized through iterative graph cut on each video frame through the time and space over-segmentation on the super voxel level, binary segmentation is performed on each frame, and a salient foreground object is obtained.

Description

A kind of video obvious object dividing method being cut based on super voxel figure
Technical field
The present invention relates to technical field of computer vision is and in particular to a kind of video obvious object being cut based on super voxel figure Dividing method, the method be based on sound state is notable, the video obvious object dividing method of type objects and continuation.
Background technology
In video sequence, the segmentation of obvious object, as the basis of Video processing, has in the multiple fields of computer vision It is widely applied, such as video frequency abstract, Human bodys' response, video frequency searching, object identification in video, video activity analysis etc..Right A generality difficult problem for the segmentation of object in video sequence includes the motion of video camera, the motion of background and change, and prospect shows Write motion and the deformation of object itself.In video sequence, the segmentation of obvious object can be divided mainly into non-automatic segmentation and automatic segmentation Two big class.
Non-automatic segmentation:The method needs the participation of user, user be required to mark out manually video head frame or some Obvious object in key frame is as initialization data, every with obtaining video sequence using the mode of area tracking or propagation afterwards The obvious object segmentation of one frame.The shortcoming of the method is that manual mark is loaded down with trivial details and time-consuming, therefore is unsuitable for the larger reality of data volume Application.
Automatically split:The method has multiple implementations:1) method based on background subtraction:Mainly to background modeling and Update, frame and background image are held the pixel region differing greatly during difference obtains.This method is transported than less suitable for background The strong situation of dynamic acute variation.2) method based on cluster:As clustered using motion, trajectory clustering, space time information cluster etc., but The method is unsuitable for the complicated situation of object displacement, situation about can move if any object therein.3) it is based on The method of object motion, the method is typically first divided into, frame of video, the cluster that much may include object, then may at these Comprise to process segmentation in the cluster of object, the complexity of the method can be higher.
Although segmentation has been research problem for many years, due to sharply increasing of video data, to VS Automatically the demand of segmentation also increases therewith.And VS segmentation is inevitably in the face of background motion and change and prospect The uncertainty of object itself compound movement and deformation and difficulty.It is therefore desirable to offer is a kind of is applied to what domestic consumer used Low cost and the method for segmentation video obvious object convenient and that there is accuracy and practicality.
Content of the invention
In order to solve problem of the prior art, it is an object of the invention to provide one kind is based on " figure cuts (Graph cut) " Video obvious object dividing method.
In order to reach described purpose, present invention utilizes the outward appearance of object, motion, type objects and apersistence information Information structuring energy equation, decreases the interference of movement background, and utilizes image over-segmentation super-pixel and video space-time mistake Split super voxel to reduce the complexity of algorithm.
The video obvious object dividing method being cut according to super voxel figure proposed by the present invention, including step:
Step 1, splits to the obvious object in the first frame in video sequence, and this step further includes:Step 101, over-segmentation is carried out to this frame and obtains super-pixel;Step 102, to calculate static state significantly by the contrast and distribution of color characteristic Property figure;Step 1033, to calculate dynamic Saliency maps by the contrast of the magnitude of light stream with continuously;Step 104:Merge static aobvious Write figure and dynamic notable figure, obtain sound state notable figure;Step 105:Calculate the type objects of the first frame, calculate potentially each The ROI candidate region of individual object;Step 106, sound state notable figure and object ROI are merged, and filter unnecessary ROI area Domain;Step 107, with ROI region and sound state conspicuousness for weak constraint, constructs energy equation, is carried out with " figure cuts " of iteration Segmentation obtains the estimation of obvious object;Step 2, is carried out to the obvious object of each frame in addition to the first frame for the video sequence point Cut, this step further includes:Step 201:The estimation region of former frame is traveled to next frame as priori;Step 202:Right This frame uses step 101, and 102,103,105 are calculated various required middle level features values;Step 203:Calculate video when Empty over-segmentation, construction, with regard to outward appearance, moves, and the energy equation of type objects and continuation minimizes this energy with " figure cuts " Equation obtains obvious object segmentation.
Beneficial effects of the present invention:The present invention with utilizing the contrast of color and light stream based on image over-segmentation super-pixel And continuity respectively obtains static and dynamic notable figure, the use of super-pixel reduces the complexity of algorithm, and not only considers The method that Characteristic Contrast is also contemplated for being distributed also reduces the interference of some objects close with foreground color in background.Type objects Calculating further increased the foundation of segmentation, improve accuracy.And it is based on the super voxel of video space-time over-segmentation " figure cuts " method is used also to reduce further Space-time Complexity for unit, " figure cuts " is linear complexity in itself in addition, Such utilization makes the calculating cost of algorithm reduce, the equipment of practical costliness that need not be professional.Non-automatic with traditional The difference of VS dividing method be, the present invention, without the manual mark of professional, enables more high-quality simultaneously Obvious object segmentation in the video sequence of amount.
Brief description
The flow chart of the video obvious object dividing method that Fig. 1 is cut based on super voxel figure for the present invention;
Fig. 2A is the original image of frame of video single frames;
Fig. 2 B is the over-segmentation of frame of video single frames, i.e. the schematic diagram of super-pixel;
Fig. 3 is the schematic diagram of the static conspicuousness of frame of video;
Fig. 4 is the schematic diagram of the dynamic conspicuousness of frame of video;
Fig. 5 is the schematic diagram of the type objects of frame of video;
Fig. 6 is the schematic diagram of the Pixel-level type objects of frame of video;
Fig. 7 is the schematic diagram of the sound state conspicuousness of fusion of frame of video;
Fig. 8 is the significantly super voxel result schematic diagram of video;
The result schematic diagram that Fig. 9 is split for video obvious object;
Figure 10 is the result schematic diagram that dynamic and static notable figure merges, and is from left to right followed successively by frame of video artwork, dynamically notable Property figure, static Saliency maps, merge the sound state notable figure obtaining:
Figure 11 is segmentation result figure, and what Far Left lines were irised out is cut zone, is then sound state from left to right respectively Conspicuousness merges figure, dynamic notable figure, static notable figure, type objects figure.
Specific embodiment
The present invention will be described in detail below it is noted that described embodiment is intended merely to facilitate to this Bright understanding, and any restriction effect is not risen to it.
The present invention is to notable in video sequence based on sound state conspicuousness, type objects and continuation using " figure cuts " The method that object is split.The method is divided into two stages, the processing stage to first frame and the dividing processing to each frame.The One stage be to video head frame pretreatment obtain first frame obvious object region estimate because due to the limitation in first frame timing Property and its propagate on importance, therefore first frame has been carried out pretreatment to expect to reach more accurate result;Second-order Section be frame of video is processed one by one and obtains each frame obvious object segmentation, the step for be core procedure, wherein energy equation Design around object outward appearance, motion, type objects and continuation are intended to reduce background change and object self-deformation and fortune The dynamic impact waiting interference.
The method according to the invention, first passes through the obvious object region estimation that pretreatment obtains first frame, then using super Pixel calculates the static conspicuousness of each frame in video sequence, obtains static notable figure;Calculated in video sequence using super-pixel The light stream of two frames before and after every, calculates the dynamic conspicuousness of each frame, obtains dynamic notable figure;To static notable figure and dynamically notable Property carries out fusion treatment and obtains sound state notable figure;Calculate the type objects figure of each frame in video sequence;Calculate video sequence The super voxel of space-time over-segmentation, and the static saliency value of pixel scale, dynamic saliency value and type objects value are reflected respectively It is mapped in the space-time over-segmentation of video;Set up with regard to conspicuousness, the segmentation energy function of type objects and continuation, in space-time Over-segmentation rank to optimize this energy function to each frame of video using " figure cuts " and to carry out binary segmentation to each frame, obtains notable Foreground object.
Fig. 1 is the video obvious object dividing method being cut based on super voxel figure of the present invention.
According to the video obvious object dividing method of the present invention, comprise the steps of:
Step 1, first, carries out over-segmentation to each two field picture in video sequence using K-MEANS algorithm and obtains super picture Element.Super-pixel schematic diagram is as shown in Figure 2.
In this step, the lab color based on each two field picture and position coordinates x, the 5 dimension information of y, to having Similar color And more neighbouring pixel clustered, obtain the over-segmentation of single-frame images, wherein lab value refers to 3 dimensions of lab color space Degree, x, y are the transverse and longitudinal coordinate of pixel;The approximate over-segmentation of the color similarity obtaining and space.Fig. 2 is the schematic diagram of over-segmentation. Because overdivided region remains the effective information carrying out image segmentation further mostly, and typically will not destroy objects in images Boundary information, can directly come to carry out process to image in super-pixel to reduce calculating cost.
Step 2, calculates the static notable figure of frame of video and dynamic notable figure.
In this step, described static state notable figure and dynamic notable figure are required for first calculating the Saliency maps of center surrounding contrast And the Saliency maps that distribution is compact.Static notable figure first precalculated be the Saliency maps of color contrast and colour consistency divides The Saliency maps of cloth, and finally static notable figure is both fusions;Dynamic Saliency maps are also by calculating light stream value in the same manner Contrast Saliency maps and the notable figure of motion continuity of light stream value merge and obtain.
Static color contrast is calculated as follows:
Wherein, N is the total number of frame of video over-segmentation;CsjFor the static color contrast of j-th super-pixel, the value model of j Enclose and arrive N for 1;cjFor the Lab color average of j-th super-pixel, ckFor the Lab color average of k-th super-pixel, k takes from 1 To N;pjFor the position mean of all pixels in j-th super-pixel, the span of j is 1 to N, pkFor k-th super-pixel All pixels position mean, k gets N from 1;w(pj, pk) it is coefficient with regard to position relationship, often both could be arranged to Number 1 it can also be provided that with position relationship between super-pixel (distance) change weight, arrange herein this coefficient be Gauss power Weight||cj-ck| | for cjWith ckDifference, the bigger Cs of its differencejStatic color Contrast is bigger, and contrast is more big just to mean that this super-pixel is unique in terms of color.
The contrast equation of dynamic motion magnitude is as follows:
Wherein, CmjDynamic motion contrast for j-th super-pixel;pjIt is similarly all pixels in j-th super-pixel Position mean, the span of j is 1 to N, and N is the total number of frame of video over-segmentation, pkFor k-th super-pixel all The position mean of pixel, k gets N from 1;w(pj, pk) both could be arranged to constant 1 for coefficient it can also be provided that with framing bits Put the weight of relation (distance) and change, this coefficient is again arranged to Gauss weight herein;HfjLight for j-th super-pixel Flow stage histogram, the span of j arrives N, Hf for 1kFor the light stream magnitude histogram of k-th super-pixel, k gets N from 1, this Light stream magnitude histogram depth involved by civilian algorithm is 2, and that is, ground floor is the light stream magnitude histogram in abscissa direction, the Two layers be ordinate direction light stream magnitude histogram, it is motion size that the setting of such histogram not only allows for light stream magnitude Distribution, also consider the direction of motion to a certain extent simultaneously;D(Hfj, Hfk) it is light stream magnitude histogram HfjWith HfkCard Square distance, due to card side's distance span be 0 arrive just infinite, here thus make use of negative exponential function by 0 arrive just infinite Card side's distance mapping 0 to 1, in order to calculate, thus, light stream magnitude histogram HfjWith HfkCard side distance bigger, CmjDynamic State motion contrast is also bigger, and contrast means that more greatly this super-pixel is unique in terms of exercise intensity.
The computing formula of the static compact change degree of distribution is as follows:
Wherein, DsjFor the compact change degree of static distribution of j-th super-pixel, the change spatially of j-th super-pixel is got over Low, DsjThen more low this super-pixel i.e. is spatially more compact for value;w(cj, ck) be with regard to super-pixel between color similarity coefficient, It both can be set for constant 1 it is also possible to arranging it is the weight changing with super-pixel color similarity, this coefficient had been set herein For Gauss weightpkFor k-th super-pixel all pixels position average Value;N is the total number of frame of video over-segmentation;AndμcjRepresent and j-th super-pixel There is the position mean of the super-pixel of Similar color.
The computing formula of dynamic motion consecutive variations degree is as follows:
Wherein, DmjDynamic motion consecutive variations degree for j-th super-pixel;w(Hfj, Hfk) be with regard to super-pixel between amount of exercise The coefficient of level histogram similarity, HfjFor the light stream magnitude histogram of j-th super-pixel, HfkFor the light stream magnitude histogram of k-th super-pixel, D (Hfj, Hfk) it is light stream magnitude histogram HfjWith HfkBetween card side away from From light stream magnitude histogram HfjWith HfkMore dissimilar, w (Hfj, Hfk) value bigger;Andμmj Represent and have and HfjThe mean value of the position of the histogrammic over-segmentation of similar light stream magnitude, wherein pkInstitute for k-th super-pixel There is the mean value of location of pixels.
Static notable figure Ss is merged by static color contrast Cs and static distribution compactness Ds, and fusion formula is:
Wherein, SsjFor the static conspicuousness of j-th super-pixel, CsjFor the static color contrast of j-th super-pixel, DsjFor The compact change degree of static distribution of j-th super-pixel;CsjBigger and DsjLess, then SsjValue is bigger.
Dynamic notable figure Sm is merged by dynamic motion contrast Cm and dynamic motion continuation degree Dm, and fusion formula is:
Wherein, SmjFor the dynamic conspicuousness of j-th super-pixel, CmjFor the dynamic motion contrast of j-th super-pixel, DmjFor The dynamic motion consecutive variations degree of j-th super-pixel;CmjBigger and DmjLess, then SmjValue is bigger.
The schematic diagram of the static conspicuousness of frame of video as shown in figure 3, but the schematic diagram of the dynamic conspicuousness of frame of video such as Shown in Fig. 4.
Step 3, the fusion of the static notable figure of execution and dynamic notable figure.
The strategy taken in this step is that static Saliency maps Ss is complemented one another with dynamic Saliency maps Sm, due to the mankind Notice is easier passive movement and is attracted, therefore the region with very high motion conspicuousness retains, and shows without very high motion The region of work property is likely to the noise that optical flow algorithm or background motion bring, and they need to be combined to examine with static notable figure Amount, fusion formula is as follows:
Wherein, SaljDynamic and static conspicuousness for j-th super-pixel merges obtained sound state saliency value, SsjFor jth The static significance value of individual super-pixel, SmjDynamic significance value for j-th super-pixel.And Ts is to arrange to obtain very high threshold value, Why herein Ts is carried out with the setting of very high threshold value, Ts is set to 0.8 herein, be that to consider motion preferential former first Then, retain the region that those have high motion conspicuousness;Secondly, it is in order that those have ambiguous motion conspicuousness The region of value can obtain the correction of static conspicuousness, reduces the impact that light flow noise and camera lens movement are brought;? Afterwards, the impact increasing motion conspicuousness in the case of motion conspicuousness very little makes the right of the obvious object in its suppression background The interference of prospect obvious object.
Fig. 7 is that the static conspicuousness of the dynamic notable figure of frame of video merges the sound state schematic diagram obtaining.
Step 4, calculates the type objects of frame of video.
In this step, the result of calculation of first frame type objects can be slightly different, except each frame obtaining of will calculating The type objects figure of Pixel-level, the ROI region to similar object candidate that video sequence also wants to, input here is except bag Include and obtain color contrast before this and super-pixel also inputs the boundary information obtaining using the detection of Canny operator.These three inputs are all Closely bound up with object, wherein color contrast represents the contrast of foreground object color and background;And each mistake of super-pixel Segmentation all represents the color that maintain boundary information with controlling region, and therefore one over-segmentation belongs to the possibility pole of same object Greatly;In addition, border is similarly the important attribute of object.Then obtained using the type objects detector based on Bayesian model Final candidate's ROI region Ro and its type objects value O that potentially include object, and the probability that intermediate result obtains then output pixel The type objects figure of level.
Fig. 5 is the ROI schematic diagram of the type objects of frame of video, and Fig. 6 is the type objects of pixel scale of frame of video Schematic diagram.
Step 5, the screening of type objects candidate's ROI region.
In this step, first, sound state notable figure be processed, the region with 0.5 as threshold value, more than or equal to 0.5 Retain, other give up, obtain conspicuousness be more than 0.5 notable figure Rh, for convenience after operation, need the threshold obtaining Value notable figure binaryzation.The present invention uses, using unrestrained water filling algorithm, image link field is filled to 1, after by remaining area Domain is set to 0.After obtaining the threshold value notable figure of binaryzation, it is carried out morphologic open operation, that is, at expansion after first burn into Reason, to remove the less bright areas of area, reduces the interference of noise.
Thereafter, for connected region RS, matching covers their ROI region:By horizontal and vertical scanning, the company of finding Logical region RSUltra-left point, rightest point, peak and minimum point, ((xl, yl), (xr, yr), (xu, yu), (xd, yd)), wherein xl, ylFor the transverse and longitudinal coordinate of high order end point, xr, yrFor the transverse and longitudinal coordinate of low order end point, xu, yuTransverse and longitudinal for the point of the top is sat Mark, xd, ydTransverse and longitudinal coordinate for the point of bottom.And its ROI region R of covering of matchingS4 apex coordinates (counterclockwise side To) it is ((xl-0.05(xr-xl), yu), (xl-0.05(xr-xl), yd), (xr+0.05(xr-xl), yd), (xr+0.05(xr-xl), yu)), all widen 5% in left and right here, also increased 5% up and down, wherein xl-0.05(xr-xl), yuROI square for matching The transverse and longitudinal coordinate of shape region upper left end points, xl-0.05(xr-xl), ydFor the transverse and longitudinal coordinate of rectangle lower-left end points, xr+0.05(xr- xl), ydFor the transverse and longitudinal coordinate of rectangle bottom right end points, xr+0.05(xr-xl), ydFor the transverse and longitudinal coordinate of upper right end points,.
Be made afterwards be exactly to include object candidate's ROI region Ro preliminary screening, first to calculate each may The region area intersecting with Rs including candidate's ROI region Ro of object, and calculate its contrast with itself area, this ratio should be big In threshold value To;Except considering RojCandidate region to saliency value and marking area occur simultaneously, and also a screening criteria is exactly, and wish Hope that it can surround marking area as far as possible, and require here to calculate candidate RojThe area ratio of the region area intersecting with Rs and Rs More than threshold value Ts, it is shown below:
R={ Roj|area(Roj∩Rs)÷area(Roj) > To ∧ area (Roj∩ Rs) ÷ area (Rs) > Ts } (8)
Wherein, the Ro that R is filtered out by above formulaiThe set in region, RoiRepresent i-th candidate ROI region area (Roi∩ Rs) represent candidate's ROI region RoiThe size in the region intersected with marking area Rs, area (Roi) represent candidate's ROI region RoiSize, area (Rs) represents the size of candidate marking area Rs, To and Ta be all threshold value;The sieve of this step Choosing is primarily to exclude some substantially non-compliant candidate regions, to reduce the calculating of the finer screening of next step Amount.
Finally, to each of the R after screening candidate's ROI region, calculate and complete the super-pixel set In in its region Saliency value distribution histogram Hin, and calculate round the super-pixel collection that In is outside its region or a part is outside its region Close the saliency value Sal distribution histogram Hsu of Su, and calculate the super-pixel collection of the outmost turns that are in In set adjacent with Su set Close the saliency value Sal distribution histogram Hbu of Bu;Calculate the contrast of Hin and Hsu afterwards, and the contrast of Hsu and Hbu, due to ROI Interior super-pixel is bigger with the significance value difference in distribution bigger explanation possibility in its region for the object around super-pixel, and interior Circle agrees with better with its region of the bigger explanation of saliency value difference in distribution around super-pixel with object boundary.Finally, this calculation The ROI region selecting to have corresponding to maximum differential value Diff is final candidate's ROI region by method, the computing formula of Diff value It is shown below:
Diffj=(1-e- D (Hsu, Hin))+α(1-e- D (Hsu, Hbu))2(9)
Wherein, DiffiRepresent the difference value of i-th candidate's ROI region;HiniRepresent super in i-th candidate's ROI region The significance value distribution histogram of pixel set In;HsuiRepresent round the super-pixel set In in i-th candidate's ROI region Super-pixel set Su significance value distribution histogram, these super-pixel are outer or a part is in area in i-th candidate's ROI region Overseas;HbuiRepresent in i-th candidate's ROI region and gather direct neighbor with Su, that is, be in the super of " outmost turns " that In gathers The significance value distribution histogram of pixel set Bu.Because the scope of card side's distance is 0 to arrive just infinite, and 1-e- D (Hsu, Hin)And 1- e- D (Hsu, Hbu)Scope all 0 to 1, because the important ratio that body form irregular rectangle, therefore border agree with is relatively low, So having carried out square processing and being multiplied by the factor alpha less than 1 to Section 2 contrast.Choose the Diff value in R with maximum ROI candidate region is the final ROI region estimated.
Step 6, the obvious object segmentation work of first frame.
In order to split the obvious object obtaining first frame, to do is to below build energy equation:
E (X)=A (X)+O (X)+AC (X)+OC (X) (10)
Wherein, E (X) is the energy equation in units of super-pixel, and X is super-pixel set, and A (X) is object outward appearance (appearance) unitary item, O (X) is type objects (objectness) unitary item, and AC (X) is color binary item, and OC (X) is Type objects binary item.
A (X) is the unitary item with regard to object outward appearance (appearance), and first first frame will be clustered with two RGB color height This mixed model (GMM), one of GMM be fusion obtained in the previous step sound state conspicuousness be more than 0.5 region RhAsk Its gauss hybrid models FG, GMM are background models BG for remaining region.Because GMM can calculate according to data generally Rate density, you can to do density estimation (density estimation), the therefore effect of GMM here is to extrapolate to give super picture Element becomes the size of the probability of a prospect or background.If an over-segmentation and prospect are mated very much, but it is marked as background (mark Background is 0, and prospect is 1) i.e. if 0, its penalty value will be very big:
Wherein, A (X) is object outward appearance unitary item,It is to super-pixel xiMark (mark 0 be background, 1 be prospect),For potential-energy function (potential functions), p (xi∈ FG), p (xi∈ BG) it is respectively super-pixel xiBelong to The probability of prospect FG, and its belong to the probability of background BG.
And O (X) is the unitary item with regard to type objects (objectness), regarding outside the ROI that finally gives depending on previous step For background, and for possible object in ROI, in the same manner, calculate the GMM model of objectness, this type objects for it (objectness) design of unitary item is similar with outward appearance (appearance) unitary item:
Wherein, O (X) is object outward appearance unitary item,It is to super-pixel xiMark (mark 0 be background, 1 be prospect),For potential-energy function (potential functions), p (xi∈ OBJ), p (xi∈ OBG) it is respectively super-pixel xi genus In the probability of possible object OBJ, and its probability belonging to background OBG outside object.
And the setting of binary item is then the relation of concern over-segmentation and over-segmentation, it is not connect mutually between neighborhood over-segmentation Continuous cost and punishment, if two neighborhood over-segmentation difference very littles, then what it belonged to same target or same background can Energy property is just very big, if their difference is very big, that illustrates the edge likely in target and background for this two over-segmentations Part, then the possibility ratio being partitioned from is larger, so when two neighborhood over-segmentation difference are bigger, energy is less.
It is binary item AC (X) of the discontinuous punishment of concern appearance color first, and its more big this discontinuous difference of distance The impact bringing can weaken, and its formula is as follows:
Wherein, AC (X) is color binary item,For KijCoefficient,For 1, It is the Euclidean distance between two super-pixel midpoints for 0, dist, dcor is the difference of the color average of over-segmentation, γ It is coefficient with β.
In the same manner, binary item OC (X) of the discontinuous punishment of concern type objects is similar with color binary item, needs first here Use the Pixel-level type objects figure that the 4th step is calculated, then the type objects value of Pixel-level is mapped one by one according to position To in over-segmentation;Binary item OC (X) formula is as follows:
Wherein, OC (X) is type objects binary item,For KijCoefficient,For 1, It is the Euclidean distance between two super-pixel midpoints for 0, dist, dobj is the difference of the type objects value of over-segmentation, γ and β For coefficient.
After establishing this energy equation, t-link (connection of node and terminal node) and n-link are (between node Connection) all establish, there has been the figure needed for " figure cuts ", it is possible to use " figure cuts " minimizes energy equation being split ?.It is used herein the thought of the iteration similar to Grab cut, each iterative process all makes to target and background modeling The parameter of GMM is more excellent so that image segmentation is more excellent.Thus, finally can get the obvious object segmentation of first frame.
Step 7, will be undue for space-time that the static conspicuousness of each frame, dynamic conspicuousness and type objects are mapped to video Cut:
This step carries out the space-time over-segmentation of video initially with supervoxel method to video, obtains Supervoxel, i.e. super voxel;Then static conspicuousness will be obtained through the 1st step and the 2nd step, dynamic conspicuousness, with and pixel The type objects of level are mapped to supervoxel over-segmentation one by one according to position, and calculate being wrapped of each supervoxel respectively The static conspicuousness of all pixels including, the mean value of dynamic conspicuousness and type objects is as the static state of this supervoxel Significance value, dynamic conspicuousness and type objects value.
Fig. 8 is the significantly super voxel over-segmentation result schematic diagram of video.
Step 8, the obvious object segmentation of each frame.
In order to split the significant object obtaining each frame after first frame, below we are to be done is still structure energy side Journey, but the energy equation of energy equation here and first frame is slightly different, its equation is as follows:
EF (V)=AF (V)+ACF (V)+OCF (V)+PCF (V) (21)
Wherein, EF (V) is the energy equation in units of supervoxel, and V gathers (i.e. super voxel collection for supervoxel Close), AF (V) is object outward appearance (appearance) unitary item, and ACF (V) is color binary item, and OCF (V) is type objects binary , PCF (V) is continuation binary item.
Wherein AF (V) is still the unitary item with regard to object outward appearance (appearance), A (X) in its definition and formula (10) Define similar.The motion assuming initially that the obvious object between two frames is smooth and gentle, here with dynamically showing calculating The light stream obtaining during work property, the obvious object obtaining that former frame is split calculates notable thing using the direction of light stream and speed The displacement of each pixel in body region, and calculate its position in next frame.In order to accelerate the speed of algorithm, the node of figure Unit is that the space-time of video clusters super voxel (supervoxel) rather than in Pixel-level operations, just includes all here The set of the super voxel of space-time over-segmentation of the pixel that previous frame is propagated is as possible prospect obvious object, remaining region For background.Two RGB color gauss hybrid models (GMM) are clustered respectively to this two regions, sets up foreground model FG and background Model B G.Its formula is as follows:
Wherein, AF (X) is object outward appearance unitary item,It is super voxel v to space-time over-segmentationiMark (mark 0 is Background, 1 is prospect),For potential-energy function, p (vi∈ FG), p (vi∈ BG) it is respectively viBelong to prospect FG probability and It belongs to the probability of background BG.
And the setting of binary item ACF (V) is almost consistent with AC (X) setting in formula (10), it is not both the node of simply figure not It is super-pixel (superpixel) again but super voxel (supervoxel), wherein dcor represents the super voxel of space-time over-segmentation The color average of all pixels difference, its formula is as follows:
Wherein, ACF (V) appearance color binary item,For KijCoefficient,For 1,It is the Euclidean distance between space-time over-segmentation i.e. super voxel midpoint for 0, dist, dcor is two space-time mistakes Split the difference of the color average of super voxel, γ and β is coefficient.
In the same manner, the setting of binary item OCF (V) is almost consistent with OC (X) setting in formula (10), here firstly the need of using the The type objects value of Pixel-level is then mapped to over-segmentation according to position by Pixel-level type objects figure that 4 steps are calculated one by one On, wherein dobj is the difference of the average type objects value of the super voxel in field, and its formula is as follows:
Wherein, OCF (V) is type objects binary item,For KijCoefficient,For 1,It is the Euclidean distance between space-time over-segmentation midpoint for 0, dist, dobj is the type objects value of super voxel Difference, γ and β is coefficient.
As it is assumed that the obvious object in video is gentle in the fluent motion of interframe, that is, it has continuation (Persistence), thus devise binary item PCF (V) with regard to continuation, pay close attention to the interframe of sequential over-segmentation continuous Property.If by the over-segmentation mark of the corresponding previous frame of super voxel very high with previous frame continuity and that outward appearance is much like not With the punishment that it is subject to is larger;Conversely, the super voxel that if two interframe continuities are very high in front and back and outward appearance is much like obtained identical Mark, then the punishment that it is subject to is less.The continuity degree of two super voxels presses optical flow computation by pixel in the super voxel of previous frame The sum of the number of pixels being displaced in the super voxel of next frame is divided by the ratio that obtains of sum of all pixels of this super voxel (the former) Represent, this is represented by pers than row.Its formula is as follows:
Kij=γ pers (vi, v 'j)exp(-βdcor(vi, v 'j)2) (30)
Wherein, PCF (V) is continuation binary item, and in following formula, v represents the super voxel of present frame, and v ' represents present frame Super voxel in former frame,For KijCoefficient,For 1,It is two super bodies for 0, dcor The difference of the color average of element, γ and β is coefficient, pers (vi, v 'j) continuity degree of two super voxels of frame before and after calculating, Space-time over-segmentation v ' by previous framejMiddle pixel is displaced to next frame over-segmentation v by optical flow computationiIn sum of all pixels divided by v 'jIn The ratio that obtains of sum of all pixels represent.
After establishing this energy equation, t-link (connection of node and terminal node) and n-link are (between node Connection) all establish, there has been the figure needed for " figure cuts ", it is possible to use " figure cuts " minimizes energy equation being split ?.It is used herein the thought of the iteration similar to Grab cut, each iterative process all makes to target and background modeling The parameter of GMM is more excellent so that image segmentation is more excellent.Thus, can get the obvious object segmentation of each frame.Fig. 9 is in frame of video The figure of obvious object cut result schematic diagram.
Particular embodiments described above, has carried out detailed further to the purpose of the present invention, technical scheme and beneficial effect Describing in detail bright it should be understood that the foregoing is only the specific embodiment of the present invention, being not limited to the present invention, all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement done etc., should be included in the protection of the present invention Within the scope of.

Claims (10)

1. the video obvious object dividing method that a kind of super voxel figure cuts, the method comprises the following steps:
Step 1, splits to the obvious object in the first frame in video sequence, and this step further includes:
Step 101, carries out over-segmentation to this frame and obtains super-pixel;Step 102, to be calculated by the contrast and distribution of color characteristic Static Saliency maps;Step 103, to calculate dynamic Saliency maps by the contrast of the magnitude of light stream with continuously;Step 104, melts Close static notable figure and dynamic notable figure, obtain sound state notable figure;Step 105, calculates the type objects of the first frame, calculates The ROI candidate region of each object potential;Step 106, sound state notable figure and object ROI is merged, filtration need not The ROI region wanted;Step 107, with ROI region and sound state conspicuousness for weak constraint, constructs energy equation, with iteration " figure cuts " carries out splitting the estimation obtaining obvious object;
Step 2, splits to the obvious object of each frame in addition to the first frame for the video sequence, this step further includes:
Step 201, the estimation region of former frame is traveled to next frame as priori;Step 202:Step 101 is used to this frame, 102,103,104,105 are calculated various required middle level features values;Step 203, calculates the space-time over-segmentation of video, construction With regard to the energy equation of outward appearance, motion, type objects and continuation, minimize this energy equation with " figure cuts " and obtain notable thing Body is split.
2. the method for claim 1 is it is characterised in that described step 101 further includes:Based on each two field picture Lab color and position x, the information of y clusters to having a Similar color and more neighbouring pixel, obtains single-frame images Over-segmentation, i.e. super-pixel, wherein lab value refer to 3 dimensions of lab color space, and x, y are the transverse and longitudinal coordinate of pixel.
3. it is characterised in that step 102,103 further include the method for claim 1:Described static state notable figure and Dynamic notable figure all needs first to calculate the Saliency maps of center surrounding contrast and is distributed compact Saliency maps, and static notable figure is first Precalculated is the Saliency maps of color contrast and the Saliency maps of colour consistency distribution, and finally static notable figure is for both Fusion;Dynamic Saliency maps are also aobvious by the motion continuity contrasting Saliency maps and light stream value calculating light stream value The fusion of work figure obtains.
4. the method for claim 1 is it is characterised in that step 104 further includes:Analyze dynamic notable figure and static state The respective advantage of notable figure and deficiency, adopt threshold value to control to merge static notable figure and dynamic notable figure with piecewise function, obtain To sound state notable figure.
5. the method for claim 1 is it is characterised in that step 105 further includes:Using the inspection of type objects detector Survey the ROI region whether this frame is object.
6. the method for claim 1 is it is characterised in that step 106 further includes:Using ROI region to sound state The level of coverage of marking area, to filter out some type objects ROI candidate, screens the ROI region that may comprise obvious object.
7. the method for claim 1 is it is characterised in that in step 107, setting up and show with regard to ROI region and sound state Write the energy equation of figure, optimizing using " figure cuts " of iteration makes this energy equation minimum, minimizes segmentation cost.
8. the method for claim 1 is it is characterised in that in step 201, the obvious object cut zone that former frame obtains Displacement will be estimated based on the direction of motion of light stream and magnitude, travel to next frame.
9. the method for claim 1 is it is characterised in that in step 202, based on conspicuousness, color contrast, rim detection Information, calculates the type objects figure of pixel scale.
10. the method for claim 1 is it is characterised in that in step 203, construction energy equation further includes to be based on The prior estimate construction continuation binary item of the former frame that step 202 is propagated, based on sound state Saliency maps constructed object outward appearance Unitary item, based on appearance color construction with regard to color successional binary item, constructs the binary with regard to object based on type objects ?;Last still optimization using " figure cuts " of iteration makes this energy equation minimum, minimizes segmentation punishment, thus obtain binary dividing Cut.
CN201410366737.5A 2014-07-29 2014-07-29 Video salient object segmentation method based on super voxel graph cut Active CN104134217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410366737.5A CN104134217B (en) 2014-07-29 2014-07-29 Video salient object segmentation method based on super voxel graph cut

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410366737.5A CN104134217B (en) 2014-07-29 2014-07-29 Video salient object segmentation method based on super voxel graph cut

Publications (2)

Publication Number Publication Date
CN104134217A CN104134217A (en) 2014-11-05
CN104134217B true CN104134217B (en) 2017-02-15

Family

ID=51806886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410366737.5A Active CN104134217B (en) 2014-07-29 2014-07-29 Video salient object segmentation method based on super voxel graph cut

Country Status (1)

Country Link
CN (1) CN104134217B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809651B (en) * 2014-12-16 2019-02-22 吉林大学 Image significance detection method based on the comparison of edge non-similarity
CN105069774B (en) * 2015-06-30 2017-11-10 长安大学 The Target Segmentation method of optimization is cut based on multi-instance learning and figure
CN106611427B (en) * 2015-10-21 2019-11-15 中国人民解放军理工大学 Saliency detection method based on candidate region fusion
CN105590100B (en) * 2015-12-23 2018-11-13 北京工业大学 Surpass the human motion recognition method of voxel based on identification
CN107154052B (en) * 2016-03-03 2020-08-04 株式会社理光 Object state estimation method and device
CN105913456B (en) * 2016-04-12 2019-03-26 西安电子科技大学 Saliency detection method based on region segmentation
CN105931244B (en) * 2016-04-29 2019-01-22 中科院成都信息技术股份有限公司 The unsupervised stingy drawing method of one kind and device
CN106372636A (en) * 2016-08-25 2017-02-01 上海交通大学 HOG-TOP-based video significance detection method
CN106778634B (en) * 2016-12-19 2020-07-14 江苏慧眼数据科技股份有限公司 Salient human body region detection method based on region fusion
CN107016675A (en) * 2017-03-07 2017-08-04 南京信息工程大学 A kind of unsupervised methods of video segmentation learnt based on non local space-time characteristic
CN107133558B (en) * 2017-03-13 2020-10-20 北京航空航天大学 Infrared pedestrian significance detection method based on probability propagation
CN107194948B (en) * 2017-04-17 2021-08-10 上海大学 Video significance detection method based on integrated prediction and time-space domain propagation
CN107038704B (en) * 2017-05-04 2020-11-06 季鑫 Retina image exudation area segmentation method and device and computing equipment
CN107564022B (en) * 2017-07-13 2019-08-13 西安电子科技大学 Saliency detection method based on Bayesian Fusion
CN108229290B (en) 2017-07-26 2021-03-02 北京市商汤科技开发有限公司 Video object segmentation method and device, electronic equipment and storage medium
CN109035293B (en) * 2018-05-22 2022-07-15 安徽大学 Method suitable for segmenting remarkable human body example in video image
CN109191485B (en) * 2018-08-29 2020-05-22 西安交通大学 Multi-video target collaborative segmentation method based on multilayer hypergraph model
CN109509194B (en) * 2018-11-23 2023-04-28 上海师范大学 Front human body image segmentation method and device under complex background
CN109785327A (en) * 2019-01-18 2019-05-21 中山大学 The video moving object dividing method of the apparent information of fusion and motion information
CN110347870A (en) * 2019-06-19 2019-10-18 西安理工大学 The video frequency abstract generation method of view-based access control model conspicuousness detection and hierarchical clustering method
CN110390293B (en) * 2019-07-18 2023-04-25 南京信息工程大学 Video object segmentation algorithm based on high-order energy constraint
CN111182307A (en) * 2019-12-27 2020-05-19 广东德融汇科技有限公司 Ultralow code stream lossless compression method based on video images for K12 education stage
CN112884302B (en) * 2021-02-01 2024-01-30 杭州市电力设计院有限公司 Electric power material management method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637253A (en) * 2011-12-30 2012-08-15 清华大学 Video foreground object extracting method based on visual saliency and superpixel division
CN103632153A (en) * 2013-12-05 2014-03-12 宁波大学 Region-based image saliency map extracting method
CN103745468A (en) * 2014-01-07 2014-04-23 上海交通大学 Significant object detecting method based on graph structure and boundary apriority

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779338B (en) * 2011-05-13 2017-05-17 欧姆龙株式会社 Image processing method and image processing device
US8989437B2 (en) * 2011-05-16 2015-03-24 Microsoft Corporation Salient object detection by composition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637253A (en) * 2011-12-30 2012-08-15 清华大学 Video foreground object extracting method based on visual saliency and superpixel division
CN103632153A (en) * 2013-12-05 2014-03-12 宁波大学 Region-based image saliency map extracting method
CN103745468A (en) * 2014-01-07 2014-04-23 上海交通大学 Significant object detecting method based on graph structure and boundary apriority

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合颜色与运动信息的视频显著性滤波器;罗雷等;《华中科技大学学报(自然科学版)》;20140228;第42卷(第2期);全文 *

Also Published As

Publication number Publication date
CN104134217A (en) 2014-11-05

Similar Documents

Publication Publication Date Title
CN104134217B (en) Video salient object segmentation method based on super voxel graph cut
Boulch et al. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks
Zhang et al. Semantic segmentation of urban scenes using dense depth maps
CN108257139B (en) RGB-D three-dimensional object detection method based on deep learning
Lei et al. Region-tree based stereo using dynamic programming optimization
US8798965B2 (en) Generating three-dimensional models from images
Li et al. Optimal seamline detection for multiple image mosaicking via graph cuts
CN107292234A (en) It is a kind of that method of estimation is laid out based on information edge and the indoor scene of multi-modal feature
Garcia-Dorado et al. Automatic urban modeling using volumetric reconstruction with surface graph cuts
Tian et al. Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection
CN104463859A (en) Real-time video stitching method based on specified tracking points
CN104715451A (en) Seamless image fusion method based on consistent optimization of color and transparency
CN105931180A (en) Salient information guided image irregular mosaic splicing method
Maltezos et al. Automatic detection of building points from LiDAR and dense image matching point clouds
Liu et al. Layered interpretation of street view images
Li et al. Seamline network generation based on foreground segmentation for orthoimage mosaicking
Du et al. ResDLPS-Net: Joint residual-dense optimization for large-scale point cloud semantic segmentation
CN108388901A (en) Collaboration well-marked target detection method based on space-semanteme channel
CN113378756B (en) Three-dimensional human body semantic segmentation method, terminal device and storage medium
Tosteberg Semantic segmentation of point clouds using deep learning
Bricola et al. Morphological processing of stereoscopic image superimpositions for disparity map estimation
Pfeiffer The stixel world
CN103733207A (en) Method of image segmentation
Hoiem Seeing the world behind the image
Laupheimer et al. Juggling with representations: On the information transfer between imagery, point clouds, and meshes for multi-modal semantics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant