CN107590818A

CN107590818A - A kind of interactive video dividing method

Info

Publication number: CN107590818A
Application number: CN201710794283.5A
Authority: CN
Inventors: 韩守东; 杨迎春; 刘昱均; 陈阳; 胡卓
Original assignee: Huazhong University of Science and Technology; Shenzhen Huazhong University of Science and Technology Research Institute
Current assignee: Huazhong University of Science and Technology; Shenzhen Huazhong University of Science and Technology Research Institute
Priority date: 2017-09-06
Filing date: 2017-09-06
Publication date: 2018-01-16
Anticipated expiration: 2037-09-06
Also published as: CN107590818B

Abstract

The invention discloses a kind of interactive video dividing method, first, target profile curve is carried out to estimate, obtain target object and estimate initial profile in present frame, and on the basis of present frame target estimates contour line, the beeline for showing that each pixel estimates profile to target, the position attribution as the pixel are mapped by distance.On the basis of each pixel of present frame is except three-dimensional color attribute, the position attribution of reflection space-time restriction is added, i.e., each pixel distance estimates the distance value of objective contour, expands to higher dimensional space.Each attribute of higher dimensional space is first divided into multiple histogram Bin in advance when establishing graph structure, then the smooth item of interframe is added in the data item being calculated by global probabilistic model, data item as energy function model, finally energy function solution to model is obtained using max-flow min-cut algorithm, the movable information of target has successfully been incorporated, has added the space-time expending of Video segmentation.

Description

A kind of interactive video dividing method

Technical field

The invention belongs to the video dividing technique field in image procossing and machine vision, is handed over more particularly, to one kind Mutual formula methods of video segmentation.

Background technology

Video segmentation is a kind of Closing Binary Marker problem, it is intended to using the video or image sequence as an entirety, passes through one Fixed method is partitioned into the object with practical significance.Video segmentation play the role of in many fields it is important, such as： In target identification, Video segmentation can provide prior information for target identification；In Image Coding, Video segmentation can be improved and regarded The efficiency of frequency compressed encoding.Usually, according to whether adding man-machine interactively, Video segmentation can be divided into non-interactive type Video segmentation Split two kinds with interactive video.Non-interactive type Video segmentation mainly uses the motion feature of object video, such as based on light stream or Non-interactive type methods of video segmentation based on gradient descent method.To moving object applicability in video be present in such method Preferably, if target to be split is fixed either motion slowly or alternately moved, such method is due to that can not pass through motion Feature predicts the Probability Area of target object, so as to being unable to reach the purpose of segmentation.And interactive video dividing method passes through The irregular motion of above-mentioned target can preferably be solved the problems, such as by adding man-machine interactively.

Video shows as the space-time expending of Video segmentation result, includes the time as a space-time entirety, space-time restriction Continuity and spatial continuity, time continuity show as motion of the target in front and rear two frame, are that segmentation result effectively transmits Important leverage；Spatial continuity is used in image segmentation earliest, and its form of expression is adjacent pixel or adjacent area Similarity, smooth item (N-link) is commonly known as in energy function, is to ensure target object globality in segmentation result Necessary condition.Video segmentation is segmented in the extension on time dimension as image, and therefore, space-time expending is for Video segmentation result Transmission it is most important.

Space-time expending is the important indicator for judging a Video segmentation result transitivity, is various based on motion analysis Methods of video segmentation uses most attributes.Space-time expending includes time continuity and spatial continuity, and time continuity leads to Often reflect the motion feature of target, the spatial continuity key reaction shape information of target.Regarded using time and space continuity Frequency division segmentation method is a lot, and some methods first carry out super-pixel pre-segmentation, then by calculating the super-pixel similarity of adjacent two frame To distribute interframe smooth item, when calculating two super-pixel centre distance using only the space length of two super-pixel, by the time Continuity is that movable information is described with a display model.Based on the locus of points tracking methods of video segmentation using dense optical flow with The locus of points of track Long time scale, and cluster by the locus of points to obtain the space-time expending of target.Also some methods of video segmentation When considering space-time expending, generally time dimension and Spatial Dimension are treated on an equal basis, i.e., an adjacent unit of time dimension and An adjacent location equivalence of space dimension, but time and space are discrepant, adjacent two frames same positions in actual conditions Pixel conventional video segmentation in be temporally adjacent, the distance for calculating time dimension is exactly a unit, then calculate work as The Euclidean distance for time and space is just briefly described in the time-space matrix of the neighborhood territory pixel of previous frame in previous frame pixel.At present, base Methods of video segmentation in bilateral space regards video to be split as an entirety, also has pixel color comprising time and space altogether Six attributes, by being mapped to sextuple bilateral space to each dimension linear interpolation, then to bilateral space using traditional Figure segmentation method solves, and obtains the label of bilateral space interior joint, it is every to obtain video to be split finally by linear interpolation inverse mapping The preceding background probability of one frame pixel.Current most of methods of video segmentation are all based on graph theory, have many methods directly will figure Cut model and be generalized to Video segmentation from image segmentation, by light stream or other tracking modes on the basis of original spatial continuity Add time continuity.Traditional figure cuts model and generally only considers some neighbouring pixels when similitude between considering pixel Point, this kind of method do not consider the contact between pixel similar in a wide range of color well.

The content of the invention

For the disadvantages described above or Improvement requirement of prior art, the invention provides a kind of interactive video dividing method, Thus it is not strong to solve segmentation result accuracy present in existing interactive video cutting techniques, space-time expending it is inconsistent and The technical problems such as interactive quantity is excessive.

To achieve the above object, the invention provides a kind of interactive video dividing method, including：

(1) according to the segmentation result of previous frame image, contour line of the target in previous frame image is obtained；

(2) contour line of the target in previous frame image is mapped to current frame image, and to each picture on contour line Vegetarian refreshments all matches position of the pixel in current frame image, obtains target and estimates initial profile in current frame image Line；

(3) initial profile line is estimated in current frame image based on target, each pixel is drawn by distance mapping To the beeline for estimating initial profile line, the position attribution as the pixel；

(4) each pixel in current frame image is transformed into YUV color spaces by RGB color, and in current frame image In each pixel YUV color attributes on the basis of, the position attribution of each pixel is added, by the feature of each pixel attribute Dimension expands to higher dimensional space；

(5) the smooth item of current frame pixel point to previous frame neighborhood territory pixel point is converted according to the mark of previous frame pixel For data item, the data item after conversion is added in the data item being calculated by global probabilistic model, after superposition Data item of the data item as energy function model, obtain energy function model；

(6) solve energy function model and obtain energy function solution to model, and using current frame image as previous frame image, Step (1)~step (5) is continued executing with until Video segmentation terminates.

Preferably, step (4) specifically includes：

(4.1) byBy each pixel in current frame image by RGB color transforms to YUV color spaces, wherein, [c_y c_u c_v]^TRepresent the pixel value in YUV color spaces, [c_r c_g c_b]^TRepresent the value in RGB color；

(4.2) by b (x)=[c_y,c_u,c_v,l]^TThe intrinsic dimensionality of each pixel attribute is expanded into higher dimensional space, wherein, L represents pixel x position attribution, and b (x) represents higher dimensional space attribute corresponding to pixel x.

Preferably, in step (5), the smooth item by current frame pixel point to previous frame neighborhood territory pixel point is according to upper The mark conversion of one frame pixel is data item, including：

ByBy current frame pixel point x to previous frame neighborhood territory pixel point smooth item according to upper The mark conversion of one frame pixel is data itemWherein, the light stream value of each pixel of y ' expressions present frame is to present frame Previous frame pixel corresponding to pixel x,For neighborhood territory pixel set in y ' frame, ω_xy′Represent pixel x and previous frame field Pixel y ' similarity, | s_y′| represent pixel y ' mark value.

Preferably, ω_xy′Calculation be：Wherein, | | x-y'| | represent pixel x With previous frame field pixel y ' time-space matrix, Δ I represents pixel x and previous frame field pixel y ' color distance, and σ represents figure The gradient mean value of picture, and(x_x,x_y) represent pixel x transverse and longitudinal coordinate, (y_x',y_y') Represent pixel y ' transverse and longitudinal coordinate.

Preferably, the energy function model obtained in step (5) is expressed as：Wherein, S represents foreground pixel set,Represent background pixel Set, and By the data item for converting to obtain to the smooth item N-link of interframe, D (x) is represented by global general The data item that rate model is calculated, θ^sExpression prospect statistics with histogram,The statistics with histogram of background is represented, τ is fore/background The weight of distributional difference, η represent the weight of smooth item in frame,The similarity of adjacent pixel in present frame is represented,Represent in color histogram, preceding context similarity difference.

Preferably,Calculation be：Wherein, s_xAnd s_yRepresent respectively pixel x, Y mark, N represent the set of adjacent pixel pair in present frame, ω_xyThe similarity of pixel x, y is represented, and‖ x-y | | the time-space matrix of pixel x, y is represented, Δ I represents the color distance of pixel x, y, σ For the gradient mean value of image.

Preferably, the differentiation mode of foreground pixel and background pixel is：

Obtain target in current frame image estimate initial profile line after, by By pixel according to whether estimating in initial profile line, it is divided into foreground seeds point and background seed point, wherein, M represents mapping Mask code matrix afterwards, d (M (x)) represent the distance map generated by mask code matrix, and dis is distance threshold, and M (x) represents pixel x's Mask value, Seeds (x) have three kinds of values, when Seeds (x) values are 1, x are arranged into foreground seeds point, as Seeds (x) When value is 0, x is arranged to background seed point, when Seeds (x) values are -1, x is arranged to unknown regions.

In general, by the contemplated above technical scheme of the present invention compared with prior art, it can obtain down and show Beneficial effect：

1st, the present invention is additionally added reflection space-time expending for each pixel of present frame in addition to having R, G, B color attribute Position attribution, i.e., each pixel distance estimates the distance value of objective contour, successfully incorporated the movable information of target, increases The space-time expending of Video segmentation is added.

2nd, the present invention considers the difference of time dimension and Spatial Dimension in space-time expending, when replacing traditional with light flow valuve Between dimension amount, multilayer graph structure is equivalent to individual layer graph structure by converting interframe smooth item.

3rd, the present invention improves the space-time of interframe segmentation result by the smooth item conversion of light stream interframe and bilateral space-time restriction Continuity, while improve the accuracy of segmentation result.

Brief description of the drawings

Fig. 1 is a kind of schematic flow sheet for interactive video dividing method that present example provides；

Fig. 2 is a kind of arrowband light stream display figure that present example provides；

Fig. 3 is a kind of RGB color that present example provides；

Fig. 4 is a kind of YUV color spaces that present example provides；

Fig. 5 is a kind of distance map that present example provides；

Fig. 6 is the final segmentation that a kind of interactive video dividing method based on the present invention that present example provides obtains As a result.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below Conflict can is not formed each other to be mutually combined.

A kind of interactive video dividing method proposed by the present invention, first, the segmentation result of previous frame is passed to, obtains target In the contour line of previous frame, then carry out target profile curve and estimate, obtain estimate initial profile of the target object in present frame, and On the basis of present frame target estimates contour line, the most short distance for showing that each pixel estimates profile to target is mapped by distance From the position attribution as the pixel.Then, each pixel of present frame is transformed into YUV color spaces by RGB color, And on the basis of each pixel of present frame is except Y, U, V three-dimensional color attribute, the position attribution of reflection space-time restriction is added, i.e., Each pixel distance estimates the distance value of objective contour, expands to higher dimensional space.When establishing graph structure, first by higher dimensional space Each attribute is divided into multiple histogram Bin in advance, is then superimposed the data item for converting to obtain to the smooth item N-link of interframe Into the data item being calculated by global probabilistic model, as the data item of energy function model, energy letter is finally given Exponential model.Energy function model is finally solved, by the movable information of the invention for successfully having incorporated target, adds video point The space-time expending cut, satisfied segmentation result can be obtained under less man-machine interactively.

It is as shown in Figure 1 a kind of schematic flow sheet of interactive video dividing method provided in an embodiment of the present invention, in Fig. 1 Following steps are specifically included in shown method：

In an optional embodiment, sparse optical flow matching algorithm can be used by the target wheel in previous frame image Exterior feature is delivered to present frame, obtains target and estimates initial profile line in current frame image, wherein, specifically obtained using which kind of mode Initial profile line is estimated in current frame image to target, uniqueness restriction will not be done in embodiments of the present invention.

In an optional embodiment, the present invention can use the bilateral grid Γ tables being made up of rule sampling point ν Show, be first then distributed to pixel lifting on Grid Sampling point into higher dimensional space, then Grid Sampling point is built Graph structure.

In an optional embodiment, calculate higher dimensional space graph structure node data item (T-link) and smoothly During item (N-link), T-link and N-link can be distributed according to the Onecut parted patterns of standard.But the embodiment of the present invention is not Uniqueness restriction is done to the mode for distributing T-link and N-link.

Wherein, RGB color and YUV color spaces are all the color models to the color description of image, and RGB is (red green It is blue) it is the space defined according to eye recognition color, most of color can be represented.But in machine vision and image procossing Field, the processing to image do not use RGB color generally, because RGB color contains only three kinds of colors of RGB Passage, the image details such as tone, brightness, saturation degree are put together consideration, so being difficult to quantitative processing these detail sections. And in yuv space, each pixel has a luminance signal Y, and two carrier chrominance signals U and V.Luminance signal is to intensity Measurement, separates consideration by luminance signal and carrier chrominance signal, can change brightness value in the case where not influenceing color.YUV colors Space can be converted to by RGB color, first colored image into gray-scale map and extracted three main colors Passage is changed into extra two carrier chrominance signals to describe color.The YUV color spaces for converting to obtain by RGB color also may be used To be inversely transformed into RGB color.Specifically include following sub-step：

(4.1) by(1) by each pixel in current frame image YUV color spaces are transformed to by RGB color, wherein, [c_y c_u c_v]^TRepresent the pixel value in YUV color spaces, [c_r c_g c_b]^TRepresent the value in RGB color；

(4.2) by b (x)=[c_y,c_u,c_v,l]^T(2) intrinsic dimensionality of each pixel attribute is expanded into higher dimensional space, its In, l represents pixel x position attribution, and b (x) represents higher dimensional space attribute corresponding to pixel x.

Wherein, three color attributes and a position attribution lifting are combined to each pixel to four dimensional feature spaces, it is false If the color attribute of a pixelThen the higher dimensional space after lifting isL makees Position attribution for each pixel is that the distance for estimating contour line recently by calculating each pixel distance obtains.Contour line Estimation contain the movable information of target.

Wherein, the pixel of color space can be mapped to higher dimensional space, common interpolation by way of a variety of interpolation Mode has, arest neighbors interpolation, linear interpolation and exponential interpolation etc..

In an optional embodiment, adopted for Reduction Computation amount, it is necessary to carry out drop to the higher dimensional space after mapping Sample.Such as arest neighbors interpolation method can be used, after above-mentioned High Dimensional Mapping, by the value of each dimension of higher-dimension node by most Neighbour's mode rounds, if j-th of dimension values of higher dimensional space node iThen arest neighbors interpolation method such as following formula：

Wherein,RepresentWith no more thanMaximum integer difference.

In an optional embodiment, it is based on individual layer graph structure, i.e., to current mostly that traditional figure, which cuts model, Whole or the modeling of local narrowband region in two field picture, this figure mode of building is effectively and necessary in image segmentation , but Video segmentation is extended to, because object of which movement has temporal continuity, and the motion bit of adjacent two frames object Move little, if only only considering to establish graph structure in the current frame, then can not make full use of the segmentation result of previous frame, institute To establish adjacent two frame or the graph structure of multiframe, the space-time expending and final segmentation that can effectively ensure that segmentation result are tied The accuracy of fruit.

But multilayer graph structure increases with the number of plies, the node and side quantity of figure can increase sharply, and it is huge to calculate time cost Greatly.Therefore, the present invention equivalent multilayer graph structure by the way of previous frame interframe N-link conversions.Specifically, due to previous frame The all pixels point of previous frame is divided into two classes by segmentation result, and adjacent with the pixel that present frame is to be split is located at upper one Either the pixel of frame is marked as prospect, or it is marked as background.Current frame pixel point to previous frame neighborhood territory pixel point N-link according to the mark of previous frame pixel conversion be T-link, on the one hand can reduce the amount of calculation finally solved, separately On the one hand constrained by the use of light stream as space-time expending.

The amount of calculation that graph structure solves depends on the number of its interior joint and connects the side number of each node.For unexcellent The multilayer graph structure of change, with the increase of the number of plies, its node and side can also increase sharply.Answered in view of the room and time of calculating Miscellaneous degree, the present invention is by the way of the smooth item conversion of interframe, with the equivalent multilayer graph structure of individual layer graph structure.Traditional interframe conversion Mode is as follows：

Wherein,Neighborhood territory pixel set of the pixel x in previous frame with position pixel y is represented, | s_y| represent pixel y mark Value (0 or 1), w_xyPixel x and previous frame field pixel y similarity is represented,||x-y|| Pixel x and y time-space matrix are represented, Δ I represents pixel x and y color distance, and σ represents the gradient mean value of whole image.And Calculate pixel x and pixel y time-space matrix | | x-y | | when, the time interval of consecutive frame is considered with 1, i.e., one in frame Pixel.The pixel x and pixel y time-space matrix calculated in this way | | x-y | | such as following formula：

Wherein, x_xAnd x_yPixel x transverse and longitudinal coordinate is represented respectively, to pixel y similarly, when calculating time-space matrix, formula (5) Roughly time dimension and Spatial Dimension are equally treated, have ignored time dimension and Spatial Dimension when calculating space-time expending Difference.

Difference in view of calculating space-time expending time dimension and Spatial Dimension, when calculating neighbouring relations, it is impossible to One unit of time dimension and a unit of Spatial Dimension are equally treated.Therefore, in an optional embodiment, this The light stream value that the smooth item translation method of interframe of the invention based on optical flow constraint solves each pixel of present frame is light stream vectors index To previous frame pixel y ' corresponding to current frame pixel point x.Light stream mapping equation such as following formula：

In formula (6), f represents light stream mapping, i.e., obtains optical flow field by light stream pyramid, and picture is found by the mapping of f positions Plain x corresponds to the pixel y ' of previous frame.A kind of arrowband light stream display figure of present example offer is provided.

In an optional embodiment, in step (5), by current frame pixel point to previous frame neighborhood territory pixel point Smooth item is data item according to the conversion of the mark of previous frame pixel, is specifically included：

BySmooth item by current frame pixel point x to previous frame neighborhood territory pixel point It is data item according to the conversion of the mark of previous frame pixelWherein, the light stream value of each pixel of y ' expressions present frame arrives Previous frame pixel corresponding to current frame pixel point x,For neighborhood territory pixel set in y ' frame, ω_xy′Represent pixel x and upper one Frame field pixel y ' similarity, | s_y′| represent pixel y ' mark value.Then according to pixel y ' be marked as prospect or Data item after background converts formula (7) is added separately in the source node and sink nodes of figure.

Wherein, ω_xy′Calculation be：Wherein, | | x-y'| | represent picture Plain x and previous frame field pixel y ' time-space matrix, Δ I represent pixel x and previous frame field pixel y ' color distance, σ tables The gradient mean value of diagram picture, and(x_x,x_y) represent pixel x transverse and longitudinal coordinate, (y_x', y_y') represent pixel y ' transverse and longitudinal coordinate.

In an optional embodiment, the energy function model obtained in step (5) is expressed as：Wherein, S represents foreground pixel set,Represent background Pixel set, and By the data item for converting to obtain to the smooth item N-link of interframe, D (x) is represented by complete The data item that office's probabilistic model is calculated, and global probabilistic model is made up of fore/background mixed Gauss model, in key frame Fore/background mixed Gauss model is initialized on segmentation result, and the parameter of global probabilistic model is completed according to the distance map of generation Renewal, θ^sExpression prospect statistics with histogram,Represent background statistics with histogram, τ be fore/background distributional difference weight, η tables Show the weight of smooth item in frame,The similarity of adjacent pixel in present frame is represented,Represent in color histogram, Preceding context similarity difference, namely the L1 distances of preceding background.

Wherein,Calculation be：Wherein, s_xAnd s_yPixel x, y is represented respectively Mark, N represent present frame in adjacent pixel pair set, ω_xyThe similarity of pixel x, y is represented, and| | x-y | | the time-space matrix of pixel x, y is represented, Δ I represents the color distance of pixel x, y, σ For the gradient mean value of image.

Wherein, the differentiation mode of foreground pixel and background pixel is：

Obtain target in current frame image estimate initial profile line after, by By pixel according to whether estimating in initial profile line, it is divided into foreground seeds point and background seed point, wherein, M represents mapping Mask code matrix afterwards, then it is that prospect is mapped as in M and is mapped as the cut-off rule of background area, d (M (x)) to estimate initial profile line The distance map generated by mask code matrix is represented, dis is distance threshold, and M (x) represents pixel x mask value, by profile transformation, If pixel x is mapped as prospect, M (x) values are 1, if pixel x is mapped as background, M (x) values are 0, therefore, Seeds (x) has three kinds of values, and when Seeds (x) values are 1, x is arranged into foreground seeds point, when Seeds (x) values are 0 When, x is arranged to background seed point, when Seeds (x) values are -1, x is arranged to unknown regions.

In conjunction with the drawings and specific embodiments, the present invention is further described.

The method flow of the present invention is as shown in figure 1, existing illustrate by taking test video bear as an example：

(1) previous frame Accurate Segmentation profile is obtained

Assuming that previous frame segmentation result is reliable, previous frame is likely to be key frame, needs to add for key frame and hands over Result mutually is accurately segmented, and initializes global probabilistic model, be i.e. fore/background gauss hybrid models, clustering algorithm uses Kmeans++, fore/background mixed Gaussian number are disposed as 5, by the segmentation result of previous frame, obtain target object upper one The precise boundary of frame.

(2) preferably match to obtain the initial position of target in the current frame using sparse optical flow

Matched by sparse optical flow and the objective contour of previous frame is mapped to present frame, i.e., to each pixel on profile Its position in the current frame is all matched, and then obtains target and estimates profile in the initial of present frame.

(3) profile generation distance map is initially estimated according to target

Profile is estimated in the initial of present frame according to target, the distance map that target range profile is obtained using distance mapping is got over Close to profile apart from smaller, bigger further away from profile distance, generation distance map is as shown in Figure 5.

(4) RGB color is converted to YUV color spaces

The original RGB color colour space is converted by above formula (1), obtains YUV color spaces, RGB color such as Fig. 3 institutes Show, YUV color spaces are as shown in Figure 4.

(5) higher dimensional space maps

YUV color attributes are carried out into intrinsic dimensionality plus position attribution according to above formula (2) to extend to obtain higher dimensional space.

(6) interframe N-link is converted

Interframe N-link is converted using above formula (7), reduced value is added separately in T-link.

(7) structure segmentation energy function model

Foreground seeds point and background seed point are chosen according to distance map, in selected seed point, distance profile line is more than one The pixel for determining threshold value is set to seed point；Except seed point, remaining pixel is set as unknown regions；And carried on the back before updating Scape probabilistic model；The data item of energy function model mainly includes the part of the smooth item N-link conversions of interframe, and computational methods are shown in Above formula (7), data item are the side that ordinary node is connected with source point and meeting point in graph structure.

(8) Accurate Segmentation result is obtained by max-flow/minimal cut algorithm

Final energy function model is shown in above formula (9), solves the model equivalency in solving minimal cut problem, and maximum flow problem It is dual problem with minimal cut problem, so the final max-flow for being equivalent to solution figure, can preferably use maxflow algorithms, Present frame Accurate Segmentation result is obtained, and present frame is switched into previous frame, continues above-mentioned (1)~(7), until Video segmentation knot Beam, obtained final segmentation result are as shown in Figure 6.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all any modification, equivalent and improvement made within the spirit and principles of the invention etc., all should be included Within protection scope of the present invention.

Claims

A kind of 1. interactive video dividing method, it is characterised in that including：

(1) according to the segmentation result of previous frame image, contour line of the target in previous frame image is obtained；

(2) contour line of the target in previous frame image is mapped to current frame image, and to each pixel on contour line Position of the pixel in current frame image is all matched, target is obtained and estimates initial profile line in current frame image；

(3) initial profile line is estimated in current frame image based on target, draws each pixel in advance by distance mapping Estimate the beeline of initial profile line, the position attribution as the pixel；

(4) each pixel in current frame image is transformed into YUV color spaces by RGB color, and it is each in current frame image On the basis of the YUV color attributes of pixel, the position attribution of each pixel is added, by the intrinsic dimensionality of each pixel attribute Expand to higher dimensional space；

(5) it is number according to the conversion of the mark of previous frame pixel by the smooth item of current frame pixel point to previous frame neighborhood territory pixel point According to item, the data item after conversion is added in the data item being calculated by global probabilistic model, by the number after superposition According to data item of the item as energy function model, energy function model is obtained；

(6) solve energy function model and obtain energy function solution to model, and using current frame image as previous frame image, continue Step (1)~step (5) is performed until Video segmentation terminates.
2. according to the method for claim 1, it is characterised in that step (4) specifically includes：

(4.1) byBy each pixel in current frame image by RGB face Color space transformation to YUV color spaces, wherein, [c_y c_u c_v]^TRepresent the pixel value in YUV color spaces, [c_r c_g c_b]^TTable Show the value in RGB color；

(4.2) by b (x)=[c_y,c_u,c_v,l]^TThe intrinsic dimensionality of each pixel attribute is expanded into higher dimensional space, wherein, l is represented Pixel x position attribution, b (x) represent higher dimensional space attribute corresponding to pixel x.
3. according to the method for claim 1, it is characterised in that described by current frame pixel point to upper one in step (5) The smooth item of frame neighborhood territory pixel point is data item according to the conversion of the mark of previous frame pixel, including：

ByBy current frame pixel point x to previous frame neighborhood territory pixel point smooth item according to previous frame The mark conversion of pixel is data itemWherein, the light stream value of each pixel of y ' expressions present frame is to current frame pixel point Previous frame pixel corresponding to x,For neighborhood territory pixel set in y ' frame, ω_xy' represent pixel x and previous frame field pixel y ' Similarity, | s_y′| represent pixel y ' mark value.
4. according to the method for claim 3, it is characterised in that ω_xy′Calculation be： Wherein, | | x-y'| | pixel x and previous frame field pixel y ' time-space matrix is represented, Δ I represents pixel x and previous frame field picture Plain y ' color distance, σ represent the gradient mean value of image, and(x_x,x_y) represent pixel X transverse and longitudinal coordinate, (y_x',y_y') represent pixel y ' transverse and longitudinal coordinate.
5. according to the method described in Claims 1-4 any one, it is characterised in that the energy function obtained in step (5) Model is expressed as：Wherein, S represents foreground pixel set,Table Show background pixel set, and Pass through the data item for converting to obtain to the smooth item N-link of interframe, D (x) tables Show the data item being calculated by global probabilistic model, θ^sExpression prospect statistics with histogram,The statistics with histogram of background is represented, τ is the weight of fore/background distributional difference, and η represents the weight of smooth item in frame,Adjacent pixel is similar in expression present frame Degree,Represent in color histogram, preceding context similarity difference.
6. according to the method for claim 5, it is characterised in thatCalculation be： Wherein, s_xAnd s_yThe mark of pixel x, y is represented respectively, and N represents the set of adjacent pixel pair in present frame, ω_xyRepresent pixel x, y Similarity, and| | x-y | | the time-space matrix of pixel x, y is represented, Δ I represents pixel x, y Color distance, σ be image gradient mean value.
7. according to the method for claim 5, it is characterised in that the differentiation mode of foreground pixel and background pixel is：

Obtain target in current frame image estimate initial profile line after, by By pixel according to whether estimating in initial profile line, it is divided into foreground seeds point and background seed point, wherein, M represents mapping Mask code matrix afterwards, d (M (x)) represent the distance map generated by mask code matrix, and dis is distance threshold, and M (x) represents pixel x's Mask value, Seeds (x) have three kinds of values, when Seeds (x) values are 1, x are arranged into foreground seeds point, as Seeds (x) When value is 0, x is arranged to background seed point, when Seeds (x) values are -1, x is arranged to unknown regions.