CN107590818B

CN107590818B - A kind of interactive video dividing method

Info

Publication number: CN107590818B
Application number: CN201710794283.5A
Authority: CN
Inventors: 韩守东; 杨迎春; 刘昱均; 陈阳; 胡卓
Original assignee: Huazhong University of Science and Technology; Shenzhen Huazhong University of Science and Technology Research Institute
Current assignee: Huazhong University of Science and Technology; Shenzhen Huazhong University of Science and Technology Research Institute
Priority date: 2017-09-06
Filing date: 2017-09-06
Publication date: 2019-10-25
Anticipated expiration: 2037-09-06
Also published as: CN107590818A

Abstract

The invention discloses a kind of interactive video dividing methods, first, target profile curve is carried out to estimate, it obtains target object and estimates initial profile in present frame, and on the basis of present frame target estimates contour line, the shortest distance for showing that each pixel estimates profile to target, the position attribution as the pixel are mapped by distance.On the basis of each pixel of present frame is in addition to three-dimensional color attribute, the position attribution of reflection space-time restriction is added, i.e., each pixel distance estimates the distance value of objective contour, expands to higher dimensional space.Each attribute of higher dimensional space is first divided into multiple histogram Bin in advance when establishing graph structure, then the smooth item of interframe is added in the data item being calculated by global probabilistic model, data item as energy function model, finally energy function solution to model is obtained using max-flow min-cut algorithm, the motion information for successfully having incorporated target, increases the space-time expending of Video segmentation.

Description

A kind of interactive video dividing method

Technical field

The invention belongs to the video dividing technique fields in image procossing and machine vision, more particularly, to a kind of friendship Mutual formula methods of video segmentation.

Background technique

Video segmentation is a kind of Closing Binary Marker problem, it is intended to video or image sequence as a whole, pass through one Fixed method is partitioned into the object with practical significance.Video segmentation has important role in many fields, such as: In target identification, Video segmentation can provide prior information for target identification；In image coding, view is can be improved in Video segmentation The efficiency of frequency compressed encoding.Generally, according to whether man-machine interactively is added, Video segmentation can be divided into non-interactive type Video segmentation Divide two kinds with interactive video.Non-interactive type Video segmentation mainly uses the motion feature of the video object, such as based on light stream or Non-interactive type methods of video segmentation based on gradient descent method.Such method is to the case where there are moving objects in video applicability Preferably, if target to be split is fixed or moved slowly or alternately moves, such method is due to that can not pass through movement Feature predicts the Probability Area of target object, to be unable to reach the purpose of segmentation.And interactive video dividing method passes through Man-machine interactively, which is added, can preferably solve the problems, such as the irregular movement of above-mentioned target.

Video shows as the space-time expending of Video segmentation result as a space-time entirety, space-time restriction, includes the time Continuity and spatial continuity, time continuity show as movement of the target in two frame of front and back, are that segmentation result effectively transmits Important leverage；Spatial continuity is used in image segmentation earliest, and the form of expression is adjacent pixel or adjacent area Similarity is commonly known as smooth item (N-link) in energy function, is to guarantee target object globality in segmentation result Necessary condition.Extension of the Video segmentation as image segmentation on time dimension, therefore, space-time expending is for Video segmentation result Transmitting it is most important.

Space-time expending is the important indicator for judging a Video segmentation result transitivity, is various based on motion analysis Methods of video segmentation uses most attributes.Space-time expending includes time continuity and spatial continuity, and time continuity is logical Often reflect the motion feature of target, the spatial continuity key reaction shape information of target.It is regarded using time and space continuity There are many frequency division segmentation method, some methods first carry out super-pixel pre-segmentation, then pass through the super-pixel similarity of adjacent two frame of calculating Distribute interframe smooth item, when calculating two super-pixel centre distance using only the space length of two super-pixel, by the time Continuity, that is, motion information is described with a display model.Based on the locus of points tracking methods of video segmentation using dense optical flow with The locus of points of track Long time scale, and cluster to obtain the space-time expending of target by the locus of points.Also some methods of video segmentation When considering space-time expending, usually time dimension and Spatial Dimension are treated on an equal basis, i.e., an adjacent unit of time dimension and An adjacent location equivalence of space dimension, but time and space are discrepant, adjacent two frames same positions in actual conditions Pixel conventional video segmentation in be it is temporally adjacent, the distance for calculating time dimension is exactly a unit, then calculate work as The Euclidean distance for time and space is just briefly described in the time-space matrix of the neighborhood territory pixel of previous frame in previous frame pixel.Currently, base Methods of video segmentation in bilateral space regards video to be split as an entirety, and comprising time and space, there are also pixel colors in total Six attributes, by being mapped to sextuple bilateral space to each dimension linear interpolation, then to bilateral space using traditional Figure segmentation method solves, and obtains the label of bilateral space interior joint, it is every to obtain video to be split finally by linear interpolation inverse mapping The preceding background probability of one frame pixel.Current most of methods of video segmentation are all based on graph theory, have many methods directly will figure It cuts model and is generalized to Video segmentation from image segmentation, through light stream or other tracking modes on the basis of original spatial continuity Time continuity is added.Traditional figure cuts model and usually only considers neighbouring some pixels when similitude between considering pixel Point, such methods do not consider the connection between pixel similar in a wide range of color well.

Summary of the invention

Aiming at the above defects or improvement requirements of the prior art, the present invention provides a kind of interactive video dividing method, Thus it is not strong to solve segmentation result accuracy present in existing interactive video cutting techniques, space-time expending it is inconsistent and The technical problems such as interactive quantity is excessive.

To achieve the above object, the present invention provides a kind of interactive video dividing methods, comprising:

(1) according to the segmentation result of previous frame image, contour line of the target in previous frame image is obtained；

(2) contour line of the target in previous frame image is mapped to current frame image, and to each picture on contour line Vegetarian refreshments is all matched to position of the pixel in current frame image, obtains target and estimates initial profile in current frame image Line；

(3) initial profile line is estimated in current frame image based on target, each pixel is obtained by distance mapping Position attribution to the shortest distance for estimating initial profile line, as the pixel；

(4) pixel each in current frame image is transformed into YUV color space by RGB color, and in current frame image In each pixel YUV color attribute on the basis of, the position attribution of each pixel is added, by the feature of each pixel attribute Dimension expands to higher dimensional space；

(5) the smooth item of current frame pixel point to previous frame neighborhood territory pixel point is converted according to the label of previous frame pixel For data item, the data item after conversion is added in the data item being calculated by global probabilistic model, after superposition Data item of the data item as energy function model, obtain energy function model；

(6) it solves energy function model and obtains energy function solution to model, and using current frame image as previous frame image, Step (1)~step (5) is continued to execute until Video segmentation terminates.

Preferably, step (4) specifically includes:

(4.1) byBy pixel each in current frame image by RGB color transforms to YUV color space, wherein [c_y c_u c_v]^TIndicate the pixel value in YUV color space, [c_r c_g c_b]^TIndicate the value in RGB color；

(4.2) by b (x)=[c_y,c_u,c_v,l]^TThe intrinsic dimensionality of each pixel attribute is expanded into higher dimensional space, wherein L indicates that the position attribution of pixel x, b (x) indicate the corresponding higher dimensional space attribute of pixel x.

Preferably, in step (5), the smooth item by current frame pixel point to previous frame neighborhood territory pixel point is according to upper The label conversion of one frame pixel is data item, comprising:

ByBy the smooth item of current frame pixel point x to previous frame neighborhood territory pixel point according to upper The label conversion of one frame pixel is data itemWherein, the light stream value of each pixel of y ' expression present frame is to present frame The corresponding previous frame pixel of pixel x,For neighborhood territory pixel set in the frame of y ', ω_xy′Indicate pixel x and previous frame field The similarity of pixel y ', | s_y′| indicate the label value of pixel y '.

Preferably, ω_xy′Calculation are as follows:Wherein, | | x-y'| | indicate pixel x With the time-space matrix of previous frame field pixel y ', Δ I indicates that the color distance of pixel x and previous frame field pixel y ', σ indicate figure The gradient mean value of picture, and(x_x,x_y) indicate pixel x transverse and longitudinal coordinate, (y_x',y_y') Indicate the transverse and longitudinal coordinate of pixel y '.

Preferably, the energy function model obtained in step (5) is expressed as:Wherein, S indicates foreground pixel set,Indicate background pixel Set, and By the data item converted to the smooth item N-link of interframe, D (x) is indicated by global general The data item that rate model is calculated, θ^sExpression prospect statistics with histogram,Indicate the statistics with histogram of background, τ is fore/background The weight of distributional difference, η indicate the weight of smooth item in frame,Indicate the similarity of adjacent pixel in present frame,It indicates in color histogram, preceding context similarity difference.

Preferably,Calculation are as follows:Wherein, s_xAnd s_yRespectively indicate pixel X, the label of y, N indicate the set of adjacent pixel pair in present frame, ω_xyIndicate the similarity of pixel x, y, and‖ x-y | | indicate that the time-space matrix of pixel x, y, Δ I indicate the color distance of pixel x, y, σ For the gradient mean value of image.

Preferably, the differentiation mode of foreground pixel and background pixel are as follows:

Obtain target in current frame image estimate initial profile line after, byBy pixel according to whether estimating in initial profile line, it is divided into prospect kind Son point and background seed point, wherein M indicates that the mask code matrix after mapping, d (M (x)) indicate the distance generated by mask code matrix Figure, dis are distance threshold, and M (x) indicates the mask value of pixel x, and there are three types of values by Seeds (x), when Seeds (x) value is 1 When, foreground seeds point is set by x, when Seeds (x) value is 0, background seed point is set by x, when Seeds (x) value When being -1, the region unknown is set by x.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show Beneficial effect:

1, present invention pixel each for present frame is additionally added reflection space-time expending other than having R, G, B color attribute Position attribution, i.e., each pixel distance estimates the distance value of objective contour, successfully incorporated the motion information of target, increases The space-time expending of Video segmentation is added.

2, the present invention considers the difference of time dimension and Spatial Dimension in space-time expending, when replacing traditional with light flow valuve Between dimension amount, multilayer graph structure is equivalent to by single layer graph structure by the conversion smooth item of interframe.

3, the present invention improves the space-time of interframe segmentation result by the smooth item conversion of light stream interframe and bilateral space-time restriction Continuity, while improving the accuracy of segmentation result.

Detailed description of the invention

Fig. 1 is a kind of flow diagram for interactive video dividing method that present example provides；

Fig. 2 is a kind of narrowband light stream display figure that present example provides；

Fig. 3 is a kind of RGB color that present example provides；

Fig. 4 is a kind of YUV color space that present example provides；

Fig. 5 is a kind of distance map that present example provides；

Fig. 6 is a kind of final segmentation obtained based on interactive video dividing method of the invention that present example provides As a result.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below Not constituting a conflict with each other can be combined with each other.

A kind of interactive video dividing method proposed by the present invention, firstly, the segmentation result of incoming previous frame, obtains target In the contour line of previous frame, then carries out target profile curve and estimate, obtain target object in the initial profile of estimating of present frame, and On the basis of present frame target estimates contour line, the most short distance for showing that each pixel estimates profile to target is mapped by distance From position attribution as the pixel.Then, each pixel of present frame is transformed into YUV color space by RGB color, And on the basis of each pixel of present frame is in addition to Y, U, V three-dimensional color attribute, the position attribution of reflection space-time restriction is added, i.e., Each pixel distance estimates the distance value of objective contour, expands to higher dimensional space.When establishing graph structure, first by higher dimensional space Each attribute is divided into multiple histogram Bin in advance, is then superimposed the data item converted to the smooth item N-link of interframe Into the data item being calculated by global probabilistic model, as the data item of energy function model, energy letter is finally obtained Exponential model.Energy function model is finally solved, has successfully incorporated the motion information of target through the invention, increases video point The space-time expending cut can obtain satisfied segmentation result under less man-machine interactively.

It is as shown in Figure 1 a kind of flow diagram of interactive video dividing method provided in an embodiment of the present invention, in Fig. 1 Shown in method specifically includes the following steps:

In an optional embodiment, sparse optical flow matching algorithm can be used the target wheel in previous frame image Exterior feature is transmitted to present frame, obtains target and estimates initial profile line in current frame image, wherein is specifically obtained using which kind of mode Initial profile line is estimated in current frame image to target, uniqueness restriction will not be done in embodiments of the present invention.

In an optional embodiment, the present invention can be using the bilateral grid Γ table being made of rule sampling point ν Show, pixel is promoted into higher dimensional space first, is then distributed on Grid Sampling point, then Grid Sampling point is constructed Graph structure.

In an optional embodiment, in the data item (T-link) for calculating higher dimensional space graph structure node and smoothly When item (N-link), T-link and N-link can be distributed according to the Onecut parted pattern of standard.But the embodiment of the present invention is not Uniqueness restriction is done to the mode of distribution T-link and N-link.

Wherein, RGB color and YUV color space are all the color models to the color description of image, and RGB is (red green It is blue) it is the space defined according to eye recognition color, it can indicate most of color.But in machine vision and image procossing Field does not use RGB color usually to the processing of image, because RGB color contains only three kinds of colors of RGB Channel puts the image details such as tone, brightness, saturation degree consideration together, so being difficult to these detail sections of quantitative processing. And in yuv space, each pixel has a luminance signal Y and two carrier chrominance signals U and V.Luminance signal is to intensity Measurement, separates consideration for luminance signal and carrier chrominance signal, can change brightness value in the case where not influencing color.YUV color Space can be converted to by RGB color, first colored image into grayscale image and extracted three main colors Channel becomes additional two carrier chrominance signals to describe color.The YUV color space converted by RGB color can also To be inversely transformed into RGB color.Specifically include following sub-step:

(4.1) by(1) by pixel each in current frame image YUV color space is transformed to by RGB color, wherein [c_y c_u c_v]^TIndicate the pixel value in YUV color space, [c_r c_g c_b]^TIndicate the value in RGB color；

(4.2) by b (x)=[c_y,c_u,c_v,l]^T(2) intrinsic dimensionality of each pixel attribute is expanded into higher dimensional space, In, l indicates that the position attribution of pixel x, b (x) indicate the corresponding higher dimensional space attribute of pixel x.

Wherein, three color attributes and a position attribution are combined to be promoted to four dimensional feature spaces each pixel, it is false If the color attribute of a pixelThen the higher dimensional space after promotion isL makees Position attribution for each pixel is obtained by calculating the distance that each pixel distance estimates contour line recently.Contour line Estimation contain the motion information of target.

Wherein, the pixel of color space can be mapped to higher dimensional space, common interpolation by way of a variety of interpolation Mode has, arest neighbors interpolation, linear interpolation and exponential interpolation etc..

In an optional embodiment, for Reduction Computation amount, needs to carry out the higher dimensional space after mapping drop and adopt Sample.Such as arest neighbors interpolation method can be used, after above-mentioned High Dimensional Mapping, by the value of each dimension of higher-dimension node by most Neighbour's mode is rounded, if j-th of dimension values of higher dimensional space node iThen arest neighbors interpolation method such as following formula:

Wherein,It indicatesBe not more thanMaximum integer difference.

In an optional embodiment, it is based on single layer graph structure, i.e., to current that traditional figure, which cuts model mostly, Whole or local narrowband region in frame image model, and this figure mode of building is effectively and necessity in image segmentation , but Video segmentation is extended to, since object of which movement has temporal continuity, and the motion bit of adjacent two frames object It moves less, if only only considering to establish graph structure in the current frame, then cannot make full use of the segmentation result of previous frame, institute To establish the graph structure of adjacent two frame or multiframe, can effectively ensure that the space-time expending and final segmentation knot of segmentation result The accuracy of fruit.

But multilayer graph structure increases with the number of plies, the node and number of edges amount of figure can increase sharply, and it is huge to calculate time cost Greatly.Therefore, the present invention equivalent multilayer graph structure by the way of previous frame interframe N-link conversion.Specifically, due to previous frame The all pixels point of previous frame is divided into two classes by segmentation result, and adjacent with present frame pixel to be split is located at upper one It the pixel of frame or is marked as prospect or is marked as background.Current frame pixel point to previous frame neighborhood territory pixel point N-link according to the label of previous frame pixel conversion be T-link, on the one hand can reduce the calculation amount finally solved, separately On the one hand light stream is used to constrain as space-time expending.

The calculation amount that graph structure solves depends on the number of its interior joint and connects the number of edges of each node.For unexcellent The multilayer graph structure of change, with the increase of the number of plies, node and side can also increase sharply.In view of the room and time of calculating is multiple Miscellaneous degree, the present invention is by the way of the smooth item conversion of interframe, with the equivalent multilayer graph structure of single layer graph structure.Traditional interframe conversion Mode is as follows:

Wherein,Indicate pixel x in previous frame with the neighborhood territory pixel set of position pixel y, | s_y| indicate the label of pixel y Value (0 or 1), w_xyIndicate the similarity of pixel x and previous frame field pixel y,||x-y|| Indicate that the time-space matrix of pixel x and y, Δ I indicate that the color distance of pixel x and y, σ indicate the gradient mean value of whole image.And Calculate the time-space matrix of pixel x and pixel y | | x-y | | when, the time interval of consecutive frame is considered with 1, i.e. one in frame Pixel.The time-space matrix of the pixel x and pixel y that calculate in this way | | x-y | | such as following formula:

Wherein, x_xAnd x_yThe transverse and longitudinal coordinate for respectively indicating pixel x, similarly to pixel y, when calculating time-space matrix, formula (5) Roughly time dimension and Spatial Dimension are equally treated, have ignored time dimension and Spatial Dimension when calculating space-time expending Difference.

It, cannot be when calculating neighbouring relations in view of calculating the difference of space-time expending time dimension and Spatial Dimension One unit of time dimension and a unit of Spatial Dimension are equally treated.Therefore, in an optional embodiment, this The smooth item translation method of interframe of the invention based on optical flow constraint solves light stream value, that is, light stream vectors index of each pixel of present frame To the corresponding previous frame pixel y ' of current frame pixel point x.Light stream mapping equation such as following formula:

In formula (6), f indicates light stream mapping, i.e., obtains optical flow field by light stream pyramid, finds picture by the mapping of the position f Plain x corresponds to the pixel y ' of previous frame.A kind of narrowband light stream display figure of present example offer is provided.

In an optional embodiment, in step (5), by current frame pixel point to previous frame neighborhood territory pixel point Smooth item is data item according to the conversion of the label of previous frame pixel, is specifically included:

ByBy current frame pixel point x to the smooth item of previous frame neighborhood territory pixel point It is data item according to the conversion of the label of previous frame pixelWherein, the light stream value of each pixel of y ' expression present frame arrives The corresponding previous frame pixel of current frame pixel point x,For neighborhood territory pixel set in the frame of y ', ω_xy′Indicate pixel x and upper one The similarity of frame field pixel y ', | s_y′| indicate the label value of pixel y '.Then according to pixel y ' be marked as prospect or Data item after background converts formula (7) is added separately in the source node and sink nodes of figure.

Wherein, ω_xy′Calculation are as follows:Wherein, | | x-y'| | indicate picture The time-space matrix of plain x and previous frame field pixel y ', Δ I indicate the color distance of pixel x and previous frame field pixel y ', σ table The gradient mean value of diagram picture, and(x_x,x_y) indicate pixel x transverse and longitudinal coordinate, (y_x', y_y') indicate pixel y ' transverse and longitudinal coordinate.

In an optional embodiment, the energy function model obtained in step (5) is expressed as:Wherein, S indicates foreground pixel set,Indicate back Scape pixel set, and By the data item converted to the smooth item N-link of interframe, D (x) indicate by The data item that global probabilistic model is calculated, and global probabilistic model is made of fore/background mixed Gauss model, in key frame Segmentation result on initialize fore/background mixed Gauss model, and the ginseng of global probabilistic model is completed according to the distance map of generation Number updates, θ^sExpression prospect statistics with histogram,Indicate that the statistics with histogram of background, τ are the weight of fore/background distributional difference, η Indicate the weight of smooth item in frame,Indicate the similarity of adjacent pixel in present frame,Indicate color histogram In, the L1 distance of preceding context similarity difference namely preceding background.

Wherein,Calculation are as follows:Wherein, s_xAnd s_yRespectively indicate pixel x, y Label, N indicate present frame in adjacent pixel pair set, ω_xyIndicate the similarity of pixel x, y, and| | x-y | | indicate that the time-space matrix of pixel x, y, Δ I indicate the color distance of pixel x, y, σ For the gradient mean value of image.

Wherein, the differentiation mode of foreground pixel and background pixel are as follows:

Obtain target in current frame image estimate initial profile line after, byBy pixel according to whether estimating in initial profile line, it is divided into prospect kind Son point and background seed point, wherein M indicate mapping after mask code matrix, then estimate initial profile line be mapped as in M prospect and It is mapped as the cut-off rule of background area, d (M (x)) indicates the distance map generated by mask code matrix, and dis is distance threshold, M (x) table The mask value for showing pixel x, by profile transformation, if pixel x is mapped as prospect, M (x) value is 1, if pixel x quilt It is mapped as background, then M (x) value is 0, and therefore, x is arranged when Seeds (x) value is 1 for value that there are three types of Seeds (x) Background seed point is set by x when Seeds (x) value is 0 for foreground seeds point, when Seeds (x) value is -1, by x It is set as the region unknown.

Now in conjunction with the drawings and specific embodiments, the present invention is further described.

Method flow of the invention is as shown in Figure 1, existing illustrate by taking test video bear as an example:

(1) previous frame Accurate Segmentation profile is obtained

Assuming that previous frame segmentation result is reliably, previous frame is likely to be key frame, and key frame is needed to be added and is handed over It is mutually accurately segmented as a result, and initializing global probabilistic model, i.e. fore/background gauss hybrid models, clustering algorithm use Kmeans++, fore/background mixed Gaussian number are disposed as 5, by the segmentation result of previous frame, obtain target object upper one The precise boundary of frame.

(2) it preferably matches to obtain the initial position of target in the current frame using sparse optical flow

It is matched by sparse optical flow and the objective contour of previous frame is mapped to present frame, i.e., to each pixel on profile It is all matched to its position in the current frame, and then obtains target and initially estimates profile in present frame.

(3) profile is initially estimated according to target generates distance map

Profile initially is estimated in present frame according to target, is got over using the distance map that distance mapping obtains target range profile It is bigger further away from profile distance close to profile apart from smaller, it is as shown in Figure 5 to generate distance map.

(4) RGB color is converted to YUV color space

The original RGB color colour space is converted by above formula (1), obtains YUV color space, RGB color such as Fig. 3 institute Show, YUV color space is as shown in Figure 4.

(5) higher dimensional space maps

YUV color attribute intrinsic dimensionality is carried out plus position attribution according to above formula (2) to extend to obtain higher dimensional space.

(6) interframe N-link is converted

Interframe N-link is converted using above formula (7), reduced value is added separately in T-link.

(7) building segmentation energy function model

Foreground seeds point and background seed point are chosen according to distance map, in selected seed point, distance profile line is more than one The pixel for determining threshold value is set as seed point；In addition to seed point, remaining pixel is set as the region unknown；And back before updating Scape probabilistic model；The data item of energy function model mainly includes the part of the smooth item N-link conversion of interframe, and calculation method is shown in Above formula (7), data item are the side that ordinary node is connect with source point and meeting point in graph structure.

(8) Accurate Segmentation result is obtained by max-flow/minimal cut algorithm

Final energy function model is shown in above formula (9), solves the model equivalency in solving minimal cut problem, and maximum flow problem It is dual problem with minimal cut problem, so being finally equivalent to the max-flow of solution figure, can preferably uses maxflow algorithm, Present frame Accurate Segmentation is obtained as a result, and present frame is switched to previous frame, continuation above-mentioned (1)~(7), until Video segmentation knot Beam, obtained final segmentation result are as shown in Figure 6.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims

1. a kind of interactive video dividing method characterized by comprising

(2) contour line of the target in previous frame image is mapped to current frame image, and to each pixel on contour line It is all matched to position of the pixel in current frame image, target is obtained and estimates initial profile line in current frame image；

(3) initial profile line is estimated in current frame image based on target, obtains each pixel in advance by distance mapping The shortest distance for estimating initial profile line, the position attribution as the pixel；

(4) pixel each in current frame image is transformed into YUV color space by RGB color, and each in current frame image On the basis of the YUV color attribute of pixel, the position attribution of each pixel is added, by the intrinsic dimensionality of each pixel attribute Expand to higher dimensional space；

It (5) is number according to the conversion of the label of previous frame pixel by the smooth item of current frame pixel point to previous frame neighborhood territory pixel point According to item, the data item after conversion is added in the data item being calculated by global probabilistic model, by the number after superposition Data item according to item as energy function model obtains energy function model；

(6) it solves energy function model and obtains energy function solution to model, and using current frame image as previous frame image, continue Step (1)~step (5) are executed until Video segmentation terminates；

Wherein, step (4) specifically includes:

(4.1) byBy pixel each in current frame image by RGB face Color space transformation is to YUV color space, wherein [c_y c_u c_v]^TIndicate the pixel value in YUV color space, [c_r c_g c_b]^TTable Show the value in RGB color；

(4.2) by b (x)=[c_y,c_u,c_v,l]^TThe intrinsic dimensionality of each pixel attribute is expanded into higher dimensional space, wherein l is indicated The position attribution of pixel x, b (x) indicate the corresponding higher dimensional space attribute of pixel x；

In step (5), the smooth item by current frame pixel point to previous frame neighborhood territory pixel point is according to previous frame pixel Label conversion be data item, comprising:

ByBy the smooth item of current frame pixel point x to previous frame neighborhood territory pixel point according to The label conversion of previous frame pixel is data itemWherein, the light stream value of each pixel of y ' expression present frame is to currently The corresponding previous frame pixel of frame pixel x,For neighborhood territory pixel set in the frame of y ', ω_xy′Indicate that pixel x and previous frame are led The similarity of domain pixel y ', | s_y′| indicate the label value of pixel y '；

The energy function model obtained in step (5) is expressed as:Wherein, S indicates foreground pixel set, Indicate background pixel set, and Pass through the data item converted to the smooth item N-link of interframe, D (x) Indicate the data item being calculated by global probabilistic model, θ^sExpression prospect statistics with histogram,Indicate the histogram system of background Meter, τ are the weight of fore/background distributional difference, and η indicates the weight of smooth item in frame,Indicate adjacent pixel in present frame Similarity,It indicates in color histogram, preceding context similarity difference.

2. the method according to claim 1, wherein ω_xy′Calculation are as follows:Wherein, | | x-y'| | indicate the time-space matrix of pixel x and previous frame field pixel y ', Δ I Indicate that the color distance of pixel x and previous frame field pixel y ', σ indicate the gradient mean value of image, and(x_x,x_y) indicate pixel x transverse and longitudinal coordinate, (y_x',y_y') indicate pixel y ' cross Ordinate.

3. method according to claim 1 or 2, which is characterized in thatCalculation are as follows:Wherein, s_xAnd s_yThe label of pixel x, y is respectively indicated, N indicates adjacent picture in present frame The set of element pair, ω_xyIndicate the similarity of pixel x, y, and| | x-y | | indicate pixel x, y Time-space matrix, Δ I indicate pixel x, y color distance, σ be image gradient mean value.

4. method according to claim 1 or 2, which is characterized in that the differentiation mode of foreground pixel and background pixel are as follows: