CN102750712A

CN102750712A - Moving object segmenting method based on local space-time manifold learning

Info

Publication number: CN102750712A
Application number: CN201210187511XA
Authority: CN
Inventors: 林倞; 江波; 徐元璐; 梁小丹
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2012-06-07
Filing date: 2012-06-07
Publication date: 2012-10-24
Anticipated expiration: 2032-06-07
Also published as: CN102750712B

Abstract

The invention discloses a moving object segmenting method based on local space-time manifold learning. The method comprises the following steps: 1. segmentation of a space-time cube of an input video and extraction of an illuminant invariance feature; 2. establishment of a background model based on the local space-time manifold learning; and 3. segmentation of a moving object based on the local space-time manifold and on-line updating maintenance of a model. According to the moving object segmenting method based on the local space-time manifold learning, local space-time variation can be effectively described, the scale adaptability of an SIFT (scale invariant feature transform) feature point set in an extended image in the input video is treated, the problem that a dynamic background and illumination change can not be effectively eliminated when the moving object is segmented is solved, and the reliable and effective moving object can be provided for an intelligent monitoring platform.

Description

A kind of moving Object Segmentation method based on local space time's manifold learning

Technical field

The present invention relates to field of video monitoring, be specifically related to the moving Object Segmentation technology, illumination unchangeability Feature Extraction, local space time's stream shape off-line learning, moving target and background segment, fields such as local space time's stream shape online updating maintenance.

Technical background

At present; With the video monitoring is the public arena safety-protection system significantly popularization of core; Be different from the problem that the huge manpower cost that traditional manpower formula video monitoring brings and a large amount of wrong reports that possibly occur are failed to report, intelligent video monitoring is directly perceived with it, accurately, in time and the information content abundant and more and more receive the concern and the popularization of industry.In recent years, along with computing power, rapid development of network, the calculating bottleneck that in original intelligent monitoring, exists, transmission bottleneck are able to break through.Intelligent monitoring technology complicated, that effect is high is replacing original simple, traditional intelligence monitoring technique of paying attention to operational efficiency gradually.Moving Object Segmentation as a basic type technology of intelligent video monitoring, always is the emphasis of the concern of a large amount of research institutions.

The fundamental purpose of moving Object Segmentation technology is to get rid of the interference that the background in the monitor video causes high-level intelligent video monitoring technology (motion target tracking, cross-line detects, behavior identification etc.).Its fundamental method is through to the background extracting characteristic in the monitor video, uses statistics/probability model to carry out background modeling then.Behind the frame of newly arriving in the video, judge whether to have occurred foreground object and split through judging that can image be expressed by background model.

Traditional moving Object Segmentation method; Like many Gausses (GMM); Direct subtractive method, background mean value image methods etc. are not all based on the scene in the monitor video static basically (and camera can be shaken); The fundamental assumption of no unexpected illumination variation or the like, this also is fixed under some indoor simple scenario with regard to the scope of application that has determined them.And along with the continuous popularization of video monitoring, increasing outdoor complex scene, traditional moving Object Segmentation method can't satisfy the demand that video intelligent is analyzed.

Instantly in the monitor video, the several problems below normal the existence:

1, dynamic background, like the trees that wave with the wind that occur in the monitor video scene, fountain, the water surface, Changes in weather (rain, snow) etc., this moment simple background model can't in the dynamic background the situation that might occur represent fully.

, illumination variation, like the switch lamp of indoor scene, the automatic light compensation of camera all can cause video on global level, to occur significantly changing.

3, camera shake; Influenced by objective condition; The camera of taking monitor video certain shake can occur unavoidably, and this part is a huge difficult problem for present motion target tracking, does not still have a kind of method preferably simply and effectively to address this problem.

In order to solve the above problems 1 and 2, many improved moving Object Segmentation technology are suggested.Divide from the type of background model, roughly can be divided into the pixel is cutting apart of basis, is cutting apart of basis with the image block, cuts apart three types with what the space-time cube was the basis.

With the pixel is cutting apart of basis, and its basic thought is in whole monitor video, to safeguard several simple probability models for each pixel.Simply use the sample of pixel value but be different from classic method as modeling, instantly to be that the moving Object Segmentation technology on basis is general with the pixel use local texture coding (LBP) to describe the characteristic of a pixel in its residing local space.This category feature generally only writes down current pixel point is faced the pixel value of territory pixel with it relative size; Also just be equivalent to write down the texture variations rule in this part zone; Relative pixel value size remains unchanged basically when illumination variation takes place like this; Thereby improve the descriptive power to dynamic scene in the video and illumination variation, the nearest research of these class methods is published on the CVPR in 2010 " Modeling pixel process with scale invariant local patterns for backgroundsubtraction in complex scenes ".The cutting techniques characteristics that with the pixel are the basis are that algorithm simple operation efficient is higher, but because local textural characteristics only can be described very simply to be changed, and in the video of comparatively complicated dynamic background and illumination variation, the effect of these class methods is relatively poor.

With the image block is cutting apart of basis; Being different from the pixel is cutting apart of basis; Image block through using fixed size is the model based unit as a setting; Use principal component analysis (PCA) (Principle Component Analysis), discriminant model (Generalized 1-SVM) waits describes background.These class methods effect in a lot of comparatively complicated scenes is better, but exists two limitation: 1. for long monitor video of time, tangible degradation phenomena can appear in a lot of models.2. the basic assumption of image block is separated video from the time with the space, and is comprehensive inadequately to the expression on changing on the space-time.In addition, lacking a kind of is the local grain characteristic on basis with the image block, so class methods are comparatively responsive to illumination variation.

Cut apart with what the space-time cube was the basis; On the basis of image block further; Through from video, being partitioned into space-time cube one by one, and suppose that each space-time cube described background simple change in time and space in video and come background is carried out modeling.Common model is that sparse dictionary is expressed (Sparse Coding), time and space significance (Spatiotemporal Saliency).Cutting apart of being the basis can the variation of fine description dynamic background in video with the space-time cube, certainly, correspondence is the increase of model complexity and calculated amount with it.With image block for the basis to cut apart similarly be that for the texture statistics amount of how using in a kind of feature description space-time cube, the someone proposes method as yet.So the effect how class methods can improve in illumination variation requires study.

In view of the description of front for the moving Object Segmentation method, a kind of effective cutting techniques should be put forth effort on and solve two problems: how to use the variation of comparatively simple model description dynamic background 1..2. the elementary cell that how to be directed against modeling proposes a kind of illumination unchangeability characteristic.

Summary of the invention

The present invention is embodied as motion target tracking in the intelligent video monitoring in order to solve the complicated dynamic background and the unexpected problem of illumination variation in the moving Object Segmentation, goal behavior identification, and cross-line/trans-regional detection provides stable, reliably foreground area.

Technical scheme of the present invention is: a kind of moving Object Segmentation method based on local space time's manifold learning may further comprise the steps:

1) input off-line video;

2) the off-line video is carried out the space-time cube and cut apart and symmetrical space-time local grain coding characteristic extraction, the space-time texture variations that obtains space-time cube characteristic is described sequence;

3) describe sequence according to the space-time texture variations of space-time cube characteristic and set up background model based on local space time's manifold learning;

4) input motion video;

5) sport video being carried out the space-time cube cuts apart and symmetrical space-time local grain coding characteristic extraction;

6) through judging sport video space-time cube and, obtaining target travel and background parts through the distance between the space-time cube of background model prediction;

7) according to background parts background model is upgraded and safeguarded;

8) the new sport video of input then skips to step 4), otherwise finishes.

Further, step 2) may further comprise the steps:

21) the off-line video is carried out the space-time cube and cut apart the space-time cube sequence that obtains each cell position in the video;

22) the off-line video is carried out symmetrical space-time local grain coding characteristic and extract, the space-time texture variations that obtains space-time cube characteristic is described sequence.

Further, step 22) the off-line video being carried out symmetrical space-time local grain coding characteristic extraction may further comprise the steps:

221) get the arbitrary pixel of space-time cube, face at the three-dimensional cube of the 3*3*3 of pixel and calculate its proper vector in the territory, be divided into the space-time local grain coding that pixels are extracted on 4 space-time planes to the three-dimensional cube of this 3*3*3;

222) use the pixel of center pixel point symmetry right; Through set the yardstick threshold value to pixel to comparing coding; The symmetrical space-time local grain coding that calculating pixel is right pulls into a big proper vector with symmetrical space-time local grain coding and describes as the cubical space-time texture variations of this space-time.

Further, step 3) may further comprise the steps:

31) with step 21) obtain space-time cube sequence and extend into sequence matrix;

32) adopt principal component analysis (PCA) that sequence matrix is decomposed, according to the eigenwert setting threshold of the principal component analysis (PCA) of sequence matrix, the structrual description parameter that then flows shape greater than d eigenwert of setting threshold as local space time;

33) the structrual description parameter that adopts local space time's stream shape is the expressed sequence matrix again, obtains the status switch of local space time's stream shape;

34) calculate the variation characterising parameter of local space time's stream shape and local spacetime manifolds state variation is carried out the error matrix E behind the linear fit through linear fit;

35) adopt principal component analysis (PCA) that error matrix E is decomposed, according to the eigenwert setting threshold of the principal component analysis (PCA) of error matrix E, greater than the d of setting threshold _εIndividual eigenwert is then as production noise characterising parameter.

Further; Step 6) is predicted next space-time cubical local space time stream shape attitude and proper vector according to local space time's stream shape; Set the threshold value of judging distance; If the Euclidean distance of the proper vector that certain pixel characteristic of correspondence vector and prediction are come out in the sport video is less than threshold value then be background parts, otherwise be the moving target point.

Further, according to background parts background model to be upgraded be the incrementally updating that adopt to set upgrades threshold value and CCIPCA algorithm combination to step 7).

Adopt the present technique scheme in order to guarantee the reliable and stable of background model, use the mode of off-line training video to set up background model.Local change in time and space can be described effectively; Handle the yardstick adaptability that the SIFT unique point is gathered in the expanded images in the input video; Solution can not effectively be got rid of dynamic background when moving Object Segmentation; The problem of illumination variation, being embodied as the intelligent monitoring platform provides reliably, effectively moving target.

Description of drawings

Fig. 1 is a system chart of the present invention;

Fig. 2 is a space-time illumination unchangeability symmetry texture coding synoptic diagram among the embodiment;

Fig. 3 is the moving Object Segmentation result.

Specific embodiments

Of Fig. 1, the present invention includes following steps:

1) input off-line video;

4) input motion video;

7) according to background parts background model is upgraded and safeguarded;

8) the new sport video of input then skips to step 4), otherwise finishes.

Wherein the concrete implementation of each step is following

(1) step 2) the off-line video being carried out the space-time cube cuts apart and symmetrical space-time local grain coding characteristic extraction

The space-time cube refers to segmented extraction goes out in the input video the time of fusion and the fundamental analysis unit of spatial variations.In system, for continuous t frame, we extract the image block of a h*w in the same position unit of each frame.Then all t image block is combined into the space-time cube of a h*w*t (the unified 4*4*5 of use in system).According to this principle, can input video be divided into space-time cube sequence V one by one _i={ v _{I, 1}, v _{I, 2}..., v _{I, n}..., the index of i representative unit position.

In order on the space-time cube, to extract the characteristic that can resist illumination variation, the present invention proposes a kind of improved symmetrical space-time local grain coding (CS-STLTP), and the Feature Extraction process is seen Fig. 2.

The local grain coding that is different from other only faces territory (3*3) to the space of a pixel and calculates, and in order to add the change information of space-time, we face in three-dimensional pixel and carry out calculation code in the territory.In order to improve the calculating extraction efficiency of characteristic, arbitrary pixel in the space-time cube is faced its characteristic of calculating in the territory at the 3*3*3 of this pixel three-dimensional cube.See from the angle of solid geometry; For all change in time and space of expressed intact; Can be divided into four space-time planes to the cube of this 3*3*3, calculate the local grain coding then respectively, pull into a long proper vector at last and describe as the change in time and space of this pixel.

The length of considering such proper vector has become original 4 times and (had only calculated a 3*3 plane originally; Calculate 4 3*3 space-time planes now); Taked the form of asymmetric encoding; Proper vector is shortened half, can either guarantee that like this change in time and space information can have been guaranteed executing efficiency again by good description.The method of asymmetric encoding is; Originally the pixel at use center and its 8 adjacent pixels points compare one by one; Calculate a texture coding then, ignore the pixel at center now, to be divided into 4 groups of pixels about the center pixel point symmetry right with central pixel point adjacent pixels point with 8; Then to each pixel to calculating respectively, obtain a texture coding at last.Such comparison calculation times has become present 8 times by original 8 times, has accelerated feature calculation speed.

In the monitor video of reality; Because the variety of problems of video capture device, the video that obtains generally all exist certain noise, in order to suppress the influence that noise causes characteristic; It or not the magnitude relationship between simple two pixel point values of comparison; But set a dimensional variation threshold value, be same type and neglect the subtle difference between them if the pixel point value that compares with it in its dimensional variation scope, is supposed these two pixels.If outside its dimensional variation scope, then be divided into two kinds of situation according to magnitude relationship.In our system, the rule that pixel value relatively obtains encoding is set as follows:

S_{τ} (p_{i}, p_{j}) = \{\begin{matrix} 1, & p_{j} &GreaterEqual; (1 + τ) p_{i} \\ 0, & (1 + τ) p_{i} > p_{j} > (1 - τ) p_{i} \\ - 1, & p_{j} \leq (1 + τ) p_{i} \end{matrix}

P in the formula _i, p _jBe the pixel of different pixels point, S _τ(p _i, p _j) be texture coding.

According to top feature calculation method; The characteristic length of a pixel is 4*4=16 dimension (if each Color Channel is extracted characteristic then be 48 dimensions), and the symmetrical space-time local grain coding (CS-STLTP) that each pixel in the space-time cube is calculated pulls into a big proper vector and describes as the cubical space-time texture variations of this space-time.

(2) step 3) is described sequence according to the space-time texture variations of space-time cube characteristic and is set up the background model based on local space time's manifold learning;

According to step 2) obtained space-time cube feature description sequence, in this step,, use a kind of local space time stream shape to describe this sequence variation relation therein through the space-time cube feature description sequence of the off-line training that gives.

In order to simplify the local space time's manifold structure and the variation at space-time cube sequence place, suppose that at first space-time cube sequence satisfies local linear change, therefore adopt a kind of linear dynamic system (LDS) to describe.Same hypothesis has obtained approving widely and using in the research field of dynamic texture.The texture variations of many complicacies, like flame, smog, the water surface etc. exist the texture variations of complicated inenarrable motion change, can both carry out modeling through using linear dynamic system, and is synthetic, work such as identification.Therefore, suppose that the dynamic background that in monitor video, exists can be described by linear dynamic system preferably.In native system, adopting automatically, the average mobility model of recurrence (ARMA) carries out modeling.

Therefore, local space time's manifold structure and the variation with space-time cube sequence is described as

v _i,n=C _i,nz _i,n+ω _i,n

z _i，n+1＝A _i,nz _i,n+B _i,nε _i,n

ω _i,n～N(0，∑ _ωi)，ε _i,n～N(0，1)

C in the formula _{I, n}The structure (can think the substrate of local space time's stream shape with the viewpoint of linear algebra) of local space time's stream shape has been described, z _{I, n}The state of representing current local space time stream shape, A _{I, n}, B _{I, n}The variation of local space time's stream shape attitude has been described, ω _{I, n}, ε _{I, n}The noise that exists in noise of representing respectively to exist in the space-time cube and the local spacetime manifolds state variation.In system, suppose that the dimension of local space time's stream shape of a space-time cube sequence is d, the dimension of production noise was d during local space time's stream shape attitude changed _ε

In order to confirm above-mentioned Model parameter, confirm that the concrete steps of model parameter are following:

With space-time cube sequence V _i={ v _{I, 1}, v _{I, 2}..., v _{I, n}Extend into a sequence matrix W _{I, n}, W _{I, n}Whenever classify a cubical proper vector of space-time as.Convert description to linear space is described in the linear algebra like this to sequence.

Dimension based on local space time's stream shape is the fundamental assumption of d, C _{I, n}Can think one group of substrate of this local space time's stream shape, therefore select the special substrate of this group of major component.Use principal component analysis (PCA) (PCA) to matrix W _{I, n}Decompose matrix W _{I, n}Preceding d major component (PC) the structrual description C that is local space time stream shape _{I, n}

Structrual description C according to the local space time's stream shape that has obtained _{I, n}, use C _{I, n}Again represent space-time cube sequence V _i={ v _{I, 1}, v _{I, 2}..., v _{I, n}, obtain the corresponding stream shape attitude Z of local space time of each space-time cube _i={ z _{I, 1}, z _{I, 2}..., z _{I, n}.

Suppose that the corresponding local space time's stream shape attitude of space-time cube sequence satisfies linear change, the so whole stream shape attitude sequence Z of local space time _i={ z _{I, 1}, z _{I, 2}..., z _{I, n}, NextState z _{I, n}Be through laststate z _{I, n-1}Multiply by state variation and describe A _{I, n}Calculate.Therefore, find the solution state variation and describe A _{I, n}Can calculate through linear fit.A _{I, n}Be expressed as

\min_{A_{i, n}} | | [\begin{matrix} z_{i, 2} & z_{i, 3} & \cdot \cdot \cdot & z_{i, n} \end{matrix}] - A_{i, n} [\begin{matrix} z_{i, 1} & z_{i, 2} & \cdot \cdot \cdot & z_{i, n - 1} \end{matrix}] | |

Calculating A _{I, n}After, calculate local spacetime manifolds state variation is carried out the error E behind the linear fit _i,

E _i=[z _i，2?z _i，3?…?z _i，n]-A _i，n[z _i，1?z _i，2?…?z _i，n-1]

=B _i，n[ε _i，1ε _i，2…ε _i，n-1]

So error E _iCan think by production noise ε _i～N (0,1) generates, and has supposed that the dimension of production noise in local space time's stream shape attitude variation is d simultaneously in front _ε, so, error E _iCan continue by the principal component analysis (PCA) dimensionality reduction, i.e. the preceding d of Select Error E _εIndividual major component is as production noise B in local space time's stream shape attitude variation of noise _{I, n}

In practical application, the background scene in the monitor video exists multiple situation.For unit area static in the video scene; Like ceiling, sky, ground etc.; The residing local space time of their space-time cube sequence stream shape attitude remains unchanged basically, and the structure of their local space time stream shape all should use low dimension parameter to describe with changing so.For the unit area that has dynamic background in the video scene, as the trees that wave, the water surface of fluctuation; The curtain of swing etc.; Their the residing local space time of space-time cube sequence manifold structure is complicated, and state variation is also comparatively complicated, and should use the parametric description of higher-dimension this moment.

Therefore, in system, the dimension of the production noise of confirming according to the size adaptation ground of the eigenwert of space-time cube sequence extension matrix to exist in the state variation of dimension and local spacetime manifolds of local space time's stream shape of this space-time cube sequence.Promptly to sequence matrix W _{I, n}, error matrix E _iThe eigenwert setting threshold of principal component analysis (PCA), select then all eigenwerts greater than the major component of threshold value as W _{I, n}, the substrate of E.Promptly

d ^*=arg?max∑ ^d>T _d

d_{&Element;}^{*} = \arg \max Σ_{&Element;}^{d} > T_{d_{&Element;}}

∑ wherein ^d,

Correspond respectively to sequence matrix W _{I, n}, error matrix E _iD eigenwert.The benefit that adopts the adaptive local spacetime manifolds to describe is to have simplified the model for the zone of the stationary unit in the input video, has reduced the calculated amount in these unit areas of online updating maintenance phase of moving Object Segmentation and model.

(3) safeguard based on the moving Object Segmentation of local space time's stream shape and the online updating of model

According to training the local space time's stream shape that obtains to realize online moving Object Segmentation.According to the background parts that obtains after cutting apart, local spacetime manifolds is carried out updating maintenance then.

Provide local space time's stream shape that a training obtains, can predict the cubical state of next space-time according to the Changing Pattern that shape is flowed by local space time, the next one of rebuilding prediction by the state of prediction again is the cubical proper vector of space-time.Promptly

{\hat{Z}}_{i, n + 1} = A_{i, n} z_{i, n} + B_{i, n} {&Element;}_{i, n}

{\hat{v}}_{i, n + 1} = C_{i, n} z_{i, n}

In system, z _{I, n}The nearest local space time stream shape attitude that the expression off-line training calculates, ε _{I, n}Be assumed that the normal distribution of obedience standard.

The cubical proper vector of space-time that obtains according to prediction is through the proper vector that obtains and new cubical proper vector of space-time of comparison prediction.Can determine that those pixels are moving target in the new space-time cube, those pixels are background.In system,, used the mode of simple setpoint distance threshold value to cut apart moving target in order to guarantee the operational efficiency of partitioning algorithm.The Euclidean distance of the proper vector of this pixel that the pixel characteristic of correspondence of just ought newly arriving vector and prediction are come out is less than Threshold Segmentation T _pThe time, assert that this pixel is a background dot, otherwise regard as the moving target point.With background dot assignment in the segmentation result image of two-value is 0, is 255 with moving target point assignment in result images, has just obtained the moving Object Segmentation result of new t two field picture (time span enough is divided into the space-time cube).

Because the background scene of the video of input may change along with change of time, so the local space time of each unit area stream shape also might change.So be necessary each local space time's stream shape of online updating maintenance, can describe nearest structure and Changing Pattern to guarantee local space time's stream shape.

If but directly flowing the sample of shape to new space-time cube as upgrading local space time, the moving target part in these space-time cubes will be sneaked into to be updated in the middle of the background model.That is to say that sport foreground will be polluted background model if new space-time cube is not distinguished.So, get rid of the pollution of a part for background model through setting a simple threshold value of upgrading.The Euclidean distance of the proper vector of this pixel that comes out when the proper vector of a new pixel and prediction is less than upgrading threshold value T _uThe time, accept this proper vector as new samples more, otherwise the proper vector of using prediction is as new samples more.

Because the various noises that possibly exist in the linear change hypothesis of local space time's stream shape and the input video can't guarantee that all backgrounds can both be in local space time's stream shape.Can't guarantee can being updated in local space time's stream shape of these background dots so upgrade threshold value.Equally, upgrade threshold value and can't guarantee that also the very similar moving target of those and background can be left out, so in native system, use the updating maintenance method of increment type to divide the model pollution problems of bringing to alleviate the wrong branch of moving target, to leak.

Renewal for local space time's stream shape was divided into for two steps: the structure of 1, upgrading local space time's stream shape; 2, upgrade the variation of local space time's stream shape.

In system, the mode that the renewal threshold value is combined with CCIPCA is the operational efficiency that in the updating maintenance of local space time's stream shape, had both guaranteed system, and the local space time's stream shape after having guaranteed again to upgrade can be described the local space time's manifold structure after the variation well.

The update algorithm of CCIPCA more new samples is regarded as the power on acting at the bottom of the former manifold base; Through iterative calculating more new samples realize upgrading at the bottom of the manifold base of increment type in each suprabasil projection of former stream shape and each substrate corresponding according to this projection deflection.

After renewal finishes the structure of local space time stream shape (by C _{I, n}Be updated to C _{I, n+1}), can go out the state of the local space time's stream shape after its corresponding renewal by new samples backwards calculation more, promptly

z_{i, n + 1} = C_{i, n + 1}^{T} {\hat{v}}_{i, n + 1}

A is described in the variation that obtains local space time's stream shape through linear fit _{I, n}The river is in the online updating stage, the A after hoping to upgrade _{I, n+1}The Changing Pattern of nearest local space time's stream shape can better be described.This is because hope to pass through A _{I, n+1}The NextState of local space time's stream shape that prediction is come out is accurate as much as possible.So A is described in the variation of in algorithm, a nearest L state being carried out the local space time stream shape after match obtains upgrading _{I, n+1}

At last, recomputate the linear fit error E _i, and then E is carried out principal component analysis (PCA) extract preceding d _εThe production noise B of the local space time's stream shape after individual major component obtains upgrading _{I, n+1}

Fig. 3 is the moving Object Segmentation result: first frame classified as in the former video (comprises crowded public arena (scene 1); Dynamic texture (scene 2,3,5); Unexpected illumination variation (scene 4)); The moving Object Segmentation result (groundtruth) that secondary series hand mark obtains, the 3rd classifies our result of calculation based on the moving Object Segmentation method of local space time's manifold learning as.

Claims

1. moving Object Segmentation method based on local space time's manifold learning is characterized in that may further comprise the steps:

1) input off-line video;

4) input motion video;

7) according to background parts background model is upgraded and safeguarded;

8) the new sport video of input then skips to step 4), otherwise finishes.

2. according to the said moving Object Segmentation method of claim 1, it is characterized in that step 2 based on local space time's manifold learning) may further comprise the steps:

3. according to the said moving Object Segmentation method of claim 2, it is characterized in that step 22 based on local space time's manifold learning) the off-line video is carried out symmetrical space-time local grain coding characteristic extract and may further comprise the steps:

221) get the arbitrary pixel of space-time cube, face at the three-dimensional cube of the 3*3*3 of pixel and calculate its proper vector in the territory, be divided into the space-time local grain coding that pixels are extracted on 4 space-time planes to the three-dimensional cube of 3*3*3;

4. according to the said moving Object Segmentation method of claim 2, it is characterized in that step 3) may further comprise the steps based on local space time's manifold learning:

5. according to the said moving Object Segmentation method of claim 1 based on local space time's manifold learning; It is characterized in that said step 6) flows shape according to local space time and predicts next space-time cubical local space time stream shape attitude and proper vector; Set the threshold value of judging distance; If the Euclidean distance of the proper vector that certain pixel characteristic of correspondence vector and prediction are come out in the sport video is less than threshold value then be background parts, otherwise be the moving target point.

6. according to the said moving Object Segmentation method of claim 5, it is characterized in that it is to adopt to set the incrementally updating that upgrades threshold value and CCIPCA algorithm combination that said step 7) is upgraded background model according to background parts based on local space time's manifold learning.