CN104766343A

CN104766343A - Vision target tracking method based on sparse representation

Info

Publication number: CN104766343A
Application number: CN201510142274.9A
Authority: CN
Inventors: 解梅; 张碧武; 何磊; 卜英家
Original assignee: University of Electronic Science and Technology of China
Current assignee: Houpu Clean Energy Group Co ltd
Priority date: 2015-03-27
Filing date: 2015-03-27
Publication date: 2015-07-08
Anticipated expiration: 2035-03-27
Also published as: CN104766343B

Abstract

The invention discloses a vision target tracking method based on sparse representation and belongs to the technical field of computer vision. The method includes the steps that firstly, a judgment dictionary, a matching dictionary and a gray level matrix T which are needed for following tracking processing are determined based on target images, image frames to be tracked are tracked, multiple first candidate image sample sets are sampled, K representative cluster centers are selected through K mean value clustering, the confidence values of the cluster centers are calculated based on the judgment dictionary, a sample center is set based on the largest confidence value so as to acquire second candidate image sample sets, N candidate images with the highest confidence values are selected from the sets, a gray level matrix of the candidate images is sampled on the basis of the matching dictionary and through fragments, and the candidate image with the highest similarity to the gray level matrix T serves as the tracking target of the current frame. The method is applied to the intelligent monitoring field and has the high robustness on the aspects of tracked target posture change, environment illumination change and blockage and the like.

Description

A kind of visual target tracking method based on rarefaction representation

Technical field

The invention belongs to computer vision field, be specifically related to field of intelligent monitoring, particularly a kind of visual target tracking based on rarefaction representation.

Background technology

The target following technology (generally referring to the target following based on video or image sequence) of view-based access control model, target object is detected, extract, identify and follows the tracks of in a series of image, thus obtain the correlation parameter of target object, as position, speed, yardstick, track etc.; Carry out processing and analyzing according to tracking results further, realize the behavior understanding to target object, or complete more higher leveled task.This is a hot fields rapidly emerged, and belongs to the research category of computer vision field, is with a wide range of applications and scientific research value.Through the evolution in nearly twenty or thirty year, there is various track algorithm both at home and abroad, comprise the algorithm of some classics and the novelty algorithm based on these algorithms.

Different according to the philosophy and technique of track algorithm, these track algorithms can be divided into the tracking of discrimination model and the tracking of generation model.Track algorithm based on discrimination model regards tracking problem as classification problem, especially in monotrack, tracking problem is regarded as two classification problems, more classical sorter has SVM (Support Vector Machine) sorter, the sorter of Adaboost.The algorithm based on sorter comparatively popular recently has CSK, Struct, CT scheduling algorithm.For the track algorithm based on generation model, the object of following the tracks of is in neighborhood, search out the region maximum with target similarity, be actually and find the region maximum with target similarity function, the i.e. state of optimum estimation target, wherein more effective classic algorithm has Kalman filtering, particle filter, comparatively popular recently have MTT, LSK scheduling algorithm.

Sparse track algorithm is development in recent years more a kind of emerging track algorithm, belongs to the tracking of generation model.Because algorithm is the net result obtained based on a series of template base, therefore for illumination variation, complex environment and attitude change etc. all have good robustness, as document " X.Mei and H.Ling. " Robust visual tracking CN 103440645A using L1minimization " .12th International Conference on Computer Vision, Kyoto, Japan, 2009 (1436-1443). " but, because template base often adopts the integral mold plate of target as feature, therefore not good to the situation treatment effect of target occlusion.

Above-mentioned algorithm can carry out the real-time follow-up of robust mostly in simple scene.But tracking target remains a significant challenge in unrestricted daily video.Illumination, change in size, block, target rotation, complex background etc. be all the factor bringing difficulty.Therefore, how designing the track algorithm that a robustness is stronger, can obtain performance preferably when condition is comparatively harsh, is the important problem of computer vision field one.

Summary of the invention

The present invention proposes a kind of Vision Tracking based on rarefaction representation, to tracking target attitudes vibration, ambient lighting change and the situation such as to block have very strong robustness, and can improve the situation treatment effect to target occlusion.

The present invention includes two parts, first the judgement dictionary D of tracking target, coupling dictionary A and gray matrix T is configured to, then in candidate image, the candidate image the most similar to tracking target is searched with gray matrix T based on judgement dictionary D, coupling dictionary A in the current frame, determine the target image of current institute tracking target, its detailed process is:

Step 1: determine initialisation image and determine the target image of described initialisation image, individual in this step, usually the first two field picture getting video or image sequence is initialisation image, and determine the target image (rectangular image block) of initialisation image, i.e. To Template, can obtain target image (target sizes, position coordinates) in the picture according to the initialized location of initialisation image manual setting.

Step 2: generate judgement dictionary D, coupling dictionary A and To Template gray matrix T based on current target image:

In initialisation image, determine distance objective picture centre P1 (preset value, occurrence rule of thumb sets according to different processing environments, usually 1 ~ 10 is set to) rectangular area of individual pixel is the sample area of foreground template, determine distance objective picture centre P2 (preset value, and P2 is greater than P1, occurrence rule of thumb sets according to different processing environments, usually the difference arranging P2 and P1 is 1 ~ 5) empty set of the rectangular area of individual pixel and the sample area of foreground template is the sample area (namely round the three-back-shaped region of the sample area of foreground template) of background template, stochastic sampling is carried out to the sample area of foreground template, and selects N _p(preset value, occurrence rule of thumb sets according to different processing environments, and usual value is [3060]) individual sampled images equal with target image size is as foreground template D ₊, stochastic sampling is carried out to the sample area of background template, and selects N _n(preset value, occurrence rule of thumb sets according to different processing environments, and usual value is set to [200300]) the individual sampled images equal with target image size template D as a setting _-, by foreground template D ₊with background template D _-form judgement dictionary D,

(w*h is of a size of based on the rectangular slide window preset, its wide w, the value of high h is preset value, occurrence rule of thumb sets according to different processing environments, usually can based target image size arrange, such as w can be set to 1/8 of target image width, 1/16, 1/32, 1/64 etc., h can be set to 1/8 of target image height, 1/16, 1/32, 1/64 etc.), target image carries out a left side, right, on, lower slider is sampled, and left and right sampling interval is identical, upper and lower sampling interval is identical (to be generally and can be set to w/2, upper and lower sampling interval can be set to h/2, other setting examples can certainly be carried out as w/4, h/4 etc., depend on that the fineness of calculating requires), obtain the fragment collection of target image, and choose K fragment (the individual representative fragment of K) based on K means clustering method from described fragment is concentrated, coupling dictionary A is obtained by K fragment composing training collection, wherein K is the general term of K means clustering method, for preset value, occurrence rule of thumb sets according to different processing environments, usual span is [50 100], when K means clustering method described below calculates, the value of K can arrange different value, usually span is [50 100], arrange the gray matrix T of the gray-scale value of each fragment of the fragment collection being used for stored target image, in described gray matrix T, the gray-scale value of each fragment is: the cumulative sum of each point pixel value of fragment.

Step 3: based on the judgement dictionary D built, coupling dictionary A and gray matrix T, determine the target location of each subsequent frame image of initialisation image, namely target following process is carried out to non-first two field picture:

301: the sampling of candidate's particle and the pre-of candidate's particle are chosen in advance

First be the sampling of candidate's particle: sampling to current frame image, (present treatment process directly completes based on conventional techniques, the present invention is not construed as limiting, usually sampling can be completed by particle filter mode, the number of particles of sampling is specifically determined according to actual conditions, generally can select about 500), obtain the first candidate image sample set, then choose K cluster centre based on K mean cluster from the first candidate samples is concentrated, and calculate the value of the confidence H of each cluster centre _c:

Corresponding sparse coefficient is calculated based on formula (1):

\frac{\min}{α} {| | X - Dα | |}_{2}^{2} + μ {| | α | |}_{1} - - - (1)

Wherein D is judgement dictionary (specifically finger foreground template D ₊(also can claim foreground template dictionary D ₊) and background template D _-(also can claim background template dictionary D _-)), X is certain sample (specifically referring to each candidate image in the present invention), and can try to achieve sparse coefficient α by L1 optimization problem, wherein μ is the predetermined coefficient of rarefaction representation formula, and occurrence rule of thumb sets according to different experimental situations.‖. ‖ ₁, ‖. ‖ ₂represent 1 norm, 2 norms of corresponding vector respectively.

Because if a sample has little reconstructed error just to represent on foreground template collection, this sample is likely a target.Otherwise, if a sample has little reconstructed error just to represent that this sample is likely background on background template collection.Therefore, the value of the confidence H of sample can be constructed according to the reconstructed error of sample on foreground template collection and the reconstructed error on background template collection _c:

H _c＝exp(-ε _f+ε _b) (2)

In above formula, that sample (candidate image) X is at foreground template D ₊on reconstructed error, a ₊that candidate image X corresponding to cluster centre is based on foreground template D ₊sparse coefficient (according to formula (1) calculate solve), that sample X is at background template D _-on reconstructed error, a _-that candidate image X corresponding to cluster centre is based on foreground template D _-sparse coefficient (according to formula (1) calculate solve).

Getting the cluster centre that in K cluster centre, the value of the confidence is maximum is this sampling of candidate's sampled images center, carries out sampling obtain the second candidate image sample set based on described candidate image specimen sample center.

Next is choosing in advance candidate's particle, and this process mainly by choosing in advance, removes candidate's particle that a large amount of departure ratio is larger, to reduce candidate's scope further.It is treated to: to the second candidate image sample set, the value of the confidence of each candidate image is calculated according to formula (1) and (2), therefrom choose the front N (empirical value that the value of the confidence is the highest again, span can be set to [20 60] usually) individual candidate as new candidate image, process further after treating.Directly not selecting the highest candidate image of the value of the confidence as target herein, is consider that the candidate that the value of the confidence is the highest may not be optimal candidate because the reason of blocking, need through blocking process and could select optimal candidate below.

Consider in follow-up object tracking process, target is likely blocked by background, if do not carry out blocking process, likely in tracing process, target is treated as background, thus the tracking results made the mistake, therefore, be necessary to carry out blocking process, make the present invention also have good robustness to circumstance of occlusion, so-called blocking is treated to:

First in the mode of sliding window, fragment sampling (needing to record the coordinate position of fragment) is carried out to each new candidate image, the mode of sampling is identical to the sample mode of target image with step 2, the fragment that sampling obtains contains each local of this candidate image, then to each fragment based on coupling dictionary A, utilize formula (1) to calculate corresponding sparse coefficient β _i, then based on formula calculate each fragment y _ireconstructed error ε _i(subscript i is for identifying different fragments).For the fragment belonging to background, its reconstructed error can be comparatively large, and belong to the fragment of prospect, its reconstructed error can be smaller, if a threshold epsilon ₀(span is generally [0.3 0.6]), judges the attribute of each fragment of candidate image successively, namely works as ε _i> ε ₀, illustrate that this fragment belongs to background; Work as ε _i≤ ε ₀time, illustrate that this fragment belongs to prospect.Then set up one and record the attribute matrix of the attribute of each fragment successively (such as, objective definition image or the fragment corresponding to candidate image add up to m*n, wherein m is the fragment sum of often going, n is the fragment sum often arranged, then attribute matrix is the two-dimensional matrix on m*n rank, can certainly store successively by row vector or column vector, just need to limit target image consistent with the storage mode that candidate image is taked, thus realize matching treatment), namely when fragment belongs to prospect, 1 is set in this matrix correspondence position value, when belonging to background, its value is set to 0, like this 0-1 matrix is set up to each candidate.In the same way 0-1 matrix is set up as To Template (To Template attribute matrix) to the target in initialisation image (the first two field picture), then each candidate mated with template, the candidate image that matching degree is the highest is the target image of present frame.If directly the attribute matrix corresponding to based target image and the attribute matrix corresponding to candidate image carry out matched jamming target, then can generate To Template gray matrix T in step 2, only generate target objective attribute target attribute matrix corresponding to target image based on foregoing description.

Concrete solid yardage method is as follows:

To each new candidate image c, first the rectangular slide window of a w*h size is established, utilize this sliding window in the enterprising line slip sampling of candidate image c, obtain the fragment collection of candidate image c, based on coupling dictionary A (also can be described as sparse dictionary A), solve according to formula (3) and obtain each fragment y _isparse coefficient vector β _i:

\frac{\min}{β_{i}} {| | y_{i} - {Aβ}_{i} | |}_{2}^{2} + μ {| | β_{i} | |}_{1} - - - (3)

Replace with coupling dictionary A by the judgement dictionary in formula (1), sample corresponds to each fragment.

Basis again calculate each fragment y _ireconstructed error ε _i, and based on reconstructed error ε _iwith predetermined threshold value ε ₀size magnitude relationship the property value of each fragment is set, the property value retelling each fragment is recorded in attribute matrix S successively _cin.

If directly the attribute matrix corresponding to based target image and the attribute matrix corresponding to candidate image carry out matched jamming target, then direct by N number of attribute matrix S _ccarry out matching treatment with target objective attribute target attribute matrix S respectively, get the attribute matrix S that matching degree is the highest _ccorresponding candidate image is the target image of present frame.

Meanwhile, the present invention the similarity of the gray matrix of based target target and the gray matrix corresponding to new candidate image can also follow the tracks of the target image of present frame.It is specifically treated to, namely first need in step 2, based on the target image determination To Template gray matrix T (successively the gray-scale value of each fragment of stored target image) of initialisation image, then blocking in process, also needing for each new candidate image arranges a gray matrix F ^c, for depositing the gray-scale value of each fragment of new candidate image c, in order to simplify calculating, based on the attribute matrix S of each candidate image _cto its gray matrix F ^cbe normalized, namely work as S _cin property value when being 0, then by F ^cthe value of middle correspondence position is set to 0; Work as S _cin property value when being 1, F ^cthe value of middle correspondence position is the gray-scale value of corresponding fragment, i.e. the cumulative sum of each point pixel value of this fragment.Finally, the gray matrix F the most similar to gray matrix T is got ^ccorresponding new candidate image c is as the target image of present frame.

Further, can according to the similarity L of formula (4) calculated candidate image and target image _c:

L_{c} = Σ_{i} Σ_{j} \min (F_{ij}^{c}, T_{ij}) / M - - - (4)

Gray-scale value summation a little that M is target image (To Template gray matrix T), by calculated value normalization. represent the gray matrix F of candidate image c ^ceach element, T _ijrepresent each element of To Template gray matrix T.The similarity of trying to achieve through above formula, in [0,1] scope, finds out similarity L in all candidates _ca maximum candidate, the matching degree of this candidate image and target image is the highest, therefore, can using the tracking target (target image) of this candidate image as present frame.

In order to keep the real-time of following the tracks of process, after having determined the target image of present frame, arrange update process, its concrete update process mode can be:

Based on the target image of present frame, according to step 2 couple foreground template D ₊, background template D _-, coupling dictionary A and To Template attribute matrix S or To Template gray matrix T upgrade;

Or, based on the target image of present frame, according to step 2, coupling dictionary A and gray matrix T is upgraded; And every 5-10 frame, based on the target image of present frame, carry out stochastic sampling according to step 2 in the sample area of background template, select the individual sampled images equal with target image size of N ', join background template D _-in, and delete background template D _-in from the longest individual sampled images of N ' of present frame interval time, wherein 1≤'≤≤ _n(in tracing process below, tracking environmental is in continuous change, and tracking target change is little, but change of background is comparatively large, therefore a consideration cycle can carry out update process to background template); Further, the target image of present frame can also be joined foreground template D ₊in, and delete foreground template D ₊in from the longest sampled images of present frame interval time.

In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows: to tracking target attitudes vibration, ambient lighting change and the situation such as to block have very strong robustness, and can improve the situation treatment effect to target occlusion.

Accompanying drawing explanation

The present invention illustrates by embodiment and with reference to the mode of accompanying drawing, wherein:

Fig. 1 is the processing procedure schematic diagram of the specific embodiment of the invention;

Fig. 2 is the foreground template (positive template) of the specific embodiment of the invention, the sampling schematic diagram of background template (negative norm plate);

Fig. 3 is that the fragment of the specific embodiment of the invention obtains exemplary plot.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail.

See Fig. 1, for the current frame image of reality, first judge whether present frame is the first frame, if the first frame, then need according to target image information (the target position in the picture in the first frame, the information such as size (length and width are respectively W, H) obtain in advance) ask for judgement dictionary (foreground template and background template) required in tracking of the present invention, coupling dictionary and out of Memory, specifically: for the first two field picture, first the sample area shown in figure bis-carries out palette sample, obtains positive and negative template as training set.Positive template is some images of mode stochastic sampling in the target location indicated with central rectangular frame by upper and lower, left and right translation 1-5 pixel, then chooses N by the mode of K mean cluster _pindividual cluster centre is as positive template (foreground template).Negative norm plate is relatively remote some images of position sampling in the position of range mark, utilizes the method for K mean cluster to choose N equally _nindividual image (namely in a back-shaped region of distance objective center certain distance, size is normalized to W*H equally), these sampled images are as negative norm plate (background template).

Also need to calculate simultaneously and obtain coupling dictionary A and gray matrix T in the first frame, namely correspondence calculating to be carried out by obtaining the fragment of To Template, its method is: the rectangular slide window first establishing a w*h size, utilize this sliding window left and right on target image, upper and lower carry out slip sampling, left and right sampling interval is w/2, upper and lower sampling interval is h/2, if the height and width of target image are respectively H and W, then number of samples is [W/ (w/2) – 1] * [H/ (h/2) – 1] (as shown in Figure 3, its number of samples is 8*8).The same K means clustering method that first utilized by these fragments asks for K the representational fragment of most, and the mode of then stacking by row piles column vector composing training collection, utilizes approximatioss (such as convex relaxation method) to generate judgement dictionary A.For To Template gray matrix for point each in T, all correspond to the gray-scale value of a local fragment of target image.

If present frame is not the first two field picture, then need the target location (determining the target image of present frame) estimating present frame.First particle filter mode is utilized to sample to current frame image, obtain many candidate images as the first candidate image sample set, then K mean cluster is utilized to choose K cluster centre, the value of the confidence of each cluster centre is calculated, center of alternatively being sampled by maximum for the value of the confidence cluster centre according to formula (1) and (2).Then the candidate image of some is again obtained in particle filter mode according to this sampling center as the second candidate image sample set.Calculate the value of the confidence to each candidate image in the second candidate image sample set, get the highest front 20 candidates of the value of the confidence as new candidate image.

To each in 20 new candidate images, the mode obtained according to fragment in the first frame obtains fragment, to each fragment, utilizes coupling dictionary A to ask the reconstructed error ε of each fragment _i, work as ε _i> ε ₀, illustrate that this fragment belongs to background; Work as ε _i≤ ε ₀time, illustrate that this fragment belongs to prospect.By this fragment at attribute matrix S _cthe value of correspondence position is set to 1.Through double counting, for each candidate image, the attribute matrix S of its correspondence can be obtained _c.Then similar with the acquiring method of gray matrix T in the first frame, ask for the gray matrix of each candidate image but here with the first frame unlike needs in conjunction with attribute matrix S _c, be namely that in gray matrix, the Normalized Grey Level value of corresponding point is also 0 when in attribute matrix, the property value of certain point is 0, when property value is 1, Normalized Grey Level value is constant.With reference to Fig. 3, for the total number of fragment for 8*8, the attribute matrix S of its correspondence _c, To Template gray matrix T and new candidate image gray matrix F ^cthen be the two-dimensional matrix into 8*8, if the property value of the fragment of (3,4) is 0, then attribute matrix S _cmiddle S ₃₄value be 0, and gray matrix F ^cmiddle F ₃₄value be also 0, if the property value of the fragment of (4,2) is 1, then attribute matrix S _cmiddle S ₄₂value be 1, and gray matrix F ^cmiddle F ₄₂value be that each point pixel value of fragment (4,2) is added the summation of trying to achieve.

Finally according to the similarity of formula (4) calculated candidate and template, using the target image of the highest for a wherein similarity candidate image as present frame, and this target image is utilized to upgrade foreground template, background template, coupling dictionary A and To Template gray matrix T.

Claims

1. based on a visual target tracking method for rarefaction representation, it is characterized in that, comprise the following steps:

Step 1: determine initialisation image and determine the target image of described initialisation image;

Step 2: generate foreground template D based on current target image ₊, background template D _-, coupling dictionary A and To Template attribute matrix S:

In described initialisation image, determine that the rectangular area of a distance objective picture centre P1 pixel is the sample area of foreground template, determine that the empty set of the rectangular area of a distance objective picture centre P2 pixel and the sample area of foreground template is the sample area of background template, wherein P1, P2 position preset value, and P2 is greater than P1; Stochastic sampling is carried out to the sample area of foreground template, and selects N _pthe individual sampled images equal with target image size is as foreground template D ₊; Stochastic sampling is carried out to the sample area of background template, and selects N _nthe individual sampled images equal with target image size template D as a setting _-;

Based on the rectangular slide window preset, target image carries out left and right, upper and lower slip sampling, and sampling interval is identical up and down, left and right sampling interval is identical, obtain the fragment collection of target image, and to concentrate from described fragment based on K means clustering method and choose K fragment, obtained mating dictionary A by described K fragment composing training collection; And each fragment y of fragment collection is calculated based on coupling dictionary A _icorresponding sparse coefficient β _i, subscript i for identifying different fragments, and based on formula calculate fragment y _ireconstructed error ε _i; If ε _ibe greater than predetermined threshold value ε ₀, then fragment y is set _iproperty value be 0, otherwise be set to 1; Offered target template attributes matrix S, for depositing the attribute of each fragment of described fragment collection successively;

Step 3: the target location determining each subsequent frame image of initialisation image:

301: current frame image is sampled, obtain the first candidate image sample set, choose K cluster centre based on K mean cluster from described first candidate samples is concentrated, and calculate the value of the confidence H of each cluster centre _c; Getting the cluster centre that in K cluster centre, the value of the confidence is maximum is this sampling of candidate's sampled images center, and carrying out sampling based on described candidate image specimen sample center obtains the second candidate image sample set, and calculates the value of the confidence H that the second candidate samples concentrates each candidate samples _c, get the highest top n candidate image of the value of the confidence as new candidate image; Size and the target image of described first candidate image sample set and each image pattern in the second candidate image sample set are consistent;

Described the value of the confidence H _ccomputing formula be H _c=exp (-ε _f+ ε _b) calculate the value of the confidence H of each cluster centre _c, prospect reconstructed error wherein a ₊the candidate image X of expression corresponding to cluster centre is based on foreground template D ₊sparse coefficient; Background reconstruction error wherein a _-the candidate image X of expression corresponding to cluster centre is based on foreground template D _-sparse coefficient;

302: based on the slip sample mode identical with the fragment collection obtaining target image in step 2, obtain the fragment collection of new candidate image c, and calculate each fragment y based on coupling dictionary A _icorresponding sparse coefficient β _i, subscript i for identifying different fragments, and based on formula calculate fragment y _ireconstructed error ε _iif, ε _ibe greater than predetermined threshold value ε ₀, then fragment y is set _iproperty value be 0, otherwise be set to 1;

303: set a property matrix S _c, for depositing the attribute of each fragment of new candidate image c successively;

304: get the attribute matrix S mated most with To Template attribute matrix S _ccorresponding new candidate image c is as the target image of present frame.

2. the method for claim 1, it is characterized in that, in described step 2, do not carry out the process by generating To Template attribute matrix S based on current target image, directly generate To Template gray scale attribute matrix T based on current target image: arrange the gray matrix T being used for the gray-scale value of each fragment of the fragment collection of stored target image successively, in described gray matrix T, the gray-scale value of each fragment is: the cumulative sum of each point pixel value of fragment;

And also comprise in described step 303: gray matrix F is set ^cfor depositing the gray-scale value of each fragment of new candidate image c successively, the gray-scale value of described fragment is the cumulative sum of each point pixel value of fragment, and based on attribute matrix S _cto gray matrix F ^cvalue adjust: if S _cin property value be 0, then by F ^cthe value of middle correspondence position is set to 0;

Described step 304 is: get the gray matrix F the most similar to To Template gray matrix T ^ccorresponding new candidate image c is as the target image of present frame.

3. method as claimed in claim 2, is characterized in that, in described step 304, calculates To Template gray matrix T and gray matrix F ^csimilarity L _cformula be wherein, M represents the summation of each element of gray matrix T, the element position of i and j representing matrix.

4. the method as described in claim 1,2 or 3, is characterized in that, described step 3 also comprises 305: based on the target image of present frame, according to step 2 couple foreground template D ₊, background template D _-, coupling dictionary A and To Template attribute matrix S or To Template gray matrix T upgrade.

5. the method as described in claim 1,2 or 3, is characterized in that, described step 3 also comprises 305: based on the target image of present frame, upgrades coupling dictionary A and gray matrix T according to step 2; And every 5-10 frame, based on the target image of present frame, carry out stochastic sampling according to step 2 in the sample area of background template, select the individual sampled images equal with target image size of N ', join background template D _-in, and delete background template D _-in from the longest individual sampled images of N ' of present frame interval time, wherein 1≤N '≤N _n.

6. method as claimed in claim 5, it is characterized in that, described step 305 also comprises, and every 5-10 frame, the target image of present frame is joined foreground template D ₊in, and delete foreground template D ₊in from the longest sampled images of present frame interval time.

7. the method for claim 1, is characterized in that, in described step 2, the value of P1 be 1 ~ 10, P2 and P1 difference be 1 ~ 5.

8. the method for claim 1, is characterized in that, described initialisation image is the first two field picture of video image or image sequence.

9. the method for claim 1, is characterized in that, when carrying out slip sampling based on the rectangular slide window preset, left and right sampling interval is w/2, and upper and lower sampling interval is h/2, and wherein w represents the wide of rectangular slide window, and h represents the height of rectangular slide window.

10. the method for claim 1, is characterized in that, in described step 301, samples based on particle filter, obtains described first candidate image sample set and the second candidate image sample set.