Summary of the invention
The present invention proposes a kind of Vision Tracking based on rarefaction representation, to tracking target attitudes vibration, ambient lighting change and the situation such as to block have very strong robustness, and can improve the situation treatment effect to target occlusion.
The present invention includes two parts, first the judgement dictionary D of tracking target, coupling dictionary A and gray matrix T is configured to, then in candidate image, the candidate image the most similar to tracking target is searched with gray matrix T based on judgement dictionary D, coupling dictionary A in the current frame, determine the target image of current institute tracking target, its detailed process is:
Step 1: determine initialisation image and determine the target image of described initialisation image, individual in this step, usually the first two field picture getting video or image sequence is initialisation image, and determine the target image (rectangular image block) of initialisation image, i.e. To Template, can obtain target image (target sizes, position coordinates) in the picture according to the initialized location of initialisation image manual setting.
Step 2: generate judgement dictionary D, coupling dictionary A and To Template gray matrix T based on current target image:
In initialisation image, determine distance objective picture centre P1 (preset value, occurrence rule of thumb sets according to different processing environments, usually 1 ~ 10 is set to) rectangular area of individual pixel is the sample area of foreground template, determine distance objective picture centre P2 (preset value, and P2 is greater than P1, occurrence rule of thumb sets according to different processing environments, usually the difference arranging P2 and P1 is 1 ~ 5) empty set of the rectangular area of individual pixel and the sample area of foreground template is the sample area (namely round the three-back-shaped region of the sample area of foreground template) of background template, stochastic sampling is carried out to the sample area of foreground template, and selects N
p(preset value, occurrence rule of thumb sets according to different processing environments, and usual value is [3060]) individual sampled images equal with target image size is as foreground template D
+, stochastic sampling is carried out to the sample area of background template, and selects N
n(preset value, occurrence rule of thumb sets according to different processing environments, and usual value is set to [200300]) the individual sampled images equal with target image size template D as a setting
-, by foreground template D
+with background template D
-form judgement dictionary D,
(w*h is of a size of based on the rectangular slide window preset, its wide w, the value of high h is preset value, occurrence rule of thumb sets according to different processing environments, usually can based target image size arrange, such as w can be set to 1/8 of target image width, 1/16, 1/32, 1/64 etc., h can be set to 1/8 of target image height, 1/16, 1/32, 1/64 etc.), target image carries out a left side, right, on, lower slider is sampled, and left and right sampling interval is identical, upper and lower sampling interval is identical (to be generally and can be set to w/2, upper and lower sampling interval can be set to h/2, other setting examples can certainly be carried out as w/4, h/4 etc., depend on that the fineness of calculating requires), obtain the fragment collection of target image, and choose K fragment (the individual representative fragment of K) based on K means clustering method from described fragment is concentrated, coupling dictionary A is obtained by K fragment composing training collection, wherein K is the general term of K means clustering method, for preset value, occurrence rule of thumb sets according to different processing environments, usual span is [50 100], when K means clustering method described below calculates, the value of K can arrange different value, usually span is [50 100], arrange the gray matrix T of the gray-scale value of each fragment of the fragment collection being used for stored target image, in described gray matrix T, the gray-scale value of each fragment is: the cumulative sum of each point pixel value of fragment.
Step 3: based on the judgement dictionary D built, coupling dictionary A and gray matrix T, determine the target location of each subsequent frame image of initialisation image, namely target following process is carried out to non-first two field picture:
301: the sampling of candidate's particle and the pre-of candidate's particle are chosen in advance
First be the sampling of candidate's particle: sampling to current frame image, (present treatment process directly completes based on conventional techniques, the present invention is not construed as limiting, usually sampling can be completed by particle filter mode, the number of particles of sampling is specifically determined according to actual conditions, generally can select about 500), obtain the first candidate image sample set, then choose K cluster centre based on K mean cluster from the first candidate samples is concentrated, and calculate the value of the confidence H of each cluster centre
c:
Corresponding sparse coefficient is calculated based on formula (1):
Wherein D is judgement dictionary (specifically finger foreground template D
+(also can claim foreground template dictionary D
+) and background template D
-(also can claim background template dictionary D
-)), X is certain sample (specifically referring to each candidate image in the present invention), and can try to achieve sparse coefficient α by L1 optimization problem, wherein μ is the predetermined coefficient of rarefaction representation formula, and occurrence rule of thumb sets according to different experimental situations.‖. ‖
1, ‖. ‖
2represent 1 norm, 2 norms of corresponding vector respectively.
Because if a sample has little reconstructed error just to represent on foreground template collection, this sample is likely a target.Otherwise, if a sample has little reconstructed error just to represent that this sample is likely background on background template collection.Therefore, the value of the confidence H of sample can be constructed according to the reconstructed error of sample on foreground template collection and the reconstructed error on background template collection
c:
H
c=exp(-ε
f+ε
b) (2)
In above formula,
that sample (candidate image) X is at foreground template D
+on reconstructed error, a
+that candidate image X corresponding to cluster centre is based on foreground template D
+sparse coefficient (according to formula (1) calculate solve),
that sample X is at background template D
-on reconstructed error, a
-that candidate image X corresponding to cluster centre is based on foreground template D
-sparse coefficient (according to formula (1) calculate solve).
Getting the cluster centre that in K cluster centre, the value of the confidence is maximum is this sampling of candidate's sampled images center, carries out sampling obtain the second candidate image sample set based on described candidate image specimen sample center.
Next is choosing in advance candidate's particle, and this process mainly by choosing in advance, removes candidate's particle that a large amount of departure ratio is larger, to reduce candidate's scope further.It is treated to: to the second candidate image sample set, the value of the confidence of each candidate image is calculated according to formula (1) and (2), therefrom choose the front N (empirical value that the value of the confidence is the highest again, span can be set to [20 60] usually) individual candidate as new candidate image, process further after treating.Directly not selecting the highest candidate image of the value of the confidence as target herein, is consider that the candidate that the value of the confidence is the highest may not be optimal candidate because the reason of blocking, need through blocking process and could select optimal candidate below.
Consider in follow-up object tracking process, target is likely blocked by background, if do not carry out blocking process, likely in tracing process, target is treated as background, thus the tracking results made the mistake, therefore, be necessary to carry out blocking process, make the present invention also have good robustness to circumstance of occlusion, so-called blocking is treated to:
First in the mode of sliding window, fragment sampling (needing to record the coordinate position of fragment) is carried out to each new candidate image, the mode of sampling is identical to the sample mode of target image with step 2, the fragment that sampling obtains contains each local of this candidate image, then to each fragment based on coupling dictionary A, utilize formula (1) to calculate corresponding sparse coefficient β
i, then based on formula
calculate each fragment y
ireconstructed error ε
i(subscript i is for identifying different fragments).For the fragment belonging to background, its reconstructed error can be comparatively large, and belong to the fragment of prospect, its reconstructed error can be smaller, if a threshold epsilon
0(span is generally [0.3 0.6]), judges the attribute of each fragment of candidate image successively, namely works as ε
i> ε
0, illustrate that this fragment belongs to background; Work as ε
i≤ ε
0time, illustrate that this fragment belongs to prospect.Then set up one and record the attribute matrix of the attribute of each fragment successively (such as, objective definition image or the fragment corresponding to candidate image add up to m*n, wherein m is the fragment sum of often going, n is the fragment sum often arranged, then attribute matrix is the two-dimensional matrix on m*n rank, can certainly store successively by row vector or column vector, just need to limit target image consistent with the storage mode that candidate image is taked, thus realize matching treatment), namely when fragment belongs to prospect, 1 is set in this matrix correspondence position value, when belonging to background, its value is set to 0, like this 0-1 matrix is set up to each candidate.In the same way 0-1 matrix is set up as To Template (To Template attribute matrix) to the target in initialisation image (the first two field picture), then each candidate mated with template, the candidate image that matching degree is the highest is the target image of present frame.If directly the attribute matrix corresponding to based target image and the attribute matrix corresponding to candidate image carry out matched jamming target, then can generate To Template gray matrix T in step 2, only generate target objective attribute target attribute matrix corresponding to target image based on foregoing description.
Concrete solid yardage method is as follows:
To each new candidate image c, first the rectangular slide window of a w*h size is established, utilize this sliding window in the enterprising line slip sampling of candidate image c, obtain the fragment collection of candidate image c, based on coupling dictionary A (also can be described as sparse dictionary A), solve according to formula (3) and obtain each fragment y
isparse coefficient vector β
i:
Replace with coupling dictionary A by the judgement dictionary in formula (1), sample corresponds to each fragment.
Basis again
calculate each fragment y
ireconstructed error ε
i, and based on reconstructed error ε
iwith predetermined threshold value ε
0size magnitude relationship the property value of each fragment is set, the property value retelling each fragment is recorded in attribute matrix S successively
cin.
If directly the attribute matrix corresponding to based target image and the attribute matrix corresponding to candidate image carry out matched jamming target, then direct by N number of attribute matrix S
ccarry out matching treatment with target objective attribute target attribute matrix S respectively, get the attribute matrix S that matching degree is the highest
ccorresponding candidate image is the target image of present frame.
Meanwhile, the present invention the similarity of the gray matrix of based target target and the gray matrix corresponding to new candidate image can also follow the tracks of the target image of present frame.It is specifically treated to, namely first need in step 2, based on the target image determination To Template gray matrix T (successively the gray-scale value of each fragment of stored target image) of initialisation image, then blocking in process, also needing for each new candidate image arranges a gray matrix F
c, for depositing the gray-scale value of each fragment of new candidate image c, in order to simplify calculating, based on the attribute matrix S of each candidate image
cto its gray matrix F
cbe normalized, namely work as S
cin property value when being 0, then by F
cthe value of middle correspondence position is set to 0; Work as S
cin property value when being 1, F
cthe value of middle correspondence position is the gray-scale value of corresponding fragment, i.e. the cumulative sum of each point pixel value of this fragment.Finally, the gray matrix F the most similar to gray matrix T is got
ccorresponding new candidate image c is as the target image of present frame.
Further, can according to the similarity L of formula (4) calculated candidate image and target image
c:
Gray-scale value summation a little that M is target image (To Template gray matrix T), by calculated value normalization.
represent the gray matrix F of candidate image c
ceach element, T
ijrepresent each element of To Template gray matrix T.The similarity of trying to achieve through above formula, in [0,1] scope, finds out similarity L in all candidates
ca maximum candidate, the matching degree of this candidate image and target image is the highest, therefore, can using the tracking target (target image) of this candidate image as present frame.
In order to keep the real-time of following the tracks of process, after having determined the target image of present frame, arrange update process, its concrete update process mode can be:
Based on the target image of present frame, according to step 2 couple foreground template D
+, background template D
-, coupling dictionary A and To Template attribute matrix S or To Template gray matrix T upgrade;
Or, based on the target image of present frame, according to step 2, coupling dictionary A and gray matrix T is upgraded; And every 5-10 frame, based on the target image of present frame, carry out stochastic sampling according to step 2 in the sample area of background template, select the individual sampled images equal with target image size of N ', join background template D
-in, and delete background template D
-in from the longest individual sampled images of N ' of present frame interval time, wherein 1≤'≤≤
n(in tracing process below, tracking environmental is in continuous change, and tracking target change is little, but change of background is comparatively large, therefore a consideration cycle can carry out update process to background template); Further, the target image of present frame can also be joined foreground template D
+in, and delete foreground template D
+in from the longest sampled images of present frame interval time.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows: to tracking target attitudes vibration, ambient lighting change and the situation such as to block have very strong robustness, and can improve the situation treatment effect to target occlusion.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail.
See Fig. 1, for the current frame image of reality, first judge whether present frame is the first frame, if the first frame, then need according to target image information (the target position in the picture in the first frame, the information such as size (length and width are respectively W, H) obtain in advance) ask for judgement dictionary (foreground template and background template) required in tracking of the present invention, coupling dictionary and out of Memory, specifically: for the first two field picture, first the sample area shown in figure bis-carries out palette sample, obtains positive and negative template as training set.Positive template is some images of mode stochastic sampling in the target location indicated with central rectangular frame by upper and lower, left and right translation 1-5 pixel, then chooses N by the mode of K mean cluster
pindividual cluster centre is as positive template (foreground template).Negative norm plate is relatively remote some images of position sampling in the position of range mark, utilizes the method for K mean cluster to choose N equally
nindividual image (namely in a back-shaped region of distance objective center certain distance, size is normalized to W*H equally), these sampled images are as negative norm plate (background template).
Also need to calculate simultaneously and obtain coupling dictionary A and gray matrix T in the first frame, namely correspondence calculating to be carried out by obtaining the fragment of To Template, its method is: the rectangular slide window first establishing a w*h size, utilize this sliding window left and right on target image, upper and lower carry out slip sampling, left and right sampling interval is w/2, upper and lower sampling interval is h/2, if the height and width of target image are respectively H and W, then number of samples is [W/ (w/2) – 1] * [H/ (h/2) – 1] (as shown in Figure 3, its number of samples is 8*8).The same K means clustering method that first utilized by these fragments asks for K the representational fragment of most, and the mode of then stacking by row piles column vector composing training collection, utilizes approximatioss (such as convex relaxation method) to generate judgement dictionary A.For To Template gray matrix
for point each in T, all correspond to the gray-scale value of a local fragment of target image.
If present frame is not the first two field picture, then need the target location (determining the target image of present frame) estimating present frame.First particle filter mode is utilized to sample to current frame image, obtain many candidate images as the first candidate image sample set, then K mean cluster is utilized to choose K cluster centre, the value of the confidence of each cluster centre is calculated, center of alternatively being sampled by maximum for the value of the confidence cluster centre according to formula (1) and (2).Then the candidate image of some is again obtained in particle filter mode according to this sampling center as the second candidate image sample set.Calculate the value of the confidence to each candidate image in the second candidate image sample set, get the highest front 20 candidates of the value of the confidence as new candidate image.
To each in 20 new candidate images, the mode obtained according to fragment in the first frame obtains fragment, to each fragment, utilizes coupling dictionary A to ask the reconstructed error ε of each fragment
i, work as ε
i> ε
0, illustrate that this fragment belongs to background; Work as ε
i≤ ε
0time, illustrate that this fragment belongs to prospect.By this fragment at attribute matrix S
cthe value of correspondence position is set to 1.Through double counting, for each candidate image, the attribute matrix S of its correspondence can be obtained
c.Then similar with the acquiring method of gray matrix T in the first frame, ask for the gray matrix of each candidate image
but here with the first frame unlike needs in conjunction with attribute matrix S
c, be namely that in gray matrix, the Normalized Grey Level value of corresponding point is also 0 when in attribute matrix, the property value of certain point is 0, when property value is 1, Normalized Grey Level value is constant.With reference to Fig. 3, for the total number of fragment for 8*8, the attribute matrix S of its correspondence
c, To Template gray matrix T and new candidate image gray matrix F
cthen be the two-dimensional matrix into 8*8, if the property value of the fragment of (3,4) is 0, then attribute matrix S
cmiddle S
34value be 0, and gray matrix F
cmiddle F
34value be also 0, if the property value of the fragment of (4,2) is 1, then attribute matrix S
cmiddle S
42value be 1, and gray matrix F
cmiddle F
42value be that each point pixel value of fragment (4,2) is added the summation of trying to achieve.
Finally according to the similarity of formula (4) calculated candidate and template, using the target image of the highest for a wherein similarity candidate image as present frame, and this target image is utilized to upgrade foreground template, background template, coupling dictionary A and To Template gray matrix T.