CN103413312A - Video target tracking method based on neighborhood components analysis and scale space theory - Google Patents

Video target tracking method based on neighborhood components analysis and scale space theory Download PDF

Info

Publication number
CN103413312A
CN103413312A CN2013103619324A CN201310361932A CN103413312A CN 103413312 A CN103413312 A CN 103413312A CN 2013103619324 A CN2013103619324 A CN 2013103619324A CN 201310361932 A CN201310361932 A CN 201310361932A CN 103413312 A CN103413312 A CN 103413312A
Authority
CN
China
Prior art keywords
target
sample
particle
new
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103619324A
Other languages
Chinese (zh)
Other versions
CN103413312B (en
Inventor
贾静平
夏宏
魏振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN201310361932.4A priority Critical patent/CN103413312B/en
Publication of CN103413312A publication Critical patent/CN103413312A/en
Application granted granted Critical
Publication of CN103413312B publication Critical patent/CN103413312B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision techniques, and particularly relates to a video target tracking method based on neighborhood components analysis and the scale space theory. According to the video target tracking method, the optimal feature for distinguishing a target and the background is obtained by means of a feature transforming function of a neighborhood components analysis method, an optimal linear classifier for distributing target pixels and background pixels in any frame images is obtained, and updating of a target feature is achieved through design angle of the classifier; a particle confidence computing method based on a multi-scale standardization Laplacian filter function is provided, state diversity and the convergence characteristic of particle filter are utilized to avoid the problem that after the target is shielded, an algorithm is caught in the local optimal point problem, and meanwhile tracking precision is guaranteed on the basis of the scale space theory. According to the video target tracking method, the position and dimension of the target can be accurately positioned, illumination and color changes of the target can be effectively adapted, and meanwhile target shielding is processed in a robust mode.

Description

Video target tracking method based on neighbourhood's constituent analysis and metric space theory
Technical field
The invention belongs to technical field of computer vision, be specifically related to a kind of video target tracking method based on neighbourhood's constituent analysis and metric space theory.
Background technology
In many application of computer vision field, as intelligent monitoring, robot vision, human-computer interaction interface etc., all need the moving target in video sequence is followed the tracks of.Due to the diversity of tracking target form and the uncertainty of target travel, how to realize the real-time follow-up of robust under various environment, and change with target range the focus that the reliable estimation that realizes its variable dimension is research always.Average drifting (mean shift) is the method for tracking target of academia's most study in recent years, it is a kind of non-parametric density gradient ascent algorithm, for finding the extreme value of probability density function, along the gradient direction of dbjective state probability function, the state that in local iteration's searching image, the target most probable has.Such algorithm fast operation, obtain extensively and praise highly.Yet, the adaptation mechanism of average drifting method mesoscale be important problem always.Traditional mean shift algorithm adopts the kernel function of fixed-bandwidth, and the convergent-divergent of tracking target, easily cause location inaccurate adaptively.When selected bandwidth was excessive, the candidate region characteristic probability extracted distributed and will comprise background interference, the impact location; Conversely, when bandwidth was too small, the characteristic probability that can only obtain the target part distributed, and can cause positioning error equally.
1998, the people such as Bradski improve basic average drifting method, in the search target location, obtain target scale size more accurately by the High Order Moment of dbjective state probability distribution, i.e. the average drifting method (CamShift) of dimension self-adaption.But the method around the color of environment and target color near the time, easily lost efficacy.Dong Rong (Dong Rong, Li Bo, Chen Qimei, target multiple degrees of freedom mean-shift track algorithm based on the SIFT feature. control and decision-making, 2012 (03): 399-402+407) proposed bandwidth and the direction that target scale that a kind of SIFT of utilization feature obtains and target side are always adjusted the mean shift algorithm kernel function, thereby improved the algorithm of the adaptive faculty that the average drifting track algorithm changes target scale.Because mean shift algorithm itself likely converges on the local best points in state space, this algorithm still can't guarantee the precision of following the tracks of theoretically.Qin Jian (Qin Jian, Zeng Xiaoping, Li Yongming, Mean-Shift core window width adaptive algorithm based on boundary force. Journal of Software, 2009 (07): p.1726-1734.) proposed to introduce regional likelihood score to extract the local message of target, compared the regional likelihood score between consecutive frame and build boundary force, by the calculating to boundary force, obtain the position of frontier point, and then upgrade adaptively a kind of track algorithm of average drifting based on boundary force of kernel function bandwidth.Because same reason, be subjected to the constraint of the shortcoming of mean shift algorithm own, this algorithm also can't guarantee the precision of following the tracks of theoretically.Similarly problem also be present in other algorithm based on average drifting (as Wang Yong, Chen Fenxiong, Guo Hong think, the nuclear space histogram target following of offset correction. robotization journal, 2012 (03): p.430-436.).At the publication number of having authorized, be in the patent of invention of CN101281648A (method for tracking dimension self-adaption video target of low complex degree), the algorithm that the inventor proposes, the particle filter algorithm of take is framework, in the importance sampling function, utilized mean shift algorithm, improved sampling efficiency, overcome the shortcoming that single yardstick average drifting method converges on local best points, but because do not consider when illumination generation significant change, to be difficult to guarantee positioning precision by the adaptive problem that the feature of target own changes.
Neighbourhood's constituent analysis (Neighbourhood Components Analysis, NCA) be Goldberger(J.Goldberger, S.Roweis, G.Hinton, R.Salakhutdinov. (2005) Neighbourhood Component Analysis.Advances in Neural Information Processing Systems.17,513-520.) the supervised learning method of a kind of distance metric of proposing, its purpose is by the training set learning, obtaining a linear space transition matrix, maximizes and on average stays a classifying quality in new transformed space.It is measured sample data according to a kind of given distance metric algorithm, then multiclass bunch is classified.Its purpose with the k nearest neighbor algorithm is identical on function, directly utilizes random neighbour's concept to determine the training sample that label is arranged closed on test sample book.
Its typical processing procedure is described as follows: with x iMean i sample in the training set sample, c iFor its classification, the sample after the conversion of linear space shift-matrix A is AX, in the new space after conversion, considers that whole data set is as random nearest-neighbors.With the squared euclidean distance function, be defined in the distance of staying a data point and other data in new transformed space, this function definition is as follows:
Figure BDA0000368657540000031
P IjAlso sample point x jFor x iThe probability of nearest-neighbors.X iClassification accuracy be the nearest-neighbors collection C be adjacent i(C i={ j|c i=c j) classification accuracy: Choose can maximize classification accuracy A as A new, namely A new = arg max A Σ i p i . For convenience of calculating, by objective function f ( A ) = Σ i p i = Σ i Σ j ∈ C i p ij Again write conduct:
Figure BDA0000368657540000035
Its gradient function is through being derived as
∂ g ∂ A = 2 A Σ i ( Σ k p ik x ik x ik T - Σ j ∈ C i p ij x ij x ij T Σ j ∈ C i p ij ) ;
X wherein Ij=x i-x j.Adopt conjugate gradient multivariate optimization method can in the hope of A new = arg max A g ( A ) = arg max A Σ i log ( Σ j ∈ C i p ij ) . A newBy making training sample set in new transformed space, obtain maximization, on average stay a classifying quality.
Multi-scale normalized laplacian filter is the theoretical (Lindeberg of metric space that Lindeberg proposes, T., " Feature Detection with Automatic Scale Selection ", International Journal of Computer Vision, 1998, vol30 (2), be used for detecting the computing formula of grey blocks on different scale in pp.79-116).It regards gray level image as two-dimensional function f, i.e. f:R 2→ R.Its linear-scale space representation L be defined as and have the convolution of gaussian kernel g of 3 dimensions of variable-width t: L (; T)=g (; T) * f (), wherein g:
Figure BDA0000368657540000041
X=(x 1... x D) T, t is called as the scale parameter of L.With L XxAnd L YyMean respectively L second-order partial differential coefficient in the horizontal and vertical directions, the value of can following definite Multi-scale normalized laplacian filter function locating at (x, y, t): Laplacian (x, y, t)=(t (L Xx(x, y)+L Yy(x, y))) 2.When in gray level image, comprising some square grey blocks varied in size, this function can be under the different scale parametric t, successively at the center of each grey blocks, obtain maximum value, by yardstick and the position that checks these maximum value places, just can determine center and the yardstick of each grey blocks.
Summary of the invention
The objective of the invention is the deficiency existed for above-mentioned prior art, proposed the video target tracking method based on neighbourhood's constituent analysis and metric space theory.
Based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, this video target tracking method comprises the following steps:
Step 1: in the first frame, by the rectangle frame detected or manually mark is determined the initial place of target, obtain the original state of relevant target, and the initialization particle filter;
By object detection method or manual mark, determine the initial place of target rectangle frame, target rectangle upper left corner point coordinate is (r, c), and the wide and high of rectangle frame is (w, h), and target initial gauges parameter s is calculated acquisition by following formula:
s=((13+(w-34)*0.47619)) 2
Wherein, target's center's point is
Figure BDA0000368657540000042
The width of record object and height ratio are: asr = w h ;
Under the sampling of the original state of target setting, be limited to lb, in sampling, be limited to ub;
Set N particle and describe the diversity of dbjective state, be initialized as the weights of all particles unified
Figure BDA0000368657540000051
Each component of each particle is initialized as to equally distributed random vector in [lb, ub] scope;
Step 2: to sampling in the rectangle frame at target place in present frame and on the background field of the outer rectangular frame at target place, obtain the training set X of 2K sample;
In the rectangle frame at target place, take target's center's point coordinate as the expectation that dimensional Gaussian distributes, sampled in target area, obtain K sampling location, with K the concentrated target class sample of object pixel feature composition training sample at place, K sampling location, the minimum of target rectangle of take is limit in abutting connection with oval center, the direction that is parallel to target width is that pole axis is set up polar coordinates, in polar coordinates, carry out stochastic sampling, the angle of each sampled point is [0, 2 π) equally distributed random number in, utmost point footpath is the multiple in the utmost point footpath of oval upper point under equal angular, this multiple is the random number of an exponential distribution and is greater than 1 floating number sum, sampling obtains K and is positioned at outside target area, new sampled point on background field on every side, feature with the background pixel at new sampling point position place forms K the background classes sample that training sample is concentrated, 2K sample by sampling obtains training set X,
Step 3: the training set X to 2K sample carries out neighbourhood's constituent analysis NCA, and uses vectorial BFGS multivariate optimized algorithm to solve to obtain new linear space shift-matrix A new, 2K training sample is again according to obtaining new linear space shift-matrix A newCarry out conversion, obtain the training sample set AX after conversion;
Step 4: obtain the next frame image, become present frame, the vector that the pixel characteristic of all positions in this two field picture is formed forms the test sample book collection, also according to A newCarry out conversion, obtain the test sample book collection S after conversion newTest sample book collection S after utilizing the training sample set AX after conversion in previous frame to conversion in this frame newClassify, obtain the Probability p of each test sample book classification Post, will belong to the pixel value of the probability of target class as each place, test sample book position, obtain the destination probability distribution plan I of the gray-scale map that a width is new Likelihood
Step 5: at destination probability distribution plan I LikelihoodUpper, calculate the Multi-scale normalized laplacian filter function centered by each place, particle position, the value at place, particle position; Wherein maximal value is vmax, with the Multi-scale normalized laplacian filter functional value of each particle, is the degree of confidence of this particle of basic calculation apart from the normalized distance of maximal value vmax;
Step 6: upgrade particle filter, obtain the output state of wave filter, obtain target means with rectangle frame in current frame image reposition, finish if follow the tracks of not yet, go to step 2, otherwise stop.
The detailed process of described background up-sampling is as follows:
Generate [0,2 π) on interval equally distributed random number as the angle [alpha] in polar coordinates;
The random number χ of the exponential distribution of production rate parameter lambda=0.5, calculate utmost point footpath
Figure BDA0000368657540000061
β wherein>1, w and h are respectively the wide and high of target, and β is for controlling the parameter of sampled point to the object edge distance;
Obtain in image coordinate
Figure BDA0000368657540000062
The feature of the pixel at place forms a background classes sample, repeats said process and can obtain K background classes sample for K time.
The Probability p of described each test sample book classification PostThe computation process detailed process is as follows:
The dimension of each sample is setting value n, i sample x iBe expressed as n dimension row vector x i=(x I1, x I2..., x In), x InFor the n dimension component of sample, using each sample as delegation, from the training sample through linear transformation of target and background, form the matrix A X of a 2K * n, front K is capable of target, rear K is capable of background, builds the matrix G of 2K * 2, and the value of each row is corresponding to the classification of corresponding line training sample, if i sample belongs to target, the i behavior (1,0) of G, if i sample belongs to background, the i behavior (0,1) of G; Similarly, test sample book collection S newFormed the matrix of W * H * n, wherein W and H are respectively the wide and high of image; The AX of take is training sample set, and G is as the supervision result, to the test sample book collection S after conversion newClassify, obtain S newIn the classification ownership of each sample, and the Probability p that belongs to target class Post, the pixel value of each pixel position in the corresponding image of test sample book is replaced with to corresponding p Post, generate destination probability distribution plan I Likelihood.
The concrete computation process of described Multi-scale normalized laplacian filter function is as follows:
(1) have continuous yardstick variable t, gaussian kernel function T:Z * R of discrete space variable n +→ R is T (n; T)=e -tI n(t), I wherein n(t) be first kind modified Bessel function, its second-order differential Can pass through Difference Calculation:
Figure BDA0000368657540000072
Wherein * means one-dimensional discrete signal convolution;
(2) give dimensioning variable t, by destination probability distribution plan I LikelihoodEvery delegation of matrix with
Figure BDA0000368657540000073
Convolution, by matrix of consequence each row again with T (n; T) convolution, with L Xx(x, y) means the first matrix of consequence of convolution for the second time; Similarly, by destination probability distribution plan I LikelihoodMatrix each row with
Figure BDA0000368657540000074
Convolution, by every delegation of matrix of consequence again with T (n; T) convolution, with L Yy(x, y) means the second matrix of consequence of convolution for the second time.The Multi-scale normalized laplacian filter function as shown in the formula the value of locating at (x, y, t) is:
Laplacian(x,y,t)=(t(L xx(x,y)+L yy(x,y))) 2
The computation process of described particle degree of confidence is as follows:
Obtain the maximal value vmax in all particle Multi-scale normalized laplacian filter functional values, calculate between each particle Multi-scale normalized laplacian filter functional value and vmax apart from d, and the variance var of all particle Laplacian filter function values, according to following formula, obtain the degree of confidence conf of this particle:
conf = e - d 2 2 × var .
The acquisition of described target means with rectangle frame in current frame image reposition, detailed process is as follows:
The state of wave filter output is
Figure BDA0000368657540000076
The width of target is calculated by following formula:
w = s - 13 0.47619 + 34 ;
Be highly h = w asr ; Target upper left corner point coordinate is ( r = r · - h 2 , c = c · - w 2 ) .
Beneficial effect of the present invention: the present invention utilizes eigentransformation to obtain the optimal characteristics of distinguishing target and background, the optimal classification device of target and background pixel is distinguished in acquisition in any two field picture, from the angle of classifier design, solved the replacement problem of target signature; Utilize state diversity and the convergence property of particle filter, when algorithm is absorbed in the local best points problem after avoiding target to be blocked, on the basis of metric space theory, guaranteed the precision of following the tracks of.Avoid the traditional algorithms such as average drifting to sink into the danger of local optimum state, obtained simultaneously the optimal estimation value of target location and yardstick.The present invention is position and the scale size of localizing objects more accurately, and more effective adaptation target light is shone, the color change problem, and the processing target of robust blocks simultaneously.
The accompanying drawing explanation
Fig. 1 is the workflow diagram of tracking of the present invention;
Fig. 2 be in prior art the Camshift method when the color of object feature occurs to change rapidly to the tracking effect figure of target; Wherein, (a) while being the second frame the method start to depart from real goal; (b)-(f) be at the tracking effect figure of each typical frame to target subsequently;
Fig. 3 is the tracking effect figure of the present invention's adaptive updates best features when the color of object feature occurs to change rapidly;
Fig. 4 is L1APG method of the prior art when object brightness and yardstick change simultaneously to the tracking effect figure of target; Wherein, (a) (b) situation that can follow the tracks of the vial as target for the method in several leading frame; (c)-(f) for the method when the surface brightness on target changes, on the object height direction, produce the situation of deviation;
Fig. 5 is the present invention when object brightness and yardstick change simultaneously, adaptive updates best features and adjust the tracking effect figure of yardstick; Situation during wherein, (a) for the tracking beginning; (b) (c) dimmed for working as target surface, the situation that during size reduction, algorithm of the present invention still can accurately be followed the tracks of; (d)-(f) dimmed by secretly brightening for target surface, size is first amplified retrude hour algorithm of the present invention still to its situation of accurately following the tracks of again;
Fig. 6 is the present invention generates in tracing process destination probability distribution plan; Wherein, (a)-(f) shown with each two field picture in Fig. 5 corresponding respectively destination probability distribution plan;
Fig. 7 is the present invention's sampled result schematic diagram to target and background in tracing process; In white square, be wherein ,Wai Wei background area, target area; Circle represents the sampled point of target area, and x represents the background area sampled point;
Fig. 8 be example of the present invention before training sample set is carried out to neighbourhood's constituent analysis, each sample is at the perspective view of feature space head bidimensional; The circle representative is from the sample point of target, and the x representative is from the sample point of background;
Fig. 9 be example of the present invention after training sample set is carried out to neighbourhood's constituent analysis, the perspective view of a bidimensional in the feature space of each sample after conversion, circle representative is from the sample point of target, the x representative is from the sample point of background;
Figure 10 is the embodiment of the present invention and the comparing result of L1APG of the prior art on the target width precision;
Figure 11 is the embodiment of the present invention and the comparing result of L1APG of the prior art on the object height precision;
Figure 12 is the tracking effect figure of the present invention when target part occurs and blocks fully; Wherein, (a) (b) is (c) respectively for the first time before partial occlusion, in blocking, and the algorithm keeps track situation after blocking; (d)-(f) before being respectively and blocking fully for the second time, in blocking, the algorithm keeps track situation after blocking.
Embodiment
In the present embodiment, " glass cylinder " video sequence is carried out to target following.Design parameter is set to: total number of particles N=800; State vector dimension stateN=3, corresponding to the coordinate on wide, the high direction of particle position, and scale parameter, observation vector dimension measureN=3, target area sample number K=300, constituent analysis maximum cycle in neighbourhood's is maxIter=100 time.
In the process of following the tracks of, likely because of the variation of environment, cause the brightness of target, the variation of color, should be when following the tracks of in time the feature of revise goal to adapt to these variations, therefore the present invention utilizes neighbourhood's constituent analysis (NCA) transformation calculations to obtain a matrix of a linear transformation, this matrix projects to the pixel characteristic of target and background in new space, mahalanobis distance in new space between target class and background classes sample reaches maximum, thereby make the binary classifier that training obtains on the feature after conversion, its error in classification is less than former feature.Each pixel characteristic in a new two field picture is carried out to conversion by this transformation matrix, and, by this sorter classification, can obtain the better destination probability distribution plan that Billy obtains by former feature.Target is rendered as the zone that gray-scale value is higher in this distribution plan, background is rendered as darker zone.For further improving precision, introduced the state that particle filter carrys out estimating target, according to Lindeberg metric space theory, particle confidence calculations method based on the Multi-scale normalized laplacian filter function has been proposed, avoid the traditional algorithms such as average drifting to sink into the danger of local optimum state, obtained simultaneously the optimal estimation value of target location and yardstick.
Compared with prior art, the present invention proposes the eigentransformation function of utilizing neighbourhood's componential analysis (NCA) and obtain the optimal characteristics of distinguishing target and background, the optimum linear sorter of target and background pixel is distinguished in acquisition in any two field picture, from the angle of classifier design, solved the replacement problem of target signature; Particle confidence calculations method based on the Multi-scale normalized laplacian filter function has been proposed, utilize state diversity and the convergence property of particle filter, when algorithm is absorbed in the local best points problem after avoiding target to be blocked, on the basis of metric space theory, guaranteed the precision of following the tracks of.Recently the L1APG track algorithm (Chenglong proposed for current pandemic average drifting track algorithm and 2012, B., et al.Real time robust L1tracker using accelerated proximal gradient approach.in Computer Vision and Pattern Recognition (CVPR), 2012IEEE Conference on.2012.), under identical experiment condition, the present invention is position and the scale size of localizing objects more accurately, more effectively adapting to target light shines, the color change problem, target occlusion is processed on robust ground simultaneously.
As shown in Figure 1, the present embodiment comprises the steps: based on the video target tracking method of neighbourhood's constituent analysis and metric space theory
Step 1: in the first frame, by the rectangle frame detected or manually mark is determined the initial place of target, obtain the original state of relevant target, and the initialization particle filter;
Initialization dbjective state: at the first frame of sequence, by object detection method or manual mark, determine target place rectangle frame, obtaining target rectangle upper left corner point coordinate is (r, c), the wide and high of place rectangle frame is (w, h), by following formula, calculate and obtain target initial gauges parameter s;
s=((13+(w-34)*0.47619)) 2 (1)
Wherein, target's center's point is
Figure BDA0000368657540000111
The width of record object and height ratio: asr = w h ;
Set under the sampling of each particle original state and be limited to lb=[r-3h, c-3w, 0.1s], in sampling, be limited to ub=[r+4h, c+4w, 2s];
Step 2: to sampling in the rectangle frame at target place in present frame and on the background field of the outer rectangular frame at target place, obtain the training set X of 2K sample;
Take target rectangle frame center is expectation
Figure BDA0000368657540000113
Variance is σ = h 3 0 0 w 3 The dimensional Gaussian of matrix distributes image pixel positions is sampled, and chooses K=300 target class sample that is arranged in the pixel composing training sample set of target place rectangle frame; The minimum of target rectangle of take is limit in abutting connection with oval center, the direction that is parallel to target width is that pole axis is set up polar coordinates, in polar coordinates, carry out stochastic sampling, the angle of each sampled point is [0,2 π) equally distributed random number in, utmost point footpath is the multiple in the utmost point footpath of oval upper point under equal angular, this multiple is that the random number of an exponential distribution adds 1.2, sampling obtains K=300 and is positioned at outside the rectangle frame of target place like this, be positioned at the location of pixels on background around target, form the background classes sample with them;
Step 3: the training set X to 2K sample carries out neighbourhood's constituent analysis NCA, and uses vectorial BFGS multivariate optimized algorithm to solve to obtain new linear space shift-matrix A new, 2K training sample is again according to obtaining new linear space shift-matrix A newCarry out conversion, obtain the training sample set AX after conversion;
Training set X to the resulting 2K of step 2 sample carries out neighbourhood's constituent analysis, and initial linear space transfer matrix is A = 1 0 0 0 1 0 0 0 1 , Using vectorial BFGS(Broyden-Fletcher-Goldfarb-Shanno) the multivariate optimized algorithm solves A new, set algorithm circulation maximum times is maxIter=100 time, the circulation exit criteria is for being less than 10 when the gradient norm -3In time, exit.If optimized algorithm successfully returns to the acquisition minimal value, corresponding multivariate parameter is just as A new, otherwise set A new=A; To 2K training sample according to A newCarry out conversion, obtain the training sample set AX=A after conversion new* X';
Step 4:, obtain the next frame image, become present frame, according to pixels from left to right, from top to bottom order is arranged in position by the RGB three-component of each pixel in this two field picture, forms the matrix S of W * H * 3, the RGB three-component of a pixel of its each behavior, W is picture traverse, H is picture altitude.To S according to A newCarry out conversion, obtain the test sample book collection S after conversion new=A new* S'; Using the AX that obtains in step 3 as training set, to S newAccording to target, background two classes classify, and obtain S newIn each sample (corresponding each pixel) belong to the Probability p of target Post, with the p of each pixel PostValue replaces former RGB component, obtains size and is the destination probability distribution plan I of W * H Likelihood
The Probability p of each test sample book classification PostThe computation process detailed process is as follows:
The dimension of supposing each sample is n, i sample x iCan be expressed as n dimension row vector x i=(x I1, x I2..., x In), x InN dimension component for sample.Using each sample as delegation, from the training sample through linear transformation of target and background, can form the matrix A X of a 2K * n, its front K is capable of target, and rear K is capable of background.Build the matrix G of 2K * 2, the value of its each row is corresponding to the classification of corresponding line training sample, if i sample belongs to target, and the i behavior (1,0) of G, if i sample belongs to background, the i behavior (0,1) of G; Similarly, test sample book collection S newFormed the matrix of W * H * n, wherein W and H are respectively the wide and high of image.The AX of take is training sample set, and G is as the supervision result, to the test sample book collection S after conversion newClassify, not only can obtain S newIn the classification ownership of each sample, can also obtain the Probability p that it belongs to target class Post, the pixel value of each pixel position in the corresponding image of test sample book is replaced with to corresponding p Post, can generate destination probability distribution plan I Likelihood
Step 5: at destination probability distribution plan I LikelihoodUpper, calculate the Multi-scale normalized laplacian filter function centered by each place, particle position, the value at place, particle position; Wherein maximal value is vmax, with the Multi-scale normalized laplacian filter functional value of each particle, is the degree of confidence of this particle of basic calculation apart from the normalized distance of maximal value vmax; Obtain the current state of this particle (x, y, t), calculate T (n under yardstick t; T) discrete template size is Discrete template is
Figure BDA0000368657540000132
Wherein
Figure BDA0000368657540000133
Calculate its second-order differential
Figure BDA0000368657540000134
Discrete template: d2t Mask=t Mask* (1 ,-2,1).
Calculate under this particle place yardstick target profile I LikelihoodOn both direction successively with
Figure BDA0000368657540000135
Discrete template d2t MaskAnd T (n; T) discrete template t MaskThe first matrix of consequence L of convolution Xx(x, y) and the second matrix of consequence L Yy(x, y).
The Multi-scale normalized laplacian filter functional value at place, particle position can be calculated by formula 2: Laplacian (x, y, t)=(t (L Xx(x, y)+L Yy(x, y))) 2(2)
Obtain the maximal value vmax in all particle Multi-scale normalized laplacian filter functional values, calculate between each particle Multi-scale normalized laplacian filter functional value and vmax apart from d, and the variance var of all particle Laplacian filter function values, according to formula (3), obtain the degree of confidence conf of this particle:
conf = e - d 2 2 × var - - - ( 3 )
Step 6: upgrade particle filter, obtain the output state of wave filter, obtain target means with rectangle frame in current frame image reposition, finish if follow the tracks of not yet, go to step 2, otherwise stop.
The acquisition of target rectangle frame, detailed process is as follows:
The state of wave filter output is The width of target is calculated by (4) formula:
w = s - 13 0.47619 + 34 - - - ( 4 )
Be highly h = w asr ; Target upper left corner point coordinate is ( r = r · - h 2 , c = c · - w 2 ) .
According to above-mentioned steps, a plurality of colored cycle testss have been carried out to target following, institute's tracking target is in tracing process, and significant change has occurred in color, illumination or size, and exists occlusion issue.
Subgraph (a)-(f) in Fig. 2,3,4,5 and 12 has been portrayed position and the size variation of target according to time sequencing.
Be illustrated in figure 3 the tracking results that adopts " variable color square " sequence that the present embodiment obtains, tracked target is square, and the algorithm keeps track result indicates with light square frame." variable color square " in motion process, color is rapidly from the pure red pure blue that becomes in 60 two field pictures of front and back.Fig. 2 has shown the typical algorithm in the average drifting class methods---the situation that Camshift falls flat, Camshift can't adapt to the so violent variation of color of object feature, as black ellipse in Fig. 2, because changing fast, the square color (can't find out on normal printer, color printer can be found out this variation), lost soon target, and the present embodiment adopts method of the present invention because can in present frame, search out the optimal mapping feature of distinguishing target and background, so adapted to well this variation of color of object, followed the tracks of exactly square.
Be illustrated in figure 5 the tracking results that adopts " vial " sequence that the present embodiment obtains, tracked target is vial, and the algorithm keeps track result indicates by white box.Vial repeatedly enters and exits the shade in background in tracing process, cause its surface brightness generation acute variation, and due to the variation of distance video camera distance, significant change also occurs the size of vial in image simultaneously.Though the L1APG track algorithm of newly delivering in 2012 can be followed the tracks of vial, on scale parameter, produced gradually larger error, finally affected the positioning precision to target, as shown in Figure 4.Figure 10, Figure 11 have provided respectively the target actual width, target true altitude, the target width that the present embodiment obtains, object height, and the curve that changes with the video frame number of the width that obtains of L1APG algorithm, height.Curve calculation draws in figure: the average error of the present embodiment gained width and target actual width is: 4.474529, and variance is 13.178290.And the average error of L1APG algorithm gained width and target actual width is: 8.821549, variance is 106.027808.The present embodiment gained height and target true altitude average error are: 1.009198, and variance is 12.256838.And L1APG and target true altitude average error are: 41.400282, variance is 689.699398, obviously is greater than the result of the present embodiment.Can find out, the present embodiment method is the size variation of tracking target well, and the object brightness that simultaneous adaptation illumination causes changes, and has obtained higher target location accuracy.Fig. 6 has shown in the present embodiment processing procedure the corresponding destination probability distribution plan of each two field picture in Fig. 5, visible employing neighbourhood's constituent analysis technology has been given prominence to the target region, and suppressed the background of next-door neighbour's target periphery, for the precision that improves location provides condition.Fig. 7 is the present embodiment schematic diagram to the target and background sampling in tracing process; In white square, be wherein target area, square is outward background area.Circle represents the sampled point of target area, and x represents the background area sampled point.The visual target sampled point is intensive near target's center place, in object boundary place rareness, effectively reduced the error that boundary vicinity may exist and introduced the possibility of disturbing.The background sampled point is intensive in the target peripheral region simultaneously, wide place rareness, and this has guaranteed to train the sorter obtained between target and its next-door neighbour's background, best classifying quality being arranged.Fig. 8 be this example before training sample set is carried out to neighbourhood's constituent analysis, each sample is in the perspective view of feature space head bidimensional.The circle representative is from the sample point of target, and the x representative is from the sample point of background.Can find out because in background, there is the zone approached with target signature, make two class sample sets overlapping, be difficult to classification.Fig. 9 be this example after training sample set is carried out to neighbourhood's constituent analysis, the perspective view of a bidimensional in the feature space of each sample after conversion.The circle representative is from the sample point of target, and the x representative is from the sample point of background.Can find out that two class sample set class spacings after conversion obviously become large, are easy to classification.
For adopting the tracking results of certain " cyclist " sequence that the present embodiment obtains, target indicates by white box as shown in figure 12.Target is successively blocked by background interference for twice, but each the present embodiment method can continue target is followed the tracks of when blocking end after, and this method can successfully manage the target occlusion problem as seen.

Claims (6)

1. based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, this video target tracking method comprises the following steps:
Step 1: in the first frame, by the rectangle frame detected or manually mark is determined the initial place of target, obtain the original state of relevant target, and the initialization particle filter;
By object detection method or manual mark, determine the initial place of target rectangle frame, target rectangle upper left corner point coordinate is (r, c), and the wide and high of rectangle frame is (w, h), and target initial gauges parameter s is calculated acquisition by following formula:
s=((13+(w-34)*0.47619)) 2
Wherein, target's center's point is
Figure FDA0000368657530000011
The width of record object and height ratio are: asr = w h ;
Under the sampling of the original state of target setting, be limited to lb, in sampling, be limited to ub;
Set N particle and describe the diversity of dbjective state, be initialized as the weights of all particles unified
Figure FDA0000368657530000013
Each component of each particle is initialized as to equally distributed random vector in [lb, ub] scope;
Step 2: to sampling in the rectangle frame at target place in present frame and on the background field of the outer rectangular frame at target place, obtain the training set X of 2K sample;
In the rectangle frame at target place, take target's center's point coordinate as the expectation that dimensional Gaussian distributes, sampled in target area, obtain K sampling location, with K the concentrated target class sample of object pixel feature composition training sample at place, K sampling location, the minimum of target rectangle of take is limit in abutting connection with oval center, the direction that is parallel to target width is that pole axis is set up polar coordinates, in polar coordinates, carry out stochastic sampling, the angle of each sampled point is [0, 2 π) equally distributed random number in, utmost point footpath is the multiple in the utmost point footpath of oval upper point under equal angular, this multiple is the random number of an exponential distribution and is greater than 1 floating number sum, sampling obtains K and is positioned at outside target area, new sampled point on background field on every side, feature with the background pixel at new sampling point position place forms K the background classes sample that training sample is concentrated, 2K sample by sampling obtains training set X,
Step 3: the training set X to 2K sample carries out neighbourhood's constituent analysis NCA, and uses vectorial BFGS multivariate optimized algorithm to solve to obtain new linear space shift-matrix A new, 2K training sample is again according to obtaining new linear space shift-matrix A newCarry out conversion, obtain the training sample set AX after conversion;
Step 4: obtain the next frame image, become present frame, the vector that the pixel characteristic of all positions in this two field picture is formed forms the test sample book collection, also according to A newCarry out conversion, obtain the test sample book collection S after conversion newTest sample book collection S after utilizing the training sample set AX after conversion in previous frame to conversion in this frame newClassify, obtain the Probability p of each test sample book classification Post, will belong to the pixel value of the probability of target class as each place, test sample book position, obtain the destination probability distribution plan I of the gray-scale map that a width is new Likelihood
Step 5: at destination probability distribution plan I LikelihoodUpper, calculate the Multi-scale normalized laplacian filter function centered by each place, particle position, the value at place, particle position; Wherein maximal value is vmax, with the Multi-scale normalized laplacian filter functional value of each particle, is the degree of confidence of this particle of basic calculation apart from the normalized distance of maximal value vmax;
Step 6: upgrade particle filter, obtain the output state of wave filter, obtain target means with rectangle frame in current frame image reposition, finish if follow the tracks of not yet, go to step 2, otherwise stop.
2. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the detailed process of described background up-sampling is as follows:
Generate [0,2 π) on interval equally distributed random number as the angle [alpha] in polar coordinates;
The random number χ of the exponential distribution of production rate parameter lambda=0.5, calculate utmost point footpath
Figure FDA0000368657530000021
β wherein>1, w and h are respectively the wide and high of target, and β is for controlling the parameter of sampled point to the object edge distance;
Obtain in image coordinate
Figure FDA0000368657530000031
The feature of the pixel at place forms a background classes sample, repeats said process and can obtain K background classes sample for K time.
3. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that the Probability p of described each test sample book classification PostThe computation process detailed process is as follows:
The dimension of each sample is setting value n, i sample x iBe expressed as n dimension row vector x i=(x I1, x I2..., x In), x InFor the n dimension component of sample, using each sample as delegation, from the training sample through linear transformation of target and background, form the matrix A X of a 2K * n, front K is capable of target, rear K is capable of background, builds the matrix G of 2K * 2, and the value of each row is corresponding to the classification of corresponding line training sample, if i sample belongs to target, the i behavior (1,0) of G, if i sample belongs to background, the i behavior (0,1) of G; Similarly, test sample book collection S newFormed the matrix of W * H * n, wherein W and H are respectively the wide and high of image; The AX of take is training sample set, and G is as the supervision result, to the test sample book collection S after conversion newClassify; Obtain S newIn the classification ownership of each sample, and the Probability p that belongs to target class Post, the pixel value of each pixel position in the corresponding image of test sample book is replaced with to corresponding p Post, generate destination probability distribution plan I Likelihood.
4. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the concrete computation process of described Multi-scale normalized laplacian filter function is as follows:
(1) have continuous yardstick variable t, gaussian kernel function T:Z * R of discrete space variable n +→ R is T (n; T)=e -tI n(t), I wherein n(t) be first kind modified Bessel function, its second-order differential
Figure FDA0000368657530000032
Can pass through Difference Calculation:
Figure FDA0000368657530000033
Wherein * means one-dimensional discrete signal convolution;
(2) give dimensioning variable t, by destination probability distribution plan I LikelihoodEvery delegation of matrix with Convolution, by matrix of consequence each row again with T (n; T) convolution, with L Xx(x, y) means the first matrix of consequence of convolution for the second time; Similarly, by destination probability distribution plan I LikelihoodMatrix each row with
Figure FDA0000368657530000041
Convolution, by every delegation of matrix of consequence again with T (n; T) convolution, with L Yy(x, y) means the second matrix of consequence of convolution for the second time.The Multi-scale normalized laplacian filter function as shown in the formula the value of locating at (x, y, t) is:
Laplacian(x,y,t)=(t(L xx(x,y)+L yy(x,y))) 2
5. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the computation process of described particle degree of confidence is as follows:
Obtain the maximal value vmax in all particle Multi-scale normalized laplacian filter functional values, calculate between each particle Multi-scale normalized laplacian filter functional value and vmax apart from d, and the variance var of all particle Laplacian filter function values, according to following formula, obtain the degree of confidence conf of this particle:
conf = e - d 2 2 × var .
6. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the acquisition of described target means with rectangle frame in current frame image reposition, detailed process is as follows:
The state of wave filter output is
Figure FDA0000368657530000043
The width of target is calculated by following formula:
w = s - 13 0.47619 + 34 ;
Be highly h = w asr ; Target upper left corner point coordinate is ( r = r · - h 2 , c = c · - w 2 ) .
CN201310361932.4A 2013-08-19 2013-08-19 Based on the video target tracking method of neighbourhood's constituent analysis and Scale-space theory Expired - Fee Related CN103413312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310361932.4A CN103413312B (en) 2013-08-19 2013-08-19 Based on the video target tracking method of neighbourhood's constituent analysis and Scale-space theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310361932.4A CN103413312B (en) 2013-08-19 2013-08-19 Based on the video target tracking method of neighbourhood's constituent analysis and Scale-space theory

Publications (2)

Publication Number Publication Date
CN103413312A true CN103413312A (en) 2013-11-27
CN103413312B CN103413312B (en) 2016-01-20

Family

ID=49606317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310361932.4A Expired - Fee Related CN103413312B (en) 2013-08-19 2013-08-19 Based on the video target tracking method of neighbourhood's constituent analysis and Scale-space theory

Country Status (1)

Country Link
CN (1) CN103413312B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715249A (en) * 2013-12-16 2015-06-17 株式会社理光 Object tracking method and device
CN105321188A (en) * 2014-08-04 2016-02-10 江南大学 Foreground probability based target tracking method
CN106127811A (en) * 2016-06-30 2016-11-16 西北工业大学 Target scale adaptive tracking method based on context
WO2018068718A1 (en) * 2016-10-13 2018-04-19 夏普株式会社 Target tracking method and target tracking device
CN108419249A (en) * 2018-03-02 2018-08-17 中南民族大学 3-D wireless sensor network cluster dividing covering method, terminal device and storage medium
CN111105441A (en) * 2019-12-09 2020-05-05 嘉应学院 Related filtering target tracking algorithm constrained by previous frame target information
CN112614158A (en) * 2020-12-18 2021-04-06 北京理工大学 Sampling frame self-adaptive multi-feature fusion online target tracking method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NAN JIANG等: "《Computer Vision and Pattern Recognition(CVPR), 2011 IEEE Conference》", 25 June 2011 *
江南: "《自适应距离度量及稳健视频运动分析》", 《中国博士学位论文全文数据库》 *
贾静平等: "《Adaboost目标跟踪算法》", 《模式识别与人工智能》 *
贾静平等: "《基于支持矢量机和信任域的目标跟踪算法》", 《北京大学学报(自然科学版)网络版(预印本)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715249A (en) * 2013-12-16 2015-06-17 株式会社理光 Object tracking method and device
CN104715249B (en) * 2013-12-16 2018-06-05 株式会社理光 Object tracking methods and device
CN105321188A (en) * 2014-08-04 2016-02-10 江南大学 Foreground probability based target tracking method
CN106127811A (en) * 2016-06-30 2016-11-16 西北工业大学 Target scale adaptive tracking method based on context
WO2018068718A1 (en) * 2016-10-13 2018-04-19 夏普株式会社 Target tracking method and target tracking device
CN108419249A (en) * 2018-03-02 2018-08-17 中南民族大学 3-D wireless sensor network cluster dividing covering method, terminal device and storage medium
CN108419249B (en) * 2018-03-02 2021-07-02 中南民族大学 Three-dimensional wireless sensor network clustering covering method, terminal equipment and storage medium
CN111105441A (en) * 2019-12-09 2020-05-05 嘉应学院 Related filtering target tracking algorithm constrained by previous frame target information
CN111105441B (en) * 2019-12-09 2023-05-05 嘉应学院 Related filtering target tracking method constrained by previous frame target information
CN112614158A (en) * 2020-12-18 2021-04-06 北京理工大学 Sampling frame self-adaptive multi-feature fusion online target tracking method

Also Published As

Publication number Publication date
CN103413312B (en) 2016-01-20

Similar Documents

Publication Publication Date Title
Li et al. Robust visual tracking based on convolutional features with illumination and occlusion handing
Abbass et al. A survey on online learning for visual tracking
CN103413312B (en) Based on the video target tracking method of neighbourhood's constituent analysis and Scale-space theory
Kumar et al. Review of lane detection and tracking algorithms in advanced driver assistance system
Mahadevan et al. Saliency-based discriminant tracking
Das et al. Use of salient features for the design of a multistage framework to extract roads from high-resolution multispectral satellite images
CN113592845A (en) Defect detection method and device for battery coating and storage medium
Zhou et al. Self‐supervised learning to visually detect terrain surfaces for autonomous robots operating in forested terrain
CN103886619B (en) A kind of method for tracking target merging multiple dimensioned super-pixel
Li et al. Real-time object tracking via compressive feature selection
CN113592911B (en) Apparent enhanced depth target tracking method
CN104036284A (en) Adaboost algorithm based multi-scale pedestrian detection method
Wang et al. An overview of 3d object detection
CN111199245A (en) Rape pest identification method
Efraty et al. Facial component-landmark detection
Laible et al. Terrain classification with conditional random fields on fused 3D LIDAR and camera data
CN112613565A (en) Anti-occlusion tracking method based on multi-feature fusion and adaptive learning rate updating
CN118411507A (en) Semantic map construction method and system for scene with dynamic target
CN113420648B (en) Target detection method and system with rotation adaptability
CN116469085B (en) Monitoring method and system for risk driving behavior
Tang et al. Place recognition using line-junction-lines in urban environments
Wang MRCNNAM: Mask Region Convolutional Neural Network Model Based On Attention Mechanism And Gabor Feature For Pedestrian Detection
Duffhauß Deep sensor data fusion for environmental perception of automated systems
Chen et al. EXTRACTION METHOD FOR CENTERLINES OF RICE SEEDLINGS BASED ON FASTSCNN SEMANTIC SEGMENTATION.
Wang et al. Object tracking with shallow convolution feature

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160120

Termination date: 20190819

CF01 Termination of patent right due to non-payment of annual fee