CN107844739B

CN107844739B - Robust target tracking method based on self-adaptive simultaneous sparse representation

Info

Publication number: CN107844739B
Application number: CN201710625586.4A
Authority: CN
Inventors: 樊庆宇; 李厚彪; 羊恺; 王梦云; 陈鑫; 李滚
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-07-27
Filing date: 2017-07-27
Publication date: 2020-09-04
Anticipated expiration: 2037-07-27
Also published as: CN107844739A

Abstract

The invention discloses a robustness target tracking method based on self-adaptive simultaneous sparse representation, which comprises the following steps of: s1, adaptively establishing a simultaneous sparse tracking model according to the size of the Laplace noise energy; s2, solving the established tracking model; and S3, updating the template. The tracking method provided by the invention has the advantages of good tracking and identifying effects, strong anti-interference performance, capability of realizing more accurate and real-time target tracking and stability.

Description

Robust target tracking method based on self-adaptive simultaneous sparse representation

Technical Field

The invention belongs to the technical field of computer image processing, relates to a target tracking method, and more particularly relates to a robustness target tracking method based on self-adaption and sparse representation.

Background

Object tracking plays an important role in the field of computer vision, and with the use of high-quality computers and cameras and the need for automatic video analysis, there is a strong interest in object tracking. The main tasks of target tracking include: the method comprises the steps of detection of interested moving objects, continuous tracking from video frame to video frame and behavior analysis of tracked objects. Currently, relevant applications of object tracking include: motion recognition, video retrieval, human-computer interaction, traffic monitoring, vehicle navigation and the like.

Currently, although many tracking algorithms have been proposed, the target tracking technology still faces many challenges. Video noise pollution, such as target posture change, illumination change, background mixing, partial occlusion and complete occlusion, is often encountered in the actual tracking process, and these problems often cause target tracking failure (tracking drift). Especially, the influence of long-time shielding on target tracking is more catastrophic, the illumination change depends on the environment where the target is located, the greater the illumination change is, the greater the influence on the tracking effect is, the background mixing and the target posture change can influence the tracking accuracy, so that tracking drift can be caused, the tracking error can be accumulated due to video noise pollution, and finally, the tracking failure can be caused.

For target tracking, solving the target shape variation is a basic and challenging problem, and the target shape variation can be divided into an internal shape variation and an external shape variation. Pose changes are intrinsic appearance changes, while illumination changes, background clutter, partial occlusions, and complete occlusions pertain to extrinsic appearance changes. To solve these appearance changes, an adaptive tracking method, i.e., an online learning method, is required. Currently, online learning methods can be roughly classified into two types, namely, generation methods (genetic applications) and discriminant methods (discriminant applications). The generation method (GA) is a method for searching for the region most similar to the tracked target, and the discrimination method (DA) can be regarded as a binary problem, and the main purpose of the method is to train a classifier by using a known training sample for discriminating the target and the background. Although reliable tracking is realized to a certain degree, the discrimination method and the generation method have respective defects, firstly, the discrimination method has higher requirement on feature extraction, so that the discrimination method is sensitive to noise in the actual tracking process, tracking failure may occur on a target with larger noise, and the generation method cannot accurately search an area similar to the target under a mixed background, so that tracking failure is easy to occur; secondly, the discrimination method needs enough training sample sets, good samples can improve the performance of the classifier, bad samples can weaken the performance of the classifier, if the bad samples are introduced into the classifier, the tracking effect can be influenced, the generation method is sensitive to the template, once the shielded target is introduced into the template by mistake, the tracking failure can occur, and therefore the two methods do not have enough robustness in the real scene tracking.

Disclosure of Invention

The present invention is directed to solve the above problems, and an object of the present invention is to provide a robust target tracking method capable of continuously and accurately tracking a target object, which can meet various challenges such as illumination change, scale change, occlusion, deformation, motion blur, fast motion, rotation, background clutter, and low resolution of the target object in a video.

The invention aims to realize a robustness target tracking method based on self-adaptive simultaneous sparse representation, which comprises the following steps:

s1, adaptively establishing a simultaneous sparse tracking model according to the size of the Laplace noise energy;

s2, solving the established tracking model;

and S3, updating the template.

Further, the execution method of step S1 is: contrast Laplace mean noise | | S | non-woven phosphor₂And (3) adaptively establishing a simultaneous sparse tracking model according to the magnitude of a given noise energy threshold value tau and the comparison result:

when | | S | non-woven phosphor₂When tau is not more than, the sparse tracking model is as follows:

when | | S | non-woven phosphor₂When τ is greater, the simultaneous sparse tracking model is:

wherein D is a tracking template and is represented as D ═ T, I]And T is a target template,

d represents the dimension of the image, n represents the number of the base vectors of the template, each column of T represents the tracking template through the zero-mean vector D, and Y is the candidateSelect target set, λ₁And λ₂Is the regularization parameter of the model, X is the sparse coefficient, S is Laplace noise,

Further, the step S2 includes:

s21, calculating a tracking target y_tThe similarity with the template T mean value is marked as sim;

s22, judging the size of the threshold value alpha of the included angle between sim and cosine, and adaptively selecting a model for tracking:

when sim is less than alpha, adopting the tracking model of the above formula (1), and solving by an alternative direction multiplier (ADMM) method to obtain a sparse coefficient X;

when sim is larger than or equal to alpha, the tracking model of the above formula (2) is adopted, and the solution is carried out by an alternative direction multiplier (ADMM) method to obtain a sparse coefficient X and Laplace noise S.

Further, in the step S21, the similarity between the tracking target and the template mean is calculated according to the following formula:

where c is the average value of the template T, y is the tracked target, sim may be equivalent to the inverse cosine of the target and the template average, i.e. the cosine angle.

Further, when | | s | | non-woven phosphor₂And when the value is less than or equal to tau, updating the template T.

Further, the template updating method includes:

s31, performing singular value decomposition on the current template T and the tracked target y respectively as follows:

s32, adding by using singular vectors u, S, vUpdating U, S, V to obtain new singular vector U_*,S_*,V_*Then the new template is represented as:

T_*＝U_*S_*V_* ^T(5)

further, the method also includes step S33: training the template by using an unsupervised learning K-means method, and giving K as the number of the initial classes

Wherein i represents the ith sample, when T_i*When it belongs to class k, then r_ik1, otherwise r_ik＝0，u_kJ represents the sum of the distances from the sample point to the mean of the class of sample points, which is the mean of all samples belonging to class k; by u_kThe average value of all samples of the kth class can be obtained, and the dimension of the original template is reduced; thus, the updated template is:

T_new＝[u₁,u₂,…,u_k](7)

the invention also provides a self-adaptive sparse representation rapid target tracking method, which comprises the following steps:

A. establishing a target tracking model:

B. obtaining an optimal tracking target by using an alternate iteration method;

C. updating the target template T and obtaining a new template T^*；

D. And returning the best tracking target and target template set, and continuing the target tracking of the next frame.

Preferably, the step C includes:

c1, calculating the similarity between the tracking target and the template mean value, and recording the similarity as sim;

c2, comparing similarity sim with cosine included angle threshold values alpha and beta, if sim is less than alpha, executing step C3 to continuously update the template, if alpha is less than or equal to sim and less than or equal to beta, y ← m, executing step C3 to update the template, if sim is greater than beta, at this time, the tracking target is basically dissimilar to the template, which indicates that the target is seriously polluted by noise, and the template is not updated;

c3, carrying out singular value decomposition on the template T to obtain T ═ U ∑ V^TIncrementally updating the left singular vector U of the template T and obtaining a new singular vector U^*Combining the formulas (5) and (6), calculating to obtain a new template

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention provides a new template selection and updating method under the frame of sparse representation, the method emphasizes the real-time updating of the template, simultaneously, because the real-time updating is the left singular vector of the template, the error introduced by the updating can be controlled to be very low, and compared with the method of removing the noise of the target and then introducing a new target template, the method also introduces the K-means technology in the updating of the template, thereby reducing the dimensionality of the template, effectively reducing the redundant template vector, improving the real-time performance of tracking and weakening the influence of the noise. The traditional template updating method is easy to introduce larger noise errors, and causes a lot of uncertainty for tracking the next frame of target;

(2) according to the target tracking model, the influence of Gaussian noise and Laplace noise is fully considered, and the regular term of the model is selected in a self-adaptive mode according to the size of Laplace noise energy, so that the tracking precision is improved, and the tracking instantaneity is also improved;

(3) according to the invention, the ADMM algorithm is combined into the solution of the tracking model, and the solution of the model is more stable through controlling the regular term parameters.

Detailed Description

The robust target tracking method based on adaptive simultaneous sparse representation according to the present invention will be further described with reference to specific embodiments.

A robustness target tracking method based on self-adaptive simultaneous sparse representation comprises the following steps:

s1, according to the Laplace noise energy, self-adaptively establishing a simultaneous sparse tracking model

Contrast Laplace mean noise | | S | non-woven phosphor₂And (3) adaptively establishing a simultaneous sparse tracking model according to the magnitude of a given noise energy threshold value tau and the comparison result:

where the definition D ═ T, I denotes the tracking template, I denotes the trivial template, given the set of images of the target template

d represents the dimension of the image, n represents the number of template basis vectors, and each column of T is a vector after zero mean. Wherein Y ═ Y₁,y₂,…,y_m]Representing all candidate targets, assuming that the noise follows a laplacian of gaussian distribution:

Y＝TZ+S+E (9)

s represents Laplace noise, E represents Gaussian noise, X is a sparse coefficient,

λ₁and λ₂Is a canonical parameter of the model and,

for the fidelity term of the sparse model, the model has this after considering Laplace noiseDeformation, | X | | non-conducting phosphor_1,1Is a coefficient matrix, can better extract the similarity between particles and effectively remove the template redundant information, | S | | calculation_1,1The laplacian noise energy is characterized.

Although the existing sparse representation target tracking algorithm solves the influences of partial shielding, illumination change, posture change, background mixing and the like to a certain extent, models are established based on the condition that noise obeys Gaussian distribution, the noise distribution condition is considered too simply, and therefore tracking failure may occur in the face of some complex noise distribution conditions. According to the target tracking model, the influence of Gaussian noise and Laplace noise is fully considered, and the regular term of the model is selected in a self-adaptive mode according to the size of Laplace noise energy, so that the tracking precision is improved, and the tracking instantaneity is also improved.

S2, solving the established tracking model

Preferably, the step S2 includes:

s21, calculating a tracking target y_tSimilarity with the template mean value is marked as sim;

preferably, the similarity of the tracked object to the template mean is calculated according to the following formula:

when sim is less than alpha, adopting the tracking model of the above formula (1), solving by an alternative direction multiplier (ADMM) method and obtaining a sparse coefficient X;

when sim is larger than or equal to alpha, the tracking model of the above formula (2) is adopted, and the solution is carried out by an alternative direction multiplier (ADMM) method to obtain a sparse coefficient X and noise S.

For a clearer explanation of the solution of the established tracking model, the following illustrates the solution of the above equation (2):

the objective function of the above equations (1) and (2) is a convex optimization problem, so that the objective function can be solved by using an unconstrained optimization method, the Alternative Direction Multiplier Method (ADMM) is a classical method of unconstrained solution, and has the advantages of stable solution, fast convergence speed and the like, and the optimization problem (2) is solved by using the ADMM method as follows:

first, change the constrained problem to an unconstrained problem

Wherein

Is an indicator function (x)_iRepresents the ith row of X, and if X_iNot negative, then τ₊(x_i) Equal to 0; else tau₊(x_i) Equal to + ∞). Thus, the optimization problem (2) has the following equivalent form:

s.t.DX＝V₁,X＝V₂,X＝V₃. (11)

wherein V₁,V₂,V₃For dual variables, equation (11) is further optimized as

Here, the first and second liquid crystal display panels are,

the augmented Lagrangian function of equation (11) is

Wherein β denotes the lagrange multiplier, U ═ U₁,U₂,U₃]^T.

Equation (15) can be decomposed into three sub-optimization problems, namely, the sub-problem of X, the sub-problem of S, and the sub-problem of V. These sub-optimization problems are solved as follows:

thus, according to the extreme principle, only the first derivative of the above subproblems is needed, and the optimal solution of equation (15) can be obtained as follows:

V_1*＝[β(TX-U₁)+(Y-S)]/(1+β)

V_2*＝shrink(X-U₂,λ₁/β)

V_3*＝max(0,X-U₃)

S_*＝shrink(Y-V₁,λ₂)

X_*＝(T^TT+2I)^-1[T^T(V₁+U₁)+V₂+U₂+V₃+U₃]

where shrink is a compact operator, i.e. for a non-negative vector p, there is

Similarly, the ADMM method can still be used to solve for equation (1) above, and the following solution is obtained:

V_1*＝[β(TX-U₁)+Y]/(1+β)

V_2*＝shrink(X-U₂,λ₁/β)

V_3*＝max(0,X-U₃)

X_*＝(T^TT+2I)^-1[T^T(V₁+U₁)+V₂+U₂+V₃+U₃]

the general form of the solution of equations (1) and (2) is thus obtained by analysis and solution of the sub-problem. If a given lagrange multiplier β is input, the optimal numerical solutions of equations (1) and (2) can be obtained by alternating direction iteration, where table 1 is the ADMM solution algorithm process of equation (1), and table 2 is the ADMM solution algorithm process of equation (2).

TABLE 1

TABLE 2

S3, updating the template

Further, the template updating method includes:

T＝USV^T

y＝usv^T(4)

s32, using the singular vectors U, S and V to update the U, S and V in a de-increment mode, thereby obtaining a new singular vector U_*,S_*,V_*Then the new template is represented as:

T_*＝U_*S_*V_* ^T(5)

preferably, the method further comprises step S33: training a template by using an unsupervised learning K-means method, wherein the K-means learning method comprises the following steps of giving K as the number of initial classes:

where i represents the ith sample, when T_i*When it belongs to class k, then r_ik1, otherwise r_ik＝0，u_kJ represents the sum of the distances from the sample point to the mean of the class of sample points, which is the mean of all samples belonging to class k; by u_kThe average of all samples of the kth class can be obtained, and the dimension of the original template is reduced. Thus, the updated template is:

T_new＝[u₁,u₂,…,u_k](7)

the template updating method provided by the invention has stronger robustness on shielding and illumination change, is different from the traditional template updating method, emphasizes that a template which has important contribution to target tracking is selected, the trivial template is avoided, and the template is subjected to unsupervised training through a K-means algorithm, so that redundant information of the template is greatly removed, and the real-time performance of tracking is improved.

As shown in table 3, the adaptive simultaneous sparse representation target tracking algorithm process of the embodiment of the present invention is as follows:

TABLE 3

In the following, the target tracking method (Ours) provided by the present invention will be compared with five other existing methods with good tracking performance by experiments, wherein the five tracking algorithms are respectively cyclic matrix tracking (CSK) of kernel technique, accelerated dual gradient tracking (L1APG), multi-task tracking (MTT), Sparse Prototype Tracking (SPT) and sparse joint tracking (SCM). The following experiments were performed on a platform based on Matlab 2012a, computer memory 2GB, and CPU intel (r) core (tm) i 3.

Data and experimental description:

the experiment selects 14 different videos with tracking challenge, wherein the factors influencing the tracking result comprise occlusion, illumination change, background mixing, posture change, low resolution and fast motion, and the attribute profile of the video sequence is shown in table 4. In table 4 the video contains different noise, where OV denotes object loss, BC denotes background clutter, OCC denotes full occlusion, OCP denotes partial occlusion, OPR denotes rotated out-of-plane, LR denotes low resolution, FM denotes fast motion, and SV denotes size change. The experiment of the embodiment of the invention adopts three evaluation methods, and each evaluation method can explain the quality of the tracking performance to a certain extent, namely a Local Center Error (Center Local Error), an Overlap Ratio (overlay Ratio) and an Area Under a Curve (Area Under Curve). Real target frame R of a given frame_g(ground route) and tracking target frame R_t(tracked target bounding) without setting their center positions to p_g＝(x_g,y_g) And p_t＝(x_t,y_t) Then the local center error is CLE | | | p_g-p_t||₂The overlapping rate is

area () represents all pixels in the region, and the value at each point of the area curve (AUC) represents the success rate of the video tracking when the overlap rate is greater than a given threshold η. Specifically, let η be 0.5, and when the overlap ratio OR > 0.5, the frame is considered to be successfully tracked. The correlation tracking results are shown in tables 4, 5 and 6.

TABLE 4

Video sequence	Number of frames	Noise(s)	Video sequence	Number of frames	Noise(s)
						Walking2	495	SV,OCP,LR	Suv	945	OCC,OV,BC
Car4	659	IV,SV	CarDark	393	IV,BC,LR
						Car2	913	IV,SV,BC	Deer	71	FM,LR,BC
Girl	500	OPR,OCC,LR	Singr2	366	IV,OPR,BC
						FaceOcc2	812	OCC,OPR,IV	Skater2	435	SV,OPR
Football	362	OCC,OPR,BC	Dudek	1145	OCC,BC,OV
						FaceOcc1	892	OCC	Subway	175	OCC,BC

Table 5 is a comparison of various algorithm performance based on average overlap ratio, where AOR represents the total average overlap ratio, where a larger average overlap ratio indicates better tracking performance; parameters in the experimentsThe regularization parameter λ is set₁＝0.1，λ₂0.1, 0.1 penalty factor β, minimum cosine angle threshold α_min20, max α_max35, the maximum number of basis vectors of the template is 15, the number of particle samples is 600, the size of the image block is 25 × 25, the maximum iteration number Loop of the experiment is 20, the convergence error tol is 0.001, and the parameter λ in the experiment is₁,λ₂Are all obtained by a cross-validation method, and lambda₂The adjustment of the parameters satisfies the rule that if the energy of the laplacian noise S is large (i.e. the target suffers from large occlusion, shape change or illumination change), then λ is obtained₂Should be small and vice versa. As can be seen from the experimental comparison data in table 5, the tracking method (Ours) provided by the present invention has a significantly higher average overlapping rate from different video categories or total average overlapping rate than other tracking methods, i.e. the tracking method of the present invention achieves the best tracking performance effect.

TABLE 5

The comparison of the performance of the various algorithms based on the average local center error is shown in table 6, where ACLE represents the total average center error, and a smaller average center error indicates better tracking performance. As is apparent from the data in table 6, the tracking method (Ours) provided by the present invention, whether the average central error from different video categories or the total average central error, is significantly lower than other tracking methods, i.e. the tracking method of the present invention achieves the optimal final performance effect.

TABLE 6

As shown in Table 7, the comparison of the performance of various algorithms based on average success rate, where ASR represents the total average success rate;

TABLE 7

As can be seen from the comparative data in table 7, the tracking method (Ours) provided by the present invention is not limited to the average success rate for different video categories or the total average success rate, which is evaluated by other tracking methods.

In order to further understand the tracking algorithm proposed by the present invention, the specific influence of laplacian noise and the template update criterion mentioned in the model on the tracking effect will be described below.

The traditional template updating method directly updates by tracking the similarity between the target and the template, if the similarity is greater than a given threshold, the target is considered to be polluted by larger noise, so the tracked target needs to replace a template vector with smaller original weight, and the replacement is rough, because larger noise errors are introduced, a lot of uncertainty is caused for tracking the target of the next frame, and the new template updating method provided by the invention weakens the noise influence. The concrete expression is as follows:

(1) the new template updating method effectively balances the weight between the original template vector and the new tracking target, and realizes template updating through forgetting factors;

(2) the new template updating method introduces a K-means method, which can effectively reduce redundant template vectors and improve the real-time performance of tracking, and the calculation of class centers is obtained by weighted average, so that the noise can be effectively reduced.

The following shows that the specific experiment respectively compares the influence of the template updating and the Laplace on the experiment effect, and the experimental object is selected: MTT algorithm, ASSAT algorithm (laplace only), ASSAT (template update only), ASSAT (laplace + template update), experimental data selection sequence Skater2, Dudek, SUV, Walking2, Subway, der, etc.

Table 8 is a comparison of the influence of laplace on the experimental results, and it can be seen from table 8 that, except for the Walking2 sequence, the tracking effect of laplace noise added is better than that of the MTT algorithm, but the original template updating method limits the tracking performance thereof, and the new template updating method proposed promotes the tracking performance of the ASSAT algorithm.

TABLE 8

Table 9 shows a comparison of the influence of different template updates on the experimental results, and it can be seen from table 9 that the tracking effect of the ASSAT method and the IVT method using only the template update is almost the same, and the tracking effect is not improved much, and for Skater2, the effect of both the approach is not good, because the two sequences contain large occlusion, and for the ASSAT algorithm that only considers the template update but not laplacian noise, the target cannot be effectively tracked, and the IVT is the same. In fact, for the case of large occlusion, the noise factor can be reduced to influence the sparse structure of the solution X in the formula (1) if the laplace noise is not considered.

TABLE 9

The influence of noise selection of different targets under the shielding condition on the solution is also shown in the following way that when the Laplace noise is considered, the obtained solution is sparse, the solution is optimal, and when the Laplace noise is not considered, the obtained solution is dense non-optimal solution, so that the tracking performance of the algorithm is directly influenced by the sparse structure of the solution.

A. establishing a target tracking model:

s.t.X≥0. (8)

B. obtaining an optimal tracking target by using an alternate iteration method;

C. updating the target template T and obtaining a new template T^*

Preferably, the step C includes:

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The robust target tracking method based on the adaptive simultaneous sparse representation is characterized by comprising the following steps of:

s2, solving the established tracking model;

s3, updating the template;

the execution method of step S1 is: to pairDistribution of the Laplacian mean noise | | S | | non-woven phosphor₂And (3) adaptively establishing a simultaneous sparse tracking model according to the magnitude of a given noise energy threshold value tau and the comparison result:

wherein D represents a tracking template, Y is a candidate target set, and lambda₁And λ₂Is the regularization parameter of the model, X is the sparse coefficient, S is Laplace noise,

2. The adaptive simultaneous sparse representation-based robust target tracking method according to claim 1, wherein the tracking template D is represented as D ═ T, I]Wherein, T is a target template,

d denotes the dimension of the image, n denotes the number of template basis vectors, d>>Each column of n, T is a vector averaged by zero, and I represents a trivial template.

3. The adaptive simultaneous sparse representation-based robust target tracking method according to claim 2, wherein said step S2 comprises:

4. The adaptive simultaneous sparse representation-based robust target tracking method according to claim 3, wherein the similarity between the tracked target and the template Tmean is calculated in step S21 according to the following formula:

5. The robust target tracking method based on adaptive simultaneous sparse representation according to any one of claims 1 to 4, wherein when | | s | | Y₂And when the value is less than or equal to tau, updating the template T.

6. The adaptive simultaneous sparse representation-based robust target tracking method according to claim 5, wherein updating the template T comprises the steps of:

s32, using the singular vectors U, S and V to update the U, S and V in an incremental mode, thereby obtaining a new singular vector U_*,S_*,V_*Then is newThe template of (a) is represented as:

T_*＝U_*S_*V_* ^T(5)

7. the robust target tracking method based on adaptive simultaneous sparse representation according to claim 6, further comprising step S33: training the template by using an unsupervised learning K-means method, and giving K as the number of the initial classes

Wherein i represents the ith sample when

When it belongs to class k, then r_ik1, otherwise r_ik＝0，u_kJ represents the sum of the distances of the sample points to the mean of the class to which the sample points belong; by u_kThen the average value of all samples in the kth class can be obtained, and thus the updated template is:

T_new＝[u₁,u₂,…,u_k](7)。