CN106203495B

CN106203495B - Target tracking method based on sparse discriminant learning

Info

Publication number: CN106203495B
Application number: CN201610519851.6A
Authority: CN
Inventors: 詹瑾; 肖政宏
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2016-07-01
Filing date: 2016-07-01
Publication date: 2020-03-17
Anticipated expiration: 2036-07-01
Also published as: CN106203495A

Abstract

The invention discloses a target tracking method based on sparse discriminant learning, which comprises the following steps: sampling a target and a surrounding background of the target for a first frame, and constructing a target appearance model; extracting two-dimensional image features of the target appearance model and carrying out normalization processing to obtain an initial dictionary; a supervised discriminative dictionary learning method is introduced, classification error terms are added on the basis of error term reconstruction, and a discriminative dictionary is trained; solving a minimum reconstruction error term under the constraint of sparsity, and updating a dictionary and sparse codes by adopting an iterative exchange optimization strategy; and measuring the similarity between the samples by adopting the Euclidean distance, and taking the sample with the highest similarity as a tracking target. According to the embodiment of the invention, when the appearance model of the target is established, the surrounding background which has spatial correlation with the target is added as a clue template, and the change of the posture of the target is processed more robustly.

Description

Target tracking method based on sparse discriminant learning

Technical Field

The invention relates to the technical field of information, in particular to a target tracking method based on sparse discriminant learning

Background

Target tracking is one of the important basic problems in the field of computer vision research, and has very wide application in monitoring, motion estimation, human-computer interaction and the like. Many tracking algorithms appearing in recent years can better track target objects in certain scenes, such as particle filtering, Boosting algorithm and L₁Tracking algorithms, etc. However, since a video is a time sequence image sequence in a complex scene, the complex scene includes illumination change, occlusion, motion deformation, background clutter, target scale change, and the like, constructing a self-adaptive target expression model to obtain a robust tracking algorithm is a research hotspot in the current tracking field and is also a difficult problem.

In recent tracking algorithms, some of the tracking problems are formulated as optimal template matching or maximum likelihood region estimation problems by establishing a target appearance model, these methods are called tracking algorithms based on a generation model, and how to construct an accurate appearance model to adapt to the change of the target appearance is the core of the algorithms. Some of the methods consider tracking as a binary classification problem, and consider tracking as separating foreground objects from background, and this method adopts many classifier algorithms, called discriminant model-based tracking algorithms. Such as a naive bayes classifier, boosting-based classifier, support vector machine, P-N learning classifier, etc. The accuracy of the classifier algorithm performs better in the case of large training examples, so online updating is usually used to obtain more training examples.

The technology for reconstructing signals by sparse representation has prominent expression in the applications of face recognition, image classification, image de-noising and deblurring and the like, and the tracking algorithm based on sparse representation also achieves certain progress. The algorithms use the first frame of target information to create a template set comprising target and noise, each sample can be represented as a sparse linear combination of a number of template column vectors, and the sparsity is solved by solving for L₁And solving by regularizing a least square problem, and taking the candidate with the minimum reconstruction error as a tracking target. Due to the introduction of the noise template, the algorithm is more accurate and robust when the occlusion is processed, but when the target is deformed frequently, the base vectors of the template set cannot represent the deformed target. In addition, the template set is actually a dictionary without a learning process, so that the obtained sparse code cannot ensure the maximum sparsity, and the sample with the minimum reconstruction error is not necessarily the best candidate, so that the drift error is easy to accumulate.

Disclosure of Invention

The invention provides a target tracking method based on sparse discriminant learning, wherein when a target appearance model is established, a surrounding background which has spatial correlation with a target is added as a clue template so as to approximate the change of the target posture.

The invention provides a target tracking method based on sparse discriminant learning, which comprises the following steps:

constructing a target appearance model;

converting a two-dimensional image block into a one-dimensional vector for processing through feature extraction on the constructed target appearance model, and performing normalization processing on the one-dimensional vector;

a discrimination dictionary learning method based on supervision is introduced into the current tracking, and classification error items are added on the basis of reconstruction error items;

solving a minimum reconstruction error term under the constraint of sparsity, and updating a dictionary and sparse codes by adopting an iterative exchange optimization strategy;

and measuring the similarity between vectors by adopting Euclidean distance, and taking the sample with the highest similarity as a candidate tracking target.

The constructing of the target appearance model comprises:

and representing a target appearance model by using a dictionary, converting the tracking problem into an optimal sparse approximation problem, and modeling the target appearance as a template dictionary containing a target and noise.

The supervised discriminative dictionary learning method is introduced into the current tracking, and the addition of the classification error term on the basis of the reconstruction error term specifically comprises the following steps:

solving an objective function based on a dictionary and sparse coding, wherein the objective function is as follows:

wherein the first term is a reconstruction error term, Y ═ Y₁,y₂...y_N]∈R^n×NIs a training sample, D is a set of overcomplete dictionary bases to learn, X ═ X₁,x₂...x_N]∈R^k×NIs a sparse coding coefficient; the second term is the classification error term, which is expressed herein using the loss function of the linear classifier f (x) ═ Wx + b, W is the classifier parameter, h_iIs y_iLabeling the category of (1); ε is the threshold to control sparsity.

The solving of the minimum reconstruction error term under the sparsity constraint is specifically as follows:

converting the constraint condition into an L1 norm of sparse coding as a regular term to ensure sparsity, and converting into solving an unconstrained optimization problem:

wherein G ═ G₁,g₂...g_N]∈R^2×NIs a category label set with Y as foreground or background when Y is_iIs the target sampling time g_i＝[1,0]^TOtherwise g_i＝[0,1]^Tα is a parameter for controlling the weight of class information, and λ is a regular parameter for controlling the sparsity of reconstruction.

The updating of the dictionary and the sparse coding by adopting the iterative exchange optimization strategy specifically comprises the following steps:

order to

The equation is converted into a common approximation problem of the minimum reconstruction error under sparsity constraint:

updating D and X by adopting an iterative alternation optimization strategy:

the method comprises the following steps: initializing Y, G, D, Y ', D';

step two: fixing D', adopting OMP to solve optimization model, updating sparse code X

Step three: fixing X, solving a least square problem with constraint, and updating a dictionary D';

step four: if the maximum iteration times are reached, the iteration is terminated; or the function absolute value difference after two adjacent iterations converges on a threshold value, and the iteration is terminated; otherwise, returning to the step two.

The method for measuring the similarity between vectors by using the euclidean distance specifically includes the following steps of:

where R (i) is the reconstruction error of the ith sample, d_iIs the dictionary column vector corresponding to the ith sample sparse coding maximum nonnegative value, d₀Is the dictionary column vector corresponding to the initial target.

In the present invention, the spatial correlation of the target with the surroundings is taken into account when modeling the appearance of the target. Since the periphery of the object contains a part of the object information and the background information, when the object is deformed in a period of time, the object can be approximately represented. By using a supervised discriminative dictionary learning method, an over-complete dictionary which gives consideration to both reconstruction and discriminative performance is solved through iterative optimization, and the obtained sparse codes are very sparse and have discriminative performance. And on a candidate target searching algorithm, constructing similarity according to the distribution of the maximum non-negative values of the sample sparse codes and the minimum reconstruction errors to determine candidate targets. When the appearance model of the target is established, a surrounding background which has spatial correlation with the target is added as a clue template so as to approximate the change of the target posture. Through the dictionary of the discrimination sparse learning, the obtained sparse code has higher sparsity and discrimination. And obtaining the estimated candidate tracking target by utilizing the distribution of the dictionary basis vectors corresponding to the maximum non-negative value in the sparse coding and the minimum reconstruction error of the sample. Experimental results show that the method can better track the target under the conditions of processing shielding, illumination change, posture change, background disorder and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a target tracking method based on sparse discriminant learning according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a target tracking method based on sparse discriminant learning in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention considers the spatial correlation of the target and the surrounding when the appearance of the target is modeled. Since the periphery of the object contains a part of the object information and the background information, when the object is deformed in a period of time, the object can be approximately represented. In addition, by using a supervised discriminative dictionary learning method, an over-complete dictionary which has both reconstruction and discriminative performance is solved through iterative optimization, and the obtained sparse code is very sparse and has discriminative performance. And on a candidate target searching algorithm, constructing similarity according to the distribution of the maximum non-negative values of the sample sparse codes and the minimum reconstruction errors to determine candidate targets.

Fig. 1 shows a flowchart of a target tracking method based on discriminant sparse template matching in an embodiment of the present invention, including the following steps:

s101, constructing a target appearance model;

in sparse representation theory, a signal y ∈ RⁿIt can be optimally represented by m basis vectors of a dictionary D:

where Λ represents the selected radix vector index set and di represents the column vectors in dictionary D. If all basis vectors in the dictionary form a linear correlation set and the dictionary is redundant, there will be an infinite number of representations of zero error for the target signal, but the sparsest of them is found, i.e. the following optimization problem is solved:

min||x||₀s.t.y＝Dx

wherein D represents a matrix formed by taking dictionary atoms (basis vectors) as columns, and 0 norm | | · | | purple₀Representing the number of non-zero elements in the vector. If the dictionary D is fixed, then this is a least squares problem; if the optimal dictionary base is to be selected, then this is a combination problem, which is an NP-hard problem. The existing ceramic huchenxuan et al prove that L is under the condition that D meets the UUP principle₀The norm may use L₁Norm substitution, so the problem becomes a convex optimization problem. Considering the minimum reconstruction error approximation of the target signal under the sparsity constraint, converting the minimum reconstruction error approximation into the following optimization problem:

a dictionary is used for representing an appearance model of a target, the tracking problem is converted into an optimal sparse approximation problem, and how to learn an effective dictionary to process various target changes is very important. Usually the appearance of an object is modeled as a template dictionary containing objects and noise, but this does not approximate an object that has been deformed. The background information around the object does not change much in a short time when the object is deformed, that is, the spatial information of the object and the background is maintained, and thus can be used to approximate the object. Thus here the template dictionary is extended into three parts: a target template T, a cue template B, and a noise template, wherein the cue template is sampled around the target. The tracking problem is finding the sample in the subsequent frame that best matches the target template or the cue template because the basis vectors of the template dictionary of the same class exist in a linear subspace, and when there are enough basis vectors, the template dictionary can represent two cases: first, when the target is not deformed, the candidate target sample y1 of the subsequent frame exists approximately in the subspace spanned by the column vectors of T₁Tx + e, the non-target sample y2 exists approximately in the sub-space spanned by the column vector of B, y₂X + e, e is used to denote occlusion and noise, and is a small noise figure. Second, when the target is partially deformed, both the candidate target sample y1 and the non-target sample y2 are approximately represented byB column vector, but the target candidate sample has a very small noise coefficient, while the non-target sample has a very large noise coefficient, y ≈ Bx + e. Combining the above two equations, a sampled signal can be expressed as:

wherein T ═ T₁,t₂...t_p]∈R^n×pIs the target template, B ═ B₁,b₂...b_q]∈R^n×qIs a clue template, I is an element of Rⁿ ^×nIs a noise template, is an n-order unit diagonal matrix. n is the characteristic dimension of a signal.

D＝[d₁,d₂...d_k]∈R^n×k(k ═ p + q + n) is an initial template dictionary, used for dictionary learning. x is the corresponding sparse coding of y under dictionary D.

S102, converting a two-dimensional image block into a one-dimensional vector for processing through feature extraction on the constructed target appearance model, and performing normalization processing on the one-dimensional vector;

in this embodiment, the template is sampled as image blocks of equal size, two-dimensional image blocks are converted into one-dimensional vectors through feature extraction for processing, and the one-dimensional vectors are normalized. After normalization, the energy of the one-dimensional vectors is equal, and the vectors can be regarded as vectors with the starting point in different directions of the origin. The target template T is sampled around the target frame given in the first frame, let l₀(c) Representing the position of the center point of the target, the sampling center point of the target template falls on₀(c) As a circle center, within a range of radius r0, { T | | | l₀(t)-l₀(c)||<r₀}. The thread template falls at the center point by l₀(c) Randomly sampling in a circular range as a center, wherein B is { B | r1<||l₀(b)-l₀(c)||<r₂Are and r₀<r₁<r₂。

S103, introducing a discrimination dictionary learning method based on supervision in the current tracking, and adding classification error terms on the basis of error term reconstruction;

the dictionary learning aims at finding a new feature space with strong expression capacity, and the learned dictionary has very important influence on the performance of sparse coding. In terms of tracking problems, although a dictionary in a non-learning process or a non-supervised dictionary learning can achieve certain tracking accuracy, in order to obtain a better effect, classification information of an object and a background needs to be combined, and the method is called a supervised discriminative dictionary learning method, wherein discriminative performance is represented as: the dictionary base atoms selected by the optimal sparse coding must be linearly independent. A supervised discriminative dictionary learning method is applied to the tracking problem, and classification error terms are added on the basis of reconstruction error terms. Therefore, solving the dictionary and sparse coding objective function is:

where the first term is the reconstruction error term, Y ═ Y₁,y₂...y_N]∈R^n×NIs a training sample and D is a set of overcomplete dictionary bases to be learned. X ═ X₁,x₂...x_N]∈R^k×NAre sparse coding coefficients. The second term is the classification error term, which is expressed herein using the loss function of the linear classifier f (x) ═ Wx + b, W is the classifier parameter, h_iIs y_iThe category label of (1). ε is the threshold to control sparsity.

S104, solving a minimum reconstruction error term under sparsity constraint, and updating a dictionary and sparse codes by adopting an iterative exchange optimization strategy;

converting the constraint condition of equation 2 into an L1 norm of sparse coding as a regular term to ensure sparsity, namely converting into solving an unconstrained optimization problem:

wherein G ═ G₁,g₂...g_N]∈R^2×NIs Y is frontClass label set of scene or background when y_iIs the target sampling time g_i＝[1,0]^TOtherwise g_i＝[0,1]^Tα is a parameter that controls the class information weight and λ is a regularization parameter that controls the sparsity of the reconstruction.

Equation (3) is appropriately transformed for solution. Order to

This transforms equation 3 into the common least reconstruction error approximation under sparsity constraints:

updating D and X by adopting an iterative alternation optimization strategy:

(1) initializing Y, G, D, Y ', D';

(2) fixing D', solving an optimization model by adopting OMP, and updating the sparse code X;

(3) fixing X, solving a least square problem with constraint, and updating a dictionary D';

(4) and if the maximum iteration times are reached, terminating the iteration. Or the absolute value difference of the function after two adjacent iterations converges on the threshold value, and the iteration is terminated. Otherwise, return to (2).

And S105, measuring the similarity between vectors by adopting the Euclidean distance, and taking the sample with the highest similarity as a candidate tracking target.

Typically, the change of a video object between previous and subsequent frames is correlated, so that in the current frame, the samples obtained by sampling around the center point of the object in the previous frame are the estimated object regions. After the discriminant dictionary is learned in the previous section, sparse coding and reconstruction errors of the sampling samples can be solved. The row number of the maximum non-negative value in the sparse coding corresponds to the column basis vector of the dictionary, namely the template vector which can be most approximate to the sample. The vectors discussed later herein are all referred to as dictionary basis vectors.

Fig. 2 shows a schematic diagram of a target tracking method based on discriminant sparse template matching in an embodiment of the present invention, where, of all samples, the sample with the smallest reconstruction error is not necessarily the best tracking target. Since the basis vectors corresponding to the target template exist in a linear subspace, the best candidate target should also be in this subspace and closest to the initial target vector. Here, euclidean distance is used to measure the similarity between vectors, and the sample with the highest similarity is the candidate tracking target. The number of samples meeting the condition is often large, and then the sample with the minimum reconstruction error is selected as the tracking target sample, that is, the candidate tracking target is the sample with the minimum reconstruction error under the sparse coding distribution constraint. The best tracking target sample obtained by the method is as follows:

In summary, the spatial correlation of the target with the surroundings is taken into account when modeling the appearance of the target. Since the periphery of the object contains a part of the object information and the background information, when the object is deformed in a period of time, the object can be approximately represented. By using a supervised discriminative dictionary learning method, an over-complete dictionary which gives consideration to both reconstruction and discriminative performance is solved through iterative optimization, and the obtained sparse codes are very sparse and have discriminative performance. And on a candidate target searching algorithm, constructing similarity according to the distribution of the maximum non-negative values of the sample sparse codes and the minimum reconstruction errors to determine candidate targets. When the appearance model of the target is established, a surrounding background which has spatial correlation with the target is added as a clue template so as to approximate the change of the target posture. Through the dictionary of the discrimination sparse learning, the obtained sparse code has higher sparsity and discrimination. And obtaining the estimated candidate tracking target by utilizing the distribution of the dictionary basis vectors corresponding to the maximum non-negative value in the sparse coding and the minimum reconstruction error of the sample. Experimental results show that the method can better track the target under the conditions of processing shielding, illumination change, posture change, background disorder and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by instructions associated with hardware via a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.

The target tracking method based on sparse discriminant learning provided by the embodiment of the present invention is described in detail above, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A target tracking method based on sparse discriminant learning is characterized by comprising the following steps:

constructing a target appearance model;

measuring the similarity between vectors by adopting Euclidean distance, and taking a sample with the highest similarity as a candidate tracking target;

the method for learning the discrimination dictionary based on supervision is introduced into the current tracking, and the method for adding the classification error items on the basis of the reconstruction error items specifically comprises the following steps:

wherein the first term is a reconstruction error term, Y ═ Y₁,y₂...y_N]∈R^n×NIs a training sample, D is a set of overcomplete dictionary bases to learn, X ═ X₁,x₂...x_N]∈R^k×NIs a sparse coding coefficient; the second term is a classification error term, expressed as a loss function of the linear classifier f (x) ═ Wx + b, W is the classifier parameter, h_iIs y_iLabeling the category of (1); ε is the threshold controlling sparsity; l { } is the classification loss function;

the method comprises the following steps of converting a two-dimensional image block into a one-dimensional vector for processing through feature extraction on a constructed target appearance model, and performing normalization processing on the one-dimensional vector, wherein the normalization processing comprises the following steps:

sampling a template into image blocks with equal size, converting two-dimensional image blocks into one-dimensional vectors through feature extraction, processing, and normalizing the one-dimensional vectors; after normalization processing, the energy of the one-dimensional vectors is equal and can be regarded as the vectors of the starting point in different directions of the origin;

the target template T is sampled around the target frame given in the first frame, let l₀(c) Representing the position of the center point of the target, the sampling center point of the target template falls on₀(c) As a circle center and has a radius r₀In the range of (1), T ═ T | | | l₀(t)-l₀(c)||<r₀}；l₀(T) represents the sampling center point position of each sample in the target template T; t represents each sample in the set T;

the thread template falls at the center point by l₀(c) Randomly sampling in a circular range as a center, wherein a clue template B is { B | r ═₁<||l₀(b)-l₀(c)||<r₂Are and r₀<r₁<r₂(ii) a The annular radius is r₁And r₂And r is₁<r₂；l₀(b) Representing the location of the sampling center point for each sample in the cue template.

2. The sparse discriminant learning-based target tracking method of claim 1, wherein the constructing the target appearance model comprises:

and representing a target appearance model by using a dictionary, converting the tracking problem into an optimal sparse approximation problem, wherein the target appearance model is a template dictionary comprising a target template T, a clue template B and a noise template.

3. The sparse discriminant learning-based target tracking method of claim 2, wherein the solving of the minimum reconstruction error term under the sparsity constraint is specifically:

converting constraints to L to be sparsely encoded₁The norm is used as a regular term to ensure sparsity and is converted into a solution unconstrained optimization problem:

4. The sparse discriminant learning-based target tracking method of claim 3, wherein the updating of the dictionary and the sparse coding by the iterative exchange optimization strategy specifically comprises:

order to

updating D and X by adopting an iterative alternation optimization strategy:

the method comprises the following steps: initializing Y, G, D, Y ', D';

step two: fixing D', solving an optimization model by adopting OMP, and updating the sparse code X;

5. The target tracking method based on sparse discriminant learning as claimed in claim 4, wherein the euclidean distance is used to measure the similarity between vectors, and the sample with the highest similarity is specifically used as the candidate tracking target: