CN105139422B

CN105139422B - A kind of self-explanatory method for tracking target and device

Info

Publication number: CN105139422B
Application number: CN201510502036.4A
Authority: CN
Inventors: 汤雅妃; 张云勇; 王志军; 郭志斌
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2015-08-14
Filing date: 2015-08-14
Publication date: 2018-05-18
Anticipated expiration: 2035-08-14
Also published as: CN105139422A

Abstract

An embodiment of the present invention provides a kind of self-explanatory method for tracking target and devices, are related to the communications field, to improve the accuracy of target tracker and robustness, the candidate target including obtaining present frame；Obtain N_tGroup target template T and N_bIn group background template B, the template with each candidate target similitude maximum；Each candidate target is divided into classification corresponding with the template of the candidate target similitude maximum, for each candidate target in the target classification, calculating using the template with the candidate target similitude maximum and respectively the candidate target with the background template of the candidate target similitude minimum becomes the possibility of target, and the candidate target of target possibility maximum is positioned as present frame target；According to the obtained present frame target, the corresponding target template T of the present frame target is updated, and the background template B is updated according to candidate target in background classification is divided into.Application target of the embodiment of the present invention tracks field.

Description

Method and device for tracking self-explanatory target

Technical Field

The invention relates to the field of communication, in particular to a method and a device for tracking a self-explanatory target.

Background

Target tracking is one of core technologies in the field of modern intelligent technology, is widely applied in national economy and even defense fields, and essentially aims at giving the motion state of a target on an initial video frame and estimating the motion state of the target in a subsequent video frame sequence. In a general target tracking method, a candidate target is generated firstly, then a model of a candidate target expression method is determined according to the generated candidate target, a target positioning criterion for estimating the possibility that one candidate target becomes a current frame target is determined again, and finally the latest change of the tracking target is updated through an online updating strategy.

In the prior art, a reconstructable expression method is more and more concerned, the reconstructable expression method reconstructs an object through expression, subspace expression and sparse expression are common in the method, and the subspace expression and the sparse expression respectively dig a similarity relation of a candidate object in a low-dimensional space and a neighbor relation of the candidate object and a dictionary (template) so as to improve the expression accuracy.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

however, in the subspace expression and the sparse expression, the similarity of the candidate target in the low-dimensional space and the neighbor relation of the candidate target to the dictionary (template) are all unidirectional, that is, only the target which is positioned before is used for expressing the candidate target, and the mutual expression relation between the candidate targets is not considered.

Disclosure of Invention

The embodiment of the invention provides a self-interpretation target tracking method and device, which can improve the accuracy and robustness of a target tracking device.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in a first aspect, a self-explanatory target tracking method is provided, including:

acquiring a candidate target of a tracking target in a current frame;

obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B;

dividing each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, wherein the category comprises a target category and at least one background category, all target templates correspond to the target category, and each group of background templates respectively corresponds to one background category;

for each candidate target in the target category, respectively calculating the possibility of the candidate target becoming a target by using the template with the maximum similarity with the candidate target and the background template with the minimum similarity with the candidate target, and positioning the candidate target with the maximum target possibility as a current frame target;

and updating the target template T corresponding to the current frame target according to the obtained current frame target, and updating the background template B according to the candidate targets classified into the background category.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the obtaining N is performed _t Group target templates T and N _b The template with the largest similarity to each candidate target in the group background template B includes:

construction of a plasmid containing N _t Group target template T, candidate targets C and N of current frame _b A sample matrix X of a background template B is formed, the background template B comprises background images of a preset number of image sequences in a frame video sequence before the current frame, and the target template T comprises a preset number of target images extracted from a frame sequence before the current frame;

constructing a similarity matrix A of the sample matrix X;

according toObtaining N _t Group target templates T and N _b The template with the highest similarity to each candidate object in the group background template B, wherein N is _l ＝N _b +N _t 。

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the constructing a similarity matrix a of the sample matrix X includes:

constructing a self-explanatory low-rank constraint model of the sample matrix X:

wherein X is a sample matrix, and Z is an expression matrix of X;

and carrying out image matrix low-rank constraint decomposition on the sample matrix X to obtain:

wherein, the first and the second end of the pipe are connected with each other,e is a sparse error part, lambda is larger than 0 to control the weight of the low rank property of Z and the sparse error of E, and the kernel norm | | | Z | | survival _* Is the sum of singular values in the matrix Z;

according to the formulaAnd constructing a similarity matrix A of the sample matrix X.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the equation is based onBefore the similarity matrix A of the sample matrix X is constructed, the method also comprises the step of calculating the nuclear norm of the expression matrix Z to Y calculation _* ：

Will be a formulaAfter introducing the relaxation variables, we obtain:

wherein Z is ₂ As a relaxation variable, Z ₁ Is renamed Z;

to formulaUsing the enhanced lagrange multiplier method, we obtain:

wherein, Y ₁ And Y ₂ Is Lagrange multiplier, tau > 0 is penalty term parameter, | · | calcualty _F F, < - > is the inner product of the two matrixes;

obtaining a minimization function L (Z) by an iterative algorithm ₁ ,Z ₂ E) obtaining Z ₁ Thereby obtainingTo the expression matrix Z.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the obtaining of the minimization function L (Z) by an iterative algorithm is described ₁ ,Z ₂ E) obtaining Z ₁ The method comprises the following steps:

using a threshold puncturing algorithmTo L (Z) ₁ ,Z ₂ And E) performing iterative calculation to obtain:

wherein Z is ₁ ^* 、Z ₂ ^* And E ^* Respectively is such that L (Z) ₁ ,Z ₂ E) the minimum optimal solution;

when the iterative algorithm satisfies L (Z) in adjacent reiterations ₁ ,Z ₂ And E) stopping the iteration when the difference is 0.001-0.01 or the iteration time exceeds the allowed maximum value of 50-100.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the calculating, for each candidate object in the object class, probabilities of the candidate object becoming an object by using the template with the largest similarity to the candidate object and the background template with the smallest similarity to the candidate object, and locating the candidate object with the largest probability of being an object in the current frame includes:

according toFormula calculation using the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target to calculate the probability L of the candidate target becoming the target _l (c)；

Where c is the candidate target, σ is the scale of the exponential function, ρ and β are the weighting parameters of the respective terms,for reconstructed candidate objects, t is the object descriptor of candidate object c, b _k The background descriptor of the background template with the minimum similarity to the candidate target is used, and ej is the jth element of the expression error e of the candidate target c;

according to the formulaMinimizing the likelihood L of the candidate target becoming a target _l (c) Wherein N is _θ The number of target template sets used;

according to the formulaAnd determining the maximum probability distribution among all candidate targets in the current frame as the current frame target, wherein C is the set of all candidate targets in the current frame.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the updating, according to the obtained current frame target, the target template T corresponding to the current frame target includes:

calculating the correlation between the current frame target and the l group target template;

when the correlation between the current frame target and the l-th group of target templates is less than a predefined threshold value theta _k And replacing the target template with the minimum correlation with the current frame target.

With reference to the first aspect, in a seventh possible implementation manner of the first aspect, the background template B is obtained by shifting a target area of each frame by 10% -30% of pixels of the width or height of the target area to m directions, where the m directions are obtained by uniform sampling in a direction range of [0 °,360 °), where m is greater than or equal to 6 and less than or equal to 10.

In a second aspect, an embodiment of the present invention provides a self-interpreting target tracking apparatus, including:

the acquisition unit is used for acquiring a candidate target of the tracking target in the current frame;

a calculation unit for obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B;

the classification unit is used for classifying each candidate target into a category corresponding to the template with the maximum similarity with the candidate target, wherein the category comprises a target category and at least one background category, all the target templates correspond to the target category, and each group of background templates respectively corresponds to one background category;

a positioning unit, configured to separately calculate, for each candidate object in the object class, a probability that the candidate object becomes an object using the template with the greatest similarity to the candidate object and the background template with the smallest similarity to the candidate object, and position the candidate object with the greatest probability as a current frame object;

and the updating unit is used for updating the target template T corresponding to the current frame target according to the obtained current frame target and updating each background template B according to the candidate targets divided into the background categories.

With reference to the second aspect, in a first possible implementation manner of the second aspect, the computing unit includes:

a first constructing module for constructing a container containing N _t Group target template T, candidate targets C and N of current frame _b A sample matrix X of a background template B is formed, the background template B comprises background images of a preset number of image sequences in a frame video sequence before the current frame, and the target template T comprises a target image extracted from the frame sequence before the current frame;

the second construction module is used for constructing a similarity matrix A of the sample matrix X;

a computing module for computing according toObtaining N _t Group target templates T and N _b The template with the highest similarity to each candidate object in the group background template B, wherein N is _l ＝N _b +N _t 。

With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the second building module is specifically configured to:

constructing a self-interpreting low-rank constraint model of the sample matrix X,wherein X is a sample matrix, and Z is an expression matrix of X;

wherein E is a sparse error part, lambda is larger than 0 to control the weight of the low rank property of Z and the sparse error of E, and the kernel norm | | Z | | luminance _* Is the sum of singular values in the matrix Z;

With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the second building module further includes calculating a kernel norm | | Z | | of the expression matrix Z _* ：

Will be a formulaAfter introducing the relaxation variables, we obtain:

wherein Z is ₂ As a relaxation variable, Z ₁ For Z after renaming

To formulaUsing the enhanced lagrange multiplier method, we obtain:

wherein, Y ₁ And Y ₂ Is a Lagrange multiplier, tau > 0 is a penalty parameter, | | · | | sweet _F F, < - > is the inner product of the two matrixes;

obtaining a minimization function L (Z) by an iterative algorithm ₁ ,Z ₂ E) obtaining Z ₁ Thereby obtaining the expression matrix Z.

With reference to the third possible implementation manner of the second aspect, in a fourth possible implementation manner of the second aspect, the minimizing function L (Z) is obtained through an iterative algorithm ₁ ,Z ₂ E) obtaining Z ₁ The method comprises the following steps:

using threshold puncturing algorithmTo L (Z) ₁ ,Z ₂ And E) performing iterative calculation to obtain:

wherein Z is ₁ ^* 、Z ₂ ^* And E ^* Respectively is such that L (Z) ₁ ,Z ₂ And E) the minimum optimal solution;

the iterative algorithm satisfies at least the following two criteria, namely stopping the iteration:

when the iterative algorithm satisfies L (Z) in the adjacent reiteration ₁ ,Z ₂ And E) when the difference is 0.001-0.01, the iteration frequency exceeds the allowed maximum value by 50-100, and the iteration algorithm stops iteration.

With reference to the second aspect, in a fifth possible implementation manner of the second aspect, the positioning unit includes:

a first calculation module for calculating according toThe formula calculation uses the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target to respectively calculate the probability L of the candidate target becoming the target _l (c)：

Where c is the candidate target, σ is the scale of the exponential function, ρ and β are the weighting parameters of the respective terms,is a reconstructed candidate target; t is the object descriptor of the candidate object c, b _k A context descriptor of a context template having a minimum similarity to the candidate object, e _j The jth element of the expression error e of the candidate target c;

a second calculation module for calculating according to a formulaMinimizing the likelihood of the candidate target becoming a target L _l (c) Wherein N is _θ The number of target template sets used;

a decision module for deciding according to a formulaAnd determining the current frame target as the maximum probability distribution among all candidate targets in the current frame, wherein C is the set of all candidate targets in the current frame.

With reference to the second aspect, in a sixth possible implementation manner of the second aspect, the updating unit includes a target template updating module and a background template updating module, and the target template updating module includes:

the judging submodule is used for calculating the correlation between the current frame target and the l group of target templates;

a replacing sub-module for replacing when the correlation between the current frame target and the ith group of target templates is less than a predefined threshold value theta _k And replacing the target template with the minimum correlation with the current frame target.

The embodiment of the invention provides a self-explanatory target tracking method, which comprises the steps of obtaining a candidate target of a tracking target in a current frame; obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B; dividing each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, wherein the category comprises a target category and at least one background category, all target templates correspond to the target category, and each group of background templates respectively corresponds to one background category; for each candidate target in the target category, respectively calculating the possibility that the candidate target becomes the target by using the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target, and positioning the candidate target with the maximum target possibility as the current frame target; and updating the target template T corresponding to the current frame target according to the obtained current frame target, and updating each background template B according to the candidate targets divided into the background categories. The method is used for mining the correlation and the clustering structure between the candidate targets on the same frame based on the low-rank constraint. Such a clustering structureThe method has better discrimination performance, so that a tracking algorithm can better process some challenging scenes in target positioning, and the candidate targets are more accurate by selecting the template with the maximum similarity to each candidate target. Meanwhile, the target template and the background template information of the candidate target are added into the positioning judgment criterion, so that the self-explanatory target tracking method can accurately position the target under various challenging tracking scenes.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a first schematic flow chart of a self-explanatory target tracking method according to a first embodiment of the present invention;

fig. 2a-2b are schematic diagrams illustrating a self-explanatory low rank constraint of a self-explanatory target tracking method according to an embodiment of the present invention;

fig. 3 is a first schematic flowchart of a self-explanatory target tracking apparatus according to a second embodiment of the present invention;

fig. 4 is a second schematic flowchart of a self-explanatory target tracking apparatus according to a second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

An embodiment of the present invention provides a self-explanatory target tracking method, as shown in fig. 1, including: s101, acquiring a candidate target of a tracking target in a current frame;

the tracking target can be from a front-end monitoring device, a graphic image processing device or a mobile phone camera to quickly capture the target or a panoramic image acquired by a high-definition camera, and the tracking target does not affect the implementation of the purpose of the invention, which is not limited in the embodiment of the invention.

The particle filtering is to obtain a minimum variance estimation of a system state by finding a group of random samples propagated in a state space and according to a probability density function of the random sample in a current frame, which is approximately represented by probability distribution of the random sample before the current frame, a target area of the random sample on the current frame is obtained according to the probability density function, a motion state variable of the random sample in the target area of the current frame is calculated through a motion state transition model of the random sample, candidate targets are generated by closely sampling in the target area according to the motion state variable of the current frame, and a clustering structure is presented in the candidate targets due to spatial neighbor similarity of the candidate targets.

The spatial neighbor similarity refers to that the positions of the candidate targets are close under a certain condition, the pixel values of the candidate targets meet the similarity condition, or the gray values of the candidate targets are equal.

It should be noted that, before performing step S101, the present invention may further include converting the original image pixels into gray value pixels, and normalizing to [0:1] because the original image pixels, e.g., color image pixels, black and white image pixels may change with light over time, and may be converted to grayscale value pixels before candidate objects are acquired in order to suppress the effects of illumination changes. Therefore, the operation amount of the device can be further reduced, the real-time processing capacity of the device is improved, and further, in order to enhance the smoothness and the definition of the image, the image area corresponding to the target position is uniformly scaled to the size of 20X20 pixels.

Illustratively, step S101 may be implemented by steps A1-A5:

a1, obtaining a candidate observation region corresponding to each frame of at least two particles before the current frame by adopting particle filtering, wherein the candidate observation region uses Y _i ＝{x _i ,y _i δ, φ } represents, the Y _i Is a motion state variable, x, of the ith frame particle _i And y _i Respectively two-dimensional coordinates of the candidate observation region on the ith frame, wherein sigma is the variance of the scale scaling coefficient of the candidate observation region, and phi is a rotation angle parameter;

a2, transferring a model p (Y) according to the motion state of the current frame particles _t |Y _t-1 )～N(Y _t |Y _t-1, Sigma) and a posterior probability distribution p (Y) tracking the motion state of the previous frame of the target _t |y _1:t-1 )＝∫p(Y _t |Y _t-1 )p(Y _t-1 |y _1:t-1 )dY _t-1 Estimating the motion state Y of each particle on the current frame _t (ii) a Using the motion state Y of each particle on the current frame _t Updating the p (Y) _t |y _1:t-1 )＝∫p(Y _t |Y _t-1 )p(Y _t-1 |y _1:t-1 )dY _t-1 Obtaining an updated posterior probability distributionWherein, Y _t Representing the motion state of each particle, Y, on the current frame _t-1 Represents the motion state of the frame immediately preceding the tracking target, Σ being a covariance matrix, Σ = diag { v } _x ,v _y ,v _δ v _φ The diagonal elements represent the state variable parameter x respectively _i ,y _i Delta, phi variance;

a3, according to the formula (I), the p (Y) _t |c,y _1:t-1 ) Determining the target position of the tracking target on the current frame according to the motion state of the particle with the maximum probability;

a4, according to the target position of the tracking target, cutting an image area corresponding to the target position from the current frame to form an image block set of the tracking target, wherein the image blocks are image blocks with uniform sizes;

and A5, respectively drawing the image blocks of the tracked target into column vectors to obtain candidate targets.

Step S102, obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B;

the target templates include a preset number of target images extracted from a frame sequence before the current frame, and preferably, the number of the target templates in the embodiment of the present invention is 10. It should be noted that, in the problem of target tracking, sparsity is not the ultimate target that we pursue, and accuracy and processing speed are relatively more important, so the target template is used as a key parameter in the method of the present invention, and has a larger value of N _t The speed of operation of the algorithm will be reduced. The method is finally set to be N according to a plurality of experimental results _t ＝10。

The background template is obtained by translating 10% -30% of pixels of the width or height of a target area of each frame to m directions, wherein the m directions are obtained by uniformly sampling in the directions of [0 DEG and 360 DEG, and m is more than or equal to 6 and less than or equal to 10. Preferably, the background template of the present invention is obtained by shifting the target region of each frame by 20% of the pixels of the width or height of the target region to 8 directions, wherein the 8 directions are obtained by uniform sampling in the range of [0 °,360 °) directions.

Illustratively, step S102 may be implemented by steps A1-A4:

a6, construction of a plasmid containing N _t Group target template T, candidate targets C and N of current frame _b A sample matrix X of a background template B is formed, the background template B comprises background images of a preset number of image sequences in a frame video sequence before the current frame, and the target template T comprises a preset number of target images extracted from a frame sequence before the current frame;

a7, constructing a similarity matrix A of the sample matrix X;

the method comprises the steps of tightly sampling in a target area based on the particle tracking algorithm to generate candidate targets, enabling the candidate targets to present a clustering structure due to spatial neighbor similarity of the candidate targets, enabling the candidate targets to be clustered through low-rank expression, further modeling the clustering structure in data, enabling expression errors to be large but sparse, carrying out matrix decomposition on a self-interpretation low-rank constraint model of the candidate targets, and then respectively applying low-rank constraint on sub-components, thereby achieving the purpose of low-rank approximation.

Illustratively, step A7 of the embodiment of the present invention is specifically implemented by the following steps:

a self-explanatory low rank constraint model of the sample matrix X is constructed, as shown in figure 2a,wherein X is a sample matrix, and Z is an expression matrix of X;

wherein E is a sparse error part, lambda is larger than 0 to control the weight of the low rank property of Z and the sparse error of E, and the kernel norm | | Z | | luminance _* Is the sum of the singular values in the matrix Z.

As shown in fig. 2a, the upper row of sample matrix X and the middle row of sample matrix X in fig. 2a are decomposed into reconstructed image blocks XZ, and the lower row is sparse error portion E. As can be seen from fig. 2b, when the image block is greatly different from the mean image, the sparse error portion E is dense, so that the reconstructed image block XZ corresponding to the image block is pulled toward the direction of the mean. Meanwhile, the similarity between one sample and other samples is described by Z in the sense of low-rank expression reconstruction, so that the expression matrix Z obviously presents a cluster structure, and the target expression method provided by the invention has good identification.

According to the formulaAnd constructing a similarity matrix A of the sample matrix X, wherein an operator | | · | | | represents matrix element-by-matrix element absolute value calculation.

According to the formulaBefore the similarity matrix A of the sample matrix X is constructed, the method also comprises the step of calculating the nuclear norm of the expression matrix Z to Y calculation _* ，

Illustratively, the kernel norm of the expression matrix Z | | purple _* The solving steps are as follows:

will be a formulaAfter introducing the relaxation variable, we obtain:

wherein Z is ₂ As a relaxation variable, Z ₁ Is renamed Z;

to formulaUsing the enhanced lagrange multiplier method, we obtain:

wherein Y is ₁ And Y ₂ Is Lagrange multiplier, tau > 0 is penalty term parameter, | · | calcualty _F F, < - > is the inner product of the two matrixes;

obtaining a minimization function L (Z) by an iterative algorithm ₁ ,Z ₂ E) obtaining Z ₁ To thereby obtain a watchUp to matrix Z.

Illustratively, the minimization function L (Z) is found by an iterative algorithm ₁ ,Z ₂ The method of E), comprising:

using threshold puncturing algorithmTo L (Z) ₁ ,Z ₂ And E) carrying out iterative computation, wherein sigma is a soft threshold parameter, and the sign function is obtained by taking values of 0, 1 and-1 for a class of sign functions:

when Z is known ₂ ^* And E ^* When the utility model is used, the water is discharged,

when Z is known ₁ ^* And E ^* When the utility model is used, the water is discharged,

when Z is ₁ ^* And Z ₂ ^* When the information is known, the information is transmitted to the mobile terminal,

when Z is ₂ ^* And E ^* When known, minimize L (Z) ₁ ,Z ₂ E) is equivalent to solving the following problem:

the above formula is about Z ₁ The global minimum value of the convex optimization problem can be calculated by using singular value threshold (singular value threshold) algorithm, and the solution is thatWherein, [ U, S, V ^T ]＝svd(Z ₂ ^* +Y ² /τ）。

When Z is ₁ ^* And E ^* When known, minimize L (Z) ₁ ,Z ₂ E) is equivalent to solving for a solution relating to Z ₂ A quadratic form problem. Its global minimum can be found via least squares.

When Z is ₁ ^* And Z ₂ ^* When known, minimize L (Z) ₁ ,Z ₂ And E) is equivalent to solving the following problem:

the above equation is a convex optimization problem for E, the global minimum of which can be solved by a software threshold-shrinkage algorithm (soft-threshold), as

A8, according toObtaining N _t Group target templates T and N _b The template with the highest similarity to each candidate object in the group background template B, wherein N is _l ＝N _b +N _t 。

Step S103, dividing each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, wherein the category comprises a target category and at least one background category, all the target templates correspond to the target category, and each group of background templates respectively corresponds to one background category;

wherein the object class refers to N which locates the embodiment of the present invention before _t Group objectThe templates are classified into corresponding categories according to the similarity between the templates, preferably, because N is selected in the target tracking process according to the embodiment of the invention _t The similarity between the target templates is greater, so N will be selected in the embodiments of the present invention _t The group object template is divided into an object class.

Wherein, the background category refers to N located before the embodiment of the present invention _b Preferably, the 8 background templates adopted in the embodiment of the present invention have a large difference from each other in the target tracking process, so that 8 background classes are required, and the 8 background templates respectively correspond to the classes to which the 8 background templates belong.

Step S104, for each candidate target in the target category, respectively calculating the possibility that the candidate target becomes the target by using the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target, and positioning the candidate target with the maximum target possibility as the current frame target;

it should be noted that candidate objects that are similar to the object descriptor and have a larger difference from the background descriptor will have a greater probability of being the object.

Wherein the target descriptor is the reconstructed target template most similar to the target descriptor, and the target descriptor can be more stable by using the reconstructed target template.

Object descriptorWherein, the first and the second end of the pipe are connected with each other,wherein the content of the first and second substances,respectively a reconstructed target template, a reconstructed background template and a reconstructed candidate target.

Preferably, in the embodiment of the present invention, each background descriptor is composed of 8 components and corresponds to 8 background categories, respectively. Each component is defined as the weighted average of all reconstructed samples in its corresponding background class, with the weight being the similarity of the candidate object to each sample in that class.

It should be noted that, constructing the background descriptor by using all background samples in the corresponding background category may improve the diversity of the background descriptor, in each background category, only one background template is used as a marked sample, and the background descriptor may be smoother and more stable by using the weighted average policy, which takes different values according to different candidate objects, so that different candidate objects have different background descriptors.

Wherein the background descriptor b _k ＝D _k w, wherein D _k For the corresponding background class S of the k-th component _k All background template reconstruction samplesThe weight w is via w _l ＝A _ij Calculated to obtain j ∈ S _k ,l＝1,2,...。

In an example, the embodiments of the present inventionFormula calculation using the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target to calculate the probability L of the candidate target becoming the target _l (c)：

Wherein c is a candidate target, σ is a scale of the exponential function, ρ and β are weight parameters of the respective terms,for reconstructed candidate object, t is the object descriptor of candidate object c, b _k A context descriptor of a context template having minimal similarity to the candidate object, e _j The jth element of the expression error e of the candidate target c;

according to the formulaMinimizing the likelihood of the candidate target becoming a target L _l (c) Wherein N is _θ The number of target template sets used;

according to the formulaAnd determining that the probability distribution among all candidate targets in the current frame is the maximum as a current frame target, wherein C is a set of all candidate targets in the current frame.

Step S105, according to the obtained current frame target, updating the target template T corresponding to the current frame target, and updating each background template B according to the candidate targets divided into the background categories.

when the correlation between the current frame target and the l group target template is less than a predefined threshold value theta _k And replacing the target template with the minimum correlation by the current frame target.

Preferably, in the embodiment of the present invention, 2 sets of target templates are selected, and the correlation between the current frame target and the 1 st set of target templates is smaller than a predefined threshold θ ₁ =15, replacing the target template with the minimum correlation with the current frame target. The correlation between the current frame target and the 2 nd group target template is less than the predefined threshold value theta ₂ =30, the target template with the minimum correlation is replaced by the current frame target.

It should be noted that the method for calculating the correlation between the candidate target and the first target template is a relatively mature technology, and the specific calculation process thereof can be referred to in the documents Mei X, link h, robust visual tracking and vehicle classification view space representation, ieee Transactions on Pattern Analysis and Machine Analysis (TPAMI), 2011,33 (11): 2259-2272.

The embodiment of the invention provides a self-explanatory target tracking method, which comprises the steps of obtaining a candidate target of a tracking target in a current frame; obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B; dividing each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, wherein the category comprises a target category and at least one background category, all target templates correspond to the target category, and each group of background templates respectively corresponds to one background category; for each candidate target in the target category, respectively calculating the possibility of the candidate target becoming a target by using the template with the maximum similarity with the candidate target and the background template with the minimum similarity with the candidate target, and positioning the candidate target with the maximum target possibility as a current frame target; and updating the target template T corresponding to the current frame target according to the obtained current frame target, and updating each background template B according to the candidate targets divided into the background categories. The method is used for mining the correlation and the clustering structure between the candidate targets on the same frame based on the low-rank constraint. The clustering structure has better discrimination performance, so that a tracking algorithm can better process some challenging scenes in target positioning, and the candidate targets are more accurate by selecting the template with the maximum similarity to each candidate target. Meanwhile, the target template and the background template information of the candidate target are added into the positioning judgment criterion, so that the self-explanatory target tracking method can accurately position the target under various challenging tracking scenes.

An example of an experiment using a self-explanatory target tracking method of the present invention is given below.

Exemplary, experimental conditions are as follows: the test data set adopted by the WINDOWXP computer operating system and MATLAB7.0 experimental platform is a mainstream data set in the world, and comprises 20 segments of videos, wherein each segment of video comprises a plurality of difficulties and challenges in tracking, as shown in Table 1:

TABLE 1 difficulties and challenges in testing 20 experimental videos in a data set

Reference numerals in Table 1The places indicated above indicate that the present invention takes into account the various scenarios mentioned above when conducting experiments.

Illustratively, in this experimental example, a self-explanatory target tracking algorithm uses 10 target templates and 8 background templates to generate 200 candidate targets per frame, the observation regions of the candidate targets corresponding to the candidate targets are uniformly scaled to 20 × 20 pixels, and in addition, a motion state transition model p (Y) of particles in the current frame _t |Y _t-1 )～N(Y _t |Y _t-1 In Σ), the covariance matrix Σ = diag { v } _x ,v _y ,v _δ v _φ }, candidate object likelihood calculation formulaσ =0.01, ρ =0.01, β =0.001.

The tracking performance of the self-explanatory target tracking algorithm is objectively evaluated by adopting the success rate index when the tracking error and the overlapping rate threshold value are 0.5, the tracking algorithm provided by the invention is respectively compared with other 17 international mainstream excellent tracking algorithms (including Frag, OAB, IVT, SOAB, L1, VTD, MIL, struck, ASLA, SCM, 2DPCA, CT, LRST, TLD, MTT, SRPCA and LSST), the results of the tracking error and the overlapping rate threshold value on 20 segments of test videos provided by the self-explanatory target tracking algorithm and the other 17 comparison algorithms are respectively given in tables 2 and 3, and the best result of each segment of the videos is marked by bold characters.

TABLE 2 average tracking error comparison (unit: pixel)

TABLE 3 comparison of success rates at an overlap threshold of 0.5 (unit:%)

It can be seen from the combination of table 2 and table 3 that, compared with 17 other excellent Tracking algorithms, the Self-explanatory target Tracking algorithm (SET) provided by the present invention achieves the best success rate in 17 videos of 20 videos, and the SET achievement success rate is only inferior to MTT in 3 videos of car11, girl, and thusl. In general, the self-explanatory target tracking algorithm SET provided by the invention has certain advantages in performance on 20 test videos and has high success rate.

Example 2

An embodiment of the present invention provides a moving target tracking apparatus, as shown in fig. 3, including:

an obtaining unit 201, configured to obtain a candidate target of a tracking target in a current frame;

a calculation unit 202 for obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B;

a classifying unit 203, configured to classify each candidate target into a category corresponding to the template with the largest similarity to the candidate target, where the category includes a target category and at least one background category, all target templates correspond to the target category, and each group of background templates corresponds to one background category;

a positioning unit 204, configured to separately calculate, for each candidate object in the object class, a probability that the candidate object becomes an object by using the template with the largest similarity to the candidate object and the background template with the smallest similarity to the candidate object, and position the candidate object with the largest object probability as the current frame object;

an updating unit 205, configured to update the target template T corresponding to the current frame target according to the obtained current frame target, and update each background template B according to the candidate targets classified into the background category.

Optionally, the obtaining unit 201 is specifically configured to:

obtaining a candidate observation region corresponding to each frame of at least two particles before the current frame by adopting particle filtering, wherein the candidate observation region uses Y _i ＝{x _i ,y _i δ, φ } represents, said Y _i Is a motion state variable, x, of the ith frame particle _i And y _i Respectively two-dimensional coordinates of the candidate observation region on the ith frame, wherein sigma is the variance of the scale scaling coefficient of the candidate observation region, and phi is a rotation angle parameter;

according to the motion state transition model p (Y) of the current frame particles _t |Y _t-1 )～N(Y _t |Y _t-1 Σ) and posterior probability distribution p (Y) tracking the motion state of the previous frame of the target _t |y _1:t-1 )＝∫p(Y _t |Y _t-1 )p(Y _t-1 |y _1:t-1 )dY _t-1 Estimating the motion state Y of each particle on the current frame _t (ii) a Using the motion state Y of each particle on the current frame _t Updating the p (Y) _t |y _1:t-1 )＝∫p(Y _t |Y _t-1 )p(Y _t-1 |y _1:t-1 )dY _t-1 Obtaining an updated posterior probability distributionWherein, Y _t Representing the motion state of each particle, Y, on the current frame _t-1 Represents the motion state of the frame immediately preceding the tracking target, sigma is a covariance matrix, sigma = diag { v } _x ,v _y ,v _δ v _φ Denotes the state variable parameter x, respectively, by diagonal elements _i ,y _i Delta, phi variance;

according to the formula such that p (Y) _t |c,y _1:t-1 ) Determining the target position of the tracking target on the current frame according to the motion state of the particle with the maximum probability;

according to the target position of the tracking target, cutting an image area corresponding to the target position from the current frame to form an image block set of the tracking target, wherein the image blocks are image blocks with uniform sizes;

and respectively drawing the image blocks of the tracked target into column vectors to obtain candidate targets.

As shown in fig. 4, optionally, the calculating unit 202 includes:

a first building block 2021 for building a building block containing N _t Group target template T, candidate targets C and N of current frame _b A sample matrix X of a background template B is set, the background template B comprises background images of a preset number of image sequences in a frame video sequence before the current frame, and the target template T comprises a preset number of target images extracted from a frame sequence before the current frame;

a second constructing module 2022, configured to construct a similarity matrix a of the sample matrix X;

optionally, the second building block 2022 is specifically configured to:

wherein E is a sparse error component, and λ > 0 controls the low-rank sum of ZE weight of sparse error, kernel norm | | | Z | luminance _* Is the sum of singular values in the matrix Z;

Wherein, according to the formulaBefore the similarity matrix A of the sample matrix X is constructed, the method also comprises the step of calculating the nuclear norm Z of the expression matrix Z to count the cells _* ：

Will be a formulaAfter introducing the relaxation variables, we obtain:

wherein Z is ₂ As a relaxation variable, Z ₁ Is renamed Z;

to formulaUsing an enhanced Lagrange multiplier method to obtain:

A computing module 2023 for computing a function based onObtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate object in the group background template B, wherein N is _l ＝N _b +N _t 。

Optionally, the positioning unit 204 includes:

a first calculating module 2041 forThe formula calculation uses the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target to respectively calculate the probability L of the candidate target becoming the target _l (c)：

Where c is the candidate target, σ is the scale of the exponential function, ρ and β are the weighting parameters of the respective terms,is a reconstructed candidate target; t is the object descriptor of the candidate object c, b _k A context descriptor of a context template having minimal similarity to the candidate object, e _j The jth element of the expression error e of the candidate target c;

a second calculating module 2042 for calculatingMinimizing the likelihood of the candidate target becoming a target L _l (c) Wherein N is _θ The number of target template sets used;

a decision module 2043 for calculatingAnd determining the current frame target as the maximum probability distribution among all candidate targets in the current frame, wherein C is the set of all candidate targets in the current frame.

Optionally, the updating unit 205 includes a target template updating module 205b and a background template updating module 205a, where the target template updating module includes:

a judgment sub-module 2051, configured to calculate a correlation between the current frame target and the ith group of target templates;

a replacing sub-module 2052, configured to, when the correlation between the current frame target and the ith group of target templates is smaller than a predefined threshold θ _k And replacing the target template with the minimum correlation with the current frame target.

The embodiment of the invention provides a self-explanatory target tracking device, which comprises an acquisition unit, a tracking unit and a control unit, wherein the acquisition unit is used for acquiring a candidate target of a tracking target in a current frame; a calculation unit for obtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B; the classification unit is used for classifying each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, and the positioning unit is used for respectively calculating the possibility that the candidate target becomes the target by utilizing the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target for each candidate target in the target category, and positioning the candidate target with the maximum target possibility as the current frame target; the updating unit is used for updating the target template T corresponding to the current frame target according to the obtained current frame target and updating each background template B according to the candidate targets divided into the background categories. The method is used for mining the correlation and the clustering structure between the candidate targets on the same frame based on the low-rank constraint. The clustering structure has better identification performance, so that a tracking algorithm can better process some challenging scenes in target positioning, and the candidate targets are more accurate by selecting the template with the maximum similarity to each candidate target. Meanwhile, the target template and the background template information of the candidate target are added into the positioning judgment criterion, so that the self-explanatory target tracking method can accurately position the target under various challenging tracking scenes.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A self-interpreting target tracking method, comprising:

acquiring a candidate target of a tracking target in a current frame;

dividing each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, wherein the category comprises a target category and at least one background category, all the target templates correspond to the target category, and each group of background templates respectively corresponds to one background category;

for each candidate target in the target category, respectively calculating the possibility that the candidate target becomes the target by using the template with the maximum similarity to the candidate target and the background template with the minimum similarity to the candidate target, and positioning the candidate target with the maximum target possibility as the current frame target;

and updating the target template T corresponding to the current frame target according to the obtained current frame target, and updating the background template B according to the candidate targets divided into the background categories.

2. The method of claim 1, the obtaining N _t Group target templates T and N _b The template with the greatest similarity to each candidate target in the group background template B includes:

construction of a plasmid containing N _t Group target template T, candidate targets C and N of current frame _b Sample matrix X of set background template B, the background template B includingThe target template T comprises a preset number of target images extracted from the frame video sequence before the current frame;

constructing a similarity matrix A of the sample matrix X;

according toObtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate object in the group background template B, wherein N is _l ＝N _b +N _t 。

3. The method of claim 2, the constructing a similarity matrix a of the sample matrix X, comprising:

s.t.x = XZ, wherein X is a sample matrix and Z is an expression matrix for X;

where E is the sparse error component, λ&gt 0 controls the weight of the low rank property of Z and the sparse error of E, and the kernel norm | | Z | calculation of luminance _* Is the sum of singular values in the matrix Z;

4. The method of claim 3, wherein the method is based on a male keyIs of the formulaBefore the similarity matrix A of the sample matrix X is constructed, the method also comprises the step of calculating the nuclear norm Z of the expression matrix Z to count the cells _* ：

Will be a formulas.t.x = XZ + E, introducing a relaxation variable, gives:

wherein Z is ₂ As a relaxation variable, Z ₁ Is renamed Z;

to formulaUsing the enhanced lagrange multiplier method, we obtain:

wherein, Y ₁ And Y ₂ Is a Lagrangian multiplier, tau&gt, 0 is a penalty parameter, | ·| non-woven phosphor _F Is the F-norm of the signal,<·,·&solving the inner product of the two matrixes;

5. Method according to claim 4, characterized in that said minimization function L (Z) is obtained by means of an iterative algorithm ₁ ,Z ₂ E) obtaining Z ₁ The method comprises the following steps:

wherein Z is ₁ ^* 、Z ₂ ^* And E ^* Respectively is such that L (Z) ₁ ,Z ₂ E) the minimum optimal solution; where σ represents a soft threshold parameter;indicating that the newly calculated S is substituted into the threshold shrinking algorithm; z ₁ ^* Which represents the calculated U, is the sum of the values of U,V ^T the result of the three matrix products;

when the iterative algorithm satisfies L (Z) in the adjacent reiteration ₁ ,Z ₂ And E) stopping the iteration when the difference is 0.001-0.01 or the iteration time exceeds the allowed maximum value of 50-100.

6. The method of claim 1, wherein for each candidate object in the object class, the candidate object with the highest object probability is located as the current frame object by respectively calculating the probability of the candidate object being the object by using the template with the highest similarity to the candidate object and the background template with the lowest similarity to the candidate object, and the method comprises:

Wherein, the first and the second end of the pipe are connected with each other,k in (1) represents the kth component; c is a candidate target, sigma is a scale of an exponential function, rho and beta are weight parameters of corresponding terms,for reconstructed candidate objects, t is the object descriptor of candidate object c, b _k Is the k component of the context descriptor of candidate object c, e _j The jth element of the expression error e of the candidate target c;

according to the formulaMinimizing the likelihood L of the candidate target becoming a target _l (c) Wherein N is _θ The number of target template sets used; y is _t The motion state variable, p (c | Y), expressed as a candidate target for the t-th frame _t ) A motion state variable representing the known motion state of the candidate target of the t-th frame, the possibility that the candidate target c becomes the target; according to the formulaAnd determining the maximum probability distribution among all candidate targets in the current frame as the current frame target, wherein C is the set of all candidate targets in the current frame.

7. The method according to claim 1, wherein said updating the target template T corresponding to the current frame target according to the obtained current frame target comprises:

8. The method of claim 1, wherein the background template B is obtained by shifting the target region of each frame by 10% -30% of the pixels of the width or height of the target region in m directions, wherein the m directions are obtained by uniform sampling in the direction range of [0 °,360 °, and wherein m is greater than or equal to 6 and less than or equal to 10.

9. A self-interpreting object tracking device, comprising:

the classification unit is used for classifying each candidate target into a category corresponding to the template with the maximum similarity to the candidate target, wherein the category comprises a target category and at least one background category, all the target templates correspond to the target category, and each group of background templates respectively corresponds to one background category;

10. The apparatus of claim 9, the computing unit, comprising:

a first constructing module for constructing a container containing N _t Group target template T, candidate targets C and N of current frame _b A sample matrix X of a background template B is set, the background template B comprises background images of a preset number of image sequences in a frame video sequence before the current frame, and the target template T comprises a preset number of target images extracted from a frame sequence before the current frame;

a computing module for computing based onObtaining N _t Group target templates T and N _b The template with the maximum similarity with each candidate target in the group background template B, wherein N is _l ＝N _b +N _t 。

11. The apparatus according to claim 10, the second building block being specifically configured to:

constructing a self-explanatory low-rank constraint model of the sample matrix X:s.t.x = XZ, wherein X is a sample matrix and Z is an expression matrix of X;

where E is the sparse error component, λ&0 controls the weight of the low rank property of Z and the sparse error of E, and the kernel norm | | Z | survival count _* Is the sum of singular values in the matrix Z;

12. The apparatus of claim 11, wherein the second construction module further comprises computing a kernel norm Z | n _ y _ n _ y of the expression matrix Z _* ：

Will be a formulas.t.x = XZ + E after introduction of a relaxation variable, gives:

wherein Z is ₂ As a relaxation variable, Z ₁ Is renamed Z;

to formulaUsing the enhanced lagrange multiplier method, we obtain:

wherein, Y ₁ And Y ₂ Is a Lagrangian multiplier, tau&gt, 0 is a penalty parameter, | ·| non-woven phosphor _F Is the number of F-norms,<·,·&solving the inner product of the two matrixes;

13. The apparatus of claim 12, wherein the minimization function L (Z) is obtained by an iterative algorithm ₁ ,Z ₂ E) obtaining Z ₁ The method comprises the following steps:

wherein Z is ₁ ^* 、Z ₂ ^* And E ^* Are respectively such that L (Z) ₁ ,Z ₂ And E) the minimum optimal solution; where σ represents a soft threshold parameter;indicating that the newly calculated S is substituted into the threshold shrinking algorithm; z ₁ ^* Which represents the calculated U-value of the signal,V ^T the result of the three matrix products;

14. The apparatus of claim 9, the positioning unit, comprising:

a first calculation module for calculating according toFormulaic calculation using the template with the greatest similarity to the candidate object and the candidateRespectively calculating the probability L of the candidate target becoming the target by the background template with the minimum target similarity _l (c)：

Wherein the content of the first and second substances,k in (1) represents the kth component; c is a candidate target, sigma is a scale of an exponential function, rho and beta are weight parameters of corresponding terms,is a reconstructed candidate target; t is the object descriptor of the candidate object c, b _k Is the k component of the context descriptor of candidate object c, e _j The jth element of the expression error e of the candidate target c;

a second calculation module for calculating according to a formulaMinimizing the likelihood L of the candidate target becoming a target _l (c) Wherein N is _θ The number of target template sets used; y is _t The motion state variable, p (c | Y), expressed as the t-th frame candidate object _t ) A motion state variable representing the known motion state of the candidate target in the t-th frame, and the possibility that the candidate target c becomes the target;

15. The apparatus of claim 9, wherein the updating unit comprises a target template updating module and a background template updating module, and the target template updating module comprises:

a replacement sub-module for replacing when the correlation between the current frame target and the l-th group of target templates is less than a predefined threshold value theta _k And replacing the target template with the minimum correlation by the current frame target.