CN111429481B

CN111429481B - Target tracking method, device and terminal based on adaptive expression

Info

Publication number: CN111429481B
Application number: CN202010192778.2A
Authority: CN
Inventors: 米建勋; 李人杰
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Dragon Totem Technology Hefei Co ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2022-08-05
Anticipated expiration: 2040-03-18
Also published as: CN111429481A

Abstract

The invention discloses a target tracking method, a target tracking device and a target tracking terminal based on self-adaptive expression, and discloses a target tracking method and a target tracking device based on self-adaptive expression, which are used for tracking a specified target in a video and realizing accurate tracking of the video target in a motion state. The method comprises the following steps: acquiring a current video frame; checking whether the template exists; initializing a tracking target template; generating a candidate sample; calculating an adaptive expression; tracking by expression; checking whether the tracking task is completed; and updating the tracking target template. Compared with the prior art, the method uses the self-adaptive expression model, namely the sparse expression can be subjected to self-adaptive change according to the structural relationship between candidate samples, and the negative template is introduced to improve the model discrimination, so that the target tracking method based on the automatic expression has strong robustness and high accuracy; the method is particularly suitable for generating effective accurate tracking in an intelligent auxiliary driving system of an automobile.

Description

Target tracking method, device and terminal based on adaptive expression

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a target tracking method, a target tracking device and a target tracking terminal based on adaptive expression.

Background

With the development of artificial intelligence technology, visual tracking plays an increasingly important role, and application scenes such as video monitoring, human-computer interaction, behavior recognition, action analysis and the like are wide and are applied to the visual tracking technology. Although researchers today propose many different tracking algorithms, the tracking problem still faces many challenges that have not been fully solved, such as fast motion of the target, occlusion, changes in the target's dimensions, pose and illumination, which all require further research and learning.

Currently, sparse representation algorithms are successfully applied to the tracking problem. Compared with other tracking methods, the sparse tracking method can well process the situation when the tracked target is shielded, and has higher robustness. The traditional sparse tracking method takes templates which can be dynamically updated in the tracking process as dictionaries, and sparsely expresses candidate samples on the dictionaries. After learning the sparse representation of these candidate samples, the candidate sample with the smallest reconstruction error is selected as the tracked target.

However, the current sparse tracking method still has certain problems, and the current sparse tracking method ignores the relation between candidate samples. Because of the strategy of densely sampling the candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The existing sparse tracking method only explores and utilizes the relationship information between templates and ignores the relationship between candidate samples.

Disclosure of Invention

The present invention provides a target tracking method, apparatus and terminal based on adaptive expression to solve the above problems, and the present invention explores and utilizes the structural relationship between candidate samples for tracking the specified target in the video, thereby implementing accurate tracking of the video target in motion state.

The invention is realized by the following technical scheme:

in a first aspect, the present invention provides a target tracking method based on adaptive expression, including the following steps:

s10: acquiring a current video frame;

s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;

s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;

s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;

s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;

s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;

s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;

s17: updating a tracking target template: the target template is updated using the result tracked in the current image frame, and execution returns to S10.

The working principle is as follows: based on video moving target tracking, the sparse tracking method in the prior art has certain problems, only the relation information between templates is explored and utilized, and the relation between candidate samples is ignored; this is because with the strategy of dense sampling of candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.

The method provided by the invention is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.

Further, the step S12 specifically includes:

s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.

S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image _p o _s ＝[t ₁ ,t ₂ ,…,t _p ]And the front template meets the following condition, | | Q _p -Q ₀ ||<σ, wherein Q _p Representing the center of the front template, Q ₀ Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, for the convenience of calculation, the ith front template is converted into the form t of a vector _i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.

Selecting a finger near the position of the tracking target in the first frame imageA certain number of negative templates form a negative template set T _neg ＝[t _p+1 ,t _p+2 ,…,t _p+n ]And the negative template satisfies the following conditions,

wherein Q _n Represents the center of the negative template,

is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation _j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.

Constructing a template set T ═ T composed of positive templates and negative templates _pos ,T _neg ]＝[t ₁ ,t ₂ ,…,t _p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.

Further, the step S13 specifically includes:

s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image _t-1 (ii) a State variable z _t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.

S131: state variable z of candidate sample according to motion model _t Sampling, i.e. with probability distribution

Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample _t To carry outAnd sampling, namely realizing the sampling of the candidate sample in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame _t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.

S132: from the sampled state variable z _t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.

Further, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:

wherein x is _i Is corresponding to the template t _i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X ₁ ,x ₂ ,…,x _p+n ]，Diag(x _* ) The operator expresses x adaptively _i Conversion to diagonal matrix Diag (x) _i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

in step S14, the above equation (1) is solved using an iterative reweighting algorithm.

Further, the step S15 specifically includes:

s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

wherein, C _i Is the ith candidate sample and T _j Is the jth template, W _i,j Is the value of the ith row and jth column of the weight matrix WAlso represents the similarity between the ith candidate sample and the jth template;

s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M _pos ,M _neg }；

S152: calculating a separability score sc for each candidate sample _i The separability score is calculated by formula (3),

sc _i ＝∑M _i,pos -∑M _i,neg (3)

wherein M is _i,pos Is a matrix M _pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M _i,neg Is a matrix M _neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc _i Is the separability score of the ith candidate sample;

s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so that the candidate sample with the highest separability score is selected as the target tracked by the current image frame.

Further, the step S17 specifically includes:

s170: calculating the similarity vector eta ═ eta (eta) ₁ ,η ₂ ,…,η _p ) Wherein eta _i Similarity between the current tracking result and the ith front template based on Euclidean distance;

s171: calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;

s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T _neg And the negative template satisfies the following conditions,

where a and

respectively designated inner and outer thresholds, Q _n And Q _k Representing the center of the negative template and the center of the current trace result, respectively.

Furthermore, the method is suitable for an intelligent auxiliary driving system of an automobile, and road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that real-time tracking of a front vehicle target is realized.

In a second aspect, the present invention provides an adaptive representation-based target tracking apparatus, including:

the acquisition module is used for acquiring the current image frame of the video;

the first judging module is used for judging whether the template exists or not;

the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;

the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;

the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;

the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;

the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;

the second judgment module is used for judging whether the tracking task is completed or not;

and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.

Further, the tracking target template building module is specifically configured to:

selecting a specified number of front templates to construct a front template near the position of the tracking target in the first frame imageSet of surface templates T _p o _s ＝[t ₁ ,t ₂ ,…,t _p ]And the front template meets the following condition, | | Q _p -Q ₀ ||<σ, wherein Q _p Representing the center of the front template, Q ₀ Represents the center of the initial target, σ is a specified inner threshold;

selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image _neg ＝[t _p+1 ,t _p+2 ,…,t _p+n ]And the negative template satisfies the following conditions,

wherein Q _n Represents the center of the negative template,

is a specified outer threshold;

constructing a template set T ═ T composed of positive templates and negative templates _p o _s ,T _neg ]＝[t ₁ ,t ₂ ,…,t _p+n ]This matrix includes all templates.

Further, the tracking target sample generation module is specifically configured to:

obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image _t-1 ；

State variable z of candidate sample according to motion model _t Sampling, i.e. with probability distribution

Sampling the state variables, wherein Σ is a diagonal covariance matrix;

from the sampled state variable z _t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed.

Further, the adaptive expression calculation module is specifically configured to:

calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):

the above equation (1) is solved using an iterative reweighting algorithm.

Further, the target tracking module is specifically configured to:

constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

wherein, C _i Is the ith candidate sample and T _j Is the jth template, W _i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;

performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M _pos ,M _neg }；

Calculating a separability score sc for each candidate sample _i The separability score is calculated by formula (3),

sc _i ＝∑M _i,pos -∑M _i,neg (3)

wherein M is _i,pos Is a matrix M _pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M _i,neg Is a matrix M _neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc _i Is the ithA separability score of the candidate sample;

the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.

Further, the tracking target template updating module is specifically configured to:

calculating the similarity vector eta ═ eta (eta) ₁ ,η ₂ ,…,η _p ) Wherein eta _i Similarity between the current tracking result and the ith front template based on Euclidean distance;

calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;

selecting a specified number of negative templates near the current tracking result as a new negative template set T _neg And the negative template satisfies the following conditions,

where a and

In a third aspect, the present invention provides an adaptive representation-based target tracking terminal, including: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor, causes the processor to perform the steps of the above-described adaptive expression-based object tracking method.

In a fourth aspect, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the above-described adaptive expression-based object tracking method.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the invention, the candidate samples are combined with the template, the candidate sample matrix is used as a dictionary, and the template is used as a regression target, so that the calculation burden of the algorithm can be effectively reduced; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between candidate samples, and has the characteristics of high accuracy and strong robustness.

2. The method is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flowchart of a target tracking method based on adaptive expression according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a target tracking terminal based on adaptive expression according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another adaptive expression-based target tracking apparatus according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 to 3, fig. 1 is a flow chart of a target tracking method based on adaptive expression, the target tracking method based on adaptive expression of the present invention is applied to an intelligent auxiliary driving system of an automobile for tracking, and road information in front is acquired in real time in an image manner by a vehicle-mounted camera, so that a front vehicle target is accurately tracked in real time; the method comprises the following steps:

s10: acquiring a current video frame;

the invention can acquire the current image frame through the video stream or the camera.

s12: initializing a tracking target template: determining the position of a tracked vehicle target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracked vehicle target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;

the positive template in the invention is used for modeling information contained in a tracked vehicle target, and the negative template in the invention is used for modeling background information. By using the positive template and the negative template, when the situation that the tracked vehicle target is mixed with the background occurs, the target tracking method based on the self-adaptive expression can also identify the target, and the tracking of the vehicle target is realized.

Specifically, the step S12 specifically includes:

S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image _p o _s ＝[t ₁ ,t ₂ ,…,t _p ]And the front template meets the following condition, | | Q _p -Q ₀ ||<σ, wherein Q _p Representing the center of the front template, Q ₀ Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation _i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.

wherein Q _n Represents the center of the negative template,

Constructing a template set T ═ T composed of positive templates and negative templates _p o _s ,T _neg ]＝[t ₁ ,t ₂ ,…,t _p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.

S13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance; since the motion of the tracking target in the video usually has continuity, the position of the tracking target in the current frame has a great relationship with the position of the tracking target in the previous frame. To simplify the model, the present invention models the motion model using a Gaussian distribution.

Specifically, the step S13 specifically includes:

Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample _t And sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame _t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.

S14: calculation adaptive tableTo achieve: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix; wherein the values of the adaptive expression matrix elements represent the relationship between the template and the corresponding candidate samples. For example, X _i,j The relationship between the jth template and the ith candidate sample is shown. In general, the larger the matrix element value, the stronger the relationship between a sample and the corresponding candidate sample.

Specifically, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:

when the candidate samples are each different, which means that the candidate sample matrix C is column-full-rank, the regularization is decomposed into

When there is a high correlation between the candidate samples, this means that the rank of the sample matrix C is 1, and C ^T C＝11 ^T (1 is a vector with element sizes of one), the regularization becomes

The candidate sample matrix C is not full rank, as studies have shown, it typically has a rank number greater than 1. In summary, the regularization term is/ ₁ Norm and l ₂ A balance between norms, i.e. | | x _i || ₂ ＜||CDiag(x _i )|| _* ＜||x _i || ₁ And dynamic adjustment can be carried out according to the internal structure of the matrix sample C, so as to obtain a sparse expression x adaptive to the structure relation of the candidate samples _i 。

In step S14, the above equation (1) is solved using an iterative reweighting algorithm. For a clearer explanation of the solution, the solution idea of equation (1) is explained in this embodiment:

known formula (4):

wherein X represents a matrix in a general sense, | | X | | Y _* Representing the nuclear norm of a matrix X, X ^T Represents the transpose of the matrix X, (XX) ^T ) ^-1/4 Representative matrix XX ^T The root of the fourth square of the root,

representative matrix (XX) ^T ) ^-1/4 The square of the frobenius norm of X;

first, equation (4) is used, and the jth column of the expression matrix X is set to X _.j Converting equation (1) to equation (5):

wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates _.j Represents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X) _.* ) The operator expresses X adaptively _.j Conversion to diagonal matrix Diag (X) _.j )；

Initializing the weight matrix W _j ：

Where C is the set of video candidate samples, X _.j Represents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X) _.* ) The operator expresses X adaptively _.j Conversion to diagonal matrix Diag (X) _.j )；

A simple form of equation (5) can then be obtained:

wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates _.j Is the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template,

is corresponding to the adaptive expression X _.j Weight matrix of (2), Diag (X) _.* ) The operator expresses X adaptively _.j Conversion to diagonal matrix Diag (X) _.j )；

It is obvious that equation (7) can be written as a sum equation:

m＝height(X)

wherein X is an expression matrix, T is a template set, T _.j Is the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient, p is the number of positive templates, n is the number of negative templates, and m is that of the matrix XNumber of lines, X _.j Is the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, X _i,j Represents the value of the ith row and the jth column of the matrix X,

is corresponding to the adaptive expression X _.j The weight matrix of (a) is determined,

representing a weight matrix

Column i of (1), operator expresses X adaptively _.j Conversion to diagonal matrix Diag (X) _.j )；

The formula (9) is derived, and the analytical solution of the formula (9) can be obtained if the derivative is zero, wherein the derivative of the formula (9) is as follows:

wherein T is a template set, T _.j Is the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient,

is an expression matrix

Corresponding to the adaptive representation of the jth template,

is corresponding to an adaptive expression

The weight matrix of (a) is determined,

representing a weight matrix

The (c) th column of (a),

can be made correspondingly

Calculating to obtain;

and finally solving the formula (1) by using an iterative reweighting method. The solution process repeats the following iterative process until the results converge:

1) given a

Each column of matrix X is updated by:

wherein T is a template set, T _.j Is the jth column of the template set matrix T, C is the set of video candidate samples,

is an expression matrix

Corresponding to the adaptive representation of the jth template,

can be made correspondingly

Calculating to obtain;

2) given a

Updating each weight matrix by

Where C is the set of video candidate samples,

is an expression matrix

Corresponding to the adaptive expression of the jth template, Diag (X) _.* ) Operators express themselves adaptively

Conversion into diagonal matrices

S15: tracking by expression: constructing a weight matrix, performing dot multiplication on the weight matrix and the adaptive expression matrix obtained in S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of a corresponding candidate sample, calculating the identification score of each candidate sample through the separable characteristics, and the candidate sample with the highest identification score is the tracked target; the weight matrix W represents the similarity relation between the template and the candidate sample, and the tracking result can be more robust and accurate by introducing the weight matrix W.

Specifically, the step S15 specifically includes:

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

wherein, C _i Is the ith candidate sample and T _j Is the jth template; w _i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template; elements of the weight matrix reflect the similarity relation between the candidate sample and the template, and the tracking error is reduced by introducing the weight matrix.

S151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M _pos ,M _neg }; just as the template set is divided into a positive template set and a negative template set according to the labels of the templates themselves, the expression map M is also divided into a positive expression map M according to the corresponding template label of each column _p o _s And negative expression map M _neg 。

sc _i ＝∑M _i,pos -∑M _i,neg (3)

wherein M is _i,p o _s Is a matrix M _p o _s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M _i,neg Is a matrix M _neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc _i Is the separability score of the ith candidate sample; a larger positive eigenvalue means that the candidate sample is more similar to the positive template, whereas a larger negative eigenvalue means that the candidate sample is more similar to the negative template.

S153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame. If a candidate sample has a larger positive characteristic value and a smaller negative characteristic value, the candidate sample is more similar to the positive template and less similar to the negative template, and is more likely to be a tracked target, so that the candidate target with the highest identification score is selected as the tracked target.

s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; because a new image frame needs to be read in the camera or in the video, the operation of reading the current image frame needs to be executed once each time the method flow is executed, after the target template is updated, the step to be executed is to read the new image frame, and after S17, the step returns to execute S10; in an actual scene, the scale, posture and the like of a tracking target can change frequently, and if the tracking task is not stopped, the target template needs to be updated. The updated target template can contain the latest information of the tracking target, so that the following tracking task is more accurate.

Specifically, the step S17 specifically includes:

s170: calculating the similarity vector eta ═ eta (eta) ₁ ,η ₂ ,…,η _p ) Wherein eta _i Similarity between the current tracking result and the ith front template based on Euclidean distance; if the tracked target is shielded, which indicates that the tracked target is greatly different from the front template, the template cannot be updated by the currently tracked target. In order to determine whether to update the front template, the similarity between the current tracking result and the front template is calculated.

S171: calculating the maximum similarity omega to max (eta), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is greater than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result; if the tracked target is shielded, the maximum similarity is smaller than the updating threshold, so that the condition that the template is updated by the shielded target is avoided, and the accuracy is reduced; if the tracked target is not shielded, the maximum similarity is larger than the updating threshold, and the tracked result is used for updating the front template, so that the front template can contain the latest target information, and the accuracy of the next tracking task is improved.

where a and

respectively designated inner and outer thresholds, Q _n And Q _k Representing the center of the negative template and the center of the current trace result, respectively. In an actual scene, the background in the video often changes rapidly, so that the template needs to be updated by the background of the current frame, the negative template can contain the latest background information, and the accuracy of the following tracking task is improved.

In the embodiment, the running vehicle target can be accurately tracked under the conditions of dynamic blurring, illumination change, rotation, background confusion and the like of the running vehicle tracking target. In order to verify the method provided by the invention, 8 challenging image sequence evaluation method performances are selected in experimental simulation, and 12 advanced tracker methods are compared, wherein the method comprises the following steps: 1) IVT; 2) APGL 1; 3) PN; 4) TD; 5) MIL; 6) FragTrack; 7) MTT; 8) OSPT; 9) ASLAS; 10) LSAT; 11) an SCM; 12) and (4) DSSM. The following experiments are all based on Matlab 2019a, the computer memory is 8GB, and the CPU is

Core ^TM i5 on a platform.

The experiment of the embodiment of the invention adopts two evaluation criteria: center point error and overlap ratio.

Center point error in image frame refers to tracking bounding box B _T The central point of (A) and the corresponding labeled real boundary frame B _G Euclidean distance between the center points. And mean center point error refers to the average of the center point error of the tracker over the entire image sequence. The overlap ratio is calculated according to the formula provided by the PASCALVOC challenge, i.e.

And the average overlap ratio refers to the average of the overlap ratio of the tracker over the entire image sequence. A good tracker will not only have a small centroid error but also a large overlap ratio.

In the simulation experiment, the parameters were set as follows: the regularization parameter λ is set to 5, the sizes of the templates and the candidate samples are both adjusted to 20 × 20, the number of candidate samples is set to 500 to 600 according to the specific form of the test video image sequence, the positive template update threshold is set to 0.4, and the numbers of positive templates and negative templates are both set to 10. The regular parameter lambda, the number of the positive templates and the number of the negative templates are obtained by a cross validation method.

Table 1 shows the comparison of performance of various methods based on mean center point error, where Average represents the Average of all mean center point errors, and the smaller the mean center point error, the better the tracking performance; wherein Ours is the process of the invention.

TABLE 1 comparison of Performance of various methods based on mean center Point error

Table 2 shows a comparison of the performance of various methods based on Average overlap rate, where Average represents the Average of the Average overlap rate, and the larger the Average overlap rate, the better the tracking performance; wherein Ours is the process of the invention.

TABLE 2 comparison of Performance of various methods based on average overlap ratio

The comparison of mean center point error and mean overlap ratio are shown in tables 1 and 2, respectively. It follows that the last row of tables 1 and 2 shows the average result of each tracker over all image sequences. Compared with other advanced tracking methods, the method (i.e. the Ours method in the table) achieves competitive results on almost all image sequences, and the method has the best target tracking effect in the average results in the two tables. In summary, compared with the prior art, the target tracking method based on adaptive expression provided by the embodiment of the invention has the characteristics of strong robustness and high accuracy.

The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.

The method provided by the invention can generate an effective tracking effect in an intelligent auxiliary driving system of the automobile, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.

Example 2

As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present invention provides an adaptive representation-based target tracking apparatus, as shown in fig. 3, the apparatus includes:

selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image _p o _s ＝[t ₁ ,t ₂ ,…,t _p ]And the front template meets the following condition, | | Q _p -Q ₀ ||<σ, wherein Q _p Representing the center of the front template, Q ₀ Represents the center of the initial target, σ is a specified inner threshold;

wherein Q _n Represents the center of the negative template,

is a specified outer threshold;

Sampling the state variables, wherein Σ is a diagonal covariance matrix;

wherein x is _i Is corresponding to the template t _i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X ₁ ,x ₂ ,…,x _p+n ]The Diag (x) operator expresses x adaptively _i Conversion to diagonal matrix Diag (x) _i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

the above equation (1) is solved using an iterative reweighting algorithm.

Further, the target tracking module is specifically configured to:

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

wherein, C _i Is the ith candidate sample and T _j Is the jth template; w is a group of _i,j Is the value of the ith row and jth column of the weight matrix WThe time also represents the similarity between the ith candidate sample and the jth template;

performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, wherein M is W X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M _pos ,M _neg }；

sc _i ＝∑M _i,pos -∑M _i,neg (3)

wherein M is _i,p o _s Is a matrix M _p o _s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M _i,neg Is a matrix M _neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc _i Is the separability score of the ith candidate sample;

where a and

Example 3

As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides an adaptive representation-based target tracking terminal, which includes: a processor and a memory, wherein the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the steps of the adaptive expression-based target tracking method of embodiment 1.

Since the terminal is the device in the method in the embodiment of the present invention, and the principle of the device to solve the problem is similar to that of the method, the implementation of the method and the facility may refer to the implementation of the method, and repeated details are not repeated.

Specifically, as shown in fig. 2, the terminal includes: a processor 200 and a memory 201, wherein the memory 201 stores program code that, when executed by the processor 200, causes the processor 200 to perform the steps of:

s10: acquiring a current video frame;

s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; since a new image frame needs to be read in the camera or in the video, and the operation of reading the current image frame needs to be performed each time in the method flow, after the target template is updated, the step to be performed is to read the new image frame, and after S17, the step returns to perform S10.

As an optional implementation, the processor 200 is specifically configured to:

wherein Q _n Representing the center of the negative template,

As an optional implementation, the processor 200 is specifically configured to:

Sampling state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample _t And sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, and the change of the state variable is modeled by using Gaussian distribution without loss of generality. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame _t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.

As an optional implementation, the processor 200 is specifically configured to:

in the step S14, the following formula (1) is adopted to calculate the adaptive expression matrix X corresponding to the template set T:

wherein x is _i Is corresponding to the template t _i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X ₁ ,x ₂ ,…,x _p+n ]The Diag (x) operator expresses x adaptively _i Conversion to diagonal matrix Diag (x) _i ) (ii) a C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

As an optional implementation, the processor 200 is specifically configured to:

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

wherein, C _i Is the ith candidate sample and T _j Is the jth template; w _i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;

s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M _p o _s ,M _neg }；

sc _i ＝∑M _i,pos -∑M _i,neg (3)

wherein M is _i,p o _s Is a matrix M _p o _s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M _i,neg Is a matrix M _neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc _i Is the ithA separability score for the candidate samples;

s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.

As an optional implementation, the processor 200 is specifically configured to:

where a and

Example 4

As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the adaptive expression-based target tracking method in embodiment 1.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A target tracking method based on adaptive expression is characterized by comprising the following steps:

s10: acquiring a current video frame;

s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;

s16: whether the tracking task is finished or not is checked, and if the current image frame is the last frame of the video, the target tracking task is finished; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;

s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10;

the step S15 specifically includes:

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

sc _i ＝∑M _i,pos -∑M _i,neg (3)

s153: and selecting the candidate sample with the highest separability score as the target tracked by the current image frame.

2. The method for tracking an object based on adaptive expression of claim 1, wherein the step S12 specifically includes:

s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates;

s121: in the first frameSelecting a specified number of front templates to construct a front template set T near the position of a tracking target in the image _pos ＝[t ₁ ,t ₂ ,...,t _p ]And the front template meets the following condition, | | Q _p -Q ₀ ||<σ, wherein Q _p Representing the center of the front template, Q ₀ Represents the center of the initial target, σ is a specified inner threshold;

selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image _neg ＝[t _p+1 ,t _p+2 ,...,t _p+n ]And the negative template satisfies the following conditions,

wherein Q _n Represents the center of the negative template,

is a specified outer threshold;

constructing a template set T ═ T composed of positive templates and negative templates _pos ,T _neg ]＝[t ₁ ,t ₂ ,...,t _p+n ]This matrix includes all templates.

3. The method for tracking an object based on adaptive expression of claim 1, wherein the step S13 specifically includes:

s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image _t-1 ；

Sampling the state variables, wherein Σ is a diagonal covariance matrix;

s132: from the sampled state variable z _t Selecting candidate samples in the current image frame and constructingA set of candidate samples C.

4. The method for tracking an object based on adaptive expression of claim 1, wherein the adaptive expression matrix X corresponding to the template set T is calculated in step S14 by using the following formula (1):

wherein x is _i Is corresponding to the template t _i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X ₁ ,x ₂ ,...,x _p+n ]，Diag(x _* ) The operator expresses x adaptively _i Conversion to diagonal matrix Diag (x) _i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

5. The method for tracking an object based on adaptive expression of claim 1, wherein the step S17 specifically includes:

wherein sigmaAnd

6. The target tracking method based on the adaptive expression is characterized in that the method is suitable for an intelligent auxiliary driving system of an automobile, road information in front is acquired in real time in an image mode through a vehicle-mounted camera, and real-time tracking of a front vehicle target is achieved.

7. An adaptive representation-based target tracking apparatus, comprising:

the tracking target template updating module is used for updating a positive template and a negative template of the tracking target according to the current tracking result in the tracking process;

the execution process of the target tracking module is as follows:

W _i,j ＝exp(-||C _i -T _j || ₂ ) (2)

sc _i ＝∑M _i,pos -∑M _i,neg (3)

and selecting the candidate sample with the highest separability score as a target tracked by the current image frame.

8. An adaptive representation-based target tracking terminal, the terminal comprising: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor causes the processor to perform the steps of the method for adaptive expression based object tracking according to any one of claims 1 to 6.

9. A computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the steps of a method for adaptive expression based object tracking according to any of claims 1 to 6.