CN111429481A

CN111429481A - Target tracking method, device and terminal based on adaptive expression

Info

Publication number: CN111429481A
Application number: CN202010192778.2A
Authority: CN
Inventors: 米建勋; 李人杰
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Dragon Totem Technology Hefei Co ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2020-07-17
Anticipated expiration: 2040-03-18
Also published as: CN111429481B

Abstract

The invention discloses a target tracking method, a target tracking device and a target tracking terminal based on self-adaptive expression, and discloses a target tracking method and a target tracking device based on self-adaptive expression, which are used for tracking a specified target in a video and realizing accurate tracking of the video target in a motion state. The method comprises the following steps: acquiring a current video frame; checking whether the template exists; initializing a tracking target template; generating a candidate sample; calculating an adaptive expression; tracking by expression; checking whether the tracking task is completed; and updating the tracking target template. Compared with the prior art, the target tracking method based on the automated expression uses the adaptive expression model, namely sparse expression can be adaptively changed according to the structural relationship among candidate samples, and a negative template is introduced to improve the model discrimination, so that the target tracking method based on the automated expression provided by the invention has strong robustness and high accuracy; the method is particularly suitable for generating effective accurate tracking in an intelligent auxiliary driving system of an automobile.

Description

Target tracking method, device and terminal based on adaptive expression

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a target tracking method, a target tracking device and a target tracking terminal based on adaptive expression.

Background

With the development of artificial intelligence technology, visual tracking plays an increasingly important role, and application scenes such as video monitoring, human-computer interaction, behavior recognition, action analysis and the like are wide and are applied to the visual tracking technology. Although researchers today propose many different tracking algorithms, the tracking problem still faces many challenges that have not been fully solved, such as fast motion of the target, occlusion, changes in the target's dimensions, pose and illumination, which all require further research and learning.

At present, sparse representation algorithms are successfully applied to the tracking problem. Compared with other tracking methods, the sparse tracking method can well process the situation when the tracked target is shielded, and has higher robustness. The traditional sparse tracking method takes templates which can be dynamically updated in the tracking process as dictionaries, and sparsely expresses candidate samples on the dictionaries. After learning the sparse representation of these candidate samples, the candidate sample with the smallest reconstruction error is selected as the tracked target.

However, the current sparse tracking method still has certain problems, and the current sparse tracking method ignores the relation between candidate samples. Because of the strategy of densely sampling the candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The existing sparse tracking method only explores and utilizes the relationship information between templates and ignores the relationship between candidate samples.

Disclosure of Invention

The present invention provides a target tracking method, apparatus and terminal based on adaptive expression to solve the above problems, and the present invention explores and utilizes the structural relationship between candidate samples for tracking the specified target in the video, thereby implementing accurate tracking of the video target in motion state.

The invention is realized by the following technical scheme:

in a first aspect, the present invention provides a target tracking method based on adaptive expression, including the following steps:

s10: acquiring a current video frame;

s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;

s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;

s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;

s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;

s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;

s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;

s17: updating a tracking target template: the target template is updated using the result tracked in the current image frame, and execution returns to S10.

The working principle is as follows: based on video moving target tracking, the sparse tracking method in the prior art has certain problems, only the relation information between templates is explored and utilized, and the relation between candidate samples is ignored; this is because with the strategy of dense sampling of candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.

The method provided by the invention is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.

Further, the step S12 specifically includes:

s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.

S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image_pos＝[t₁,t₂,…,t_p]And front face templateSatisfies the condition of | | Q_p-Q₀||<σ, wherein Q_pRepresenting the center of the front template, Q₀Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation_i. In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.

Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image_neg＝[t_p+1,t_p+2,…,t_p+n]And the negative template satisfies the following conditions,

wherein Q_nRepresents the center of the negative template,

is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation_j. In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.

Constructing a template set T ═ T composed of positive templates and negative templates_pos,T_neg]＝[t₁,t₂,…,t_p+n]This matrix includes all templates. The initial template set T contains tracking target information and background information.

Further, the step S13 specifically includes:

s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image_t-1(ii) a Form ofVariable z of state_t-1Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.

S131: state variable z of candidate sample according to motion model_tSampling, i.e. with probability distribution

Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample_tAnd sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame_t-1The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.

S132: from the sampled state variable z_tCandidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.

Further, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:

wherein x is_iIs corresponding to the template t_iThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X₁,x₂,…,x_p+n]，Diag(x_*) The operator expresses x adaptively_iConversion to diagonal matrix Diag (x)_i) And C is a set of video candidate samples,lambda is the regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

in step S14, the above equation (1) is solved using an iterative reweighting algorithm.

Further, the step S15 specifically includes:

s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):

W_i,j＝exp(-||C_i-T_j||₂) (2)

wherein, C_iIs the ith candidate sample and T_jIs the jth template, W_i,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;

s151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { M_pos,M_neg}；

S152: calculating a separability score sc for each candidate sample_iThe separability score is calculated by formula (3),

sc_i＝∑M_i,pos-∑M_i,neg(3)

wherein M is_i,posIs a matrix M_posRow i of (1), representing the separable features of the ith candidate sample on the front template set, M_i,negIs a matrix M_negLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sc_iIs the separability score of the ith candidate sample;

s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.

Further, the step S17 specifically includes:

s170, calculating similarity vector η ═ (η)₁,η₂,…,η_p) Wherein η_iIs the similarity between the current tracking result and the ith front template based on Euclidean distance；

S171, calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is larger than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;

s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T_negAnd the negative template satisfies the following conditions,

where a and

respectively designated inner and outer thresholds, Q_nAnd Q_kRepresenting the center of the negative template and the center of the current trace result, respectively.

Furthermore, the method is suitable for an intelligent auxiliary driving system of an automobile, and road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that real-time tracking of a front vehicle target is realized.

In a second aspect, the present invention provides an adaptive representation-based target tracking apparatus, including:

the acquisition module is used for acquiring the current image frame of the video;

the first judging module is used for judging whether the template exists or not;

the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;

the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;

the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;

the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;

the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;

the second judgment module is used for judging whether the tracking task is completed or not;

and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.

Further, the tracking target template building module is specifically configured to:

selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image_pos＝[t₁,t₂,…,t_p]And the front template meets the following condition, | | Q_p-Q₀||<σ, wherein Q_pRepresenting the center of the front template, Q₀Represents the center of the initial target, σ is a specified inner threshold;

wherein Q_nRepresents the center of the negative template,

is a specified outer threshold;

constructing a template set T ═ T composed of positive templates and negative templates_pos,T_neg]＝[t₁,t₂,…,t_p+n]This matrix includes all templates.

Further, the tracking target sample generation module is specifically configured to:

obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image_t-1；

Pair of candidates based on motion modelState variable z of the sample_tSampling, i.e. with probability distribution

Sampling the state variables, wherein Σ is a diagonal covariance matrix;

from the sampled state variable z_tCandidate samples are selected in the current image frame, and a candidate sample set C is constructed.

Further, the adaptive expression calculation module is specifically configured to:

calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):

wherein x is_iIs corresponding to the template t_iThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X₁,x₂,…,x_p+n]，Diag(x_*) The operator expresses x adaptively_iConversion to diagonal matrix Diag (x)_i) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

the above equation (1) is solved using an iterative reweighting algorithm.

Further, the target tracking module is specifically configured to:

constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):

W_i,j＝exp(-||C_i-T_j||₂) (2)

performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and expressing the expression graph MGraph M is divided into two parts { M }_pos,M_neg}；

Calculating a separability score sc for each candidate sample_iThe separability score is calculated by formula (3),

sc_i＝∑M_i,pos-∑M_i,neg(3)

the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.

Further, the tracking target template updating module is specifically configured to:

calculate similarity vector η ═ (η)₁,η₂,…,η_p) Wherein η_iSimilarity between the current tracking result and the ith front template based on Euclidean distance;

calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is greater than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;

selecting a specified number of negative templates near the current tracking result as a new negative template set T_negAnd the negative template satisfies the following conditions,

where a and

In a third aspect, the present invention provides an adaptive representation-based target tracking terminal, including: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor, causes the processor to perform the steps of the above-described adaptive expression-based object tracking method.

In a fourth aspect, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the above-described adaptive expression-based object tracking method.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the invention, the candidate samples are combined with the template, the candidate sample matrix is used as a dictionary, and the template is used as a regression target, so that the calculation burden of the algorithm can be effectively reduced; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.

2. The method is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flowchart of a target tracking method based on adaptive expression according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a target tracking terminal based on adaptive expression according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another adaptive expression-based target tracking apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 to 3, fig. 1 is a flow chart of a target tracking method based on adaptive expression, the target tracking method based on adaptive expression of the present invention is applied to an intelligent auxiliary driving system of an automobile for tracking, and road information in front is acquired in real time in an image manner by a vehicle-mounted camera, so that a front vehicle target is accurately tracked in real time; the method comprises the following steps:

s10: acquiring a current video frame;

the invention can acquire the current image frame through the video stream or the camera.

s12: initializing a tracking target template: determining the position of a tracked vehicle target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracked vehicle target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;

the positive template in the invention is used for modeling information contained in a tracked vehicle target, and the negative template in the invention is used for modeling background information. By using the positive template and the negative template, when the situation that the tracked vehicle target is mixed with the background occurs, the target tracking method based on the self-adaptive expression can also identify the target, and the tracking of the vehicle target is realized.

Specifically, the step S12 specifically includes:

S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image_pos＝[t₁,t₂,…,t_p]And the front template meets the following condition, | | Q_p-Q₀||<σ, wherein Q_pRepresenting the center of the front template, Q₀Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation_i. In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.

wherein Q_nRepresents the center of the negative template,

is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation_j. In order to make the negative template model the background information, a tracking target is neededNegative templates are selected in the heart ring area, so that the negative templates are greatly different from the tracking target, and background information is naturally introduced.

S13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance; since the motion of the tracking target in the video usually has continuity, the position of the tracking target in the current frame has a great relationship with the position of the tracking target in the previous frame. To simplify the model, the present invention models the motion model using a Gaussian distribution.

Specifically, the step S13 specifically includes:

s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image_t-1(ii) a State variable z_t-1Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.

Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample_tAnd sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the change of the state variable is modeled by using Gaussian distribution without loss of generality. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame_t-1The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.

S14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix; wherein the values of the adaptive expression matrix elements represent the relationship between the template and the corresponding candidate samples. For example, X_i,jThe relationship between the jth template and the ith candidate sample is shown. In general, the larger the matrix element value, the stronger the relationship between a sample and the corresponding candidate sample.

Specifically, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:

when the candidate samples are each different, which means that the candidate sample matrix C is column-full-rank, the regularization is decomposed into

When there is a high correlation between the candidate samples, this means that the rank of the sample matrix C is 1, and C^TC＝11^T(1 is a vector with element sizes of one), the regularization becomes

The candidate sample matrix C is not full rank, as studies have shown, it typically has a rank number greater than 1. In summary, the regularization term is/₁Norm and l₂A balance between norms, i.e. | | x_i||₂＜||CDiag(x_i)||_*＜||x_i||₁And the dynamic adjustment can be carried out according to the internal structure of the matrix sample C, thereby obtaining a sparse expression x adaptive to the structure relation of the candidate sample_i。

In step S14, the above equation (1) is solved using an iterative reweighting algorithm. For a clearer explanation of the solution, the solution idea of equation (1) is explained in this embodiment:

known formula (4):

wherein X represents a matrix in a general sense, | | X | | Y_*Representing the nuclear norm of a matrix X, X^TRepresenting the transpose of matrix X, (XX)^T)^-1/4Representative matrix XX^TThe root of the fourth square of the root,

representative matrix (XX)^T)^-1/4The square of the frobenius norm of X;

first, equation (4) is used, and the jth column of the expression matrix X is set to X_.jConverting equation (1) to equation (5):

wherein X is an expression matrix, T is a template set, C is a video candidate sample set, and λ is a regular term coefficient,p is the number of positive templates, n is the number of negative templates, X_.jRepresents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X)_.*) The operator expresses X adaptively_.jConversion to diagonal matrix Diag (X)_.j)；

Initializing the weight matrix W_j：

Where C is the set of video candidate samples, X_.jRepresents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X)_.*) The operator expresses X adaptively_.jConversion to diagonal matrix Diag (X)_.j)；

A simple form of equation (5) can then be obtained:

wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates_.jIs the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template,

is corresponding to the adaptive expression X_.jWeight matrix of (2), Diag (X)_.*) The operator expresses X adaptively_.jConversion to diagonal matrix Diag (X)_.j)；

It is obvious that equation (7) can be written as a sum equation:

m＝height(X)

wherein X is an expression matrix, T is a template set, T_.jIs the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient, p is the number of positive templates, n is the number of negative templates, m is the number of rows of the matrix X, X is the number of the positive templates_.jIs the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, X_i,jRepresents the value of the ith row and the jth column of the matrix X,

is corresponding to the adaptive expression X_.jThe weight matrix of (a) is determined,

representing a weight matrix

Column i of (1), operator expresses X adaptively_.jConversion to diagonal matrix Diag (X)_.j)；

The formula (9) is derived, and the analytical solution of the formula (9) can be obtained if the derivative is zero, wherein the derivative of the formula (9) is as follows:

wherein T is a template set, T_.jIs the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient,

is an expression matrix

Corresponding to the jth templateIs expressed in a self-adaptive manner,

is corresponding to an adaptive expression

The weight matrix of (a) is determined,

representing a weight matrix

The (c) th column of (a),

can be made correspondingly

Calculating to obtain;

and finally solving the formula (1) by using an iterative reweighting method. The solution process repeats the following iterative process until the results converge:

1) given a

Each column of matrix X is updated by:

wherein T is a template set, T_.jIs the jth column of the template set matrix T, C is the set of video candidate samples,

is an expression matrix

Corresponding to the adaptive representation of the jth template,

can be made correspondingly

Calculating to obtain;

2) given a

Updating each weight matrix by

Where C is the set of video candidate samples,

is an expression matrix

Corresponding to the adaptive expression of the jth template, Diag (X)_.*) Operators express themselves adaptively

Conversion into diagonal matrices

S15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target; the weight matrix W represents the similarity relation between the template and the candidate sample, and the tracking result can be more robust and accurate by introducing the weight matrix W.

Specifically, the step S15 specifically includes:

W_i,j＝exp(-||C_i-T_j||₂) (2)

wherein, C_iIs the ith candidate sample and T_jIs the jth template; w_i,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template; the elements of the weight matrix reflect the similarity relation between the candidate sample and the template, and the introduction of the weight matrix reduces the tracking error.

S151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { M_pos,M_neg}; just as the template set is divided into a positive template set and a negative template set according to the labels of the templates themselves, the expression map M is also divided into a positive expression map M according to the corresponding template label of each column_po_sAnd negative expression map M_neg。

sc_i＝∑M_i,pos-∑M_i,neg(3)

wherein M is_i,po_sIs a matrix M_po_sRow i of (1), representing the separable features of the ith candidate sample on the front template set, M_i,negIs a matrix M_negLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sc_iIs the separability score of the ith candidate sample; a larger positive eigenvalue means that the candidate sample is more similar to the positive template, whereas a larger negative eigenvalue means that the candidate sample is more similar to the negative template.

S153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame. If a candidate sample has a larger positive characteristic value and a smaller negative characteristic value, the candidate sample is more similar to the positive template and less similar to the negative template, and is more likely to be a tracked target, so that the candidate target with the highest identification score is selected as the tracked target.

s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; because a new image frame needs to be read in the camera or in the video, the operation of reading the current image frame needs to be executed once each time the method flow is executed, after the target template is updated, the step to be executed is to read the new image frame, and after S17, the step returns to execute S10; in an actual scene, the scale, posture and the like of a tracking target can change frequently, and if the tracking task is not stopped, the target template needs to be updated. The updated target template can contain the latest information of the tracking target, so that the following tracking task is more accurate.

Specifically, the step S17 specifically includes:

s170, calculating similarity vector η ═ (η)₁,η₂,…,η_p) Wherein η_iSimilarity between the current tracking result and the ith front template based on Euclidean distance; if the tracked target is shielded, which indicates that the tracked target is greatly different from the front template, the template cannot be updated by the currently tracked target. In order to determine whether to update the front template, the similarity between the current tracking result and the front template is calculated.

S171, calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, if the maximum similarity omega is larger than the updating threshold theta, using the current tracking result to update the front template corresponding to the maximum similarity, if the tracked target is shielded, the maximum similarity is smaller than the updating threshold, avoiding updating the template by the shielded target, so that the accuracy is reduced, if the tracked target is not shielded, the maximum similarity is larger than the updating threshold, using the tracked result to update the front template, enabling the front template to contain the latest target information, and improving the accuracy of the next tracking task.

where a and

respectively designated inner and outer thresholds, Q_nAnd Q_kRepresenting the center of the negative template and the center of the current trace result, respectively. In an actual scene, the background in the video often changes rapidly, so that the template needs to be updated by the background of the current frame, the negative template can contain the latest background information, and the accuracy of the following tracking task is improved.

In order to verify the method provided by the invention, 8 challenging image sequence evaluation method performances are selected in experimental simulation, and 12 advanced tracker methods are compared, wherein the method comprises 1) IVT, 2) APG L1, 3) PN, 4) TD, 5) MI L, 6) FragTrack, 7) MTT, 8) OSPT, 9) AS L AS, 10) L SAT, 11) SCM and 12) DSSM, and the following experiments are all based on Matlab 2019a, the computer memory is 8GB, and the CPU is 8GB

Core^TMi5 on a platform.

The experiment of the embodiment of the invention adopts two evaluation criteria: center point error and overlap ratio.

Center point error in image frame refers to tracking bounding box B_TAnd the central point of (3) corresponds toLabeling the real bounding Box B_GThe overlap ratio is calculated according to the formula provided by the PASCA L VOC challenge, i.e., the ratio of the overlap ratio to the mean center point error over the entire image sequence

And the average overlap ratio refers to the average of the overlap ratio of the tracker over the entire image sequence. A good tracker will not only have a small centroid error but also a large overlap ratio.

In the simulation experiment, the parameters were set as follows: the regularization parameter λ is set to 5, the sizes of the templates and the candidate samples are both adjusted to 20 × 20, the number of candidate samples is set to 500 to 600 according to the specific form of the test video image sequence, the positive template update threshold is set to 0.4, and the numbers of positive templates and negative templates are both set to 10. The regular parameter lambda, the number of the positive templates and the number of the negative templates are obtained by a cross validation method.

Table 1 shows the comparison of performance of various methods based on mean center point error, where Average represents the Average of all mean center point errors, and the smaller the mean center point error, the better the tracking performance; wherein Ours is the process of the invention.

TABLE 1 comparison of Performance of various methods based on mean center Point error

Table 2 shows a comparison of the performance of various methods based on Average overlap rate, where Average represents the Average of the Average overlap rate, and the larger the Average overlap rate, the better the tracking performance; wherein Ours is the process of the invention.

TABLE 2 comparison of Performance of various methods based on average overlap ratio

The comparison of mean center point error and mean overlap ratio are shown in tables 1 and 2, respectively. It follows that the last row of tables 1 and 2 shows the average result of each tracker over all image sequences. Compared with other advanced tracking methods, the method (i.e. the Ours method in the table) achieves competitive results on almost all image sequences, and the method has the best target tracking effect in the average results in the two tables. In summary, compared with the prior art, the target tracking method based on adaptive expression provided by the embodiment of the invention has the characteristics of strong robustness and high accuracy.

The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.

The method provided by the invention can generate an effective tracking effect in an intelligent auxiliary driving system of the automobile, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.

Example 2

As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present invention provides an adaptive representation-based target tracking apparatus, as shown in fig. 3, the apparatus includes:

wherein Q_nRepresents the center of the negative template,

is a specified outer threshold;

constructing a form panel consisting of a positive form panel and a negative form panelSet T ═ T_pos,T_neg]＝[t₁,t₂,…,t_p+n]This matrix includes all templates.

State variable z of candidate sample according to motion model_tSampling, i.e. with probability distribution

Sampling the state variables, wherein Σ is a diagonal covariance matrix;

wherein x is_iIs corresponding to the template t_iThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X₁,x₂,…,x_p+n]The Diag (x) operator expresses x adaptively_iConversion to diagonal matrix Diag (x)_i) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

the above equation (1) is solved using an iterative reweighting algorithm.

Further, the target tracking module is specifically configured to:

W_i,j＝exp(-||C_i-T_j||₂) (2)

wherein, C_iIs the ith candidate sample and T_jIs the jth template; w_i,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;

performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { M }_pos,M_neg}；

sc_i＝∑M_i,pos-∑M_i,neg(3)

wherein M is_i,po_sIs a matrix M_po_sRow i of (1), representing the separable features of the ith candidate sample on the front template set, M_i,negIs a matrix M_negLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sc_iIs the separability score of the ith candidate sample;

where a and

Example 3

As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides an adaptive representation-based target tracking terminal, which includes: a processor and a memory, wherein the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the steps of the adaptive expression-based target tracking method of embodiment 1.

Since the terminal is the device in the method in the embodiment of the present invention, and the principle of the device to solve the problem is similar to that of the method, the implementation of the method and the facility may refer to the implementation of the method, and repeated details are not repeated.

Specifically, as shown in fig. 2, the terminal includes: a processor 200 and a memory 201, wherein the memory 201 stores program code that, when executed by the processor 200, causes the processor 200 to perform the steps of:

s10: acquiring a current video frame;

s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; since a new image frame needs to be read in the camera or in the video, and the operation of reading the current image frame needs to be performed each time in the method flow, after the target template is updated, the step to be performed is to read the new image frame, and after S17, the step returns to perform S10.

As an optional implementation, the processor 200 is specifically configured to:

wherein Q_nRepresents the center of the negative template,

As an optional implementation, the processor 200 is specifically configured to:

As an optional implementation, the processor 200 is specifically configured to:

in the step S14, the following formula (1) is adopted to calculate the adaptive expression matrix X corresponding to the template set T:

wherein x is_iIs corresponding to the template t_iThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X₁,x₂,…,x_p+n]The Diag (x) operator expresses x adaptively_iConversion to diagonal matrix Diag (x)_i) (ii) a C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;

As an optional implementation, the processor 200 is specifically configured to:

W_i,j＝exp(-||C_i-T_j||₂) (2)

s151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { M_po_s,M_neg}；

sc_i＝∑M_i,pos-∑M_i,neg(3)

wherein M is_i,po_sIs a matrix M_po_sRow i of (1), representing the separable features of the ith candidate sample on the front template set, M_i,negIs momentMatrix M_negLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sc_iIs the separability score of the ith candidate sample;

As an optional implementation, the processor 200 is specifically configured to:

s170, calculating similarity vector η ═ (η)₁,η₂,…,η_p) Wherein η_iSimilarity between the current tracking result and the ith front template based on Euclidean distance;

where a and

Example 4

As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the adaptive expression-based target tracking method in embodiment 1.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A target tracking method based on adaptive expression is characterized by comprising the following steps:

s10: acquiring a current video frame;

s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;

2. The method for tracking an object based on adaptive expression of claim 1, wherein the step S12 specifically includes:

s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates;

s121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image_pos＝[t₁,t₂,...,t_p]And the front template meets the following condition, | | Q_p-Q₀||<σ, wherein Q_pRepresenting the center of the front template, Q₀Represents the center of the initial target, σ is a specified inner threshold;

selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image_neg＝[t_p+1,t_p+2,...,t_p+n]And the negative template satisfies the following conditions,

wherein Q_nRepresents the center of the negative template,

is a specified outer threshold;

constructing a template set T ═ T composed of positive templates and negative templates_pos,T_neg]＝[t₁,t₂,...,t_p+n]This matrix includes all templates.

3. The method for tracking an object based on adaptive expression of claim 1, wherein the step S13 specifically includes:

s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image_t-1；

Sampling the state variables, wherein Σ is a diagonal covariance matrix;

s132: from the sampled state variable z_tCandidate samples are selected in the current image frame, and a candidate sample set C is constructed.

4. The method for tracking an object based on adaptive expression of claim 1, wherein the adaptive expression matrix X corresponding to the template set T is calculated in step S14 by using the following formula (1):

5. The method for tracking an object based on adaptive expression of claim 1, wherein the step S15 specifically includes:

W_i,j＝exp(-||C_i-T_j||₂) (2)

sc_i＝∑M_i,pos-∑M_i,neg(3)

s153: and selecting the candidate sample with the highest separability score as the target tracked by the current image frame.

6. The method for tracking an object based on adaptive expression of claim 1, wherein the step S17 specifically includes:

s172: selecting a specified number of negative templates in the vicinity of the current trace resultAs a new set of negative templates T_negAnd the negative template satisfies the following conditions,

where a and

7. The target tracking method based on the adaptive expression is characterized in that the method is suitable for an intelligent auxiliary driving system of an automobile, road information in front is acquired in real time in an image mode through a vehicle-mounted camera, and real-time tracking of a front vehicle target is achieved.

8. An adaptive representation-based target tracking apparatus, comprising:

9. An adaptive representation-based target tracking terminal, the terminal comprising: a processor and a memory, wherein the memory has stored therein a program code, which when executed by the processor causes the processor to perform the steps of the adaptive expression based object tracking method of any one of claims 1 to 7.

10. A computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the steps of an adaptive representation-based object tracking method according to any one of claims 1 to 7.