Disclosure of Invention
The present invention provides a target tracking method, apparatus and terminal based on adaptive expression to solve the above problems, and the present invention explores and utilizes the structural relationship between candidate samples for tracking the specified target in the video, thereby implementing accurate tracking of the video target in motion state.
The invention is realized by the following technical scheme:
in a first aspect, the present invention provides a target tracking method based on adaptive expression, including the following steps:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: the target template is updated using the result tracked in the current image frame, and execution returns to S10.
The working principle is as follows: based on video moving target tracking, the sparse tracking method in the prior art has certain problems, only the relation information between templates is explored and utilized, and the relation between candidate samples is ignored; this is because with the strategy of dense sampling of candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
The method provided by the invention is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Further, the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, for the convenience of calculation, the ith front template is converted into the form t of a vector i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a finger near the position of the tracking target in the first frame imageA certain number of negative templates form a negative template set T
neg =[t
p+1 ,t
p+2 ,…,t
p+n ]And the negative template satisfies the following conditions,
wherein Q
n Represents the center of the negative template,
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation
j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templates pos ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.
Further, the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 (ii) a State variable z t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion model
t Sampling, i.e. with probability distribution
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample
t To carry outAnd sampling, namely realizing the sampling of the candidate sample in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame
t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
Further, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
Further, the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template, W i,j Is the value of the ith row and jth column of the weight matrix WAlso represents the similarity between the ith candidate sample and the jth template;
s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg };
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,pos Is a matrix M pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample;
s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so that the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the step S17 specifically includes:
s170: calculating the similarity vector eta ═ eta (eta) 1 ,η 2 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
s171: calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T
neg And the negative template satisfies the following conditions,
where a and
respectively designated inner and outer thresholds, Q
n And Q
k Representing the center of the negative template and the center of the current trace result, respectively.
Furthermore, the method is suitable for an intelligent auxiliary driving system of an automobile, and road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that real-time tracking of a front vehicle target is realized.
In a second aspect, the present invention provides an adaptive representation-based target tracking apparatus, including:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
Further, the tracking target template building module is specifically configured to:
selecting a specified number of front templates to construct a front template near the position of the tracking target in the first frame imageSet of surface templates T p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image
neg =[t
p+1 ,t
p+2 ,…,t
p+n ]And the negative template satisfies the following conditions,
wherein Q
n Represents the center of the negative template,
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates.
Further, the tracking target sample generation module is specifically configured to:
obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 ;
State variable z of candidate sample according to motion model
t Sampling, i.e. with probability distribution
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed.
Further, the adaptive expression calculation module is specifically configured to:
calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
the above equation (1) is solved using an iterative reweighting algorithm.
Further, the target tracking module is specifically configured to:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template, W i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg };
Calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,pos Is a matrix M pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the ithA separability score of the candidate sample;
the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the tracking target template updating module is specifically configured to:
calculating the similarity vector eta ═ eta (eta) 1 ,η 2 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
selecting a specified number of negative templates near the current tracking result as a new negative template set T
neg And the negative template satisfies the following conditions,
where a and
respectively designated inner and outer thresholds, Q
n And Q
k Representing the center of the negative template and the center of the current trace result, respectively.
In a third aspect, the present invention provides an adaptive representation-based target tracking terminal, including: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor, causes the processor to perform the steps of the above-described adaptive expression-based object tracking method.
In a fourth aspect, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the above-described adaptive expression-based object tracking method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the invention, the candidate samples are combined with the template, the candidate sample matrix is used as a dictionary, and the template is used as a regression target, so that the calculation burden of the algorithm can be effectively reduced; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between candidate samples, and has the characteristics of high accuracy and strong robustness.
2. The method is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Detailed Description
To make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 to 3, fig. 1 is a flow chart of a target tracking method based on adaptive expression, the target tracking method based on adaptive expression of the present invention is applied to an intelligent auxiliary driving system of an automobile for tracking, and road information in front is acquired in real time in an image manner by a vehicle-mounted camera, so that a front vehicle target is accurately tracked in real time; the method comprises the following steps:
s10: acquiring a current video frame;
the invention can acquire the current image frame through the video stream or the camera.
S11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracked vehicle target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracked vehicle target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;
the positive template in the invention is used for modeling information contained in a tracked vehicle target, and the negative template in the invention is used for modeling background information. By using the positive template and the negative template, when the situation that the tracked vehicle target is mixed with the background occurs, the target tracking method based on the self-adaptive expression can also identify the target, and the tracking of the vehicle target is realized.
Specifically, the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image
neg =[t
p+1 ,t
p+2 ,…,t
p+n ]And the negative template satisfies the following conditions,
wherein Q
n Represents the center of the negative template,
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation
j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.
S13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance; since the motion of the tracking target in the video usually has continuity, the position of the tracking target in the current frame has a great relationship with the position of the tracking target in the previous frame. To simplify the model, the present invention models the motion model using a Gaussian distribution.
Specifically, the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 (ii) a State variable z t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion model
t Sampling, i.e. with probability distribution
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample
t And sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame
t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
S14: calculation adaptive tableTo achieve: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix; wherein the values of the adaptive expression matrix elements represent the relationship between the template and the corresponding candidate samples. For example, X i,j The relationship between the jth template and the ith candidate sample is shown. In general, the larger the matrix element value, the stronger the relationship between a sample and the corresponding candidate sample.
Specifically, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
when the candidate samples are each different, which means that the candidate sample matrix C is column-full-rank, the regularization is decomposed into
When there is a high correlation between the candidate samples, this means that the rank of the sample matrix C is 1, and C
T C=11
T (1 is a vector with element sizes of one), the regularization becomes
The candidate sample matrix C is not full rank, as studies have shown, it typically has a rank number greater than 1. In summary, the regularization term is/
1 Norm and l
2 A balance between norms, i.e. | | x
i ||
2 <||CDiag(x
i )||
* <||x
i ||
1 And dynamic adjustment can be carried out according to the internal structure of the matrix sample C, so as to obtain a sparse expression x adaptive to the structure relation of the candidate samples
i 。
In step S14, the above equation (1) is solved using an iterative reweighting algorithm. For a clearer explanation of the solution, the solution idea of equation (1) is explained in this embodiment:
wherein X represents a matrix in a general sense, | | X | | Y
* Representing the nuclear norm of a matrix X, X
T Represents the transpose of the matrix X, (XX)
T )
-1/4 Representative matrix XX
T The root of the fourth square of the root,
representative matrix (XX)
T )
-1/4 The square of the frobenius norm of X;
first, equation (4) is used, and the jth column of the expression matrix X is set to X .j Converting equation (1) to equation (5):
wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates .j Represents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X) .* ) The operator expresses X adaptively .j Conversion to diagonal matrix Diag (X) .j );
Initializing the weight matrix W j :
Where C is the set of video candidate samples, X .j Represents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X) .* ) The operator expresses X adaptively .j Conversion to diagonal matrix Diag (X) .j );
A simple form of equation (5) can then be obtained:
wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates
.j Is the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template,
is corresponding to the adaptive expression X
.j Weight matrix of (2), Diag (X)
.* ) The operator expresses X adaptively
.j Conversion to diagonal matrix Diag (X)
.j );
It is obvious that equation (7) can be written as a sum equation:
m=height(X)
wherein X is an expression matrix, T is a template set, T
.j Is the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient, p is the number of positive templates, n is the number of negative templates, and m is that of the matrix XNumber of lines, X
.j Is the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, X
i,j Represents the value of the ith row and the jth column of the matrix X,
is corresponding to the adaptive expression X
.j The weight matrix of (a) is determined,
representing a weight matrix
Column i of (1), operator expresses X adaptively
.j Conversion to diagonal matrix Diag (X)
.j );
The formula (9) is derived, and the analytical solution of the formula (9) can be obtained if the derivative is zero, wherein the derivative of the formula (9) is as follows:
wherein T is a template set, T
.j Is the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient,
is an expression matrix
Corresponding to the adaptive representation of the jth template,
is corresponding to an adaptive expression
The weight matrix of (a) is determined,
representing a weight matrix
The (c) th column of (a),
can be made correspondingly
Calculating to obtain;
and finally solving the formula (1) by using an iterative reweighting method. The solution process repeats the following iterative process until the results converge:
1) given a
Each column of matrix X is updated by:
wherein T is a template set, T
.j Is the jth column of the template set matrix T, C is the set of video candidate samples,
is an expression matrix
Corresponding to the adaptive representation of the jth template,
can be made correspondingly
Calculating to obtain;
2) given a
Updating each weight matrix by
Where C is the set of video candidate samples,
is an expression matrix
Corresponding to the adaptive expression of the jth template, Diag (X)
.* ) Operators express themselves adaptively
Conversion into diagonal matrices
S15: tracking by expression: constructing a weight matrix, performing dot multiplication on the weight matrix and the adaptive expression matrix obtained in S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of a corresponding candidate sample, calculating the identification score of each candidate sample through the separable characteristics, and the candidate sample with the highest identification score is the tracked target; the weight matrix W represents the similarity relation between the template and the candidate sample, and the tracking result can be more robust and accurate by introducing the weight matrix W.
Specifically, the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template; w i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template; elements of the weight matrix reflect the similarity relation between the candidate sample and the template, and the tracking error is reduced by introducing the weight matrix.
S151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg }; just as the template set is divided into a positive template set and a negative template set according to the labels of the templates themselves, the expression map M is also divided into a positive expression map M according to the corresponding template label of each column p o s And negative expression map M neg 。
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,p o s Is a matrix M p o s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample; a larger positive eigenvalue means that the candidate sample is more similar to the positive template, whereas a larger negative eigenvalue means that the candidate sample is more similar to the negative template.
S153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame. If a candidate sample has a larger positive characteristic value and a smaller negative characteristic value, the candidate sample is more similar to the positive template and less similar to the negative template, and is more likely to be a tracked target, so that the candidate target with the highest identification score is selected as the tracked target.
S16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; because a new image frame needs to be read in the camera or in the video, the operation of reading the current image frame needs to be executed once each time the method flow is executed, after the target template is updated, the step to be executed is to read the new image frame, and after S17, the step returns to execute S10; in an actual scene, the scale, posture and the like of a tracking target can change frequently, and if the tracking task is not stopped, the target template needs to be updated. The updated target template can contain the latest information of the tracking target, so that the following tracking task is more accurate.
Specifically, the step S17 specifically includes:
s170: calculating the similarity vector eta ═ eta (eta) 1 ,η 2 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance; if the tracked target is shielded, which indicates that the tracked target is greatly different from the front template, the template cannot be updated by the currently tracked target. In order to determine whether to update the front template, the similarity between the current tracking result and the front template is calculated.
S171: calculating the maximum similarity omega to max (eta), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is greater than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result; if the tracked target is shielded, the maximum similarity is smaller than the updating threshold, so that the condition that the template is updated by the shielded target is avoided, and the accuracy is reduced; if the tracked target is not shielded, the maximum similarity is larger than the updating threshold, and the tracked result is used for updating the front template, so that the front template can contain the latest target information, and the accuracy of the next tracking task is improved.
S172: selecting a specified number of negative templates near the current tracking result as a new negative template set T
neg And the negative template satisfies the following conditions,
where a and
respectively designated inner and outer thresholds, Q
n And Q
k Representing the center of the negative template and the center of the current trace result, respectively. In an actual scene, the background in the video often changes rapidly, so that the template needs to be updated by the background of the current frame, the negative template can contain the latest background information, and the accuracy of the following tracking task is improved.
In the embodiment, the running vehicle target can be accurately tracked under the conditions of dynamic blurring, illumination change, rotation, background confusion and the like of the running vehicle tracking target. In order to verify the method provided by the invention, 8 challenging image sequence evaluation method performances are selected in experimental simulation, and 12 advanced tracker methods are compared, wherein the method comprises the following steps: 1) IVT; 2) APGL 1; 3) PN; 4) TD; 5) MIL; 6) FragTrack; 7) MTT; 8) OSPT; 9) ASLAS; 10) LSAT; 11) an SCM; 12) and (4) DSSM. The following experiments are all based on Matlab 2019a, the computer memory is 8GB, and the CPU is
Core
TM i5 on a platform.
The experiment of the embodiment of the invention adopts two evaluation criteria: center point error and overlap ratio.
Center point error in image frame refers to tracking bounding box B
T The central point of (A) and the corresponding labeled real boundary frame B
G Euclidean distance between the center points. And mean center point error refers to the average of the center point error of the tracker over the entire image sequence. The overlap ratio is calculated according to the formula provided by the PASCALVOC challenge, i.e.
And the average overlap ratio refers to the average of the overlap ratio of the tracker over the entire image sequence. A good tracker will not only have a small centroid error but also a large overlap ratio.
In the simulation experiment, the parameters were set as follows: the regularization parameter λ is set to 5, the sizes of the templates and the candidate samples are both adjusted to 20 × 20, the number of candidate samples is set to 500 to 600 according to the specific form of the test video image sequence, the positive template update threshold is set to 0.4, and the numbers of positive templates and negative templates are both set to 10. The regular parameter lambda, the number of the positive templates and the number of the negative templates are obtained by a cross validation method.
Table 1 shows the comparison of performance of various methods based on mean center point error, where Average represents the Average of all mean center point errors, and the smaller the mean center point error, the better the tracking performance; wherein Ours is the process of the invention.
TABLE 1 comparison of Performance of various methods based on mean center Point error
Table 2 shows a comparison of the performance of various methods based on Average overlap rate, where Average represents the Average of the Average overlap rate, and the larger the Average overlap rate, the better the tracking performance; wherein Ours is the process of the invention.
TABLE 2 comparison of Performance of various methods based on average overlap ratio
The comparison of mean center point error and mean overlap ratio are shown in tables 1 and 2, respectively. It follows that the last row of tables 1 and 2 shows the average result of each tracker over all image sequences. Compared with other advanced tracking methods, the method (i.e. the Ours method in the table) achieves competitive results on almost all image sequences, and the method has the best target tracking effect in the average results in the two tables. In summary, compared with the prior art, the target tracking method based on adaptive expression provided by the embodiment of the invention has the characteristics of strong robustness and high accuracy.
The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
The method provided by the invention can generate an effective tracking effect in an intelligent auxiliary driving system of the automobile, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Example 2
As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present invention provides an adaptive representation-based target tracking apparatus, as shown in fig. 3, the apparatus includes:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
Further, the tracking target template building module is specifically configured to:
selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image
neg =[t
p+1 ,t
p+2 ,…,t
p+n ]And the negative template satisfies the following conditions,
wherein Q
n Represents the center of the negative template,
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates.
Further, the tracking target sample generation module is specifically configured to:
obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 ;
State variable z of candidate sample according to motion model
t Sampling, i.e. with probability distribution
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed.
Further, the adaptive expression calculation module is specifically configured to:
calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ]The Diag (x) operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
the above equation (1) is solved using an iterative reweighting algorithm.
Further, the target tracking module is specifically configured to:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template; w is a group of i,j Is the value of the ith row and jth column of the weight matrix WThe time also represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, wherein M is W X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M pos ,M neg };
Calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,p o s Is a matrix M p o s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample;
the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the tracking target template updating module is specifically configured to:
calculating the similarity vector eta ═ eta (eta) 1 ,η 2 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
selecting a specified number of negative templates near the current tracking result as a new negative template set T
neg And the negative template satisfies the following conditions,
where a and
respectively designated inner and outer thresholds, Q
n And Q
k Representing the center of the negative template and the center of the current trace result, respectively.
Example 3
As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides an adaptive representation-based target tracking terminal, which includes: a processor and a memory, wherein the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the steps of the adaptive expression-based target tracking method of embodiment 1.
Since the terminal is the device in the method in the embodiment of the present invention, and the principle of the device to solve the problem is similar to that of the method, the implementation of the method and the facility may refer to the implementation of the method, and repeated details are not repeated.
Specifically, as shown in fig. 2, the terminal includes: a processor 200 and a memory 201, wherein the memory 201 stores program code that, when executed by the processor 200, causes the processor 200 to perform the steps of:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; since a new image frame needs to be read in the camera or in the video, and the operation of reading the current image frame needs to be performed each time in the method flow, after the target template is updated, the step to be performed is to read the new image frame, and after S17, the step returns to perform S10.
As an optional implementation, the processor 200 is specifically configured to:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image
neg =[t
p+1 ,t
p+2 ,…,t
p+n ]And the negative template satisfies the following conditions,
wherein Q
n Representing the center of the negative template,
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation
j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.
As an optional implementation, the processor 200 is specifically configured to:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 (ii) a State variable z t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion model
t Sampling, i.e. with probability distribution
Sampling state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample
t And sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, and the change of the state variable is modeled by using Gaussian distribution without loss of generality. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame
t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
As an optional implementation, the processor 200 is specifically configured to:
in the step S14, the following formula (1) is adopted to calculate the adaptive expression matrix X corresponding to the template set T:
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ]The Diag (x) operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) (ii) a C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
As an optional implementation, the processor 200 is specifically configured to:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template; w i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M p o s ,M neg };
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,p o s Is a matrix M p o s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the ithA separability score for the candidate samples;
s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
As an optional implementation, the processor 200 is specifically configured to:
s170: calculating the similarity vector eta ═ eta (eta) 1 ,η 2 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
s171: calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T
neg And the negative template satisfies the following conditions,
where a and
respectively designated inner and outer thresholds, Q
n And Q
k Representing the center of the negative template and the center of the current trace result, respectively.
Example 4
As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the adaptive expression-based target tracking method in embodiment 1.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.