CN111429481B - Target tracking method, device and terminal based on adaptive expression - Google Patents

Target tracking method, device and terminal based on adaptive expression Download PDF

Info

Publication number
CN111429481B
CN111429481B CN202010192778.2A CN202010192778A CN111429481B CN 111429481 B CN111429481 B CN 111429481B CN 202010192778 A CN202010192778 A CN 202010192778A CN 111429481 B CN111429481 B CN 111429481B
Authority
CN
China
Prior art keywords
template
tracking
target
expression
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010192778.2A
Other languages
Chinese (zh)
Other versions
CN111429481A (en
Inventor
米建勋
李人杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010192778.2A priority Critical patent/CN111429481B/en
Publication of CN111429481A publication Critical patent/CN111429481A/en
Application granted granted Critical
Publication of CN111429481B publication Critical patent/CN111429481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method, a target tracking device and a target tracking terminal based on self-adaptive expression, and discloses a target tracking method and a target tracking device based on self-adaptive expression, which are used for tracking a specified target in a video and realizing accurate tracking of the video target in a motion state. The method comprises the following steps: acquiring a current video frame; checking whether the template exists; initializing a tracking target template; generating a candidate sample; calculating an adaptive expression; tracking by expression; checking whether the tracking task is completed; and updating the tracking target template. Compared with the prior art, the method uses the self-adaptive expression model, namely the sparse expression can be subjected to self-adaptive change according to the structural relationship between candidate samples, and the negative template is introduced to improve the model discrimination, so that the target tracking method based on the automatic expression has strong robustness and high accuracy; the method is particularly suitable for generating effective accurate tracking in an intelligent auxiliary driving system of an automobile.

Description

Target tracking method, device and terminal based on adaptive expression
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a target tracking method, a target tracking device and a target tracking terminal based on adaptive expression.
Background
With the development of artificial intelligence technology, visual tracking plays an increasingly important role, and application scenes such as video monitoring, human-computer interaction, behavior recognition, action analysis and the like are wide and are applied to the visual tracking technology. Although researchers today propose many different tracking algorithms, the tracking problem still faces many challenges that have not been fully solved, such as fast motion of the target, occlusion, changes in the target's dimensions, pose and illumination, which all require further research and learning.
Currently, sparse representation algorithms are successfully applied to the tracking problem. Compared with other tracking methods, the sparse tracking method can well process the situation when the tracked target is shielded, and has higher robustness. The traditional sparse tracking method takes templates which can be dynamically updated in the tracking process as dictionaries, and sparsely expresses candidate samples on the dictionaries. After learning the sparse representation of these candidate samples, the candidate sample with the smallest reconstruction error is selected as the tracked target.
However, the current sparse tracking method still has certain problems, and the current sparse tracking method ignores the relation between candidate samples. Because of the strategy of densely sampling the candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The existing sparse tracking method only explores and utilizes the relationship information between templates and ignores the relationship between candidate samples.
Disclosure of Invention
The present invention provides a target tracking method, apparatus and terminal based on adaptive expression to solve the above problems, and the present invention explores and utilizes the structural relationship between candidate samples for tracking the specified target in the video, thereby implementing accurate tracking of the video target in motion state.
The invention is realized by the following technical scheme:
in a first aspect, the present invention provides a target tracking method based on adaptive expression, including the following steps:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: the target template is updated using the result tracked in the current image frame, and execution returns to S10.
The working principle is as follows: based on video moving target tracking, the sparse tracking method in the prior art has certain problems, only the relation information between templates is explored and utilized, and the relation between candidate samples is ignored; this is because with the strategy of dense sampling of candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
The method provided by the invention is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Further, the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, for the convenience of calculation, the ith front template is converted into the form t of a vector i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a finger near the position of the tracking target in the first frame imageA certain number of negative templates form a negative template set T neg =[t p+1 ,t p+2 ,…,t p+n ]And the negative template satisfies the following conditions,
Figure BDA0002416513070000032
wherein Q n Represents the center of the negative template,
Figure BDA0002416513070000033
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templates pos ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.
Further, the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 (ii) a State variable z t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion model t Sampling, i.e. with probability distribution
Figure BDA0002416513070000031
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample t To carry outAnd sampling, namely realizing the sampling of the candidate sample in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
Further, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:
Figure BDA0002416513070000041
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
Further, the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template, W i,j Is the value of the ith row and jth column of the weight matrix WAlso represents the similarity between the ith candidate sample and the jth template;
s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg };
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,pos Is a matrix M pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample;
s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so that the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the step S17 specifically includes:
s170: calculating the similarity vector eta ═ eta (eta) 12 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
s171: calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T neg And the negative template satisfies the following conditions,
Figure BDA0002416513070000051
where a and
Figure BDA0002416513070000052
respectively designated inner and outer thresholds, Q n And Q k Representing the center of the negative template and the center of the current trace result, respectively.
Furthermore, the method is suitable for an intelligent auxiliary driving system of an automobile, and road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that real-time tracking of a front vehicle target is realized.
In a second aspect, the present invention provides an adaptive representation-based target tracking apparatus, including:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
Further, the tracking target template building module is specifically configured to:
selecting a specified number of front templates to construct a front template near the position of the tracking target in the first frame imageSet of surface templates T p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image neg =[t p+1 ,t p+2 ,…,t p+n ]And the negative template satisfies the following conditions,
Figure BDA0002416513070000053
wherein Q n Represents the center of the negative template,
Figure BDA0002416513070000054
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates.
Further, the tracking target sample generation module is specifically configured to:
obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1
State variable z of candidate sample according to motion model t Sampling, i.e. with probability distribution
Figure BDA0002416513070000061
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed.
Further, the adaptive expression calculation module is specifically configured to:
calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):
Figure BDA0002416513070000062
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
the above equation (1) is solved using an iterative reweighting algorithm.
Further, the target tracking module is specifically configured to:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template, W i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg };
Calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,pos Is a matrix M pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the ithA separability score of the candidate sample;
the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the tracking target template updating module is specifically configured to:
calculating the similarity vector eta ═ eta (eta) 12 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
selecting a specified number of negative templates near the current tracking result as a new negative template set T neg And the negative template satisfies the following conditions,
Figure BDA0002416513070000071
where a and
Figure BDA0002416513070000072
respectively designated inner and outer thresholds, Q n And Q k Representing the center of the negative template and the center of the current trace result, respectively.
In a third aspect, the present invention provides an adaptive representation-based target tracking terminal, including: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor, causes the processor to perform the steps of the above-described adaptive expression-based object tracking method.
In a fourth aspect, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the above-described adaptive expression-based object tracking method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the invention, the candidate samples are combined with the template, the candidate sample matrix is used as a dictionary, and the template is used as a regression target, so that the calculation burden of the algorithm can be effectively reduced; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between candidate samples, and has the characteristics of high accuracy and strong robustness.
2. The method is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flowchart of a target tracking method based on adaptive expression according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a target tracking terminal based on adaptive expression according to an embodiment of the present invention;
fig. 3 is a schematic diagram of another adaptive expression-based target tracking apparatus according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 to 3, fig. 1 is a flow chart of a target tracking method based on adaptive expression, the target tracking method based on adaptive expression of the present invention is applied to an intelligent auxiliary driving system of an automobile for tracking, and road information in front is acquired in real time in an image manner by a vehicle-mounted camera, so that a front vehicle target is accurately tracked in real time; the method comprises the following steps:
s10: acquiring a current video frame;
the invention can acquire the current image frame through the video stream or the camera.
S11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracked vehicle target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracked vehicle target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;
the positive template in the invention is used for modeling information contained in a tracked vehicle target, and the negative template in the invention is used for modeling background information. By using the positive template and the negative template, when the situation that the tracked vehicle target is mixed with the background occurs, the target tracking method based on the self-adaptive expression can also identify the target, and the tracking of the vehicle target is realized.
Specifically, the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image neg =[t p+1 ,t p+2 ,…,t p+n ]And the negative template satisfies the following conditions,
Figure BDA0002416513070000092
wherein Q n Represents the center of the negative template,
Figure BDA0002416513070000093
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.
S13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance; since the motion of the tracking target in the video usually has continuity, the position of the tracking target in the current frame has a great relationship with the position of the tracking target in the previous frame. To simplify the model, the present invention models the motion model using a Gaussian distribution.
Specifically, the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 (ii) a State variable z t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion model t Sampling, i.e. with probability distribution
Figure BDA0002416513070000091
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample t And sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
S14: calculation adaptive tableTo achieve: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix; wherein the values of the adaptive expression matrix elements represent the relationship between the template and the corresponding candidate samples. For example, X i,j The relationship between the jth template and the ith candidate sample is shown. In general, the larger the matrix element value, the stronger the relationship between a sample and the corresponding candidate sample.
Specifically, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:
Figure BDA0002416513070000101
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
when the candidate samples are each different, which means that the candidate sample matrix C is column-full-rank, the regularization is decomposed into
Figure BDA0002416513070000102
When there is a high correlation between the candidate samples, this means that the rank of the sample matrix C is 1, and C T C=11 T (1 is a vector with element sizes of one), the regularization becomes
Figure BDA0002416513070000103
The candidate sample matrix C is not full rank, as studies have shown, it typically has a rank number greater than 1. In summary, the regularization term is/ 1 Norm and l 2 A balance between norms, i.e. | | x i || 2 <||CDiag(x i )|| * <||x i || 1 And dynamic adjustment can be carried out according to the internal structure of the matrix sample C, so as to obtain a sparse expression x adaptive to the structure relation of the candidate samples i
In step S14, the above equation (1) is solved using an iterative reweighting algorithm. For a clearer explanation of the solution, the solution idea of equation (1) is explained in this embodiment:
known formula (4):
Figure BDA0002416513070000104
wherein X represents a matrix in a general sense, | | X | | Y * Representing the nuclear norm of a matrix X, X T Represents the transpose of the matrix X, (XX) T ) -1/4 Representative matrix XX T The root of the fourth square of the root,
Figure BDA0002416513070000105
representative matrix (XX) T ) -1/4 The square of the frobenius norm of X;
first, equation (4) is used, and the jth column of the expression matrix X is set to X .j Converting equation (1) to equation (5):
Figure BDA0002416513070000106
wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates .j Represents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X) .* ) The operator expresses X adaptively .j Conversion to diagonal matrix Diag (X) .j );
Initializing the weight matrix W j
Figure BDA0002416513070000111
Where C is the set of video candidate samples, X .j Represents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X) .* ) The operator expresses X adaptively .j Conversion to diagonal matrix Diag (X) .j );
A simple form of equation (5) can then be obtained:
Figure BDA0002416513070000112
wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates .j Is the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template,
Figure BDA0002416513070000113
is corresponding to the adaptive expression X .j Weight matrix of (2), Diag (X) .* ) The operator expresses X adaptively .j Conversion to diagonal matrix Diag (X) .j );
It is obvious that equation (7) can be written as a sum equation:
m=height(X)
Figure BDA0002416513070000114
Figure BDA0002416513070000115
Figure BDA0002416513070000116
wherein X is an expression matrix, T is a template set, T .j Is the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient, p is the number of positive templates, n is the number of negative templates, and m is that of the matrix XNumber of lines, X .j Is the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, X i,j Represents the value of the ith row and the jth column of the matrix X,
Figure BDA0002416513070000117
is corresponding to the adaptive expression X .j The weight matrix of (a) is determined,
Figure BDA0002416513070000118
representing a weight matrix
Figure BDA0002416513070000119
Column i of (1), operator expresses X adaptively .j Conversion to diagonal matrix Diag (X) .j );
The formula (9) is derived, and the analytical solution of the formula (9) can be obtained if the derivative is zero, wherein the derivative of the formula (9) is as follows:
Figure BDA0002416513070000121
Figure BDA0002416513070000122
wherein T is a template set, T .j Is the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient,
Figure BDA0002416513070000123
is an expression matrix
Figure BDA00024165130700001220
Corresponding to the adaptive representation of the jth template,
Figure BDA0002416513070000124
is corresponding to an adaptive expression
Figure BDA0002416513070000125
The weight matrix of (a) is determined,
Figure BDA0002416513070000126
representing a weight matrix
Figure BDA0002416513070000127
The (c) th column of (a),
Figure BDA0002416513070000128
can be made correspondingly
Figure BDA0002416513070000129
Calculating to obtain;
and finally solving the formula (1) by using an iterative reweighting method. The solution process repeats the following iterative process until the results converge:
1) given a
Figure BDA00024165130700001210
Each column of matrix X is updated by:
Figure BDA00024165130700001211
wherein T is a template set, T .j Is the jth column of the template set matrix T, C is the set of video candidate samples,
Figure BDA00024165130700001212
is an expression matrix
Figure BDA00024165130700001221
Corresponding to the adaptive representation of the jth template,
Figure BDA00024165130700001213
can be made correspondingly
Figure BDA00024165130700001214
Calculating to obtain;
2) given a
Figure BDA00024165130700001222
Updating each weight matrix by
Figure BDA00024165130700001215
Figure BDA00024165130700001216
Where C is the set of video candidate samples,
Figure BDA00024165130700001217
is an expression matrix
Figure BDA00024165130700001223
Corresponding to the adaptive expression of the jth template, Diag (X) .* ) Operators express themselves adaptively
Figure BDA00024165130700001218
Conversion into diagonal matrices
Figure BDA00024165130700001219
S15: tracking by expression: constructing a weight matrix, performing dot multiplication on the weight matrix and the adaptive expression matrix obtained in S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of a corresponding candidate sample, calculating the identification score of each candidate sample through the separable characteristics, and the candidate sample with the highest identification score is the tracked target; the weight matrix W represents the similarity relation between the template and the candidate sample, and the tracking result can be more robust and accurate by introducing the weight matrix W.
Specifically, the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template; w i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template; elements of the weight matrix reflect the similarity relation between the candidate sample and the template, and the tracking error is reduced by introducing the weight matrix.
S151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg }; just as the template set is divided into a positive template set and a negative template set according to the labels of the templates themselves, the expression map M is also divided into a positive expression map M according to the corresponding template label of each column p o s And negative expression map M neg
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,p o s Is a matrix M p o s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample; a larger positive eigenvalue means that the candidate sample is more similar to the positive template, whereas a larger negative eigenvalue means that the candidate sample is more similar to the negative template.
S153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame. If a candidate sample has a larger positive characteristic value and a smaller negative characteristic value, the candidate sample is more similar to the positive template and less similar to the negative template, and is more likely to be a tracked target, so that the candidate target with the highest identification score is selected as the tracked target.
S16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; because a new image frame needs to be read in the camera or in the video, the operation of reading the current image frame needs to be executed once each time the method flow is executed, after the target template is updated, the step to be executed is to read the new image frame, and after S17, the step returns to execute S10; in an actual scene, the scale, posture and the like of a tracking target can change frequently, and if the tracking task is not stopped, the target template needs to be updated. The updated target template can contain the latest information of the tracking target, so that the following tracking task is more accurate.
Specifically, the step S17 specifically includes:
s170: calculating the similarity vector eta ═ eta (eta) 12 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance; if the tracked target is shielded, which indicates that the tracked target is greatly different from the front template, the template cannot be updated by the currently tracked target. In order to determine whether to update the front template, the similarity between the current tracking result and the front template is calculated.
S171: calculating the maximum similarity omega to max (eta), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is greater than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result; if the tracked target is shielded, the maximum similarity is smaller than the updating threshold, so that the condition that the template is updated by the shielded target is avoided, and the accuracy is reduced; if the tracked target is not shielded, the maximum similarity is larger than the updating threshold, and the tracked result is used for updating the front template, so that the front template can contain the latest target information, and the accuracy of the next tracking task is improved.
S172: selecting a specified number of negative templates near the current tracking result as a new negative template set T neg And the negative template satisfies the following conditions,
Figure BDA0002416513070000143
where a and
Figure BDA0002416513070000144
respectively designated inner and outer thresholds, Q n And Q k Representing the center of the negative template and the center of the current trace result, respectively. In an actual scene, the background in the video often changes rapidly, so that the template needs to be updated by the background of the current frame, the negative template can contain the latest background information, and the accuracy of the following tracking task is improved.
In the embodiment, the running vehicle target can be accurately tracked under the conditions of dynamic blurring, illumination change, rotation, background confusion and the like of the running vehicle tracking target. In order to verify the method provided by the invention, 8 challenging image sequence evaluation method performances are selected in experimental simulation, and 12 advanced tracker methods are compared, wherein the method comprises the following steps: 1) IVT; 2) APGL 1; 3) PN; 4) TD; 5) MIL; 6) FragTrack; 7) MTT; 8) OSPT; 9) ASLAS; 10) LSAT; 11) an SCM; 12) and (4) DSSM. The following experiments are all based on Matlab 2019a, the computer memory is 8GB, and the CPU is
Figure BDA0002416513070000142
Core TM i5 on a platform.
The experiment of the embodiment of the invention adopts two evaluation criteria: center point error and overlap ratio.
Center point error in image frame refers to tracking bounding box B T The central point of (A) and the corresponding labeled real boundary frame B G Euclidean distance between the center points. And mean center point error refers to the average of the center point error of the tracker over the entire image sequence. The overlap ratio is calculated according to the formula provided by the PASCALVOC challenge, i.e.
Figure BDA0002416513070000141
And the average overlap ratio refers to the average of the overlap ratio of the tracker over the entire image sequence. A good tracker will not only have a small centroid error but also a large overlap ratio.
In the simulation experiment, the parameters were set as follows: the regularization parameter λ is set to 5, the sizes of the templates and the candidate samples are both adjusted to 20 × 20, the number of candidate samples is set to 500 to 600 according to the specific form of the test video image sequence, the positive template update threshold is set to 0.4, and the numbers of positive templates and negative templates are both set to 10. The regular parameter lambda, the number of the positive templates and the number of the negative templates are obtained by a cross validation method.
Table 1 shows the comparison of performance of various methods based on mean center point error, where Average represents the Average of all mean center point errors, and the smaller the mean center point error, the better the tracking performance; wherein Ours is the process of the invention.
TABLE 1 comparison of Performance of various methods based on mean center Point error
Figure BDA0002416513070000151
Table 2 shows a comparison of the performance of various methods based on Average overlap rate, where Average represents the Average of the Average overlap rate, and the larger the Average overlap rate, the better the tracking performance; wherein Ours is the process of the invention.
TABLE 2 comparison of Performance of various methods based on average overlap ratio
Figure BDA0002416513070000152
The comparison of mean center point error and mean overlap ratio are shown in tables 1 and 2, respectively. It follows that the last row of tables 1 and 2 shows the average result of each tracker over all image sequences. Compared with other advanced tracking methods, the method (i.e. the Ours method in the table) achieves competitive results on almost all image sequences, and the method has the best target tracking effect in the average results in the two tables. In summary, compared with the prior art, the target tracking method based on adaptive expression provided by the embodiment of the invention has the characteristics of strong robustness and high accuracy.
The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
The method provided by the invention can generate an effective tracking effect in an intelligent auxiliary driving system of the automobile, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Example 2
As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present invention provides an adaptive representation-based target tracking apparatus, as shown in fig. 3, the apparatus includes:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
Further, the tracking target template building module is specifically configured to:
selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image neg =[t p+1 ,t p+2 ,…,t p+n ]And the negative template satisfies the following conditions,
Figure BDA0002416513070000173
wherein Q n Represents the center of the negative template,
Figure BDA0002416513070000174
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates.
Further, the tracking target sample generation module is specifically configured to:
obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1
State variable z of candidate sample according to motion model t Sampling, i.e. with probability distribution
Figure BDA0002416513070000171
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed.
Further, the adaptive expression calculation module is specifically configured to:
calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):
Figure BDA0002416513070000172
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ]The Diag (x) operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
the above equation (1) is solved using an iterative reweighting algorithm.
Further, the target tracking module is specifically configured to:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template; w is a group of i,j Is the value of the ith row and jth column of the weight matrix WThe time also represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, wherein M is W X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M pos ,M neg };
Calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,p o s Is a matrix M p o s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample;
the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the tracking target template updating module is specifically configured to:
calculating the similarity vector eta ═ eta (eta) 12 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
selecting a specified number of negative templates near the current tracking result as a new negative template set T neg And the negative template satisfies the following conditions,
Figure BDA0002416513070000181
where a and
Figure BDA0002416513070000182
respectively designated inner and outer thresholds, Q n And Q k Representing the center of the negative template and the center of the current trace result, respectively.
Example 3
As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides an adaptive representation-based target tracking terminal, which includes: a processor and a memory, wherein the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the steps of the adaptive expression-based target tracking method of embodiment 1.
Since the terminal is the device in the method in the embodiment of the present invention, and the principle of the device to solve the problem is similar to that of the method, the implementation of the method and the facility may refer to the implementation of the method, and repeated details are not repeated.
Specifically, as shown in fig. 2, the terminal includes: a processor 200 and a memory 201, wherein the memory 201 stores program code that, when executed by the processor 200, causes the processor 200 to perform the steps of:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; since a new image frame needs to be read in the camera or in the video, and the operation of reading the current image frame needs to be performed each time in the method flow, after the target template is updated, the step to be performed is to read the new image frame, and after S17, the step returns to perform S10.
As an optional implementation, the processor 200 is specifically configured to:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame image p o s =[t 1 ,t 2 ,…,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculation i . In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image neg =[t p+1 ,t p+2 ,…,t p+n ]And the negative template satisfies the following conditions,
Figure BDA0002416513070000203
wherein Q n Representing the center of the negative template,
Figure BDA0002416513070000204
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computation j . In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templates p o s ,T neg ]=[t 1 ,t 2 ,…,t p+n ]This matrix includes all templates. The initial template set T contains tracking target information and background information.
As an optional implementation, the processor 200 is specifically configured to:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1 (ii) a State variable z t-1 Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion model t Sampling, i.e. with probability distribution
Figure BDA0002416513070000201
Sampling state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sample t And sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, and the change of the state variable is modeled by using Gaussian distribution without loss of generality. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous frame t-1 The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable z t Candidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
As an optional implementation, the processor 200 is specifically configured to:
in the step S14, the following formula (1) is adopted to calculate the adaptive expression matrix X corresponding to the template set T:
Figure BDA0002416513070000202
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,…,x p+n ]The Diag (x) operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) (ii) a C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
As an optional implementation, the processor 200 is specifically configured to:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template; w i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M p o s ,M neg };
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,p o s Is a matrix M p o s Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the ithA separability score for the candidate samples;
s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
As an optional implementation, the processor 200 is specifically configured to:
s170: calculating the similarity vector eta ═ eta (eta) 12 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
s171: calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T neg And the negative template satisfies the following conditions,
Figure BDA0002416513070000211
where a and
Figure BDA0002416513070000212
respectively designated inner and outer thresholds, Q n And Q k Representing the center of the negative template and the center of the current trace result, respectively.
Example 4
As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the adaptive expression-based target tracking method in embodiment 1.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A target tracking method based on adaptive expression is characterized by comprising the following steps:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is finished or not is checked, and if the current image frame is the last frame of the video, the target tracking task is finished; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10;
the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template, W i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
s151: performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg };
S152: calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,pos Is a matrix M pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample;
s153: and selecting the candidate sample with the highest separability score as the target tracked by the current image frame.
2. The method for tracking an object based on adaptive expression of claim 1, wherein the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates;
s121: in the first frameSelecting a specified number of front templates to construct a front template set T near the position of a tracking target in the image pos =[t 1 ,t 2 ,...,t p ]And the front template meets the following condition, | | Q p -Q 0 ||<σ, wherein Q p Representing the center of the front template, Q 0 Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame image neg =[t p+1 ,t p+2 ,...,t p+n ]And the negative template satisfies the following conditions,
Figure FDA0003695015620000023
wherein Q n Represents the center of the negative template,
Figure FDA0003695015620000024
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templates pos ,T neg ]=[t 1 ,t 2 ,...,t p+n ]This matrix includes all templates.
3. The method for tracking an object based on adaptive expression of claim 1, wherein the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of image t-1
S131: state variable z of candidate sample according to motion model t Sampling, i.e. with probability distribution
Figure FDA0003695015620000021
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
s132: from the sampled state variable z t Selecting candidate samples in the current image frame and constructingA set of candidate samples C.
4. The method for tracking an object based on adaptive expression of claim 1, wherein the adaptive expression matrix X corresponding to the template set T is calculated in step S14 by using the following formula (1):
Figure FDA0003695015620000022
wherein x is i Is corresponding to the template t i The adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X 1 ,x 2 ,...,x p+n ],Diag(x * ) The operator expresses x adaptively i Conversion to diagonal matrix Diag (x) i ) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
5. The method for tracking an object based on adaptive expression of claim 1, wherein the step S17 specifically includes:
s170: calculating the similarity vector eta ═ eta (eta) 12 ,…,η p ) Wherein eta i Similarity between the current tracking result and the ith front template based on Euclidean distance;
s171: calculating the maximum similarity omega as max (eta), comparing the maximum similarity omega with an updating threshold theta, and updating the front template corresponding to the maximum similarity by using the current tracking result if the maximum similarity omega is greater than the updating threshold theta;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set T neg And the negative template satisfies the following conditions,
Figure FDA0003695015620000031
wherein sigmaAnd
Figure FDA0003695015620000032
respectively designated inner and outer thresholds, Q n And Q k Representing the center of the negative template and the center of the current trace result, respectively.
6. The target tracking method based on the adaptive expression is characterized in that the method is suitable for an intelligent auxiliary driving system of an automobile, road information in front is acquired in real time in an image mode through a vehicle-mounted camera, and real-time tracking of a front vehicle target is achieved.
7. An adaptive representation-based target tracking apparatus, comprising:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
the tracking target template updating module is used for updating a positive template and a negative template of the tracking target according to the current tracking result in the tracking process;
the execution process of the target tracking module is as follows:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
W i,j =exp(-||C i -T j || 2 ) (2)
wherein, C i Is the ith candidate sample and T j Is the jth template, W i,j Is the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, i.e. M ═ W ^ X, recording the re-weighted expression matrix as an expression map M, and dividing the expression map M into two parts { M ^ M pos ,M neg };
Calculating a separability score sc for each candidate sample i The separability score is calculated by formula (3),
sc i =∑M i,pos -∑M i,neg (3)
wherein M is i,pos Is a matrix M pos Row i of (1), representing the separable features of the ith candidate sample on the front template set, M i,neg Is a matrix M neg Line i of (1), represents the separable features of the ith candidate sample on the negative template set, sc i Is the separability score of the ith candidate sample;
and selecting the candidate sample with the highest separability score as a target tracked by the current image frame.
8. An adaptive representation-based target tracking terminal, the terminal comprising: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor causes the processor to perform the steps of the method for adaptive expression based object tracking according to any one of claims 1 to 6.
9. A computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the steps of a method for adaptive expression based object tracking according to any of claims 1 to 6.
CN202010192778.2A 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression Active CN111429481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010192778.2A CN111429481B (en) 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010192778.2A CN111429481B (en) 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression

Publications (2)

Publication Number Publication Date
CN111429481A CN111429481A (en) 2020-07-17
CN111429481B true CN111429481B (en) 2022-08-05

Family

ID=71548059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010192778.2A Active CN111429481B (en) 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression

Country Status (1)

Country Link
CN (1) CN111429481B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070805B (en) * 2020-09-10 2021-05-14 深圳市豪恩汽车电子装备股份有限公司 Motor vehicle target real-time image tracking device and method
CN113486820B (en) * 2021-07-09 2023-06-06 厦门理工学院 Bidirectional target tracking method and system based on efficient template updating and selecting mechanism
CN116108091B (en) * 2022-12-26 2024-01-23 小米汽车科技有限公司 Data processing method, event tracking analysis method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570884A (en) * 2016-09-30 2017-04-19 天津大学 On-line dictionary updating model-based object tracking method
CN109492671A (en) * 2018-10-16 2019-03-19 北京理工大学 A kind of multipolarization High Range Resolution target identification method based on LSTM

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544483B (en) * 2013-10-25 2016-09-14 合肥工业大学 A kind of joint objective method for tracing based on local rarefaction representation and system thereof
CN103810252B (en) * 2014-01-21 2017-02-08 南京信息工程大学 Image retrieval method based on group sparse feature selection
CN104239902B (en) * 2014-09-12 2018-04-24 西安电子科技大学 Hyperspectral image classification method based on non local similitude and sparse coding
CN107203747B (en) * 2017-05-17 2021-01-08 江南大学 Sparse combined model target tracking method based on self-adaptive selection mechanism
US20190158270A1 (en) * 2017-11-21 2019-05-23 International Business Machines Corporation Exchanging Asset, Maintenance, And Spares Parts Information Via Blockchain
CN107784664B (en) * 2017-12-05 2021-07-27 韶关学院 K-sparse-based rapid robust target tracking method
CN108492312B (en) * 2018-02-26 2021-06-29 大连大学 Visual tracking method based on reverse sparse representation under illumination change
CN109003291A (en) * 2018-07-11 2018-12-14 上饶师范学院 Method for tracking target and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570884A (en) * 2016-09-30 2017-04-19 天津大学 On-line dictionary updating model-based object tracking method
CN109492671A (en) * 2018-10-16 2019-03-19 北京理工大学 A kind of multipolarization High Range Resolution target identification method based on LSTM

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于特征选择与时间一致性稀疏外观模型的目标追踪算法;张伟东等;《模式识别与人工智能》;20180315(第03期);1-6 *
杨红红等.基于稀疏约束深度学习的交通目标跟踪.《中国公路学报》.2016,(第06期), *

Also Published As

Publication number Publication date
CN111429481A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
Deng et al. Joint hand detection and rotation estimation using CNN
CN112926410B (en) Target tracking method, device, storage medium and intelligent video system
CN111795704A (en) Method and device for constructing visual point cloud map
CN111429481B (en) Target tracking method, device and terminal based on adaptive expression
CN111476302A (en) fast-RCNN target object detection method based on deep reinforcement learning
CN110032925B (en) Gesture image segmentation and recognition method based on improved capsule network and algorithm
CN110427799A (en) Based on the manpower depth image data Enhancement Method for generating confrontation network
CN107067410B (en) Manifold regularization related filtering target tracking method based on augmented samples
Zhang et al. Sparse learning-based correlation filter for robust tracking
Yang et al. Visual tracking with long-short term based correlation filter
CN109784155B (en) Visual target tracking method based on verification and error correction mechanism and intelligent robot
KR20240144139A (en) Facial pose estimation method, apparatus, electronic device and storage medium
CN106485750A (en) A kind of estimation method of human posture based on supervision Local Subspace
Huang et al. Tracking-by-detection of 3d human shapes: from surfaces to volumes
CN112489088A (en) Twin network visual tracking method based on memory unit
CN114118303B (en) Face key point detection method and device based on prior constraint
EP1801731B1 (en) Adaptive scene dependent filters in online learning environments
CN117854155B (en) Human skeleton action recognition method and system
WO2008046963A1 (en) Object tracking in computer vision
CN117894065A (en) Multi-person scene behavior recognition method based on skeleton key points
CN117333937A (en) Human body posture estimation method and device based on classification and distillation and electronic equipment
CN112614158B (en) Sampling frame self-adaptive multi-feature fusion online target tracking method
CN116758212A (en) 3D reconstruction method, device, equipment and medium based on self-adaptive denoising algorithm
CN111914751B (en) Image crowd density identification detection method and system
CN112507940A (en) Skeleton action recognition method based on difference guidance representation learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240404

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Country or region after: China

Address before: 400000 No.2 Chongwen Road, Nanshan street, Nan'an District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Country or region before: China