CN111429481A - Target tracking method, device and terminal based on adaptive expression - Google Patents

Target tracking method, device and terminal based on adaptive expression Download PDF

Info

Publication number
CN111429481A
CN111429481A CN202010192778.2A CN202010192778A CN111429481A CN 111429481 A CN111429481 A CN 111429481A CN 202010192778 A CN202010192778 A CN 202010192778A CN 111429481 A CN111429481 A CN 111429481A
Authority
CN
China
Prior art keywords
tracking
template
target
expression
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010192778.2A
Other languages
Chinese (zh)
Other versions
CN111429481B (en
Inventor
米建勋
李人杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010192778.2A priority Critical patent/CN111429481B/en
Publication of CN111429481A publication Critical patent/CN111429481A/en
Application granted granted Critical
Publication of CN111429481B publication Critical patent/CN111429481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method, a target tracking device and a target tracking terminal based on self-adaptive expression, and discloses a target tracking method and a target tracking device based on self-adaptive expression, which are used for tracking a specified target in a video and realizing accurate tracking of the video target in a motion state. The method comprises the following steps: acquiring a current video frame; checking whether the template exists; initializing a tracking target template; generating a candidate sample; calculating an adaptive expression; tracking by expression; checking whether the tracking task is completed; and updating the tracking target template. Compared with the prior art, the target tracking method based on the automated expression uses the adaptive expression model, namely sparse expression can be adaptively changed according to the structural relationship among candidate samples, and a negative template is introduced to improve the model discrimination, so that the target tracking method based on the automated expression provided by the invention has strong robustness and high accuracy; the method is particularly suitable for generating effective accurate tracking in an intelligent auxiliary driving system of an automobile.

Description

Target tracking method, device and terminal based on adaptive expression
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a target tracking method, a target tracking device and a target tracking terminal based on adaptive expression.
Background
With the development of artificial intelligence technology, visual tracking plays an increasingly important role, and application scenes such as video monitoring, human-computer interaction, behavior recognition, action analysis and the like are wide and are applied to the visual tracking technology. Although researchers today propose many different tracking algorithms, the tracking problem still faces many challenges that have not been fully solved, such as fast motion of the target, occlusion, changes in the target's dimensions, pose and illumination, which all require further research and learning.
At present, sparse representation algorithms are successfully applied to the tracking problem. Compared with other tracking methods, the sparse tracking method can well process the situation when the tracked target is shielded, and has higher robustness. The traditional sparse tracking method takes templates which can be dynamically updated in the tracking process as dictionaries, and sparsely expresses candidate samples on the dictionaries. After learning the sparse representation of these candidate samples, the candidate sample with the smallest reconstruction error is selected as the tracked target.
However, the current sparse tracking method still has certain problems, and the current sparse tracking method ignores the relation between candidate samples. Because of the strategy of densely sampling the candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The existing sparse tracking method only explores and utilizes the relationship information between templates and ignores the relationship between candidate samples.
Disclosure of Invention
The present invention provides a target tracking method, apparatus and terminal based on adaptive expression to solve the above problems, and the present invention explores and utilizes the structural relationship between candidate samples for tracking the specified target in the video, thereby implementing accurate tracking of the video target in motion state.
The invention is realized by the following technical scheme:
in a first aspect, the present invention provides a target tracking method based on adaptive expression, including the following steps:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: the target template is updated using the result tracked in the current image frame, and execution returns to S10.
The working principle is as follows: based on video moving target tracking, the sparse tracking method in the prior art has certain problems, only the relation information between templates is explored and utilized, and the relation between candidate samples is ignored; this is because with the strategy of dense sampling of candidate samples, some of the candidate samples may be very similar, with a low rank property between the candidate samples. The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
The method provided by the invention is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Further, the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame imagepos=[t1,t2,…,tp]And front face templateSatisfies the condition of | | Qp-Q0||<σ, wherein QpRepresenting the center of the front template, Q0Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculationi. In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame imageneg=[tp+1,tp+2,…,tp+n]And the negative template satisfies the following conditions,
Figure BDA0002416513070000032
wherein QnRepresents the center of the negative template,
Figure BDA0002416513070000033
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computationj. In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templatespos,Tneg]=[t1,t2,…,tp+n]This matrix includes all templates. The initial template set T contains tracking target information and background information.
Further, the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of imaget-1(ii) a Form ofVariable z of statet-1Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion modeltSampling, i.e. with probability distribution
Figure BDA0002416513070000031
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sampletAnd sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous framet-1The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable ztCandidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
Further, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:
Figure BDA0002416513070000041
wherein x isiIs corresponding to the template tiThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X1,x2,…,xp+n],Diag(x*) The operator expresses x adaptivelyiConversion to diagonal matrix Diag (x)i) And C is a set of video candidate samples,lambda is the regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
Further, the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
Wi,j=exp(-||Ci-Tj||2) (2)
wherein, CiIs the ith candidate sample and TjIs the jth template, Wi,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
s151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { Mpos,Mneg};
S152: calculating a separability score sc for each candidate sampleiThe separability score is calculated by formula (3),
sci=∑Mi,pos-∑Mi,neg(3)
wherein M isi,posIs a matrix MposRow i of (1), representing the separable features of the ith candidate sample on the front template set, Mi,negIs a matrix MnegLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sciIs the separability score of the ith candidate sample;
s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the step S17 specifically includes:
s170, calculating similarity vector η ═ (η)12,…,ηp) Wherein ηiIs the similarity between the current tracking result and the ith front template based on Euclidean distance;
S171, calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is larger than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set TnegAnd the negative template satisfies the following conditions,
Figure BDA0002416513070000051
where a and
Figure BDA0002416513070000052
respectively designated inner and outer thresholds, QnAnd QkRepresenting the center of the negative template and the center of the current trace result, respectively.
Furthermore, the method is suitable for an intelligent auxiliary driving system of an automobile, and road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that real-time tracking of a front vehicle target is realized.
In a second aspect, the present invention provides an adaptive representation-based target tracking apparatus, including:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
Further, the tracking target template building module is specifically configured to:
selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame imagepos=[t1,t2,…,tp]And the front template meets the following condition, | | Qp-Q0||<σ, wherein QpRepresenting the center of the front template, Q0Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame imageneg=[tp+1,tp+2,…,tp+n]And the negative template satisfies the following conditions,
Figure BDA0002416513070000053
wherein QnRepresents the center of the negative template,
Figure BDA0002416513070000054
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templatespos,Tneg]=[t1,t2,…,tp+n]This matrix includes all templates.
Further, the tracking target sample generation module is specifically configured to:
obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of imaget-1
Pair of candidates based on motion modelState variable z of the sampletSampling, i.e. with probability distribution
Figure BDA0002416513070000061
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
from the sampled state variable ztCandidate samples are selected in the current image frame, and a candidate sample set C is constructed.
Further, the adaptive expression calculation module is specifically configured to:
calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):
Figure BDA0002416513070000062
wherein x isiIs corresponding to the template tiThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X1,x2,…,xp+n],Diag(x*) The operator expresses x adaptivelyiConversion to diagonal matrix Diag (x)i) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
the above equation (1) is solved using an iterative reweighting algorithm.
Further, the target tracking module is specifically configured to:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
Wi,j=exp(-||Ci-Tj||2) (2)
wherein, CiIs the ith candidate sample and TjIs the jth template, Wi,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and expressing the expression graph MGraph M is divided into two parts { M }pos,Mneg};
Calculating a separability score sc for each candidate sampleiThe separability score is calculated by formula (3),
sci=∑Mi,pos-∑Mi,neg(3)
wherein M isi,posIs a matrix MposRow i of (1), representing the separable features of the ith candidate sample on the front template set, Mi,negIs a matrix MnegLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sciIs the separability score of the ith candidate sample;
the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the tracking target template updating module is specifically configured to:
calculate similarity vector η ═ (η)12,…,ηp) Wherein ηiSimilarity between the current tracking result and the ith front template based on Euclidean distance;
calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is greater than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;
selecting a specified number of negative templates near the current tracking result as a new negative template set TnegAnd the negative template satisfies the following conditions,
Figure BDA0002416513070000071
where a and
Figure BDA0002416513070000072
respectively designated inner and outer thresholds, QnAnd QkRepresenting the center of the negative template and the center of the current trace result, respectively.
In a third aspect, the present invention provides an adaptive representation-based target tracking terminal, including: a processor and a memory, wherein the memory has stored therein program code, which when executed by the processor, causes the processor to perform the steps of the above-described adaptive expression-based object tracking method.
In a fourth aspect, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the above-described adaptive expression-based object tracking method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the invention, the candidate samples are combined with the template, the candidate sample matrix is used as a dictionary, and the template is used as a regression target, so that the calculation burden of the algorithm can be effectively reduced; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
2. The method is particularly suitable for an intelligent auxiliary driving system of an automobile and can generate an effective tracking effect, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flowchart of a target tracking method based on adaptive expression according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a target tracking terminal based on adaptive expression according to an embodiment of the present invention;
fig. 3 is a schematic diagram of another adaptive expression-based target tracking apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 to 3, fig. 1 is a flow chart of a target tracking method based on adaptive expression, the target tracking method based on adaptive expression of the present invention is applied to an intelligent auxiliary driving system of an automobile for tracking, and road information in front is acquired in real time in an image manner by a vehicle-mounted camera, so that a front vehicle target is accurately tracked in real time; the method comprises the following steps:
s10: acquiring a current video frame;
the invention can acquire the current image frame through the video stream or the camera.
S11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracked vehicle target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracked vehicle target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;
the positive template in the invention is used for modeling information contained in a tracked vehicle target, and the negative template in the invention is used for modeling background information. By using the positive template and the negative template, when the situation that the tracked vehicle target is mixed with the background occurs, the target tracking method based on the self-adaptive expression can also identify the target, and the tracking of the vehicle target is realized.
Specifically, the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame imagepos=[t1,t2,…,tp]And the front template meets the following condition, | | Qp-Q0||<σ, wherein QpRepresenting the center of the front template, Q0Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculationi. In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame imageneg=[tp+1,tp+2,…,tp+n]And the negative template satisfies the following conditions,
Figure BDA0002416513070000092
wherein QnRepresents the center of the negative template,
Figure BDA0002416513070000093
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computationj. In order to make the negative template model the background information, a tracking target is neededNegative templates are selected in the heart ring area, so that the negative templates are greatly different from the tracking target, and background information is naturally introduced.
Constructing a template set T ═ T composed of positive templates and negative templatespos,Tneg]=[t1,t2,…,tp+n]This matrix includes all templates. The initial template set T contains tracking target information and background information.
S13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance; since the motion of the tracking target in the video usually has continuity, the position of the tracking target in the current frame has a great relationship with the position of the tracking target in the previous frame. To simplify the model, the present invention models the motion model using a Gaussian distribution.
Specifically, the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of imaget-1(ii) a State variable zt-1Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion modeltSampling, i.e. with probability distribution
Figure BDA0002416513070000091
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sampletAnd sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the change of the state variable is modeled by using Gaussian distribution without loss of generality. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous framet-1The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable ztCandidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
S14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix; wherein the values of the adaptive expression matrix elements represent the relationship between the template and the corresponding candidate samples. For example, Xi,jThe relationship between the jth template and the ith candidate sample is shown. In general, the larger the matrix element value, the stronger the relationship between a sample and the corresponding candidate sample.
Specifically, in step S14, the following formula (1) is used to calculate the adaptive expression matrix X corresponding to the template set T:
Figure BDA0002416513070000101
wherein x isiIs corresponding to the template tiThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X1,x2,…,xp+n],Diag(x*) The operator expresses x adaptivelyiConversion to diagonal matrix Diag (x)i) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
when the candidate samples are each different, which means that the candidate sample matrix C is column-full-rank, the regularization is decomposed into
Figure BDA0002416513070000102
When there is a high correlation between the candidate samples, this means that the rank of the sample matrix C is 1, and CTC=11T(1 is a vector with element sizes of one), the regularization becomes
Figure BDA0002416513070000103
The candidate sample matrix C is not full rank, as studies have shown, it typically has a rank number greater than 1. In summary, the regularization term is/1Norm and l2A balance between norms, i.e. | | xi||2<||CDiag(xi)||*<||xi||1And the dynamic adjustment can be carried out according to the internal structure of the matrix sample C, thereby obtaining a sparse expression x adaptive to the structure relation of the candidate samplei
In step S14, the above equation (1) is solved using an iterative reweighting algorithm. For a clearer explanation of the solution, the solution idea of equation (1) is explained in this embodiment:
known formula (4):
Figure BDA0002416513070000104
wherein X represents a matrix in a general sense, | | X | | Y*Representing the nuclear norm of a matrix X, XTRepresenting the transpose of matrix X, (XX)T)-1/4Representative matrix XXTThe root of the fourth square of the root,
Figure BDA0002416513070000105
representative matrix (XX)T)-1/4The square of the frobenius norm of X;
first, equation (4) is used, and the jth column of the expression matrix X is set to X.jConverting equation (1) to equation (5):
Figure BDA0002416513070000106
wherein X is an expression matrix, T is a template set, C is a video candidate sample set, and λ is a regular term coefficient,p is the number of positive templates, n is the number of negative templates, X.jRepresents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X).*) The operator expresses X adaptively.jConversion to diagonal matrix Diag (X).j);
Initializing the weight matrix Wj
Figure BDA0002416513070000111
Where C is the set of video candidate samples, X.jRepresents the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Diag (X).*) The operator expresses X adaptively.jConversion to diagonal matrix Diag (X).j);
A simple form of equation (5) can then be obtained:
Figure BDA0002416513070000112
wherein X is an expression matrix, T is a template set, C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, n is the number of negative templates, X is the number of positive templates.jIs the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template,
Figure BDA0002416513070000113
is corresponding to the adaptive expression X.jWeight matrix of (2), Diag (X).*) The operator expresses X adaptively.jConversion to diagonal matrix Diag (X).j);
It is obvious that equation (7) can be written as a sum equation:
m=height(X)
Figure BDA0002416513070000114
Figure BDA0002416513070000115
Figure BDA0002416513070000116
wherein X is an expression matrix, T is a template set, T.jIs the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient, p is the number of positive templates, n is the number of negative templates, m is the number of rows of the matrix X, X is the number of the positive templates.jIs the jth column of the expression matrix X, corresponding to the adaptive expression of the jth template, Xi,jRepresents the value of the ith row and the jth column of the matrix X,
Figure BDA0002416513070000117
is corresponding to the adaptive expression X.jThe weight matrix of (a) is determined,
Figure BDA0002416513070000118
representing a weight matrix
Figure BDA0002416513070000119
Column i of (1), operator expresses X adaptively.jConversion to diagonal matrix Diag (X).j);
The formula (9) is derived, and the analytical solution of the formula (9) can be obtained if the derivative is zero, wherein the derivative of the formula (9) is as follows:
Figure BDA0002416513070000121
Figure BDA0002416513070000122
wherein T is a template set, T.jIs the jth column of the template set matrix T, C is the video candidate sample set, λ is the regular term coefficient,
Figure BDA0002416513070000123
is an expression matrix
Figure BDA00024165130700001220
Corresponding to the jth templateIs expressed in a self-adaptive manner,
Figure BDA0002416513070000124
is corresponding to an adaptive expression
Figure BDA0002416513070000125
The weight matrix of (a) is determined,
Figure BDA0002416513070000126
representing a weight matrix
Figure BDA0002416513070000127
The (c) th column of (a),
Figure BDA0002416513070000128
can be made correspondingly
Figure BDA0002416513070000129
Calculating to obtain;
and finally solving the formula (1) by using an iterative reweighting method. The solution process repeats the following iterative process until the results converge:
1) given a
Figure BDA00024165130700001210
Each column of matrix X is updated by:
Figure BDA00024165130700001211
wherein T is a template set, T.jIs the jth column of the template set matrix T, C is the set of video candidate samples,
Figure BDA00024165130700001212
is an expression matrix
Figure BDA00024165130700001221
Corresponding to the adaptive representation of the jth template,
Figure BDA00024165130700001213
can be made correspondingly
Figure BDA00024165130700001214
Calculating to obtain;
2) given a
Figure BDA00024165130700001222
Updating each weight matrix by
Figure BDA00024165130700001215
Figure BDA00024165130700001216
Where C is the set of video candidate samples,
Figure BDA00024165130700001217
is an expression matrix
Figure BDA00024165130700001223
Corresponding to the adaptive expression of the jth template, Diag (X).*) Operators express themselves adaptively
Figure BDA00024165130700001218
Conversion into diagonal matrices
Figure BDA00024165130700001219
S15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target; the weight matrix W represents the similarity relation between the template and the candidate sample, and the tracking result can be more robust and accurate by introducing the weight matrix W.
Specifically, the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
Wi,j=exp(-||Ci-Tj||2) (2)
wherein, CiIs the ith candidate sample and TjIs the jth template; wi,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template; the elements of the weight matrix reflect the similarity relation between the candidate sample and the template, and the introduction of the weight matrix reduces the tracking error.
S151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { Mpos,Mneg}; just as the template set is divided into a positive template set and a negative template set according to the labels of the templates themselves, the expression map M is also divided into a positive expression map M according to the corresponding template label of each columnposAnd negative expression map Mneg
S152: calculating a separability score sc for each candidate sampleiThe separability score is calculated by formula (3),
sci=∑Mi,pos-∑Mi,neg(3)
wherein M isi,posIs a matrix MposRow i of (1), representing the separable features of the ith candidate sample on the front template set, Mi,negIs a matrix MnegLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sciIs the separability score of the ith candidate sample; a larger positive eigenvalue means that the candidate sample is more similar to the positive template, whereas a larger negative eigenvalue means that the candidate sample is more similar to the negative template.
S153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame. If a candidate sample has a larger positive characteristic value and a smaller negative characteristic value, the candidate sample is more similar to the positive template and less similar to the negative template, and is more likely to be a tracked target, so that the candidate target with the highest identification score is selected as the tracked target.
S16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; because a new image frame needs to be read in the camera or in the video, the operation of reading the current image frame needs to be executed once each time the method flow is executed, after the target template is updated, the step to be executed is to read the new image frame, and after S17, the step returns to execute S10; in an actual scene, the scale, posture and the like of a tracking target can change frequently, and if the tracking task is not stopped, the target template needs to be updated. The updated target template can contain the latest information of the tracking target, so that the following tracking task is more accurate.
Specifically, the step S17 specifically includes:
s170, calculating similarity vector η ═ (η)12,…,ηp) Wherein ηiSimilarity between the current tracking result and the ith front template based on Euclidean distance; if the tracked target is shielded, which indicates that the tracked target is greatly different from the front template, the template cannot be updated by the currently tracked target. In order to determine whether to update the front template, the similarity between the current tracking result and the front template is calculated.
S171, calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, if the maximum similarity omega is larger than the updating threshold theta, using the current tracking result to update the front template corresponding to the maximum similarity, if the tracked target is shielded, the maximum similarity is smaller than the updating threshold, avoiding updating the template by the shielded target, so that the accuracy is reduced, if the tracked target is not shielded, the maximum similarity is larger than the updating threshold, using the tracked result to update the front template, enabling the front template to contain the latest target information, and improving the accuracy of the next tracking task.
S172: selecting a specified number of negative templates near the current tracking result as a new negative template set TnegAnd the negative template satisfies the following conditions,
Figure BDA0002416513070000143
where a and
Figure BDA0002416513070000144
respectively designated inner and outer thresholds, QnAnd QkRepresenting the center of the negative template and the center of the current trace result, respectively. In an actual scene, the background in the video often changes rapidly, so that the template needs to be updated by the background of the current frame, the negative template can contain the latest background information, and the accuracy of the following tracking task is improved.
In order to verify the method provided by the invention, 8 challenging image sequence evaluation method performances are selected in experimental simulation, and 12 advanced tracker methods are compared, wherein the method comprises 1) IVT, 2) APG L1, 3) PN, 4) TD, 5) MI L, 6) FragTrack, 7) MTT, 8) OSPT, 9) AS L AS, 10) L SAT, 11) SCM and 12) DSSM, and the following experiments are all based on Matlab 2019a, the computer memory is 8GB, and the CPU is 8GB
Figure BDA0002416513070000142
CoreTMi5 on a platform.
The experiment of the embodiment of the invention adopts two evaluation criteria: center point error and overlap ratio.
Center point error in image frame refers to tracking bounding box BTAnd the central point of (3) corresponds toLabeling the real bounding Box BGThe overlap ratio is calculated according to the formula provided by the PASCA L VOC challenge, i.e., the ratio of the overlap ratio to the mean center point error over the entire image sequence
Figure BDA0002416513070000141
And the average overlap ratio refers to the average of the overlap ratio of the tracker over the entire image sequence. A good tracker will not only have a small centroid error but also a large overlap ratio.
In the simulation experiment, the parameters were set as follows: the regularization parameter λ is set to 5, the sizes of the templates and the candidate samples are both adjusted to 20 × 20, the number of candidate samples is set to 500 to 600 according to the specific form of the test video image sequence, the positive template update threshold is set to 0.4, and the numbers of positive templates and negative templates are both set to 10. The regular parameter lambda, the number of the positive templates and the number of the negative templates are obtained by a cross validation method.
Table 1 shows the comparison of performance of various methods based on mean center point error, where Average represents the Average of all mean center point errors, and the smaller the mean center point error, the better the tracking performance; wherein Ours is the process of the invention.
TABLE 1 comparison of Performance of various methods based on mean center Point error
Figure BDA0002416513070000151
Table 2 shows a comparison of the performance of various methods based on Average overlap rate, where Average represents the Average of the Average overlap rate, and the larger the Average overlap rate, the better the tracking performance; wherein Ours is the process of the invention.
TABLE 2 comparison of Performance of various methods based on average overlap ratio
Figure BDA0002416513070000152
The comparison of mean center point error and mean overlap ratio are shown in tables 1 and 2, respectively. It follows that the last row of tables 1 and 2 shows the average result of each tracker over all image sequences. Compared with other advanced tracking methods, the method (i.e. the Ours method in the table) achieves competitive results on almost all image sequences, and the method has the best target tracking effect in the average results in the two tables. In summary, compared with the prior art, the target tracking method based on adaptive expression provided by the embodiment of the invention has the characteristics of strong robustness and high accuracy.
The invention adopts the scheme to combine the candidate sample with the template, uses the candidate sample matrix as a dictionary and the template as a regression target, thereby effectively reducing the calculation burden of the algorithm; self-adaptive expression is introduced, the low-rank structure of a candidate sample matrix can be explored through the self-adaptive expression, and the structural relation among candidate samples is effectively utilized; and negative templates are used as expression targets, so that background information is effectively utilized. Compared with the prior art, the method can explore and utilize the relation between the candidate samples, and has the characteristics of high accuracy and strong robustness.
The method provided by the invention can generate an effective tracking effect in an intelligent auxiliary driving system of the automobile, and the road information in front is acquired in real time in an image mode through the vehicle-mounted camera, so that the real-time accurate tracking of the front vehicle target is realized.
Example 2
As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present invention provides an adaptive representation-based target tracking apparatus, as shown in fig. 3, the apparatus includes:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
Further, the tracking target template building module is specifically configured to:
selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame imagepos=[t1,t2,…,tp]And the front template meets the following condition, | | Qp-Q0||<σ, wherein QpRepresenting the center of the front template, Q0Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame imageneg=[tp+1,tp+2,…,tp+n]And the negative template satisfies the following conditions,
Figure BDA0002416513070000173
wherein QnRepresents the center of the negative template,
Figure BDA0002416513070000174
is a specified outer threshold;
constructing a form panel consisting of a positive form panel and a negative form panelSet T ═ Tpos,Tneg]=[t1,t2,…,tp+n]This matrix includes all templates.
Further, the tracking target sample generation module is specifically configured to:
obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of imaget-1
State variable z of candidate sample according to motion modeltSampling, i.e. with probability distribution
Figure BDA0002416513070000171
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
from the sampled state variable ztCandidate samples are selected in the current image frame, and a candidate sample set C is constructed.
Further, the adaptive expression calculation module is specifically configured to:
calculating an adaptive expression matrix X corresponding to the template set T by adopting the following formula (1):
Figure BDA0002416513070000172
wherein x isiIs corresponding to the template tiThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X1,x2,…,xp+n]The Diag (x) operator expresses x adaptivelyiConversion to diagonal matrix Diag (x)i) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
the above equation (1) is solved using an iterative reweighting algorithm.
Further, the target tracking module is specifically configured to:
constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
Wi,j=exp(-||Ci-Tj||2) (2)
wherein, CiIs the ith candidate sample and TjIs the jth template; wi,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { M }pos,Mneg};
Calculating a separability score sc for each candidate sampleiThe separability score is calculated by formula (3),
sci=∑Mi,pos-∑Mi,neg(3)
wherein M isi,posIs a matrix MposRow i of (1), representing the separable features of the ith candidate sample on the front template set, Mi,negIs a matrix MnegLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sciIs the separability score of the ith candidate sample;
the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
Further, the tracking target template updating module is specifically configured to:
calculate similarity vector η ═ (η)12,…,ηp) Wherein ηiSimilarity between the current tracking result and the ith front template based on Euclidean distance;
calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is greater than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;
selecting a specified number of negative templates near the current tracking result as a new negative template set TnegAnd the negative template satisfies the following conditions,
Figure BDA0002416513070000181
where a and
Figure BDA0002416513070000182
respectively designated inner and outer thresholds, QnAnd QkRepresenting the center of the negative template and the center of the current trace result, respectively.
Example 3
As shown in fig. 1 to fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides an adaptive representation-based target tracking terminal, which includes: a processor and a memory, wherein the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the steps of the adaptive expression-based target tracking method of embodiment 1.
Since the terminal is the device in the method in the embodiment of the present invention, and the principle of the device to solve the problem is similar to that of the method, the implementation of the method and the facility may refer to the implementation of the method, and repeated details are not repeated.
Specifically, as shown in fig. 2, the terminal includes: a processor 200 and a memory 201, wherein the memory 201 stores program code that, when executed by the processor 200, causes the processor 200 to perform the steps of:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance; the positive template in the invention is used for modeling the information contained in the tracking target, and the negative template in the invention is used for modeling the background information; by using the positive template and the negative template, when the tracked target is mixed with the background, the target can be identified by the target tracking method based on the adaptive expression, so that the target can be tracked;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: updating the target template using the result tracked in the current image frame and returning to execute S10; since a new image frame needs to be read in the camera or in the video, and the operation of reading the current image frame needs to be performed each time in the method flow, after the target template is updated, the step to be performed is to read the new image frame, and after S17, the step returns to perform S10.
As an optional implementation, the processor 200 is specifically configured to:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates; in a video tracking task, the initial position of a tracking target is generally marked manually according to the specific properties of the task, or the initial position of the tracking target is directly set.
S121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame imagepos=[t1,t2,…,tp]And the front template meets the following condition, | | Qp-Q0||<σ, wherein QpRepresenting the center of the front template, Q0Represents the center of the initial target, σ is a specified inner threshold; since the front template is an image block, the ith front template is converted into the form t of a vector for calculationi. In order to enable the front template to model information contained in the tracked target, the front template needs to be selected near the tracked target, so that the front template is similar to the tracked target, the information of the tracked target is reserved, the front template is slightly different from the tracked target, and the robustness of the method is improved.
Selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame imageneg=[tp+1,tp+2,…,tp+n]And the negative template satisfies the following conditions,
Figure BDA0002416513070000203
wherein QnRepresents the center of the negative template,
Figure BDA0002416513070000204
is a specified outer threshold; since the negative template is an image block, the jth negative template is converted into the form t of a vector for ease of computationj. In order to make the negative template model the background information, the negative template is selected in a circular area with the tracking target as the center, so that the negative template has a larger difference with the tracking target and naturally introduces the background information.
Constructing a template set T ═ T composed of positive templates and negative templatespos,Tneg]=[t1,t2,…,tp+n]This matrix includes all templates. The initial template set T contains tracking target information and background information.
As an optional implementation, the processor 200 is specifically configured to:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of imaget-1(ii) a State variable zt-1Contains six mapping parameters: x-coordinate, y-coordinate, rotation angle, size, aspect ratio, tilt direction, and these six mapping parameters may locate the target position in the video image.
S131: state variable z of candidate sample according to motion modeltSampling, i.e. with probability distribution
Figure BDA0002416513070000201
Sampling the state variables, wherein Σ is a diagonal covariance matrix; it is difficult to sample the candidate sample directly in the current frame image, so an indirect way is selected, namely the state variable z of the candidate sampletAnd sampling is carried out, and candidate samples are sampled in the image. The motion change of the tracking target is simplified into the change of the state variable corresponding to the tracking target, the generality is not lost, and the change of the state variable is modeled by using Gaussian distribution. The expectation of the Gaussian distribution is the state variable z corresponding to the target of the previous framet-1The diagonal covariance matrix is determined by the specific properties of the tracking data, and the diagonal covariance matrix is different for different tracking data.
S132: from the sampled state variable ztCandidate samples are selected in the current image frame, and a candidate sample set C is constructed. The sampled state variables are used to determine the positions of the candidate samples, and the candidate samples are converted into a vector form, which then constitutes a candidate sample matrix C.
As an optional implementation, the processor 200 is specifically configured to:
in the step S14, the following formula (1) is adopted to calculate the adaptive expression matrix X corresponding to the template set T:
Figure BDA0002416513070000202
wherein x isiIs corresponding to the template tiThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X1,x2,…,xp+n]The Diag (x) operator expresses x adaptivelyiConversion to diagonal matrix Diag (x)i) (ii) a C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
As an optional implementation, the processor 200 is specifically configured to:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
Wi,j=exp(-||Ci-Tj||2) (2)
wherein, CiIs the ith candidate sample and TjIs the jth template; wi,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
s151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { Mpos,Mneg};
S152: calculating a separability score sc for each candidate sampleiThe separability score is calculated by formula (3),
sci=∑Mi,pos-∑Mi,neg(3)
wherein M isi,posIs a matrix MposRow i of (1), representing the separable features of the ith candidate sample on the front template set, Mi,negIs momentMatrix MnegLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sciIs the separability score of the ith candidate sample;
s153: the higher the identification score is, the more similar the candidate sample is to the tracking target, so the candidate sample with the highest separability score is selected as the target tracked by the current image frame.
As an optional implementation, the processor 200 is specifically configured to:
s170, calculating similarity vector η ═ (η)12,…,ηp) Wherein ηiSimilarity between the current tracking result and the ith front template based on Euclidean distance;
s171, calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is larger than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;
s172: selecting a specified number of negative templates near the current tracking result as a new negative template set TnegAnd the negative template satisfies the following conditions,
Figure BDA0002416513070000211
where a and
Figure BDA0002416513070000212
respectively designated inner and outer thresholds, QnAnd QkRepresenting the center of the negative template and the center of the current trace result, respectively.
Example 4
As shown in fig. 1 to 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the adaptive expression-based target tracking method in embodiment 1.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A target tracking method based on adaptive expression is characterized by comprising the following steps:
s10: acquiring a current video frame;
s11: checking whether the template exists, if not, executing S12 to initialize and track the target template; if the template already exists, executing S13 to generate a candidate sample in the current image frame;
s12: initializing a tracking target template: determining the position of a tracking target in a first frame image of a video through setting or manual designation, selecting a positive template and a negative template near the tracking target, and constructing a template set consisting of the positive template and the negative template, wherein each template is converted into a vector form in advance;
s13: generating a candidate sample: selecting candidate samples in the current image frame according to the target position tracked in the previous image frame, and constructing a candidate sample set consisting of the candidate samples, wherein each candidate sample is converted into a vector form in advance;
s14: calculating an adaptive expression: taking a matrix formed by the candidate sample set obtained in the step S13 as a dictionary, taking a template as a target of sparse expression, and using a trace lasso as a regular term to calculate to obtain a matrix formed by self-adaptive expression, namely a self-adaptive expression matrix;
s15: tracking by expression: constructing a weight matrix, carrying out point multiplication on the weight matrix and the self-adaptive expression matrix obtained in the step S14 to obtain a re-weighted expression matrix, wherein each element in the re-weighted expression matrix represents separable characteristics of corresponding candidate samples, and calculating the identification score of each candidate sample through the separable characteristics, wherein the candidate sample with the highest identification score is the tracked target;
s16: whether the tracking task is completed or not is checked, and if the current image frame is the last frame of the video, the target tracking task is completed; if the current image frame is not the last frame of the video, executing S17 to update the tracking target template, and continuing the target tracking task;
s17: updating a tracking target template: the target template is updated using the result tracked in the current image frame, and execution returns to S10.
2. The method for tracking an object based on adaptive expression of claim 1, wherein the step S12 specifically includes:
s120: determining an initial position of a tracking target in a first frame image frame of a video through set or manually designated coordinates;
s121: selecting a specified number of front templates to construct a front template set T near the position of a tracking target in a first frame imagepos=[t1,t2,...,tp]And the front template meets the following condition, | | Qp-Q0||<σ, wherein QpRepresenting the center of the front template, Q0Represents the center of the initial target, σ is a specified inner threshold;
selecting a specified number of negative templates to form a negative template set T near the position of the tracking target in the first frame imageneg=[tp+1,tp+2,...,tp+n]And the negative template satisfies the following conditions,
Figure FDA0002416513060000011
wherein QnRepresents the center of the negative template,
Figure FDA0002416513060000012
is a specified outer threshold;
constructing a template set T ═ T composed of positive templates and negative templatespos,Tneg]=[t1,t2,...,tp+n]This matrix includes all templates.
3. The method for tracking an object based on adaptive expression of claim 1, wherein the step S13 specifically includes:
s130: obtaining a state variable z corresponding to the target of the previous frame according to the target position tracked in the previous frame of imaget-1
S131: state variable z of candidate sample according to motion modeltSampling, i.e. with probability distribution
Figure FDA0002416513060000021
Sampling the state variables, wherein Σ is a diagonal covariance matrix;
s132: from the sampled state variable ztCandidate samples are selected in the current image frame, and a candidate sample set C is constructed.
4. The method for tracking an object based on adaptive expression of claim 1, wherein the adaptive expression matrix X corresponding to the template set T is calculated in step S14 by using the following formula (1):
Figure FDA0002416513060000022
wherein x isiIs corresponding to the template tiThe adaptive expression matrix X is a set of adaptive expressions, i.e. X ═ X1,x2,…,xp+n],Diag(x*) The operator expresses x adaptivelyiConversion to diagonal matrix Diag (x)i) C is a video candidate sample set, lambda is a regular term coefficient, p is the number of positive templates, and n is the number of negative templates;
in step S14, the above equation (1) is solved using an iterative reweighting algorithm.
5. The method for tracking an object based on adaptive expression of claim 1, wherein the step S15 specifically includes:
s150: constructing a weight matrix W, wherein the weight matrix is constructed by the formula (2):
Wi,j=exp(-||Ci-Tj||2) (2)
wherein, CiIs the ith candidate sample and TjIs the jth template, Wi,jIs the value of the ith row and the jth column of the weight matrix W, and simultaneously represents the similarity between the ith candidate sample and the jth template;
s151, performing dot multiplication on the weight matrix W and the adaptive expression matrix X to obtain a re-weighted expression matrix M, namely M is W ⊙ X, recording the re-weighted expression matrix as an expression graph M, and dividing the expression graph M into two parts { Mpos,Mneg};
S152: calculating a separability score sc for each candidate sampleiThe separability score is calculated by formula (3),
sci=∑Mi,pos-∑Mi,neg(3)
wherein M isi,posIs a matrix MposRow i of (1), representing the separable features of the ith candidate sample on the front template set, Mi,negIs a matrix MnegLine i of (1), represents the separable features of the ith candidate sample on the negative template set, sciIs the separability score of the ith candidate sample;
s153: and selecting the candidate sample with the highest separability score as the target tracked by the current image frame.
6. The method for tracking an object based on adaptive expression of claim 1, wherein the step S17 specifically includes:
s170, calculating similarity vector η ═ (η)12,…,ηp) Wherein ηiSimilarity between the current tracking result and the ith front template based on Euclidean distance;
s171, calculating the maximum similarity omega as max (η), comparing the maximum similarity omega with an updating threshold theta, and if the maximum similarity omega is larger than the updating threshold theta, updating the front template corresponding to the maximum similarity by using the current tracking result;
s172: selecting a specified number of negative templates in the vicinity of the current trace resultAs a new set of negative templates TnegAnd the negative template satisfies the following conditions,
Figure FDA0002416513060000031
where a and
Figure FDA0002416513060000032
respectively designated inner and outer thresholds, QnAnd QkRepresenting the center of the negative template and the center of the current trace result, respectively.
7. The target tracking method based on the adaptive expression is characterized in that the method is suitable for an intelligent auxiliary driving system of an automobile, road information in front is acquired in real time in an image mode through a vehicle-mounted camera, and real-time tracking of a front vehicle target is achieved.
8. An adaptive representation-based target tracking apparatus, comprising:
the acquisition module is used for acquiring the current image frame of the video;
the first judging module is used for judging whether the template exists or not;
the device comprises an initialization setting module, a tracking task starting module and a tracking task stopping module, wherein the initialization setting module is used for setting an initial position of a tracking target in a video image, and corresponding parameters include but are not limited to a state variable variance and a front template updating threshold value and starting a tracking task;
the tracking target template building module is used for building an initial positive template and a negative template of the tracking target near the initial position of the specified tracking target;
the tracking target sample generation module is used for randomly sampling the current image frame of the tracking task to generate a sample of a tracking target;
the adaptive expression calculation module is used for calculating the adaptive expression of the template on the candidate sample;
the target tracking module is used for obtaining self-adaptive expression through calculation and tracking the target of the current frame;
the second judgment module is used for judging whether the tracking task is completed or not;
and the tracking target template updating module is used for updating the positive template and the negative template of the tracking target according to the current tracking result in the tracking process.
9. An adaptive representation-based target tracking terminal, the terminal comprising: a processor and a memory, wherein the memory has stored therein a program code, which when executed by the processor causes the processor to perform the steps of the adaptive expression based object tracking method of any one of claims 1 to 7.
10. A computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the steps of an adaptive representation-based object tracking method according to any one of claims 1 to 7.
CN202010192778.2A 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression Active CN111429481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010192778.2A CN111429481B (en) 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010192778.2A CN111429481B (en) 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression

Publications (2)

Publication Number Publication Date
CN111429481A true CN111429481A (en) 2020-07-17
CN111429481B CN111429481B (en) 2022-08-05

Family

ID=71548059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010192778.2A Active CN111429481B (en) 2020-03-18 2020-03-18 Target tracking method, device and terminal based on adaptive expression

Country Status (1)

Country Link
CN (1) CN111429481B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070805A (en) * 2020-09-10 2020-12-11 深圳市豪恩汽车电子装备股份有限公司 Motor vehicle target real-time image tracking device and method
CN113486820A (en) * 2021-07-09 2021-10-08 厦门理工学院 Bidirectional target tracking method and system based on efficient template updating and selecting mechanism
CN116108091A (en) * 2022-12-26 2023-05-12 小米汽车科技有限公司 Data processing method, event tracking analysis method, device, equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544483A (en) * 2013-10-25 2014-01-29 合肥工业大学 United target tracking method based on local sparse representation and system thereof
CN103810252A (en) * 2014-01-21 2014-05-21 南京信息工程大学 Image retrieval method based on group sparse feature selection
CN104239902A (en) * 2014-09-12 2014-12-24 西安电子科技大学 Hyper-spectral image classification method based on non-local similarity and sparse coding
CN106570884A (en) * 2016-09-30 2017-04-19 天津大学 On-line dictionary updating model-based object tracking method
CN107203747A (en) * 2017-05-17 2017-09-26 江南大学 Sparse conjunctive model method for tracking target based on adaptively selected mechanism
CN107784664A (en) * 2017-12-05 2018-03-09 韶关学院 A kind of fast robust method for tracking target sparse based on K
CN108492312A (en) * 2018-02-26 2018-09-04 大连大学 Visual tracking method based on reverse rarefaction representation under illumination variation
CN109003291A (en) * 2018-07-11 2018-12-14 上饶师范学院 Method for tracking target and device
CN109492671A (en) * 2018-10-16 2019-03-19 北京理工大学 A kind of multipolarization High Range Resolution target identification method based on LSTM
US20190158270A1 (en) * 2017-11-21 2019-05-23 International Business Machines Corporation Exchanging Asset, Maintenance, And Spares Parts Information Via Blockchain

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544483A (en) * 2013-10-25 2014-01-29 合肥工业大学 United target tracking method based on local sparse representation and system thereof
CN103810252A (en) * 2014-01-21 2014-05-21 南京信息工程大学 Image retrieval method based on group sparse feature selection
CN104239902A (en) * 2014-09-12 2014-12-24 西安电子科技大学 Hyper-spectral image classification method based on non-local similarity and sparse coding
CN106570884A (en) * 2016-09-30 2017-04-19 天津大学 On-line dictionary updating model-based object tracking method
CN107203747A (en) * 2017-05-17 2017-09-26 江南大学 Sparse conjunctive model method for tracking target based on adaptively selected mechanism
US20190158270A1 (en) * 2017-11-21 2019-05-23 International Business Machines Corporation Exchanging Asset, Maintenance, And Spares Parts Information Via Blockchain
CN107784664A (en) * 2017-12-05 2018-03-09 韶关学院 A kind of fast robust method for tracking target sparse based on K
CN108492312A (en) * 2018-02-26 2018-09-04 大连大学 Visual tracking method based on reverse rarefaction representation under illumination variation
CN109003291A (en) * 2018-07-11 2018-12-14 上饶师范学院 Method for tracking target and device
CN109492671A (en) * 2018-10-16 2019-03-19 北京理工大学 A kind of multipolarization High Range Resolution target identification method based on LSTM

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
EDOUARD GRAVE: "Trace Lasso: a trace norm regularization for", 《HTTPS://PROCEEDINGS.NEURIPS.CC/PAPER/2011/HASH/33CEB07BF4EEB3DA587E268D663ABA1A-ABSTRACT.HTML》 *
JIAN-XUN MI: "Adaptive Class Preserving Representation for Image Classification", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
KALAL, ZDENEK: "Tracking-Learning-Detection", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
ZHI HAN: "Active Trace: A Sparse Spatiotemporal Representation for Videos", 《IEEE ACCESS ( VOLUME: 5)》 *
刘海仓: "基于稀疏表示的图像超分辨率与目标跟踪方法研究", 《中国博士学位论文全文数据库 (信息科技辑)》 *
康伟: "基于相关滤波与度量学习的目标跟踪算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
张伟东等: "基于特征选择与时间一致性稀疏外观模型的目标追踪算法", 《模式识别与人工智能》 *
杨红红;曲仕茹: "基于稀疏约束深度学习的交通目标跟踪", 《中国公路学报》 *
苏巧平: "基于稀疏表达的多示例学习目标追踪算法", 《计算机工程》 *
黄宏图: "基于稀疏表示的视频目标跟踪研究综述", 《自动化学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070805A (en) * 2020-09-10 2020-12-11 深圳市豪恩汽车电子装备股份有限公司 Motor vehicle target real-time image tracking device and method
CN113486820A (en) * 2021-07-09 2021-10-08 厦门理工学院 Bidirectional target tracking method and system based on efficient template updating and selecting mechanism
CN113486820B (en) * 2021-07-09 2023-06-06 厦门理工学院 Bidirectional target tracking method and system based on efficient template updating and selecting mechanism
CN116108091A (en) * 2022-12-26 2023-05-12 小米汽车科技有限公司 Data processing method, event tracking analysis method, device, equipment and medium
CN116108091B (en) * 2022-12-26 2024-01-23 小米汽车科技有限公司 Data processing method, event tracking analysis method, device, equipment and medium

Also Published As

Publication number Publication date
CN111429481B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN111429481B (en) Target tracking method, device and terminal based on adaptive expression
CN111915644B (en) Real-time target tracking method of twin guide anchor frame RPN network
CN109711283A (en) A kind of joint doubledictionary and error matrix block Expression Recognition algorithm
CN104424629A (en) X-ray chest radiography lung segmentation method and device
CN101281648A (en) Method for tracking dimension self-adaption video target with low complex degree
CN110208660B (en) Training method and device for diagnosing partial discharge defects of power equipment
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
Zhang et al. Sparse learning-based correlation filter for robust tracking
Yang et al. Visual tracking with long-short term based correlation filter
CN109784155B (en) Visual target tracking method based on verification and error correction mechanism and intelligent robot
CN114387513A (en) Robot grabbing method and device, electronic equipment and storage medium
CN106485750A (en) A kind of estimation method of human posture based on supervision Local Subspace
Cheng et al. Parallel vertex diffusion for unified visual grounding
CN115239760A (en) Target tracking method, system, equipment and storage medium
Wu et al. An unsupervised real-time framework of human pose tracking from range image sequences
CN117894065A (en) Multi-person scene behavior recognition method based on skeleton key points
CN112614158B (en) Sampling frame self-adaptive multi-feature fusion online target tracking method
CN112733758B (en) Curve-based three-dimensional face recognition method and system under Riemann geometric invariance
Wang et al. Robot grasping in dense clutter via view-based experience transfer
CN111914751B (en) Image crowd density identification detection method and system
CN112507940A (en) Skeleton action recognition method based on difference guidance representation learning network
CN112906432A (en) Error detection and correction method applied to human face key point positioning task
CN112633078A (en) Target tracking self-correcting method, system, medium, equipment, terminal and application
CN113362372B (en) Single target tracking method and computer readable medium
Li et al. Spatial-temporal graph Transformer for object tracking against noise spoofing interference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240404

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Country or region after: China

Address before: 400000 No.2 Chongwen Road, Nanshan street, Nan'an District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Country or region before: China