CN114372997A - Target tracking method based on quality and similarity evaluation online template updating - Google Patents

Target tracking method based on quality and similarity evaluation online template updating Download PDF

Info

Publication number
CN114372997A
CN114372997A CN202111476809.8A CN202111476809A CN114372997A CN 114372997 A CN114372997 A CN 114372997A CN 202111476809 A CN202111476809 A CN 202111476809A CN 114372997 A CN114372997 A CN 114372997A
Authority
CN
China
Prior art keywords
template
frame
pool
target tracking
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111476809.8A
Other languages
Chinese (zh)
Inventor
李雅倩
赵明
肖存军
李海滨
张文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202111476809.8A priority Critical patent/CN114372997A/en
Publication of CN114372997A publication Critical patent/CN114372997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method based on quality and similarity evaluation online template updating, which comprises the following steps: s1, generating N incremental templates by amplifying the initial template frame
Figure DDA0003393738160000011
Establishing a template pool with the size of M; s2, extracting the characteristics of the template frame and the search frame through the target tracking module, then obtaining a response score map through convolution response, and carrying out comparison on the current new template according to the quality evaluation index
Figure DDA0003393738160000012
To carry outQuality evaluation; s3, judging the template passing the quality evaluation through the cosine similarity
Figure DDA0003393738160000013
Judging whether a new template needs to be added into the template pool or not according to the similarity between the template pool and the template in the template pool; s4, fusing the templates in the template pool through different weights of the templates to obtain tiA final template of time; and S5, convolving the feature maps obtained by feature extraction of the template frame and the search frame with the cavity convolution layers with different step lengths and aspect ratios, and then performing feature fusion.

Description

Target tracking method based on quality and similarity evaluation online template updating
Technical Field
The invention relates to a video image processing technology, in particular to a target tracking method based on quality and similarity evaluation online template updating.
Background
At present, the application range of the artificial intelligence technology is wider and wider, the artificial intelligence technology is widely concerned by various industries, and the target tracking is one of the most important branches of the artificial intelligence technology, and the development of the artificial intelligence technology is very rapid. The main task of target tracking is to detect the initial position and size of the target position and to lock the target so that it is not lost in the field of view. With the continuous development of the computer vision field, higher requirements are put forward on the video processing technology, the target tracking technology is also highly emphasized, and the target tracking has wide application prospects.
Target tracking refers to the location and size of the target being located at a subsequent frame given the location and size information of the target at the first frame. With the continuous improvement of the algorithm, the target tracking performance is greatly improved. However, the target tracking has been challenged by drastic changes in the target morphology, motion blur, interference of similar objects, occlusion, and so on. These challenges make tracking targets susceptible to drift, resulting in tracking failures.
The target tracking is mainly to evaluate the tracking result through indexes of accuracy, robustness, average overlapping rate and speed. The accuracy represents the coincidence degree of the predicted target frame and the real target frame, if the coincidence degree is higher, the accuracy is better, the performance of the tracker is better, and otherwise, the performance of the tracker is poorer. The robustness represents the capability of recovering after the tracking result fails, namely the strange sequence image target can be identified, and the lower the robustness value is, the better the adaptability of the tracker is represented, and the better the performance is. The real-time performance indicates that the tracker needs to reach the minimum speed of video playing, the video playing is not stopped, and the speed is at least 24FPS to reach the real-time performance.
Object tracking is essentially a continuous image object detector, the result of the previous frame has an effect on the result of the next frame, the tracker is essentially divided into three steps: extracting features, matching target features, and determining the position and size of a target. The characteristic extraction is that the target is processed, and the original image cannot be directly applied and matched without being preprocessed, so that the original image needs to be cut and modeled to obtain some characteristics of the target, and the characteristics are the basis for determining the target; the matching of the target features is to compare the features of the current frame with the features of the target extracted, find a region which is in accordance with the features of the target, and generally regard the region as the tracking target to be found as long as the difference between the features of the region and the target features is minimum; the determination of the target position and size is the target area feature to be matched, including the size and position of the target, which is the result of the tracker output. Current tracking algorithms are mainly classified into two categories: the method comprises a traditional target tracking algorithm and a tracking algorithm based on deep learning, wherein the traditional target tracking algorithm generally uses a generative model, and typically comprises tracking algorithms based on relevant filtering, such as Kalman filtering and particle filtering, and the tracking algorithm based on deep learning is mainly a discriminant, so that the method is high in precision and strong in robustness.
The traditional target tracking algorithm is high in running speed, needs less data and is suitable for simple application scenes, but the traditional target tracking algorithm is poor in precision and robustness and cannot be suitable for complex scenes. The tracking algorithm based on deep learning has high precision and adaptability, can process the tracking task of a complex scene, but needs more data, has a large model and is slower in running speed. According to the current twin network target tracking based on deep learning, the problems of severe deformation, shielding, similarity interference and the like of a target in the tracking process cannot be solved by an initial template when the target is tracked over time. Therefore, the target template needs to be updated so as to adapt to the challenges of deformation, occlusion and the like of the target.
A processing flow of a target tracking system scheme based on a twin network update template in the prior art is shown in fig. 1, and the specific steps include:
1. reading images frame by frame, preprocessing the images, giving the position and the size of a target in a first frame, and defining the processed and cut image as a template frame;
2. and extracting features of the template frame and the search frame, and performing cross-correlation operation to obtain a score chart and a regression chart.
3. Selecting the value with the maximum score as a new position and size of the target;
4. taking the current frame as a new template, storing the new template into a template pool with a fixed size, distributing the weight of each template according to a convolution layer, and then cascading to form a new template;
5. displaying the tracking result of the current frame;
6. repeating the steps 2-5 until the whole video sequence is input;
the target tracking system scheme based on the twin network in the prior art has the following disadvantages:
1. the model of deep learning is originally bigger, the parameter quantity is more, the above-mentioned updating scheme is that every frame needs to be updated, thus has increased the enormous parameter quantity, the speed of tracing will be reduced inevitably, it is after long-time accumulation, will reduce the weight of the frame template of the initial template continuously, after the goal is lost, the subsequent frame is difficult to find the goal again, because the initial template frame is certain reliable;
2. the updating scheme solves the problem of model degradation to a certain extent, but does not consider the problem of reliability of the templates added into the template pool, when unreliable templates are added into the template pool, the template pool is polluted, so that the positioning of the target in the subsequent frames is more difficult, the obtained new templates are unreliable, errors are continuously accumulated in the process, and the target is more easily lost when being positioned.
In the prior art, a target tracking process based on twin network feature fusion is shown in fig. 2, and the reception field is enlarged mainly through two-layer hole convolution, so that more multi-scale information can be acquired.
The target tracking process based on twin network feature fusion in the prior art has the following defects:
1. a part of information is lost while a larger receptive field is obtained by a mode of convolution of two layers of holes, and the part of information exists in a mode of hole points, so that the information loss caused by characteristic fusion in the mode is inevitable;
2. the above process is convolved with voids of comparable aspect ratio, such that the aspect ratio 1: the ratio of 1 is such that the hole points always exist, and no matter how many layers of hole convolutions, the lost part of information can be obtained.
Disclosure of Invention
The invention provides a target tracking method for updating an online template based on quality and similarity evaluation, and solves the problems of the existing template updating and feature fusion.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a target tracking method based on quality and similarity evaluation online template updating comprises the following steps:
s1, generating N incremental templates by amplifying the initial template frame
Figure BDA0003393738140000031
Establishing a template pool with the size of M;
s2, extracting the characteristics of the template frame and the search frame through the target tracking module, then obtaining a response score map through convolution response, and carrying out comparison on the current new template according to the quality evaluation index
Figure BDA0003393738140000046
Carrying out quality evaluation;
s3, judging the template passing the quality evaluation through the cosine similarity
Figure BDA0003393738140000047
Judging whether a new template needs to be added into the template pool or not according to the similarity between the template pool and the template in the template pool;
s4, fusing the templates in the template pool through different weights of the templates to obtain tiA final template of time;
and S5, convolving the feature maps obtained by feature extraction of the template frame and the search frame with the cavity convolution layers with different step lengths and aspect ratios, and then performing feature fusion.
The technical scheme of the invention is further improved as follows: in the step S1, the target tracking module is used to track the initial time t0And performing data augmentation on video pictures at all times, performing rotation, translation, scale transformation and inversion operations on a given target according to a target tracking task to obtain templates of different postures of the target, and storing the templates into a template pool with the size of M.
The technical scheme of the invention is further improved as follows: in the step S2, feature extraction is performed on the template frame and the search frame respectively by using the backbone network of the target tracking module according to the provided image data, convolution operation is performed on the obtained feature maps respectively to obtain a classification map and a regression map, and quality evaluation is performed on the classification map.
The technical scheme of the invention is further improved as follows: the quality evaluation index calculation formula is as follows:
Figure BDA0003393738140000041
Figure BDA0003393738140000042
wherein A represents a quality assessment value, α1A weight parameter, α, representing the degree of maximum score fluctuation2Weight parameter, F, representing the degree of fluctuation of the multi-peak detection valuemaxRepresents the maximum value of the current classification score,
Figure BDA0003393738140000043
the degree of score fluctuation, mean (F), is shownmax) Representing the mean value of the maximum value of the classification scores of the historical frames, mean (APCE) representing the peak energy of the historical frames, and APCE representing the current average peak correlation energy; fminRepresents the minimum value of the classification score of the current frame, FiEach score value represents a classification score.
The technical scheme of the invention is further improved as follows: passing the current template in the step S3
Figure BDA0003393738140000044
And comparing the cosine similarity with each template in the template pool, wherein the calculation formula is as follows:
Figure BDA0003393738140000045
wherein
Figure BDA0003393738140000051
A new template, T, representing the current frameiRepresenting the templates in the template pool, S represents the set of cosine similarity metrics, COS represents the cosine similarity, and i represents the template index among the template pool.
The technical scheme of the invention is further improved as follows: in step S4, all templates in the template pool have different weights, except the initial template, and the other templates are assigned weights according to the distance of the current frame template, where the specific weight assignment formula is as follows:
Figure BDA0003393738140000052
wherein TgtnRepresents the respective weights of the templates in the template pool, N represents the number stored in the template pool, β represents the weight of the initial template,
Figure BDA0003393738140000053
represent the normalization of subsequent template weights;
the final output template is as follows:
Figure BDA0003393738140000054
wherein T isnewRepresenting the resulting final matching template, N ∈ [1, N],
Figure BDA0003393738140000055
Represented as historical frames.
The technical scheme of the invention is further improved as follows: in step S6, the empty hole convolution layer is four convolutions of 3x3, and the expansion rates thereof are (m, n) ∈ { (1,1), (1,2), (2,1), (2,2) }, and the process of feature fusion may be as follows:
Figure BDA0003393738140000056
wherein f isTRepresenting a template frame, fSWhich represents a search frame, is displayed,
Figure BDA0003393738140000057
represents a single hole convolution and represents cross-correlation.
Due to the adoption of the technical scheme, the invention has the technical progress that:
1. the invention carries on data augmentation through the initial template frame, set up a template pool of fixed size to store the newer template, carry on the relevant fusion after the cavity convolution through different step length, length-width ratio separately through template frame and search frame, get categorised score map and regression map, have carried on the multi-peak detection calculation according to categorised score, judge the reliability of the new template, then carry on the similarity comparison with template in the template pool the new template, in this way, can judge the necessity of the new template renewal, judge whether the new template should be added to the template pool through these two indexes, both can avoid every frame to upgrade, accelerate the speed, has guaranteed the reliability of the template in the template pool, the invention can apply to certain particular goal real-time tracking task under the natural condition, such as video monitoring, unmanned driving, etc.;
2. the template updating method provided by the invention considers the reliability problem of a new template, reduces the pollution of a template pool, improves the robustness of the model, also considers the necessity problem of template updating, reduces information redundancy, improves the accuracy of the model, and in addition, provides a hole convolution characteristic fusion mode with different step lengths and length-width ratios, can not reduce information, obtains a larger view field at the same time, and better performs multi-scale estimation of a target.
Drawings
FIG. 1 is a schematic diagram of a twin network update template based target tracking system solution in the prior art;
FIG. 2 is a schematic diagram of a target tracking process based on twin network feature fusion in the prior art;
FIG. 3 is a flowchart of a target tracking implementation process for updating an online template based on quality and similarity evaluation according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a hole convolution provided in an embodiment of the present invention;
fig. 5 is a schematic diagram of a processing procedure of a template update policy according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples:
as shown in fig. 3, the present invention provides a target tracking method for updating an online template based on quality and similarity evaluation, comprising the following steps:
s1, generating N incremental templates by amplifying the initial template frame
Figure BDA0003393738140000061
Establishing a template pool with the size of M, which comprises the following specific steps:
the present hardware can adopt GPU (graphic processor) to make acceleration operation, said invention uses computer hardware equipment of GPU, etc. to make image processing, and utilizes target tracking module to make initial time t0Performing data augmentation on video pictures at moments according toAnd the target tracking task performs rotation, translation, scale transformation and inversion operations on the given target to obtain templates of different postures of the target, and then stores the templates into a template pool with the size of M.
S2, extracting the characteristics of the template frame and the search frame through the target tracking module, then obtaining a response score map through convolution response, and carrying out comparison on the current new template according to the quality evaluation index
Figure BDA0003393738140000071
Quality evaluation was carried out as follows:
according to the provided image data, the template frame and the search frame are input into a twin network, the twin network has the characteristic of sharing weight, so that a lot of parameter quantity can be reduced to accelerate calculation, feature extraction is carried out on the template frame and the search frame respectively through a main network of a target tracking module, convolution operation is carried out on the obtained feature images respectively to obtain a classification image and a regression image, and quality evaluation is carried out on the classification score image.
The invention is illustrated by the following specific examples:
s21, image feature extraction:
at the initial time, namely, at the time when T is 0, in the tracking task, one of the template frames at the initial time gives the position information and the scale information of the target, N incremental templates have been generated through step S1, at this time, the information of the incremental templates is obtained through the initial template frame, and there is no information of the subsequent frames, so that the new template T obtained by fusion in the template pool can be obtainednewViewed as an initial template frame
Figure BDA0003393738140000072
At an initial time, that is, at a time when t is 0, an initial template frame (with a size of 127 × 127 × 3) and a search frame (with a size of 303 × 303 × 3), where 3 denotes that they both have 3 color channels, and feature matrices of 7 × 7 × 256 and 31 × 31 × 256 are obtained through extraction of a backbone network, respectively, where the backbone network is composed of five convolutional layers and two pooling layers, where the convolutional core size of the first layer is 11 × 11 and the step size is 2; the second layer is a pooling layer with the size of 3 multiplied by 3 and the step length of 2; the convolution kernel size of the third layer is 5 multiplied by 5, and the step length is 1; the fourth layer is a pooling layer with the size of 3 multiplied by 3 and the step length of 2; the fifth layer is a convolution layer, the size of a convolution kernel is 3 multiplied by 3, and the step length is 1; the sixth layer is a convolutional layer with a convolutional kernel size of 3 × 3 with a step size of 1, the seventh layer is a convolutional layer with a convolutional kernel size of 3 × 3 with a step size of 1.
S22, feature fusion:
after the initial frame template is extracted by the backbone network, a feature matrix of 7 × 7 × 256 is obtained, similarly, a frame is searched to obtain a feature matrix of 31 × 31 × 256, and then the feature matrices pass through four void convolution layers respectively, and the specific process is as follows: inputting an initial frame template 7x7x256 feature matrix into a first layer of cavity convolution layer, wherein the convolution kernel is 3x3, the expansion rate is (1,1), the initial frame template can be regarded as a 3x3 common convolution kernel, inputting a first layer of cavity convolution layer into a similar search frame 31 x256 feature matrix, and performing correlation operation on feature maps obtained by the initial frame template and the first layer of cavity convolution layer to obtain a feature map with the size of 25 x256, wherein parameters are the same as those of the first layer of cavity convolution layer; inputting an initial frame template 7 × 7 × 256 feature matrix into a second-layer cavity convolution layer, wherein a convolution kernel is 3 × 3, an expansion rate is (1,2), searching frames 31 × 31 × 256 feature matrices for the same processing, and performing correlation operation on feature maps obtained by the initial frame template and the search frames to obtain a feature map with the size of 25 × 25 × 256; inputting an initial frame template 7 × 7 × 256 feature matrix into a third layer of void convolution layer, wherein a convolution kernel is 3 × 3, an expansion rate is (2,1), searching frames 31 × 31 × 256 feature matrices for the same processing, and performing correlation operation on feature maps obtained by the initial frame template and the search frames to obtain a feature map with the size of 25 × 25 × 256; inputting an initial frame template 7 × 7 × 256 feature matrix into a fourth layer of void convolution layer, performing the same processing on a convolution kernel of 3 × 3 and an expansion rate of (2,2) and searching a frame 31 × 31 × 256 feature matrix, and performing correlation operation on feature maps obtained by the initial frame template and the search frame to obtain a feature map with the size of 25 × 25 × 256. It can be seen from the above process that the template frame and the search frame are respectively subjected to four times of cavity convolution, and finally, the 25 × 25 × 256 feature maps obtained by four times of correlation operations are subjected to equal weight fusion.
S23, obtaining a classification branch and a regression branch:
and performing tertiary convolution on the obtained 25 × 25 × 256 feature map, and performing channel compression to obtain a classification branch, a quality evaluation branch and a regression branch respectively. The obtained classification branch feature map is 19 × 19 × 1, the quality evaluation branch feature map is 19 × 19 × 1, the regression branch feature map is 19 × 19 × 4, the classification branch feature map represents the equal number of each sample, the quality evaluation branch feature map is obtained by giving a higher weight to the important part of the score in the classification branch, and the regression branch is obtained by estimating the distance t from the target center to the target boundary frame as (l, t, r, b), which are the distances from the target center to the left frame, the upper frame, the right frame and the lower frame, respectively.
S24, according to the template
Figure BDA0003393738140000081
And tracking a classification branch feature map provided by the obtained result, and performing quality evaluation through the classification score map. The quality evaluation ensures the reliability of the template: the quality evaluation index calculation formula is as follows:
Figure BDA0003393738140000082
Figure BDA0003393738140000091
wherein A represents a quality assessment value, α1A weight parameter, α, representing the degree of maximum score fluctuation2Weight parameter, F, representing the degree of fluctuation of the multi-peak detection valuemaxRepresents the maximum value of the current classification score,
Figure BDA0003393738140000092
the degree of score fluctuation, mean (F), is shownmax) Mean (APCE) representing the mean of the maximum values of the classification scores of the historical frames, mean (APCE) representing the peak energy of the historical frames, APCE tableShowing the current mean peak correlation energy; fminRepresents the minimum value of the classification score of the current frame, FiEach score value represents a classification score.
After a plurality of experiments, the text takes alpha1=1,α2And 2, setting the threshold value of the template A to be 1.8, wherein the threshold value of the template A is less than 1.8, and considering that the quality of the current template is poor, the current template is not added into the template pool for tracking the next frame.
S3, judging the template passing the quality evaluation through the cosine similarity
Figure BDA0003393738140000093
Similarity with the templates in the template pool, and similarity evaluation to detect the necessity of updating the templates and judge whether a new template needs to be added into the template pool, as shown in fig. 5; by current template
Figure BDA0003393738140000094
And comparing the cosine similarity with each template in the template pool, wherein the calculation formula is as follows:
Figure BDA0003393738140000095
wherein
Figure BDA0003393738140000096
A new template, T, representing the current frameiRepresenting the templates in the template pool, S represents the set of cosine similarity metrics, COS represents the cosine similarity, and i represents the template index among the template pool. Over several experiments, S was set to 0.15.
S4, fusing the templates in the template pool through different weights of the templates to obtain tiA final template of time; all templates in the template pool have different weights, the initial template is removed, and other templates are distributed with weights according to the distance of the current frame template. By the weight mode of the distribution template, the characteristics of the initial template can be kept to the maximum extent, and the latest information of the target is added, wherein the specific weight distribution formula is as follows:
Figure BDA0003393738140000097
wherein TgtnRepresents the respective weights of the templates in the template pool, N represents the number stored in the template pool, β represents the weight of the initial template,
Figure BDA0003393738140000098
represent the normalization of subsequent template weights;
the final output template is as follows:
Figure BDA0003393738140000101
wherein T isnewRepresenting the resulting final matching template, N ∈ [1, N],
Figure BDA0003393738140000102
Represented as historical frames. Over several experiments, the value of β was set to 0.5.
And S5, convolving the feature maps obtained by feature extraction of the template frame and the search frame with the cavity convolution layers with different step lengths and aspect ratios, and then performing feature fusion.
The schematic diagram of the hole convolution provided in the embodiment of the invention is shown in fig. 4, information is easy to lack in the process of extracting features of a template frame and a search frame, before the template frame is correlated with the search frame, the hole convolution layers are firstly correlated, and the hole convolution with different step lengths and aspect ratios can enlarge the receptive field, so that more scale information can be obtained without losing other information. The hole convolution layer is four convolutions of 3x3, the expansion rates of the four convolutions are (m, n) ∈ { (1,1), (1,2), (2,1), (2,2) }, and the process of feature fusion can be as follows:
Figure BDA0003393738140000103
wherein f isTRepresenting a template frame, fSWhich represents a search frame, is displayed,
Figure BDA0003393738140000104
represents a single hole convolution and represents cross-correlation.
The invention carries out data augmentation through an initial template frame, establishes a template pool with fixed size for storing an updated template, carries out correlation fusion after the template frame and a search frame are respectively convolved through cavities with different step lengths and length-width ratios to obtain a classification score graph and a regression graph, carries out multi-peak detection calculation according to the classification scores, judges the reliability of a new template, and then carries out similarity comparison on the new template and the templates in the template pool, thereby judging the necessity of updating the new template, judging whether the new template should be added into the template pool or not through the two indexes, not only avoiding updating each frame, accelerating the speed, but also ensuring the reliability of the template in the template pool.
The template updating method provided by the invention considers the reliability problem of a new template, reduces the pollution problem of a template pool, improves the robustness of the model, also considers the necessity problem of template updating, reduces information redundancy, improves the accuracy of the template, and provides a hole convolution feature fusion mode with different step lengths and length-width ratios, so that a larger view field can be obtained while information is not reduced, and multi-scale of a target can be better carried out.

Claims (7)

1. A target tracking method based on quality and similarity evaluation online template updating is characterized in that: the method comprises the following steps:
s1, generating N incremental templates by amplifying the initial template frame
Figure FDA0003393738130000011
Establishing a template pool with the size of M;
s2, extracting the characteristics of the template frame and the search frame through the target tracking module, and then obtaining a response score map through convolution responseAccording to the quality evaluation index, the current new template is subjected to
Figure FDA0003393738130000014
Carrying out quality evaluation;
s3, judging the template passing the quality evaluation through the cosine similarity
Figure FDA0003393738130000015
Judging whether a new template needs to be added into the template pool or not according to the similarity between the template pool and the template in the template pool;
s4, fusing the templates in the template pool through different weights of the templates to obtain tiA final template of time;
and S5, convolving the feature maps obtained by feature extraction of the template frame and the search frame with the cavity convolution layers with different step lengths and aspect ratios, and then performing feature fusion.
2. The method of claim 1, wherein the target tracking method comprises the following steps: in the step S1, the target tracking module is used to track the initial time t0And performing data augmentation on video pictures at all times, performing rotation, translation, scale transformation and inversion operations on a given target according to a target tracking task to obtain templates of different postures of the target, and storing the templates into a template pool with the size of M.
3. The method of claim 2, wherein the target tracking method comprises the following steps: in the step S2, feature extraction is performed on the template frame and the search frame respectively by using the backbone network of the target tracking module according to the provided image data, convolution operation is performed on the obtained feature maps respectively to obtain a classification map and a regression map, and quality evaluation is performed on the classification map.
4. The method of claim 3, wherein the target tracking method comprises the following steps: the quality evaluation index calculation formula is as follows:
Figure FDA0003393738130000012
Figure FDA0003393738130000013
wherein A represents a quality assessment value, α1A weight parameter, α, representing the degree of maximum score fluctuation2Weight parameter, F, representing the degree of fluctuation of the multi-peak detection valuemaxRepresents the maximum value of the current classification score,
Figure FDA0003393738130000021
the degree of score fluctuation, mean (F), is shownmax) Representing the mean value of the maximum value of the classification scores of the historical frames, mean (APCE) representing the peak energy of the historical frames, and APCE representing the current average peak correlation energy; fminRepresents the minimum value of the classification score of the current frame, FiEach score value represents a classification score.
5. The method of claim 4, wherein the target tracking method based on quality and similarity evaluation online template update comprises: passing the current template in the step S3
Figure FDA0003393738130000022
And comparing the cosine similarity with each template in the template pool, wherein the calculation formula is as follows:
Figure FDA0003393738130000023
wherein
Figure FDA0003393738130000024
A new template, T, representing the current frameiRepresenting the templates in the template pool, S represents the set of cosine similarity metrics, COS represents the cosine similarity, and i represents the template index among the template pool.
6. The method of claim 5, wherein the target tracking method comprises the following steps: in step S4, all templates in the template pool have different weights, except the initial template, and the other templates are assigned weights according to the distance of the current frame template, where the specific weight assignment formula is as follows:
Figure FDA0003393738130000025
wherein TgtnRepresents the respective weights of the templates in the template pool, N represents the number stored in the template pool, β represents the weight of the initial template,
Figure FDA0003393738130000026
represent the normalization of subsequent template weights;
the final output template is as follows:
Figure FDA0003393738130000027
wherein T isnewRepresenting the resulting final matching template, N ∈ [1, N],
Figure FDA0003393738130000028
Represented as historical frames.
7. The method of claim 6, wherein the target tracking method comprises the following steps: in step S6, the empty hole convolution layer is four convolutions of 3x3, and the expansion rates thereof are (m, n) ∈ { (1,1), (1,2), (2,1), (2,2) }, and the process of feature fusion may be as follows:
Figure FDA0003393738130000031
wherein f isTRepresenting a template frame, fSWhich represents a search frame, is displayed,
Figure FDA0003393738130000032
represents a single hole convolution and represents cross-correlation.
CN202111476809.8A 2021-12-06 2021-12-06 Target tracking method based on quality and similarity evaluation online template updating Pending CN114372997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111476809.8A CN114372997A (en) 2021-12-06 2021-12-06 Target tracking method based on quality and similarity evaluation online template updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111476809.8A CN114372997A (en) 2021-12-06 2021-12-06 Target tracking method based on quality and similarity evaluation online template updating

Publications (1)

Publication Number Publication Date
CN114372997A true CN114372997A (en) 2022-04-19

Family

ID=81140514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111476809.8A Pending CN114372997A (en) 2021-12-06 2021-12-06 Target tracking method based on quality and similarity evaluation online template updating

Country Status (1)

Country Link
CN (1) CN114372997A (en)

Similar Documents

Publication Publication Date Title
CN111797716B (en) Single target tracking method based on Siamese network
CN110335319B (en) Semantic-driven camera positioning and map reconstruction method and system
CN109146921B (en) Pedestrian target tracking method based on deep learning
CN112184752A (en) Video target tracking method based on pyramid convolution
CN110473231B (en) Target tracking method of twin full convolution network with prejudging type learning updating strategy
CN110781262B (en) Semantic map construction method based on visual SLAM
CN113674328A (en) Multi-target vehicle tracking method
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
WO2023065395A1 (en) Work vehicle detection and tracking method and system
CN108564598B (en) Improved online Boosting target tracking method
CN109472191A (en) A kind of pedestrian based on space-time context identifies again and method for tracing
CN107688830B (en) Generation method of vision information correlation layer for case serial-parallel
CN112446882A (en) Robust visual SLAM method based on deep learning in dynamic scene
CN113298014B (en) Closed loop detection method, storage medium and equipment based on reverse index key frame selection strategy
CN113902991A (en) Twin network target tracking method based on cascade characteristic fusion
CN113033454A (en) Method for detecting building change in urban video camera
CN115131760A (en) Lightweight vehicle tracking method based on improved feature matching strategy
CN114882351B (en) Multi-target detection and tracking method based on improved YOLO-V5s
CN110688512A (en) Pedestrian image search algorithm based on PTGAN region gap and depth neural network
CN113628246A (en) Twin network target tracking method based on 3D convolution template updating
CN114372997A (en) Target tracking method based on quality and similarity evaluation online template updating
CN112200831B (en) Dynamic template-based dense connection twin neural network target tracking method
CN112613472B (en) Pedestrian detection method and system based on deep search matching
CN114067240A (en) Pedestrian single-target tracking method based on online updating strategy and fusing pedestrian characteristics
CN113888603A (en) Loop detection and visual SLAM method based on optical flow tracking and feature matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination