CN114897941A - Target tracking method based on Transformer and CNN - Google Patents
Target tracking method based on Transformer and CNN Download PDFInfo
- Publication number
- CN114897941A CN114897941A CN202210819539.4A CN202210819539A CN114897941A CN 114897941 A CN114897941 A CN 114897941A CN 202210819539 A CN202210819539 A CN 202210819539A CN 114897941 A CN114897941 A CN 114897941A
- Authority
- CN
- China
- Prior art keywords
- target
- network
- tracking
- transformer
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
Abstract
The invention discloses a target tracking method based on Transformer and CNN, which comprises the following steps: cutting the target and the search area according to the initial target position; generating an online template library by using an image augmentation means according to a known target; extracting target characteristics through a CNN network; analyzing the target characteristics to obtain a corresponding frame score map; similarity calculation is carried out on the two targets, and if the similarity is higher than a certain threshold value, simple processing is carried out to output the similarity; if the similarity is lower than the threshold value, the cause of missed detection is judged, if only a single network has missed detection, the template library is updated by using the current correct target, and the corresponding network is corrected; if both networks miss the target at the same time. The invention ensures the robustness of tracking deformation and fuzzy targets; ensuring the self-adaptive force to the target scale change during tracking; the stability of long-term tracking of the target is ensured; and the tracking robustness of the shielded and deformed target is ensured.
Description
Technical Field
The invention belongs to the technical field of single target tracking in computer vision, and particularly relates to a target tracking method based on a Transformer and a CNN.
Background
As one of important tasks of computer vision, single-target tracking has wide application prospects in real life, such as pedestrian detection and tracking under complex backgrounds, human-computer interaction, unmanned driving technology and the like. The existing single-target tracking algorithm obtains higher tracking precision and better short-term tracking effect under the condition that the target is in a single background and has less deformation and shielding.
However, the existing mainstream tracking algorithms all have considerable defects, for example, the conventional tracking algorithm based on the relevant filtering has the problem of low updating precision of the scale of the tracked target, and when the size of the target changes, the problems of missing detection and error detection easily occur. For a twin network tracking algorithm capable of stably tracking when a target deforms, the problems of shading and insufficient robustness of fuzzy target tracking caused by lack of an effective template updating mechanism are solved. The tracking algorithm based on the transform structure has strong advantages in the aspect of dealing with the problem of target occlusion, but the transform structure which only performs linear segmentation and mapping on target features has poor capability of extracting local features of a target and has the problem of low recognition force on target deformation.
Disclosure of Invention
In view of the above, the invention adopts a mode of integrating a deep learning tracking algorithm with complementary performance to solve the long-term single-target tracking problem under the situation that the target is shielded and deformed.
In order to solve the problems, the invention provides a dual-network deep learning tracking method based on ensemble learning, which realizes the stable tracking of morphologically variable targets under a complex background. The method has the translation invariance of the CNN tracking network and the strong extraction capability of local information, and also effectively fuses the strong anti-shielding capability of the Transformer tracking network, thereby obtaining stable tracking effect on target deformation and shielding. Meanwhile, in order to realize long-time tracking, the invention provides an online learning method, so that online optimization of network weight is realized.
In addition, the invention adopts a self-adaptive strategy of the search range according to the different sizes of the tracking targets so as to improve the accuracy of the tracking result,
specifically, the invention discloses a target tracking method based on a Transformer and a CNN, which comprises the following steps:
initial target loading: cutting the target and the search area according to the initial target position;
and (3) generating an online target template library: generating an online template library by using an image augmentation means according to a known target;
feature extraction: extracting target characteristics through a CNN network;
target prediction: simultaneously, two deep learning networks with different structures are used for obtaining corresponding frame score maps through analyzing the target characteristics, and the score maps are converted into the relative positions of the target in the map frame through an angular point regression network;
and (3) judging the similarity: in order to reduce calculation parameters and improve the real-time performance of a tracking algorithm, similarity calculation is carried out on two targets obtained in the target prediction step, and if the similarity is higher than a certain threshold valueIf so, the two algorithms are considered to be stably tracked and directly output after simple processing;
and (4) missing detection correction: if the similarity is lower than the thresholdIf the single network has the missed detection, updating the template base by using the current correct target, and correcting the corresponding network;
target recovery: if the two networks are missed to be detected simultaneously and the target is lost, the tracking is stopped, the searching range is expanded, and the target is tried to be found back.
Further, the cutting method comprises the following steps:
wherein,The length and the width of the final cutting target template are shown,,the length and width of the initial target are shown,the adaptive amplification factor for the search range varies according to the size of the target.
Further, the image augmentation means includes image rotation, image size conversion, and motion blur, the image size conversion including: and respectively carrying out zooming processing on the template image through a Gaussian pyramid and bilinear interpolation to obtain different scale characteristics of different current targets, wherein the motion blur carries out blur processing on the image by using mean filtering.
Further, the gaussian pyramid formula is as follows:
whereinIs a kernel of a gaussian convolution with the result that,is the original template image and is a template image,the template image after the quarter reduction is obtained;
the bilinear interpolation algorithm formula is as follows:
further, a residual error network ResNet based on CNN is used as a backbone network to realize feature extraction of a target template library and a search area.
Further, the target prediction uses a convolution tracking network and a Transformer tracking network;
the convolution tracking network consists of a convolution layer and a linear full-connection layer, and updates the convolution classifier in a mode of learning the characteristics of the target template library on line; in order to accelerate the convergence rate of the classification model, the weight of the model is optimized by adopting a Gaussian-Newton iteration method in the updating process, and the updated classifier is used for positioning the target of the current frame in the search area to obtain a corresponding score map;
the Transformer tracking network consists of an Attention module and a linear full-connection layer, and in order to further strengthen the local information perception capability of the Transformer network, a convolution layer is used for flattening picture features extracted by ResNet and mapping the picture features into features required by Attention calculationA component;
after the current search area and the target template characteristics are subjected to attention calculation, obtaining corresponding scores of the search area characteristics F through a linear full link layer (MLP);
the Transformer network score calculation formula is represented by the following formula:
to ensure long-term tracking capability, cross-entropy loss function is usedAnd a ternary loss functionA weighted average is performed as shown in the following equation:
whereinAndrespectively, represent the weight of the corresponding loss function,is a training threshold constant that is constant for one,andrepresenting the mahalanobis distance between the current result and the positive and negative samples respectively,is a constant.
Furthermore, in order to obtain more accurate target scale estimation, a corner regression network is used, and the score maps of the two prediction networks are converted into corresponding tracking frames and network confidence degrees through a structure of a multilayer convolution plus a corner pooling layer.
Further, the similarity of the two predicted targets is represented by an image structure similarity SSIM, and an SSIM index calculation formula is shown as the following formula:
whereinRepresenting the luminance similarity of the predicted object by the convolutional network and the Transformer network,respectively representing the contrast of the target predicted by the convolutional network and the Transformer network,representing the structural similarity of the two predicted objects,,andrespectively, represent the corresponding similarity classification weights,is a constant.
Further, the network correction method includes:
for the convolution tracking network, a temporary template base is reconstructed by using a current target, and the classifier weight is optimized by using an online updating method in the target prediction step;
for the Transformer network, the current correct target position is used as a positive sample, the Transformer missing detection result is used as a negative sample, and the contrast loss is calculated
WhereinIn order to train the threshold constant for the training,the mahalanobis distance between the input samples.
Compared with the prior art, the invention has the following beneficial effects:
two deep learning tracking networks are integrated to synchronously track the target, and the accuracy of long-term tracking of the target is improved by a method of integrating a complementary network.
A template base updating strategy based on confidence coefficient and similarity is provided, and the tracking robustness of deformation and fuzzy targets is ensured;
an angular point regression network is provided to ensure the self-adaptive capacity to the target scale change during tracking;
the target missing detection and correction strategy based on online learning and complementary network integration is provided, and the stability of tracking the target for a long time is ensured;
a convolution-fused Transformer tracking network is provided, and robustness of tracking of occluded and deformed targets is guaranteed.
Drawings
FIG. 1 is a flowchart of the process of the present invention;
FIG. 2 is a flow diagram of the convolution tracking network of the present invention;
FIG. 3 is a flow chart of the Transformer tracking network of the present invention.
Detailed Description
The invention is further described with reference to the accompanying drawings, but the invention is not limited in any way, and any alterations or substitutions based on the teaching of the invention are within the scope of the invention.
The invention provides a dual-network deep learning tracking method based on ensemble learning, which realizes stable tracking of morphologically variable targets under a complex background. The method has the translation invariance of the CNN tracking network and the strong extraction capability of local information, and also effectively fuses the strong anti-shielding capability of the Transformer tracking network, thereby obtaining stable tracking effect on target deformation and shielding. Meanwhile, in order to realize long-time tracking, the invention provides an online learning method, so that online optimization of network weight is realized.
In addition, the present invention also adopts a search range adaptive strategy according to the size of the tracking target to improve the accuracy of the tracking result, referring to the flow chart of the present invention program of fig. 1, the steps of the present invention include:
s1 initial target load: cutting the target and the search area according to the initial target position;
to reduce the amount of computation, the present invention will crop the target. The clipping size is determined by the formula (1), wherein,The length and the width of the final cutting target template are shown,,the length and width of the initial target are shown,the adaptive amplification factor for the search range will vary according to the size of the target;
s2 online target template library generation: generating an online template library by using an image augmentation means according to a known target;
by amplifying the target in the step 1), the image amplifying method mainly comprises the following steps:
a. rotating the image;
b. image size transformation: and respectively carrying out scaling processing on the template image through a Gaussian pyramid and bilinear interpolation to obtain different scale characteristics of different current targets.
The formula (2) is a general formula of a Gaussian pyramid, whereinIs a kernel of a gaussian convolution with the result that,is the original template image and is a template image,to reduce the template image by one quarter.
Formula (3) is a general formula of the bilinear interpolation algorithm adopted by the invention, and the general formula is as follows:
c. motion blur: the image is blurred using mean filtering.
d. And (3) feature enhancement: the invention selectively enhances the weak information according to the brightness information and the structure information quantity of the current target.
S3 feature extraction: extracting target characteristics through a CNN convolutional network;
the method uses a residual error network ResNet based on CNN as a backbone network to realize the feature extraction of a target template library and a search area; ResNet is the prior art in the field, and the present invention is not described in detail.
S4 target prediction: and simultaneously, obtaining corresponding frame score maps by analyzing the target features by using two deep learning networks with different structures, and converting the score maps into the relative position of the target in the map frame by using an angular point regression network.
The two tracking networks used in the invention are respectively
a. Convolution tracking network:
the convolution tracking network mainly comprises a convolution layer and a linear full-connection layer, and updates a convolution classifier in a mode of learning the characteristics of a target template library on line. In order to accelerate the convergence rate of the classification model, the invention optimizes the model weight by adopting a Gauss-Newton iteration method. And the updated classifier is used for positioning the current frame target in the search area to obtain a corresponding score map. In order to reduce the computation time of the convolution network, the invention is characterized in that the depth separable convolution network is adopted to carry out convolution computation. Fig. 2 is a flow chart of the convolutional network of the present invention.
b. Transformer tracking network:
the invention relates to a Transformer tracking network which mainly comprises an Attention (Attention) module and a linear full-connection layer, and simultaneously, in order to further enhance the local information perception capability of the Transformer network, the invention particularly uses a convolution layer to replace a mode of linear mapping and position coding in a general Transformer structure, and flattens the picture characteristics extracted by ResNet and maps the picture characteristics into the image characteristics required by Attention calculationAnd (4) components. And after the current search area and the target template feature are subjected to attention calculation, obtaining a corresponding score of a search area feature F through a linear full connecting layer (MLP).
The transform network score calculation formula is represented by formula (4), whereinThe components are derived from mapping the ResNet extracted features,is the data dimension.
FIG. 3 is a flow chart of the Transformer network of the present invention.
In the aspect of loss function, in order to ensure the long-term tracking capability of the invention, a cross entropy loss function is specially selectedAnd ternary loss functions commonly used in re-identification problemsWeighted average of the following equations (5) and (6) was carried out. WhereinAndrespectively, represent the weight of the corresponding loss function,is a training threshold constant that is constant for one,andrepresenting the mahalanobis distance between the current result and the positive and negative samples respectively,is a constant.
In order to obtain more accurate target scale estimation, the invention uses a corner regression network which is a structure of multilayer convolution and a corner pooling layer and converts the score maps of two prediction networks into corresponding tracking frames and network confidence coefficients.
And S5 similarity judgment: in order to reduce calculation parameters and improve the real-time performance of the tracking algorithm, the similarity of the two targets obtained in the step 4) is calculated, and if the similarity is higher than a certain threshold valueIf the two algorithms are stable, the two algorithms can be processed simply and output directly;
in order to reduce calculation parameters and improve the real-time performance of the tracking algorithm, the similarity calculation is firstly carried out on the two predicted targets obtained in the step 4), if the similarity is higher than a certain threshold value, the two algorithms are considered to be stably tracked, and the two algorithms can be directly output after simple processing. Meanwhile, in order to ensure the long-term tracking capability of the invention. When outputting the tracking result, the invention refers to the confidence of two tracking networks at the same time, if the similarity of the two networks is higher than the threshold valueBut with confidence levels below their corresponding thresholdThen it is considered asIf the current target is greatly deformed, the current target needs to be selected and added into an online target template library, and if the size of the online template library is larger than a set threshold valueThen the oldest added template is deleted according to the template addition time.
In the invention, the Similarity of two prediction targets is represented by an image Structure Similarity (SSIM) index, and an SSIM index calculation formula is shown as a formula (7).
WhereinRepresenting the luminance similarity of the predicted object by the convolutional network and the Transformer network,respectively representing the contrast of the target predicted by the convolutional network and the Transformer network,representing the structural similarity of the two predicted objects,,andrespectively, represent the corresponding similarity classification weights,is a constant.
S6 missing detection correction: if the similarity is lower than the thresholdIf the single network has the missed detection, the template library is updated by using the current correct target, and the corresponding network is corrected.
If SSIM is below a thresholdIf the condition of missing inspection appears, the cause of the missing inspection needs to be judged. The invention adopts the missing detection judgment index as the confidence coefficient of the corresponding network to the current predicted tracking target, if only one network confidence coefficient is lower than the corresponding threshold valueIf the network is judged to have missed detection, the target tracking can be continuously tracked by taking the output of the other network as correct output and correcting the missed detection network.
The network correction method adopted by the invention respectively comprises the following steps:
a. for the convolution tracking network, the invention uses the current target to reconstruct a temporary template base, and uses the online updating method as in the step 4) to optimize the classifier weight, thereby realizing the correction of the undetected network.
b. For the Transformer network, the invention also selects an online learning mode to realize the correction of the false detection network. The process comprises the steps of using the current correct target position as a positive sample, using a Transformer missing detection result as a negative sample, and calculating a contrast loss(formula 8), whereinIn order to train the threshold constant for the training,mahalanobis distance between input samples). Meanwhile, Gauss is adoptedAnd Newton iteration method to realize network weight correction.
S7 target retrieval: if the two networks are missed to be detected simultaneously and the target is lost, the tracking is stopped, the searching range is expanded, and the target is tried to be found back.
If the confidence of both networks is lower than the corresponding thresholdIf the two networks are missed, the target is lost. At this time, the invention will stop tracking, expand the target search range, try to retrieve the target by using kuen-man clais bipartite matching algorithm, and the weight used in the matching algorithm is also represented by the contrast loss of equation (8).
Comparative experiment:
the hardware environment for the experiment of the invention is i7-8700 CPU and Yingwei GTX 2080Ti GPU. The software environments are Python3.6 and CUDA 11.0. The VOT 2019 public data set is adopted in the experimental data set, and the algorithm is compared with a current leading edge single-target tracking algorithm. The results of the experiments are shown in the following table:
compared with the prior art, the invention has the following beneficial effects:
two deep learning tracking networks are integrated to synchronously track the target, and the accuracy of long-term tracking of the target is improved by a method of integrating a complementary network.
A template base updating strategy based on confidence coefficient and similarity is provided, and the tracking robustness of deformation and fuzzy targets is ensured;
an angular point regression network is provided to ensure the self-adaptive capacity to the target scale change during tracking;
the target missing detection and correction strategy based on online learning and complementary network integration is provided, and the stability of tracking the target for a long time is ensured;
a convolution-fused Transformer tracking network is provided, and robustness of tracking of occluded and deformed targets is guaranteed.
The word "preferred" is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "preferred" is not necessarily to be construed as advantageous over other aspects or designs. Rather, use of the word "preferred" is intended to present concepts in a concrete fashion. The term "or" as used in this application is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise or clear from context, "X employs A or B" is intended to include either of the permutations as a matter of course. That is, if X employs A; b is used as X; or X employs both A and B, then "X employs A or B" is satisfied in any of the foregoing examples.
Also, although the disclosure has been shown and described with respect to one or an implementation, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations, and is limited only by the scope of the appended claims. In particular regard to the various functions performed by the above described components (e.g., elements, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or other features of the other implementations as may be desired and advantageous for a given or particular application. Furthermore, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.
Each functional unit in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or a plurality of or more than one unit are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium. The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Each apparatus or system described above may execute the storage method in the corresponding method embodiment.
In summary, the above-mentioned embodiment is an implementation manner of the present invention, but the implementation manner of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be regarded as equivalent replacements within the protection scope of the present invention.
Claims (9)
1. The target tracking method based on the Transformer and the CNN is characterized by comprising the following steps of:
initial target loading: cutting the target and the search area according to the initial target position;
and (3) generating an online target template library: generating an online template library by using an image augmentation means according to a known target;
feature extraction: extracting target characteristics through a CNN network;
target prediction: simultaneously, two deep learning networks with different structures are used for obtaining corresponding frame score maps through analyzing the target characteristics, and the score maps are converted into the relative positions of the target in the map frame through an angular point regression network;
and (3) judging the similarity: in order to reduce calculation parameters and improve the real-time performance of a tracking algorithm, similarity calculation is carried out on two targets obtained in the target prediction step, and if the similarity is higher than a certain threshold valueThen, consider both algorithms to track stablyDirectly outputting after simple processing;
and (4) missing detection correction: if the similarity is lower than the thresholdIf the single network has the missed detection, updating the template base by using the current correct target, and correcting the corresponding network;
target recovery: if the two networks are missed to be detected simultaneously and the target is lost, the tracking is stopped, the searching range is expanded, and the target is tried to be found back.
2. The Transformer and CNN-based target tracking method according to claim 1, wherein the clipping method is as follows:
3. The Transformer and CNN based target tracking method of claim 1, wherein the image augmentation means comprises image rotation, image size transformation and motion blur, the image size transformation comprises: and respectively carrying out zooming processing on the template image through a Gaussian pyramid and bilinear interpolation to obtain different scale characteristics of different current targets, wherein the motion blur carries out blur processing on the image by using mean filtering.
5. The method for tracking targets based on Transformer and CNN as claimed in claim 1, wherein residual ResNet based on CNN is used as backbone network to extract features of target template library and search area.
6. The Transformer and CNN based target tracking method of claim 1, wherein the target prediction uses a convolution tracking network and a Transformer tracking network;
the convolution tracking network consists of a convolution layer and a linear full-connection layer, and updates the convolution classifier in a mode of learning the characteristics of the target template library on line; in order to accelerate the convergence rate of the classification model, the weight of the model is optimized by adopting a Gaussian-Newton iteration method in the updating process, and the updated classifier is used for positioning the target of the current frame in the search area to obtain a corresponding score map;
the Transformer tracking network consists of an Attention module and a linear full-connection layer, and in order to further strengthen the local information perception capability of the Transformer network, a convolution layer is used for flattening picture features extracted by ResNet and mapping the picture features into features required by Attention calculationA component;
after the current search area and the target template characteristics are subjected to attention calculation, obtaining corresponding scores of the search area characteristics F through a linear full-link layer MLP;
the Transformer network score calculation formula is represented by the following formula:
to ensure long-term tracking capability, cross-entropy loss function is usedAnd a ternary loss functionA weighted average is performed as shown in the following equation:
7. The Transformer and CNN-based target tracking method according to claim 1, wherein to obtain a more accurate target scale estimate, a corner regression network is used, which is a structure of multiple layers of convolution plus corner pooling layers, and the score maps of two prediction networks are converted into corresponding tracking boxes and network confidences.
8. The Transformer and CNN-based target tracking method according to claim 1, wherein the similarity between two predicted targets is represented by image structure similarity SSIM, and the SSIM index calculation formula is shown as follows:
whereinRepresenting the luminance similarity of the predicted object by the convolutional network and the Transformer network,respectively representing the contrast of the target predicted by the convolutional network and the Transformer network,representing the structural similarity of the two predicted objects,,andrespectively, represent the corresponding similarity classification weights,is a constant.
9. The Transformer and CNN-based target tracking method according to claim 1, wherein the network correction method comprises:
for the convolution tracking network, a temporary template base is reconstructed by using a current target, and the classifier weight is optimized by using an online updating method in the target prediction step;
for Transformer networksCalculating the contrast loss by using the current correct target position as a positive sample and the Transformer missing detection result as a negative sample
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210819539.4A CN114897941B (en) | 2022-07-13 | 2022-07-13 | Target tracking method based on Transformer and CNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210819539.4A CN114897941B (en) | 2022-07-13 | 2022-07-13 | Target tracking method based on Transformer and CNN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114897941A true CN114897941A (en) | 2022-08-12 |
CN114897941B CN114897941B (en) | 2022-09-30 |
Family
ID=82729589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210819539.4A Active CN114897941B (en) | 2022-07-13 | 2022-07-13 | Target tracking method based on Transformer and CNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114897941B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147602A1 (en) * | 2017-11-13 | 2019-05-16 | Qualcomm Technologies, Inc. | Hybrid and self-aware long-term object tracking |
CN110533691A (en) * | 2019-08-15 | 2019-12-03 | 合肥工业大学 | Method for tracking target, equipment and storage medium based on multi-categorizer |
CN110660082A (en) * | 2019-09-25 | 2020-01-07 | 西南交通大学 | Target tracking method based on graph convolution and trajectory convolution network learning |
CN112561907A (en) * | 2020-12-24 | 2021-03-26 | 南开大学 | Video tampering operation detection method and device based on double-current network |
CN113256637A (en) * | 2021-07-15 | 2021-08-13 | 北京小蝇科技有限责任公司 | Urine visible component detection method based on deep learning and context correlation |
CN113628249A (en) * | 2021-08-16 | 2021-11-09 | 电子科技大学 | RGBT target tracking method based on cross-modal attention mechanism and twin structure |
CN113902773A (en) * | 2021-09-24 | 2022-01-07 | 南京信息工程大学 | Long-term target tracking method using double detectors |
CN114529581A (en) * | 2022-01-28 | 2022-05-24 | 西安电子科技大学 | Multi-target tracking method based on deep learning and multi-task joint training |
-
2022
- 2022-07-13 CN CN202210819539.4A patent/CN114897941B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147602A1 (en) * | 2017-11-13 | 2019-05-16 | Qualcomm Technologies, Inc. | Hybrid and self-aware long-term object tracking |
CN110533691A (en) * | 2019-08-15 | 2019-12-03 | 合肥工业大学 | Method for tracking target, equipment and storage medium based on multi-categorizer |
CN110660082A (en) * | 2019-09-25 | 2020-01-07 | 西南交通大学 | Target tracking method based on graph convolution and trajectory convolution network learning |
CN112561907A (en) * | 2020-12-24 | 2021-03-26 | 南开大学 | Video tampering operation detection method and device based on double-current network |
CN113256637A (en) * | 2021-07-15 | 2021-08-13 | 北京小蝇科技有限责任公司 | Urine visible component detection method based on deep learning and context correlation |
CN113628249A (en) * | 2021-08-16 | 2021-11-09 | 电子科技大学 | RGBT target tracking method based on cross-modal attention mechanism and twin structure |
CN113902773A (en) * | 2021-09-24 | 2022-01-07 | 南京信息工程大学 | Long-term target tracking method using double detectors |
CN114529581A (en) * | 2022-01-28 | 2022-05-24 | 西安电子科技大学 | Multi-target tracking method based on deep learning and multi-task joint training |
Non-Patent Citations (5)
Title |
---|
QIANGYU LI ET AL.: "Visual Object Tracking: Method and Comparison", 《ICETCI》 * |
XIN LI ET AL.: "Dual-regression model for visual tracking", 《NEURAL NETWORKS》 * |
YABIN ZHU ET AL.: "RGBT tracking by trident fusion network", 《IEEE》 * |
YIHONG ZHANG ET AL.: "Parallel three-branch correlation filters for complex marine environmental object tracking based on a confidence mechanism", 《SENSORS》 * |
马勇等: "水域无人系统平台自主航行及协同控制研究进展", 《无人系统技术》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114897941B (en) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335290B (en) | Twin candidate region generation network target tracking method based on attention mechanism | |
CN111354017B (en) | Target tracking method based on twin neural network and parallel attention module | |
CN108090919B (en) | Improved kernel correlation filtering tracking method based on super-pixel optical flow and adaptive learning factor | |
CN106599836B (en) | Multi-face tracking method and tracking system | |
CN108647694B (en) | Context-aware and adaptive response-based related filtering target tracking method | |
CN112733822B (en) | End-to-end text detection and identification method | |
CN112150493B (en) | Semantic guidance-based screen area detection method in natural scene | |
CN111260688A (en) | Twin double-path target tracking method | |
CN110120064B (en) | Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning | |
CN112364931B (en) | Few-sample target detection method and network system based on meta-feature and weight adjustment | |
CN110942471B (en) | Long-term target tracking method based on space-time constraint | |
CN112232134B (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN113221925B (en) | Target detection method and device based on multi-scale image | |
CN110889865A (en) | Video target tracking method based on local weighted sparse feature selection | |
CN112329784A (en) | Correlation filtering tracking method based on space-time perception and multimodal response | |
CN110706256A (en) | Detection tracking algorithm optimization method based on multi-core heterogeneous platform | |
CN114445715A (en) | Crop disease identification method based on convolutional neural network | |
CN116030396A (en) | Accurate segmentation method for video structured extraction | |
CN113393385B (en) | Multi-scale fusion-based unsupervised rain removing method, system, device and medium | |
CN113627481A (en) | Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens | |
CN114897941B (en) | Target tracking method based on Transformer and CNN | |
CN114882076B (en) | Lightweight video object segmentation method based on big data memory storage | |
CN116385281A (en) | Remote sensing image denoising method based on real noise model and generated countermeasure network | |
CN115661860A (en) | Method, device and system for dog behavior and action recognition technology and storage medium | |
CN114202694A (en) | Small sample remote sensing scene image classification method based on manifold mixed interpolation and contrast learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |