CN113393496A - Target tracking method based on space-time attention mechanism - Google Patents
Target tracking method based on space-time attention mechanism Download PDFInfo
- Publication number
- CN113393496A CN113393496A CN202110755862.5A CN202110755862A CN113393496A CN 113393496 A CN113393496 A CN 113393496A CN 202110755862 A CN202110755862 A CN 202110755862A CN 113393496 A CN113393496 A CN 113393496A
- Authority
- CN
- China
- Prior art keywords
- attention mechanism
- image
- target
- space
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 93
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims description 20
- 230000004044 response Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 7
- 238000007906 compression Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 9
- 230000008859 change Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000002349 favourable effect Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a target tracking method based on a space-time attention mechanism, which comprises the following steps: constructing a network model for acquiring a template image and an image to be tracked of a target to be tracked; constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model; constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model; training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function; and tracking the target in the video by using the network model obtained by training to obtain a target tracking result. Compared with the prior art, the tracking performance is greatly improved, the target can still be stably tracked when the tracked target is subjected to complex environments such as shielding, deformation, background interference and the like, the target drifting problem in the tracking process is effectively solved, and a more stable tracking result is provided for a user.
Description
Technical Field
The invention relates to the technical field of electronic signal detection, in particular to a target tracking method based on a space-time attention mechanism.
Background
The target tracking technology is one of hot research directions in the field of computer vision, and has wide application in various fields such as intelligent monitoring, man-machine interaction, unmanned driving and the like. The target tracking means that in a continuous video sequence, the position relation of an object to be tracked is established to obtain the complete motion track of the object, and the position and the size of the target in a subsequent sequence image are calculated according to the target coordinate position of a first frame of a given image. The target tracking technology can provide basis for behavior understanding, reasoning, decision making and the like, is the basis of high-level video processing tasks such as subsequent target recognition, behavior analysis, video compression coding and video understanding, and is a necessary premise for executing high-level intelligent behaviors. Although the target tracking technology has advanced sufficiently in recent years, many efficient algorithms are proposed to solve the challenging problems in a specific scene, but problems such as occlusion, illumination change, scale change, background interference and the like still exist, so the research of the target tracking technology is still a difficult task.
In the target tracking method based on the full convolution twin network, the method carries out target tracking through a template matching strategy, the characteristic discrimination is insufficient, only the target information of the first frame image is used in the tracking process, and the performance of the target is reduced when the target is challenged by deformation, shielding and the like. In addition, the twin network only retains the image characteristics of the first frame, so that the target characteristics are prevented from being polluted, but the method cannot capture the change of the target in the subsequent frames. Therefore, when the target is greatly deformed, the score of the response value corresponding to the real position of the target may become low, increasing the risk of target loss.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects in the prior art, the invention aims to provide a target tracking method based on a space-time attention mechanism, which is used for realizing a video target tracking method.
The technical scheme is as follows: the invention improves the discrimination capability of the characteristics through a twin network architecture; then, improved channel attention and spatial attention mechanisms were introduced, with different weights applied to features at different channels and spatial locations, focusing on features at spatial locations and channel locations that are advantageous for target tracking. In addition, an efficient online target template updating mechanism is provided, and the image features of the first frame and the image features with higher confidence in the subsequent tracking image frame are fused.
The invention provides a target tracking method based on a space-time attention mechanism, which comprises the following steps:
step 2, constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model;
step 3, constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model;
step 4, training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function;
and 5, tracking the target in the video by using the network model obtained by training to obtain a target tracking result.
Further, in one implementation, the network model includes a template branch and a search branch; the template branch is used for obtaining a template image as a first frame of a target to be tracked, and the template image is obtained by initialization, namely, the image of the first frame is initialized to the template image; the searching branch is used for receiving a searching image in the tracking process, and the searching image is a current frame of the target to be tracked in the tracking video; recording a template image of a first frame received by the template branch as z, recording a search image of a current frame received by the search branch as x, and recording a feature extraction network as x
Further, in one implementation, the step 2 includes:
step 2-1, performing feature compression on image features through a Global Average Pooling layer (Global Average Pooling) in the network model to obtain a feature vector with the size of 1 × 1 × C, wherein C is the number of channels, and in the invention, the image features refer to features obtained by performing feature extraction on a template image on a template branch;
2-2, respectively performing dimensionality reduction and dimensionality enhancement on the feature vector subjected to feature compression through two fully-connected layers in the network model and corresponding activation functions, and outputting the feature vector with the size of 1 multiplied by C;
step 2-3, generating channel weight alpha corresponding to each channel by using a Sigmoid function;
step 2-4, inputting image characteristics of the channel attention mechanism modelAnd obtaining a new channel weight corresponding to each channel according to the product of each channel and the channel weight alpha corresponding to the channel. In the invention, the favorable channel characteristics are selected according to the obtained new channel weight, and the favorable channel characteristics are used for carrying out template matching with the image to be tracked, thereby determining the target position.
Further, in one implementation, the step 3 includes:
step 3-1, sending the template image of the first frame into a space attention mechanism network to obtain the weight of each pixel point of a feature map, wherein the feature map is an image feature;
step 3-2, carrying out template matching on the weighted first frame image characteristic and the image to be tracked, namely, inputting the weighted first frame image into the spatial attention mechanism network, inputting each frame of image to be tracked into a network model, and inputting the weight of each pixel point in the characteristic graph and the input image characteristicMultiplying to obtain a response graph of the target, and performing template matching;
and 3-3, taking the position corresponding to the response graph with the highest score as the position of the target to be tracked.
Further, in one implementation, the step 4 includes:
step 4-1, a space-time attention module obtained through the channel attention and the space attention of the steps 2-1 to 3-3 receives a pair of template images z and a search image x as input in an off-line training process based on a space-time attention mechanism target tracking algorithm;
step 4-2, respectively sending the template image z to a channel attention mechanism model and a space attention mechanism model for feature selection, wherein the channel attention mechanism model generates a channel weight alpha according to input image features, and the space attention mechanism network generates a weight beta according to the input image features;
step 4-3, obtaining a weighted feature map h (z) according to the following formula:
step 4-4, according to the following formula, the network model uses the weighted feature graph h (z) as a convolution kernel to send the search image x to a feature extraction networkFeature map obtained by feature extractionPerforming a sliding convolution operation:
wherein f (z, x) is a final response graph for performing feature fusion cross-correlation operation on the template image z and the search image x; in the invention, the final response graph is a graph finally obtained after the template image branch is cross-correlated with the search image branch through the attention mechanism module.
And 4-5, obtaining a final network model by continuously optimizing the loss function according to the following formula:
l(y,v)=log(1+exp(-yv)) (3)
where l (y, v) is the loss function, y is ground-truth, and v is the predictor.
Further, in one implementation, the step 5 includes:
step 5-1, sending the first frame template image into a feature extraction network and an attention mechanism network for feature extraction;
step 5-2, performing similarity calculation on the feature graph obtained by each subsequent frame of search image through the feature extraction network and the template image by using convolution operation, namely adoptingAnd a Correlation calculation mode, wherein the similarity between the template image and the search image is calculated and obtained according to the following formula:
sim(A,B)=AB/||A||||B||;
and 5-3, obtaining a response graph, and determining the final position of the target to be tracked according to the position corresponding to the response graph with the highest score, namely obtaining the target tracking result. In the invention, the higher the similarity score obtained by calculation is, the target which needs to be tracked is, and the lower the similarity score is, the target is lost.
In the prior art, the performance of the target is reduced when the target is challenged by deformation, shielding and the like, and the risk of target loss is increased.
In order to deal with the problem, the invention introduces a channel attention mechanism and a space attention mechanism, so that the algorithm focuses more on the space position and the characteristic which is beneficial to target tracking on the channel position. Specifically, the invention provides an efficient online updating mechanism, which fuses the image characteristics of the first frame with the image characteristics with higher reliability in the subsequent tracking image frame, and reduces the risk of tracking failure when the target is subjected to the challenge problems of shielding, deformation and the like. In the present invention, the image feature with higher confidence, i.e. the frame with better target condition, specifically, the selection of the frame with better target condition is realized through the aforementioned correlation step related to the channel attention mechanism. And the fusion is to send the image to the network model with template branch and search branch separately and utilize the network model to perform feature fusion. The experimental results show that: the method provided by the invention has higher precision on the OTB2013 and OTB2015 data sets.
Compared with the prior art, the invention has the following remarkable advantages:
firstly, when a tracked target is subjected to complex environments such as shielding, deformation, background interference and the like, the target can still be stably tracked;
secondly, the target drift problem in the tracking process is effectively solved, and a more stable tracking result is provided for a user.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an algorithm architecture of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a workflow of a channel attention mechanism in a space-time attention mechanism-based target tracking method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a workflow of a spatial attention mechanism in a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4a is a schematic diagram of a first tracking result of an exemplary target of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4b is a diagram illustrating a second tracking result of an exemplary target of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4c is a schematic diagram of a third tracking result of an exemplary target of the target tracking method based on the spatiotemporal attention mechanism according to the embodiment of the present invention;
FIG. 4d is a diagram illustrating a fourth tracking result of an exemplary target of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4e is a diagram illustrating a fifth tracking result of an exemplary target of the target tracking method based on the spatiotemporal attention mechanism according to an embodiment of the present invention;
fig. 4f is a schematic diagram of a sixth tracking result of an exemplary target of the target tracking method based on the spatiotemporal attention mechanism according to the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1 to fig. 3, the embodiment of the invention discloses a target tracking method based on a space-time attention mechanism, which is applied to a long-time target tracking scene, and the space attention mechanism can effectively capture global information and can better track a target. The method comprises the following steps:
step 2, constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model;
step 3, constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model;
step 4, training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function;
and 5, tracking the target in the video by using the network model obtained by training to obtain a target tracking result.
The embodiment provides a time-based clockIn the target tracking method of the empty attention machine system, the network model comprises a template branch and a search branch; the template branch is used for obtaining a template image as a first frame of a target to be tracked, and the template image is obtained by initialization, namely, the image of the first frame is initialized to the template image; the searching branch is used for receiving a searching image in the tracking process, and the searching image is a current frame of the target to be tracked in the tracking video; recording a template image of a first frame received by the template branch as z, recording a search image of a current frame received by the search branch as x, and recording a feature extraction network as x
As shown in fig. 2, in the target tracking method based on the spatiotemporal attention mechanism provided in this embodiment, the step 2 includes:
step 2-1, performing feature compression on image features through a Global Average Pooling layer (Global Average Pooling) in the network model to obtain a feature vector with the size of 1 × 1 × C, wherein C is the number of channels, and in the embodiment, the image features refer to features obtained by performing feature extraction on a template image on a template branch;
2-2, respectively performing dimensionality reduction and dimensionality enhancement on the feature vector subjected to feature compression through two fully-connected layers in the network model and corresponding activation functions, and outputting the feature vector with the size of 1 multiplied by C;
step 2-3, generating channel weight alpha corresponding to each channel by using a Sigmoid function;
step 2-4, inputting image characteristics of the channel attention mechanism modelAnd obtaining a new channel weight corresponding to each channel according to the product of each channel and the channel weight alpha corresponding to the channel. In this embodiment, the favorable channel feature is selected according to the obtained new channel weight, and the favorable channel feature is used to perform template matching with the image to be tracked, thereby ensuring that the image to be tracked is accurately trackedAnd (4) determining the target position.
As shown in fig. 3, in the target tracking method based on the spatiotemporal attention mechanism provided in this embodiment, the step 3 includes:
step 3-1, sending the template image of the first frame into a space attention mechanism network to obtain the weight of each pixel point of a feature map, wherein the feature map is an image feature;
step 3-2, carrying out template matching on the weighted first frame image characteristic and the image to be tracked, namely, inputting the weighted first frame image into the spatial attention mechanism network, inputting each frame of image to be tracked into a network model, and inputting the weight of each pixel point in the characteristic graph and the input image characteristicMultiplying to obtain a response graph of the target, and performing template matching;
and 3-3, taking the position corresponding to the response graph with the highest score as the position of the target to be tracked.
In the target tracking method based on the spatio-temporal attention mechanism provided in this embodiment, the step 4 includes:
step 4-1, a space-time attention module obtained through the channel attention and the space attention of the steps 2-1 to 3-3 receives a pair of template images z and a search image x as input in an off-line training process based on a space-time attention mechanism target tracking algorithm;
step 4-2, respectively sending the template image z to a channel attention mechanism model and a space attention mechanism model for feature selection, wherein the channel attention mechanism model generates a channel weight alpha according to input image features, and the space attention mechanism network generates a weight beta according to the input image features;
step 4-3, obtaining a weighted feature map h (z) according to the following formula:
step 4-4, according to the following formula, the network model uses the weighted feature graph h (z) as a convolution kernel to send the search image x to a feature extraction networkFeature map obtained by feature extractionPerforming a sliding convolution operation:
wherein f (z, x) is a final response graph for performing feature fusion cross-correlation operation on the template image z and the search image x; in this embodiment, the final response graph is a graph obtained by performing cross-correlation on the template image branch and the search image branch through the attention mechanism module.
And 4-5, obtaining a final network model by continuously optimizing the loss function according to the following formula:
l(y,v)=log(1+exp(-yv)) (3)
where l (y, v) is the loss function, y is ground-truth, and v is the predictor.
In the target tracking method based on the spatiotemporal attention mechanism provided in this embodiment, the step 5 includes:
step 5-1, sending the first frame template image into a feature extraction network and an attention mechanism network for feature extraction;
step 5-2, performing similarity calculation on the feature graph obtained by each subsequent frame of search image through the feature extraction network and the template image by using convolution operation, namely adoptingAnd a Correlation calculation mode, wherein the similarity between the template image and the search image is calculated and obtained according to the following formula:
sim(A,B)=AB/||A||||B||;
and 5-3, obtaining a response graph, and determining the final position of the target to be tracked according to the position corresponding to the response graph with the highest score, namely obtaining the target tracking result. In this embodiment, the higher the similarity score obtained by calculation is, the target that needs to be tracked is, and the lower the similarity score is, the target is lost.
In the prior art, the performance of the target is reduced when the target is challenged by deformation, shielding and the like, and the risk of target loss is increased. The invention provides a target tracking method based on a space-time attention mechanism, which is suitable for stably tracking a target under complex environments such as shielding, illumination change, background interference and the like. The method is characterized in that: firstly, a twin network architecture is adopted to improve the discrimination capability of the features; then, improved channel attention and spatial attention mechanisms were introduced, with different weights applied to features at different channels and spatial locations, focusing on features at spatial locations and channel locations that are beneficial for target tracking. In addition, an efficient online target template updating mechanism is provided, and the image features of the first frame and the image features with higher confidence in the subsequent tracking image frame are fused for reducing the risk of target drift. And repeating the steps until one section of video tracking is completed. Finally, the proposed tracking method was tested on OTB2013 and OTB2015 datasets. The experimental results show that: compared with the performance of the current mainstream tracking algorithm, the method has the advantage that the performance is improved by 7.6%. As shown in fig. 4a to 4f, the target tracking method based on the spatio-temporal attention mechanism provided by the present embodiment visualizes part of the challenging scenes with fast motion, occlusion, scale change, and the like.
Compared with the prior art, the invention has the following remarkable advantages:
firstly, when a tracked target is subjected to complex environments such as shielding, deformation, background interference and the like, the target can still be stably tracked;
secondly, the target drift problem in the tracking process is effectively solved, and a more stable tracking result is provided for a user.
In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the target tracking method based on the spatiotemporal attention mechanism provided in the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments in this specification may be referred to each other. The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.
Claims (6)
1. A target tracking method based on a space-time attention mechanism is characterized by comprising the following steps:
step 1, constructing a network model for acquiring a template image of a target to be tracked and an image to be tracked;
step 2, constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model;
step 3, constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model;
step 4, training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function;
and 5, tracking the target in the video by using the network model obtained by training to obtain a target tracking result.
2. The space-time attention mechanism-based target tracking method according to claim 1, wherein the network model comprises a template branch and a search branch; the template branch is used for obtaining a template image as a first frame of a target to be tracked, and the template image is obtained through initialization; the searching branch is used for receiving a searching image in the tracking process, and the searching image is a current frame of the target to be tracked in the tracking video; recording a template image of a first frame received by the template branch as z, recording a search image of a current frame received by the search branch as x, and recording a feature extraction network as x
3. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 2 comprises:
step 2-1, performing feature compression on image features through a global average pooling layer in the network model to obtain a feature vector with the size of 1 × 1 × C, wherein C is the number of channels;
2-2, respectively performing dimensionality reduction and dimensionality enhancement on the feature vector subjected to feature compression through two fully-connected layers in the network model and corresponding activation functions, and outputting the feature vector with the size of 1 multiplied by C;
step 2-3, generating channel weight alpha corresponding to each channel by using a Sigmoid function;
4. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 3 comprises:
step 3-1, sending the template image of the first frame into a space attention mechanism network to obtain the weight of each pixel point of a feature map, wherein the feature map is an image feature;
step 3-2, carrying out template matching on the weighted first frame image characteristic and the image to be tracked, namely, inputting the weighted first frame image into the spatial attention mechanism network, inputting each frame of image to be tracked into a network model, and inputting the weight of each pixel point in the characteristic graph and the input image characteristicMultiplying to obtain a response graph of the target, and performing template matching;
and 3-3, taking the position corresponding to the response graph with the highest score as the position of the target to be tracked.
5. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 4 comprises:
step 4-1, a space-time attention module obtained through the channel attention and the space attention of the steps 2-1 to 3-3 receives a pair of template images z and a search image x as input in an off-line training process based on a space-time attention mechanism target tracking algorithm;
step 4-2, respectively sending the template image z to a channel attention mechanism model and a space attention mechanism model for feature selection, wherein the channel attention mechanism model generates a channel weight alpha according to input image features, and the space attention mechanism network generates a weight beta according to the input image features;
step 4-3, obtaining a weighted feature map h (z) according to the following formula:
step 4-4, according to the following formula, the network model uses the weighted feature graph h (z) as a convolution kernel to send the search image x to a feature extraction networkFeature map obtained by feature extractionPerforming a sliding convolution operation:
wherein f (z, x) is a final response graph for performing feature fusion cross-correlation operation on the template image z and the search image x;
and 4-5, obtaining a final network model by continuously optimizing the loss function according to the following formula:
l(y,v)=log(1+exp(-yv)) (3)
where l (y, v) is the loss function, y is ground-truth, and v is the predictor.
6. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 5 comprises:
step 5-1, sending the first frame template image into a feature extraction network and an attention mechanism network for feature extraction;
step 5-2, performing similarity calculation on the feature graph obtained by each subsequent frame of search image through the feature extraction network and the template image by using convolution operation, namely adoptingA Correlation calculation mode, which is to calculate and obtain the template image and search according to the following formulaSimilarity between chordal images:
sim(A,B)=AB/||A||||B||;
and 5-3, obtaining a response graph, and determining the final position of the target to be tracked according to the position corresponding to the response graph with the highest score, namely obtaining the target tracking result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110755862.5A CN113393496A (en) | 2021-07-05 | 2021-07-05 | Target tracking method based on space-time attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110755862.5A CN113393496A (en) | 2021-07-05 | 2021-07-05 | Target tracking method based on space-time attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113393496A true CN113393496A (en) | 2021-09-14 |
Family
ID=77625185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110755862.5A Pending CN113393496A (en) | 2021-07-05 | 2021-07-05 | Target tracking method based on space-time attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113393496A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113793362A (en) * | 2021-09-22 | 2021-12-14 | 清华大学 | Pedestrian track extraction method and device based on multi-lens video |
CN114782488A (en) * | 2022-04-01 | 2022-07-22 | 燕山大学 | Underwater target tracking method based on channel perception |
CN117522925A (en) * | 2024-01-05 | 2024-02-06 | 成都合能创越软件有限公司 | Method and system for judging object motion state in mobile camera under attention mechanism |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978921A (en) * | 2019-04-01 | 2019-07-05 | 南京信息工程大学 | A kind of real-time video target tracking algorithm based on multilayer attention mechanism |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
CN110751018A (en) * | 2019-09-03 | 2020-02-04 | 上海交通大学 | Group pedestrian re-identification method based on mixed attention mechanism |
CN111462175A (en) * | 2020-03-11 | 2020-07-28 | 华南理工大学 | Space-time convolution twin matching network target tracking method, device, medium and equipment |
CN111598117A (en) * | 2019-02-21 | 2020-08-28 | 成都通甲优博科技有限责任公司 | Image recognition method and device |
CN112164094A (en) * | 2020-09-22 | 2021-01-01 | 江南大学 | Fast video target tracking method based on twin network |
CN112233147A (en) * | 2020-12-21 | 2021-01-15 | 江苏移动信息系统集成有限公司 | Video moving target tracking method and device based on two-way twin network |
-
2021
- 2021-07-05 CN CN202110755862.5A patent/CN113393496A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111598117A (en) * | 2019-02-21 | 2020-08-28 | 成都通甲优博科技有限责任公司 | Image recognition method and device |
CN109978921A (en) * | 2019-04-01 | 2019-07-05 | 南京信息工程大学 | A kind of real-time video target tracking algorithm based on multilayer attention mechanism |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
CN110751018A (en) * | 2019-09-03 | 2020-02-04 | 上海交通大学 | Group pedestrian re-identification method based on mixed attention mechanism |
CN111462175A (en) * | 2020-03-11 | 2020-07-28 | 华南理工大学 | Space-time convolution twin matching network target tracking method, device, medium and equipment |
CN112164094A (en) * | 2020-09-22 | 2021-01-01 | 江南大学 | Fast video target tracking method based on twin network |
CN112233147A (en) * | 2020-12-21 | 2021-01-15 | 江苏移动信息系统集成有限公司 | Video moving target tracking method and device based on two-way twin network |
Non-Patent Citations (2)
Title |
---|
DANLU ZHANG ET AL.: "SIAMESE NETWORK COMBINED WITH ATTENTION MECHANISM FOR OBJECT TRACKING", 《INTERNATIONAL ARCHIVES OF THE PHOTOGRAMMETRY, REMOTE SENSING & SPATIAL INFORMATION SCIENCES》 * |
刘宏志: "《推荐系统》", 31 May 2020, 机械工业出版社 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113793362A (en) * | 2021-09-22 | 2021-12-14 | 清华大学 | Pedestrian track extraction method and device based on multi-lens video |
CN113793362B (en) * | 2021-09-22 | 2024-05-07 | 清华大学 | Pedestrian track extraction method and device based on multi-lens video |
CN114782488A (en) * | 2022-04-01 | 2022-07-22 | 燕山大学 | Underwater target tracking method based on channel perception |
CN117522925A (en) * | 2024-01-05 | 2024-02-06 | 成都合能创越软件有限公司 | Method and system for judging object motion state in mobile camera under attention mechanism |
CN117522925B (en) * | 2024-01-05 | 2024-04-16 | 成都合能创越软件有限公司 | Method and system for judging object motion state in mobile camera under attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Girdhar et al. | Detect-and-track: Efficient pose estimation in videos | |
Leng et al. | Realize your surroundings: Exploiting context information for small object detection | |
CN113393496A (en) | Target tracking method based on space-time attention mechanism | |
Kugarajeevan et al. | Transformers in single object tracking: an experimental survey | |
Lai et al. | Real-time micro-expression recognition based on ResNet and atrous convolutions | |
Huang et al. | End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking | |
CN112446900B (en) | Twin neural network target tracking method and system | |
Saribas et al. | TRAT: Tracking by attention using spatio-temporal features | |
CN116235209A (en) | Sparse optical flow estimation | |
Wei et al. | Novel video prediction for large-scale scene using optical flow | |
He et al. | Towards robust visual tracking for unmanned aerial vehicle with tri-attentional correlation filters | |
CN114596432B (en) | Visual tracking method and system based on foreground region corresponding template features | |
Li et al. | Robust visual tracking with channel attention and focal loss | |
CN118212463A (en) | Target tracking method based on fractional order hybrid network | |
Zhang et al. | Dual attentional Siamese network for visual tracking | |
Zhao et al. | Context-aware and part alignment for visible-infrared person re-identification | |
Li et al. | Video prediction for driving scenes with a memory differential motion network model | |
Feng et al. | Exploring the potential of Siamese network for RGBT object tracking | |
Sun et al. | Deblurring transformer tracking with conditional cross-attention | |
Wang et al. | EMAT: Efficient feature fusion network for visual tracking via optimized multi-head attention | |
Gong et al. | Research on an improved KCF target tracking algorithm based on CNN feature extraction | |
CN110942463A (en) | Video target segmentation method based on generation countermeasure network | |
Ge et al. | A visual tracking algorithm combining parallel network and dual attention-aware mechanism | |
Gong et al. | Visual tracking with pyramidal feature fusion and transformer based model predictor | |
Liu et al. | Adversarial erasing attention for person re-identification in camera networks under complex environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210914 |