CN113393496A - Target tracking method based on space-time attention mechanism - Google Patents

Target tracking method based on space-time attention mechanism Download PDF

Info

Publication number
CN113393496A
CN113393496A CN202110755862.5A CN202110755862A CN113393496A CN 113393496 A CN113393496 A CN 113393496A CN 202110755862 A CN202110755862 A CN 202110755862A CN 113393496 A CN113393496 A CN 113393496A
Authority
CN
China
Prior art keywords
attention mechanism
image
target
space
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110755862.5A
Other languages
Chinese (zh)
Inventor
后弘毅
陆保国
褚孔统
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202110755862.5A priority Critical patent/CN113393496A/en
Publication of CN113393496A publication Critical patent/CN113393496A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target tracking method based on a space-time attention mechanism, which comprises the following steps: constructing a network model for acquiring a template image and an image to be tracked of a target to be tracked; constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model; constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model; training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function; and tracking the target in the video by using the network model obtained by training to obtain a target tracking result. Compared with the prior art, the tracking performance is greatly improved, the target can still be stably tracked when the tracked target is subjected to complex environments such as shielding, deformation, background interference and the like, the target drifting problem in the tracking process is effectively solved, and a more stable tracking result is provided for a user.

Description

Target tracking method based on space-time attention mechanism
Technical Field
The invention relates to the technical field of electronic signal detection, in particular to a target tracking method based on a space-time attention mechanism.
Background
The target tracking technology is one of hot research directions in the field of computer vision, and has wide application in various fields such as intelligent monitoring, man-machine interaction, unmanned driving and the like. The target tracking means that in a continuous video sequence, the position relation of an object to be tracked is established to obtain the complete motion track of the object, and the position and the size of the target in a subsequent sequence image are calculated according to the target coordinate position of a first frame of a given image. The target tracking technology can provide basis for behavior understanding, reasoning, decision making and the like, is the basis of high-level video processing tasks such as subsequent target recognition, behavior analysis, video compression coding and video understanding, and is a necessary premise for executing high-level intelligent behaviors. Although the target tracking technology has advanced sufficiently in recent years, many efficient algorithms are proposed to solve the challenging problems in a specific scene, but problems such as occlusion, illumination change, scale change, background interference and the like still exist, so the research of the target tracking technology is still a difficult task.
In the target tracking method based on the full convolution twin network, the method carries out target tracking through a template matching strategy, the characteristic discrimination is insufficient, only the target information of the first frame image is used in the tracking process, and the performance of the target is reduced when the target is challenged by deformation, shielding and the like. In addition, the twin network only retains the image characteristics of the first frame, so that the target characteristics are prevented from being polluted, but the method cannot capture the change of the target in the subsequent frames. Therefore, when the target is greatly deformed, the score of the response value corresponding to the real position of the target may become low, increasing the risk of target loss.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects in the prior art, the invention aims to provide a target tracking method based on a space-time attention mechanism, which is used for realizing a video target tracking method.
The technical scheme is as follows: the invention improves the discrimination capability of the characteristics through a twin network architecture; then, improved channel attention and spatial attention mechanisms were introduced, with different weights applied to features at different channels and spatial locations, focusing on features at spatial locations and channel locations that are advantageous for target tracking. In addition, an efficient online target template updating mechanism is provided, and the image features of the first frame and the image features with higher confidence in the subsequent tracking image frame are fused.
The invention provides a target tracking method based on a space-time attention mechanism, which comprises the following steps:
step 1, constructing a network model for acquiring a template image of a target to be tracked and an image to be tracked;
step 2, constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model;
step 3, constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model;
step 4, training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function;
and 5, tracking the target in the video by using the network model obtained by training to obtain a target tracking result.
Further, in one implementation, the network model includes a template branch and a search branch; the template branch is used for obtaining a template image as a first frame of a target to be tracked, and the template image is obtained by initialization, namely, the image of the first frame is initialized to the template image; the searching branch is used for receiving a searching image in the tracking process, and the searching image is a current frame of the target to be tracked in the tracking video; recording a template image of a first frame received by the template branch as z, recording a search image of a current frame received by the search branch as x, and recording a feature extraction network as x
Figure BDA0003147396680000021
Further, in one implementation, the step 2 includes:
step 2-1, performing feature compression on image features through a Global Average Pooling layer (Global Average Pooling) in the network model to obtain a feature vector with the size of 1 × 1 × C, wherein C is the number of channels, and in the invention, the image features refer to features obtained by performing feature extraction on a template image on a template branch;
2-2, respectively performing dimensionality reduction and dimensionality enhancement on the feature vector subjected to feature compression through two fully-connected layers in the network model and corresponding activation functions, and outputting the feature vector with the size of 1 multiplied by C;
step 2-3, generating channel weight alpha corresponding to each channel by using a Sigmoid function;
step 2-4, inputting image characteristics of the channel attention mechanism model
Figure BDA0003147396680000022
And obtaining a new channel weight corresponding to each channel according to the product of each channel and the channel weight alpha corresponding to the channel. In the invention, the favorable channel characteristics are selected according to the obtained new channel weight, and the favorable channel characteristics are used for carrying out template matching with the image to be tracked, thereby determining the target position.
Further, in one implementation, the step 3 includes:
step 3-1, sending the template image of the first frame into a space attention mechanism network to obtain the weight of each pixel point of a feature map, wherein the feature map is an image feature;
step 3-2, carrying out template matching on the weighted first frame image characteristic and the image to be tracked, namely, inputting the weighted first frame image into the spatial attention mechanism network, inputting each frame of image to be tracked into a network model, and inputting the weight of each pixel point in the characteristic graph and the input image characteristic
Figure BDA0003147396680000031
Multiplying to obtain a response graph of the target, and performing template matching;
and 3-3, taking the position corresponding to the response graph with the highest score as the position of the target to be tracked.
Further, in one implementation, the step 4 includes:
step 4-1, a space-time attention module obtained through the channel attention and the space attention of the steps 2-1 to 3-3 receives a pair of template images z and a search image x as input in an off-line training process based on a space-time attention mechanism target tracking algorithm;
step 4-2, respectively sending the template image z to a channel attention mechanism model and a space attention mechanism model for feature selection, wherein the channel attention mechanism model generates a channel weight alpha according to input image features, and the space attention mechanism network generates a weight beta according to the input image features;
step 4-3, obtaining a weighted feature map h (z) according to the following formula:
Figure BDA0003147396680000032
step 4-4, according to the following formula, the network model uses the weighted feature graph h (z) as a convolution kernel to send the search image x to a feature extraction network
Figure BDA0003147396680000033
Feature map obtained by feature extraction
Figure BDA0003147396680000034
Performing a sliding convolution operation:
Figure BDA0003147396680000035
wherein f (z, x) is a final response graph for performing feature fusion cross-correlation operation on the template image z and the search image x; in the invention, the final response graph is a graph finally obtained after the template image branch is cross-correlated with the search image branch through the attention mechanism module.
And 4-5, obtaining a final network model by continuously optimizing the loss function according to the following formula:
l(y,v)=log(1+exp(-yv)) (3)
where l (y, v) is the loss function, y is ground-truth, and v is the predictor.
Further, in one implementation, the step 5 includes:
step 5-1, sending the first frame template image into a feature extraction network and an attention mechanism network for feature extraction;
step 5-2, performing similarity calculation on the feature graph obtained by each subsequent frame of search image through the feature extraction network and the template image by using convolution operation, namely adopting
Figure BDA0003147396680000041
And a Correlation calculation mode, wherein the similarity between the template image and the search image is calculated and obtained according to the following formula:
sim(A,B)=AB/||A||||B||;
and 5-3, obtaining a response graph, and determining the final position of the target to be tracked according to the position corresponding to the response graph with the highest score, namely obtaining the target tracking result. In the invention, the higher the similarity score obtained by calculation is, the target which needs to be tracked is, and the lower the similarity score is, the target is lost.
In the prior art, the performance of the target is reduced when the target is challenged by deformation, shielding and the like, and the risk of target loss is increased.
In order to deal with the problem, the invention introduces a channel attention mechanism and a space attention mechanism, so that the algorithm focuses more on the space position and the characteristic which is beneficial to target tracking on the channel position. Specifically, the invention provides an efficient online updating mechanism, which fuses the image characteristics of the first frame with the image characteristics with higher reliability in the subsequent tracking image frame, and reduces the risk of tracking failure when the target is subjected to the challenge problems of shielding, deformation and the like. In the present invention, the image feature with higher confidence, i.e. the frame with better target condition, specifically, the selection of the frame with better target condition is realized through the aforementioned correlation step related to the channel attention mechanism. And the fusion is to send the image to the network model with template branch and search branch separately and utilize the network model to perform feature fusion. The experimental results show that: the method provided by the invention has higher precision on the OTB2013 and OTB2015 data sets.
Compared with the prior art, the invention has the following remarkable advantages:
firstly, when a tracked target is subjected to complex environments such as shielding, deformation, background interference and the like, the target can still be stably tracked;
secondly, the target drift problem in the tracking process is effectively solved, and a more stable tracking result is provided for a user.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an algorithm architecture of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a workflow of a channel attention mechanism in a space-time attention mechanism-based target tracking method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a workflow of a spatial attention mechanism in a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4a is a schematic diagram of a first tracking result of an exemplary target of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4b is a diagram illustrating a second tracking result of an exemplary target of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4c is a schematic diagram of a third tracking result of an exemplary target of the target tracking method based on the spatiotemporal attention mechanism according to the embodiment of the present invention;
FIG. 4d is a diagram illustrating a fourth tracking result of an exemplary target of a target tracking method based on a spatiotemporal attention mechanism according to an embodiment of the present invention;
FIG. 4e is a diagram illustrating a fifth tracking result of an exemplary target of the target tracking method based on the spatiotemporal attention mechanism according to an embodiment of the present invention;
fig. 4f is a schematic diagram of a sixth tracking result of an exemplary target of the target tracking method based on the spatiotemporal attention mechanism according to the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1 to fig. 3, the embodiment of the invention discloses a target tracking method based on a space-time attention mechanism, which is applied to a long-time target tracking scene, and the space attention mechanism can effectively capture global information and can better track a target. The method comprises the following steps:
step 1, constructing a network model for acquiring a template image of a target to be tracked and an image to be tracked;
step 2, constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model;
step 3, constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model;
step 4, training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function;
and 5, tracking the target in the video by using the network model obtained by training to obtain a target tracking result.
The embodiment provides a time-based clockIn the target tracking method of the empty attention machine system, the network model comprises a template branch and a search branch; the template branch is used for obtaining a template image as a first frame of a target to be tracked, and the template image is obtained by initialization, namely, the image of the first frame is initialized to the template image; the searching branch is used for receiving a searching image in the tracking process, and the searching image is a current frame of the target to be tracked in the tracking video; recording a template image of a first frame received by the template branch as z, recording a search image of a current frame received by the search branch as x, and recording a feature extraction network as x
Figure BDA0003147396680000061
As shown in fig. 2, in the target tracking method based on the spatiotemporal attention mechanism provided in this embodiment, the step 2 includes:
step 2-1, performing feature compression on image features through a Global Average Pooling layer (Global Average Pooling) in the network model to obtain a feature vector with the size of 1 × 1 × C, wherein C is the number of channels, and in the embodiment, the image features refer to features obtained by performing feature extraction on a template image on a template branch;
2-2, respectively performing dimensionality reduction and dimensionality enhancement on the feature vector subjected to feature compression through two fully-connected layers in the network model and corresponding activation functions, and outputting the feature vector with the size of 1 multiplied by C;
step 2-3, generating channel weight alpha corresponding to each channel by using a Sigmoid function;
step 2-4, inputting image characteristics of the channel attention mechanism model
Figure BDA0003147396680000062
And obtaining a new channel weight corresponding to each channel according to the product of each channel and the channel weight alpha corresponding to the channel. In this embodiment, the favorable channel feature is selected according to the obtained new channel weight, and the favorable channel feature is used to perform template matching with the image to be tracked, thereby ensuring that the image to be tracked is accurately trackedAnd (4) determining the target position.
As shown in fig. 3, in the target tracking method based on the spatiotemporal attention mechanism provided in this embodiment, the step 3 includes:
step 3-1, sending the template image of the first frame into a space attention mechanism network to obtain the weight of each pixel point of a feature map, wherein the feature map is an image feature;
step 3-2, carrying out template matching on the weighted first frame image characteristic and the image to be tracked, namely, inputting the weighted first frame image into the spatial attention mechanism network, inputting each frame of image to be tracked into a network model, and inputting the weight of each pixel point in the characteristic graph and the input image characteristic
Figure BDA0003147396680000071
Multiplying to obtain a response graph of the target, and performing template matching;
and 3-3, taking the position corresponding to the response graph with the highest score as the position of the target to be tracked.
In the target tracking method based on the spatio-temporal attention mechanism provided in this embodiment, the step 4 includes:
step 4-1, a space-time attention module obtained through the channel attention and the space attention of the steps 2-1 to 3-3 receives a pair of template images z and a search image x as input in an off-line training process based on a space-time attention mechanism target tracking algorithm;
step 4-2, respectively sending the template image z to a channel attention mechanism model and a space attention mechanism model for feature selection, wherein the channel attention mechanism model generates a channel weight alpha according to input image features, and the space attention mechanism network generates a weight beta according to the input image features;
step 4-3, obtaining a weighted feature map h (z) according to the following formula:
Figure BDA0003147396680000072
step 4-4, according to the following formula, the network model uses the weighted feature graph h (z) as a convolution kernel to send the search image x to a feature extraction network
Figure BDA0003147396680000073
Feature map obtained by feature extraction
Figure BDA0003147396680000074
Performing a sliding convolution operation:
Figure BDA0003147396680000075
wherein f (z, x) is a final response graph for performing feature fusion cross-correlation operation on the template image z and the search image x; in this embodiment, the final response graph is a graph obtained by performing cross-correlation on the template image branch and the search image branch through the attention mechanism module.
And 4-5, obtaining a final network model by continuously optimizing the loss function according to the following formula:
l(y,v)=log(1+exp(-yv)) (3)
where l (y, v) is the loss function, y is ground-truth, and v is the predictor.
In the target tracking method based on the spatiotemporal attention mechanism provided in this embodiment, the step 5 includes:
step 5-1, sending the first frame template image into a feature extraction network and an attention mechanism network for feature extraction;
step 5-2, performing similarity calculation on the feature graph obtained by each subsequent frame of search image through the feature extraction network and the template image by using convolution operation, namely adopting
Figure BDA0003147396680000081
And a Correlation calculation mode, wherein the similarity between the template image and the search image is calculated and obtained according to the following formula:
sim(A,B)=AB/||A||||B||;
and 5-3, obtaining a response graph, and determining the final position of the target to be tracked according to the position corresponding to the response graph with the highest score, namely obtaining the target tracking result. In this embodiment, the higher the similarity score obtained by calculation is, the target that needs to be tracked is, and the lower the similarity score is, the target is lost.
In the prior art, the performance of the target is reduced when the target is challenged by deformation, shielding and the like, and the risk of target loss is increased. The invention provides a target tracking method based on a space-time attention mechanism, which is suitable for stably tracking a target under complex environments such as shielding, illumination change, background interference and the like. The method is characterized in that: firstly, a twin network architecture is adopted to improve the discrimination capability of the features; then, improved channel attention and spatial attention mechanisms were introduced, with different weights applied to features at different channels and spatial locations, focusing on features at spatial locations and channel locations that are beneficial for target tracking. In addition, an efficient online target template updating mechanism is provided, and the image features of the first frame and the image features with higher confidence in the subsequent tracking image frame are fused for reducing the risk of target drift. And repeating the steps until one section of video tracking is completed. Finally, the proposed tracking method was tested on OTB2013 and OTB2015 datasets. The experimental results show that: compared with the performance of the current mainstream tracking algorithm, the method has the advantage that the performance is improved by 7.6%. As shown in fig. 4a to 4f, the target tracking method based on the spatio-temporal attention mechanism provided by the present embodiment visualizes part of the challenging scenes with fast motion, occlusion, scale change, and the like.
Compared with the prior art, the invention has the following remarkable advantages:
firstly, when a tracked target is subjected to complex environments such as shielding, deformation, background interference and the like, the target can still be stably tracked;
secondly, the target drift problem in the tracking process is effectively solved, and a more stable tracking result is provided for a user.
In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the target tracking method based on the spatiotemporal attention mechanism provided in the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments in this specification may be referred to each other. The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims (6)

1. A target tracking method based on a space-time attention mechanism is characterized by comprising the following steps:
step 1, constructing a network model for acquiring a template image of a target to be tracked and an image to be tracked;
step 2, constructing a channel attention mechanism model, and fusing the channel attention mechanism model into a network model;
step 3, constructing a space attention mechanism model, and fusing the space attention mechanism model into a network model;
step 4, training a network model fused into the channel attention mechanism model and the space attention mechanism model according to a loss function;
and 5, tracking the target in the video by using the network model obtained by training to obtain a target tracking result.
2. The space-time attention mechanism-based target tracking method according to claim 1, wherein the network model comprises a template branch and a search branch; the template branch is used for obtaining a template image as a first frame of a target to be tracked, and the template image is obtained through initialization; the searching branch is used for receiving a searching image in the tracking process, and the searching image is a current frame of the target to be tracked in the tracking video; recording a template image of a first frame received by the template branch as z, recording a search image of a current frame received by the search branch as x, and recording a feature extraction network as x
Figure FDA0003147396670000011
3. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 2 comprises:
step 2-1, performing feature compression on image features through a global average pooling layer in the network model to obtain a feature vector with the size of 1 × 1 × C, wherein C is the number of channels;
2-2, respectively performing dimensionality reduction and dimensionality enhancement on the feature vector subjected to feature compression through two fully-connected layers in the network model and corresponding activation functions, and outputting the feature vector with the size of 1 multiplied by C;
step 2-3, generating channel weight alpha corresponding to each channel by using a Sigmoid function;
step 2-4, inputting image characteristics of the channel attention mechanism model
Figure FDA0003147396670000012
And obtaining a new channel weight corresponding to each channel according to the product of each channel and the channel weight alpha corresponding to the channel.
4. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 3 comprises:
step 3-1, sending the template image of the first frame into a space attention mechanism network to obtain the weight of each pixel point of a feature map, wherein the feature map is an image feature;
step 3-2, carrying out template matching on the weighted first frame image characteristic and the image to be tracked, namely, inputting the weighted first frame image into the spatial attention mechanism network, inputting each frame of image to be tracked into a network model, and inputting the weight of each pixel point in the characteristic graph and the input image characteristic
Figure FDA0003147396670000025
Multiplying to obtain a response graph of the target, and performing template matching;
and 3-3, taking the position corresponding to the response graph with the highest score as the position of the target to be tracked.
5. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 4 comprises:
step 4-1, a space-time attention module obtained through the channel attention and the space attention of the steps 2-1 to 3-3 receives a pair of template images z and a search image x as input in an off-line training process based on a space-time attention mechanism target tracking algorithm;
step 4-2, respectively sending the template image z to a channel attention mechanism model and a space attention mechanism model for feature selection, wherein the channel attention mechanism model generates a channel weight alpha according to input image features, and the space attention mechanism network generates a weight beta according to the input image features;
step 4-3, obtaining a weighted feature map h (z) according to the following formula:
Figure FDA0003147396670000021
step 4-4, according to the following formula, the network model uses the weighted feature graph h (z) as a convolution kernel to send the search image x to a feature extraction network
Figure FDA0003147396670000022
Feature map obtained by feature extraction
Figure FDA0003147396670000023
Performing a sliding convolution operation:
Figure FDA0003147396670000024
wherein f (z, x) is a final response graph for performing feature fusion cross-correlation operation on the template image z and the search image x;
and 4-5, obtaining a final network model by continuously optimizing the loss function according to the following formula:
l(y,v)=log(1+exp(-yv)) (3)
where l (y, v) is the loss function, y is ground-truth, and v is the predictor.
6. The space-time attention mechanism-based target tracking method according to claim 1, wherein the step 5 comprises:
step 5-1, sending the first frame template image into a feature extraction network and an attention mechanism network for feature extraction;
step 5-2, performing similarity calculation on the feature graph obtained by each subsequent frame of search image through the feature extraction network and the template image by using convolution operation, namely adopting
Figure FDA0003147396670000031
A Correlation calculation mode, which is to calculate and obtain the template image and search according to the following formulaSimilarity between chordal images:
sim(A,B)=AB/||A||||B||;
and 5-3, obtaining a response graph, and determining the final position of the target to be tracked according to the position corresponding to the response graph with the highest score, namely obtaining the target tracking result.
CN202110755862.5A 2021-07-05 2021-07-05 Target tracking method based on space-time attention mechanism Pending CN113393496A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110755862.5A CN113393496A (en) 2021-07-05 2021-07-05 Target tracking method based on space-time attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110755862.5A CN113393496A (en) 2021-07-05 2021-07-05 Target tracking method based on space-time attention mechanism

Publications (1)

Publication Number Publication Date
CN113393496A true CN113393496A (en) 2021-09-14

Family

ID=77625185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110755862.5A Pending CN113393496A (en) 2021-07-05 2021-07-05 Target tracking method based on space-time attention mechanism

Country Status (1)

Country Link
CN (1) CN113393496A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793362A (en) * 2021-09-22 2021-12-14 清华大学 Pedestrian track extraction method and device based on multi-lens video
CN114782488A (en) * 2022-04-01 2022-07-22 燕山大学 Underwater target tracking method based on channel perception
CN117522925A (en) * 2024-01-05 2024-02-06 成都合能创越软件有限公司 Method and system for judging object motion state in mobile camera under attention mechanism

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978921A (en) * 2019-04-01 2019-07-05 南京信息工程大学 A kind of real-time video target tracking algorithm based on multilayer attention mechanism
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110751018A (en) * 2019-09-03 2020-02-04 上海交通大学 Group pedestrian re-identification method based on mixed attention mechanism
CN111462175A (en) * 2020-03-11 2020-07-28 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
CN111598117A (en) * 2019-02-21 2020-08-28 成都通甲优博科技有限责任公司 Image recognition method and device
CN112164094A (en) * 2020-09-22 2021-01-01 江南大学 Fast video target tracking method based on twin network
CN112233147A (en) * 2020-12-21 2021-01-15 江苏移动信息系统集成有限公司 Video moving target tracking method and device based on two-way twin network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598117A (en) * 2019-02-21 2020-08-28 成都通甲优博科技有限责任公司 Image recognition method and device
CN109978921A (en) * 2019-04-01 2019-07-05 南京信息工程大学 A kind of real-time video target tracking algorithm based on multilayer attention mechanism
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110751018A (en) * 2019-09-03 2020-02-04 上海交通大学 Group pedestrian re-identification method based on mixed attention mechanism
CN111462175A (en) * 2020-03-11 2020-07-28 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
CN112164094A (en) * 2020-09-22 2021-01-01 江南大学 Fast video target tracking method based on twin network
CN112233147A (en) * 2020-12-21 2021-01-15 江苏移动信息系统集成有限公司 Video moving target tracking method and device based on two-way twin network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANLU ZHANG ET AL.: "SIAMESE NETWORK COMBINED WITH ATTENTION MECHANISM FOR OBJECT TRACKING", 《INTERNATIONAL ARCHIVES OF THE PHOTOGRAMMETRY, REMOTE SENSING & SPATIAL INFORMATION SCIENCES》 *
刘宏志: "《推荐系统》", 31 May 2020, 机械工业出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793362A (en) * 2021-09-22 2021-12-14 清华大学 Pedestrian track extraction method and device based on multi-lens video
CN113793362B (en) * 2021-09-22 2024-05-07 清华大学 Pedestrian track extraction method and device based on multi-lens video
CN114782488A (en) * 2022-04-01 2022-07-22 燕山大学 Underwater target tracking method based on channel perception
CN117522925A (en) * 2024-01-05 2024-02-06 成都合能创越软件有限公司 Method and system for judging object motion state in mobile camera under attention mechanism
CN117522925B (en) * 2024-01-05 2024-04-16 成都合能创越软件有限公司 Method and system for judging object motion state in mobile camera under attention mechanism

Similar Documents

Publication Publication Date Title
Girdhar et al. Detect-and-track: Efficient pose estimation in videos
Leng et al. Realize your surroundings: Exploiting context information for small object detection
CN113393496A (en) Target tracking method based on space-time attention mechanism
Kugarajeevan et al. Transformers in single object tracking: an experimental survey
Lai et al. Real-time micro-expression recognition based on ResNet and atrous convolutions
Huang et al. End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking
CN112446900B (en) Twin neural network target tracking method and system
Saribas et al. TRAT: Tracking by attention using spatio-temporal features
CN116235209A (en) Sparse optical flow estimation
Wei et al. Novel video prediction for large-scale scene using optical flow
He et al. Towards robust visual tracking for unmanned aerial vehicle with tri-attentional correlation filters
CN114596432B (en) Visual tracking method and system based on foreground region corresponding template features
Li et al. Robust visual tracking with channel attention and focal loss
CN118212463A (en) Target tracking method based on fractional order hybrid network
Zhang et al. Dual attentional Siamese network for visual tracking
Zhao et al. Context-aware and part alignment for visible-infrared person re-identification
Li et al. Video prediction for driving scenes with a memory differential motion network model
Feng et al. Exploring the potential of Siamese network for RGBT object tracking
Sun et al. Deblurring transformer tracking with conditional cross-attention
Wang et al. EMAT: Efficient feature fusion network for visual tracking via optimized multi-head attention
Gong et al. Research on an improved KCF target tracking algorithm based on CNN feature extraction
CN110942463A (en) Video target segmentation method based on generation countermeasure network
Ge et al. A visual tracking algorithm combining parallel network and dual attention-aware mechanism
Gong et al. Visual tracking with pyramidal feature fusion and transformer based model predictor
Liu et al. Adversarial erasing attention for person re-identification in camera networks under complex environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210914