CN113658218A - Dual-template dense twin network tracking method and device and storage medium - Google Patents

Dual-template dense twin network tracking method and device and storage medium Download PDF

Info

Publication number
CN113658218A
CN113658218A CN202110811344.0A CN202110811344A CN113658218A CN 113658218 A CN113658218 A CN 113658218A CN 202110811344 A CN202110811344 A CN 202110811344A CN 113658218 A CN113658218 A CN 113658218A
Authority
CN
China
Prior art keywords
template
new
image
dense
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110811344.0A
Other languages
Chinese (zh)
Other versions
CN113658218B (en
Inventor
胡栋
张虎
张庆军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110811344.0A priority Critical patent/CN113658218B/en
Publication of CN113658218A publication Critical patent/CN113658218A/en
Application granted granted Critical
Publication of CN113658218B publication Critical patent/CN113658218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of video analysis and discloses a dual-template dense twin network target tracking method based on global context, which is characterized in that based on a twin network framework, an AlexNet network in a twin network is replaced by a deeper dense convolution network, characteristics among layers are spliced on a channel to realize characteristic reuse, and adding a global attention module behind the network to capture context dependent information, in addition, designing a new template updating scheme, extracting a picture frame with better performance in a historical tracking result, processing the picture frame to serve as a new target template, performing feature fusion on the original target template and the new template by adopting a space-time attention mechanism, and finally obtaining new template features.

Description

Dual-template dense twin network tracking method and device and storage medium
Technical Field
The invention relates to a double-template dense twin network tracking method, a double-template dense twin network tracking device and a storage medium, and belongs to the technical field of video analysis.
Background
The visual target tracking is an important component of computer vision, and is widely applied to the aspects of intelligent video monitoring, intelligent traffic, unmanned driving, military reconnaissance and the like. The target tracking technology has important research significance in both civil field and military safety field.
In 2016, Luca Bertinetto and the like begin to utilize twin networks for target tracking, and a SimFC algorithm is proposed, and target tracking formally enters the twin network era. The SimFC adopts a twin network structure, one branch circuit extracts target template information, the other branch circuit extracts search area characteristics, then the two parts of characteristics are subjected to related operation, and the target position is judged according to the maximum value of a response graph. Guo et al propose a dynamic twin network DSiam algorithm based on SimFC, which can learn target appearance changes on line, suppress irrelevant backgrounds, and improve the capability of target on-line updating. He and the like are improved on a feature extraction network, an SA-Sim algorithm is provided, and semantic features and appearance features of the image are comprehensively utilized.
However, in an actual life scene, due to complexity and uncertainty of an environment, when a target is subjected to background clutter and target distortion in video target tracking, a problem that tracking performance of a tracker is reduced is caused.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a dual-template dense twin network tracking method, a device and a storage medium, and solves the technical problem that the tracking performance of a tracker is reduced when a target is subjected to background clutter and target distortion in video target tracking.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the invention provides a dual-template dense twin network tracking method, which comprises the following steps:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain an original template characteristic diagram, a new template characteristic diagram and a search image characteristic diagram;
respectively obtaining an original template weight map and a new template weight map by a pre-added space-time attention module from the original template feature map and the new template feature map;
carrying out weighted fusion processing on the original template weight graph and the template image feature graph, and the new template weight graph and the new template feature graph to obtain a fusion template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
Further, the twin network model includes: template branch, new template branch and search branch, the twin network adopts the dense convolution network;
a global attention module is added after the template branch and the new template branch,
a common space-time attention module is added behind the template branch and the new template branch;
the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4; the template branch and the new template branch are completely consistent in structure.
Further, the preprocessing the original template image, the new template image and the search image includes:
and giving an initial target frame in the original template image, the new template image and the search image to obtain an initial target center and a target scale, and calculating according to the initial target frame, the initial target center and the target scale to obtain the sizes of the adjusted original template image, the adjusted new template image and the adjusted search image.
Further, the obtaining the original template weight map and the new template weight map includes:
and calculating the similarity between the new template characteristic diagram and the original template characteristic diagram by using cosine similarity measurement, and executing softmax operation on each position of the new template characteristic diagram and the original template characteristic diagram to obtain an original template weight diagram and a new template weight diagram.
In a second aspect, the invention provides a dual-template dense twin network tracking method, which comprises the following steps:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain a first original template feature map, a first new template feature map and a search image feature map;
inputting the first original template feature map and the first new template feature map into a global attention module added in advance, and acquiring a second original template feature map and a second new template feature map;
inputting a second original template feature map and a second new template feature map to a pre-added space-time attention module to respectively obtain an original template weight map and a new template weight map;
carrying out weighted fusion processing on the original template weight graph and the second template image feature graph as well as the new template weight graph and the second new template feature graph to obtain a fused template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
Further, the twin network model includes: template branch, new template branch and search branch, the twin network adopts dense convolution network,
a global attention module is added after the template branch and the new template branch,
a common space-time attention module is added behind the template branch and the new template branch;
the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4; the template branch and the new template branch are completely consistent in structure.
In a third aspect, the invention provides a dual-template dense twin network tracking device, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of the above.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention designs a new template and an original template for feature complementation, realizes the dynamic update of the target template, inputs the original template and the new template into a twin network after selection to respectively obtain respective feature maps, then obtains respective weight maps by the two feature maps through a space-time attention module, and obtains a fused template feature map by weighting and fusing the weight maps and the corresponding feature maps.
2. The method is based on the twin network framework, and can splice the characteristics between layers on the channel to realize characteristic reuse, thereby improving the generalization capability of the characteristics.
3. According to the invention, the global attention module is added after the twin network template branches to aggregate the global context information of the target, so that the network can output a feature map with richer semantic information, and the robustness of the network to the appearance change of the target is further enhanced.
Drawings
FIG. 1 is a flowchart of a method for tracking a dual-template dense twin network according to an embodiment of the present invention;
FIG. 2 is a flow chart of an algorithm provided by an embodiment of the present invention;
FIG. 3 is a diagram of a network architecture provided by an embodiment of the present invention;
FIG. 4 is a diagram of a global attention module provided by an embodiment of the present invention;
FIG. 5 is a block diagram of a spatiotemporal attention module provided by an embodiment of the present invention;
FIG. 6 is a graph of accuracy provided by an embodiment of the present invention;
FIG. 7 is a graph of the success rate provided by an embodiment of the present invention;
FIG. 8 is a graph of accuracy for background clutter according to an embodiment of the present invention;
FIG. 9 is a graph illustrating the success rate of background clutter according to an embodiment of the present invention;
fig. 10 is a diagram of a partial trace result provided by an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
As shown in fig. 1, the dual-template dense twin network tracking method provided in this embodiment includes:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain an original template characteristic diagram, a new template characteristic diagram and a search image characteristic diagram;
respectively obtaining an original template weight map and a new template weight map by a pre-added space-time attention module from the original template feature map and the new template feature map;
carrying out weighted fusion processing on the original template weight graph and the template image feature graph, and the new template weight graph and the new template feature graph to obtain a fusion template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
Specifically, the twin network model includes: template branch, new template branch and search branch, the twin network adopts the dense convolution network;
a global attention module is added after the template branch and the new template branch,
a common space-time attention module is added behind the template branch and the new template branch;
the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4; the template branch and the new template branch are completely consistent in structure.
Specifically, the preprocessing the original template image, the new template image and the search image includes:
and giving an initial target frame in the original template image, the new template image and the search image to obtain an initial target center and a target scale, and calculating according to the initial target frame, the initial target center and the target scale to obtain the sizes of the adjusted original template image, the adjusted new template image and the adjusted search image.
Specifically, the obtaining of the original template weight map and the new template weight map includes:
and calculating the similarity between the new template characteristic diagram and the original template characteristic diagram by using cosine similarity measurement, and executing softmax operation on each position of the new template characteristic diagram and the original template characteristic diagram to obtain an original template weight diagram and a new template weight diagram.
In the embodiment, a new template and an original template are designed to perform feature complementation, so that dynamic update of a target template is realized, the original template and the new template are input into a twin network after selection to respectively obtain respective feature maps, then the two feature maps are used for obtaining respective weight maps through a space-time attention module, the weight maps and the corresponding feature maps are subjected to weighted fusion to obtain a fused template feature map, and the algorithm for updating the target template in real time can also play a good target tracking effect when a target is distorted.
Example 2
In order to make the objects, implementation schemes, and advantages of the present invention clearer, the following takes a sequence Singer1 in an open test set OTB Benchmark as an example, and the specific implementation of the present invention is further described in detail with reference to the accompanying drawings, which are specifically set forth as follows:
the embodiment provides a double-template dense twin network tracking method. The network of the method has three inputs, an original template image, here a first frame image, a new template image and a search image. And obtaining a feature map with global context information by the template image and the new template image through the same dense network and the global attention module. And then the two feature maps are subjected to weight calibration through a space-time attention module to obtain respective weights, and then the weights are summed with the corresponding feature maps to obtain a fused template feature map. And extracting the self characteristic diagram of the search image of the search branch through a same dense network, then performing cross correlation on the fused template characteristic diagram and the characteristic diagram of the search image to obtain a final response diagram, and finally determining the final position of the tracking target according to the response diagram.
The method comprises the following steps:
step 1, adjusting and training a twin network structure:
structure adjustment: the AlexNet network in the original twin network is replaced by a dense convolutional network, the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4, a new template branch completely consistent with a template branch structure is added, a global attention module is added after the template branch and the new template branch, as shown in figure 4, a common space-time attention module is added after the template branch and the new template branch, as shown in figure 5, the improved network structure is as shown in figure 3, the improved network model is trained by an ImageNet data set, and improved network parameters are obtained.
Training process: optimizing a logistic regression target by using a stochastic gradient descent method, training 50 epochs, wherein each epoch comprises 50000 sample pairs, the Batchsize is set to 8, the parameters in the neural network are initialized by adopting Gaussian distribution, the stochastic gradient descent training network with the momentum of 0.9 is used, and the learning rate is exponentially attenuated from 10-3Is attenuated to 10-8
Step 2, in the first frame template image of Singer1, an initial target frame is given as (48,98,40,142), wherein the initial target center ispos (48,98), the target dimension target is 40 × 142, the template image input by the network, the new template image and the search image are read, and according to a given initial target frame (μ, v, w, h), the position of the target is pos (μ, v), and the target dimension is target (w, h). A standard template image may then be generated by the following formula:
Figure BDA0003168347380000081
wherein A is 1272S is a scale factor, resize is performed on the picture expansion to generate a 127 × 127 template image, a new template image having the same size and a search image having a size of 255 × 255 are generated in the same manner,
and 3, respectively inputting the preprocessed template image, the new template image and the search image into a network, and respectively extracting respective characteristic graphs through a dense convolution network.
And 4, extracting the feature map with the global context information from the template feature map and the new template feature map through a global attention module. Firstly, carrying out context modeling on a feature map, aggregating features of all positions to form a global context feature, mainly utilizing 1 × 1 convolution and a Softmax function, then carrying out feature conversion to capture interdependency among channels, mainly using 1 × 1 convolution, layer normalization and a ReLu function, and finally combining the global context feature into the features of all the positions, wherein the process can be represented by the following formula:
Figure BDA0003168347380000082
wherein z isiRepresenting the output characteristic map at position i, xiRepresents the input feature map at location i, i represents the location index,
Figure BDA0003168347380000083
weight representing the global attention pool, δ (·) Wv2ReLU(LN(Wv1(.))) represents a feature transformation, Wv2、Wv1And WkAre all 1 × 1 convolutions, m and j enumerate all positions, and LN (·) represents layer normalization.
And 5, performing feature fusion on the two template feature graphs with the global context information through a space-time attention mechanism to obtain fusion template features. Spatial attention is first used to derive the weight magnitudes at different spatial locations. Specifically, first, features are mapped using a bottleneck subnetwork
Figure BDA0003168347380000091
Generating embedded features from mapping from one space to another
Figure BDA0003168347380000092
Then, feature vectors of the original template and the new template at a spatial position p are taken out, cosine similarity between the taken-out feature vectors and the feature vectors taken out by the original template is respectively solved, and softmax operation is executed, so that the feature weight of each template at the position p is obtained. The same operation is performed for each position p of the feature map, resulting in a spatial weight map.
Calculating similarity w between new template features and original template features using cosine similarity measurei(p) represented by the following formula:
Figure BDA0003168347380000093
where p represents the spatial position on the feature map, z represents the original template, i represents the new template or the original template,
Figure BDA0003168347380000094
representing the spatially generated embedded features of the new or original template at position p,
Figure BDA0003168347380000095
representing the spatially generated embedded features of the original template at position p.
After the weight map of the spatial attention module output is obtained, channel attention is added after the spatial attention in order to further enhance the network performance.
The spatial attention module outputs two 9 × 9 × 128 weight maps, which are the original target template weight map and the new template weight map, respectively. For each weight map, first global average pooling along the channel results in a 1 x 128 weight vector, this vector, to some extent, has a global receptive field, characterizing the global distribution of responses in the channel dimension, while also shielding the effects of spatial information distribution, then sequentially passing through two full-connection layers of 1 × 1 × 128 and 1 × 1 × 128, training out the weight value of the channel by using the correlation between the channels, the activation function behind the first full-connection layer is a ReLU function, the activation function behind the second full-connection layer is a Sigmoid function, a weight vector of 1 x 128 is obtained behind the full-connection layer 2, a weight value is generated for each channel through the Sigmoid function, and each channel of the weight graph output by the spatial attention module is multiplied by the weight of the corresponding channel to obtain a final weight graph. And finally, multiplying the template features extracted by the feature extraction network by the final weight graph, and then summing to obtain the final fusion template features.
And 6, performing related operation on the fusion template features and the previously obtained search image features to obtain a final response image, and further obtaining a target position.
And 7, calculating the average peak correlation energy value of the response image, updating the new template image if the average peak correlation energy value is larger than the average peak correlation energy value of the previous T frames, otherwise, not updating, calculating the average peak correlation energy values of the response images from the T frame to the T-30 th frame, and selecting the frame with the maximum average peak correlation energy value from the 30 frames as a new target template.
The average peak correlation energy value of the calculated score map oscillation may be represented by the following equation:
Figure BDA0003168347380000101
wherein s ismaxAnd sminIs the maximum and minimum values in the score plot s, s [ v ]]Represents each oneThe predicted value, | M | is the number of elements in the score map.
Selecting one from the previous predicted results as a new template according to the average peak correlation energy value:
Figure BDA0003168347380000102
wherein, APCEnRepresenting the average peak correlation energy value of the score map in the nth frame.
Step 8, judging whether the current frame is the last frame, if so, ending, otherwise, returning to the step 2, reading a new frame of search image, regenerating a standard search image and continuing target tracking; the algorithm flow chart is shown in fig. 2.
To verify the effect of the method of the invention, the following verification experiments were performed:
the invention adopts 100 recognized and marked video sequences on an OTB (Online Tracking benchmark) platform to test, and the video sequences simulate various conditions in a real scene, including illumination change, scale transformation, partial or serious shielding, deformation and the like. Table 1 shows the hardware and software simulation environment for the experiments of the present invention.
Table 1 hardware and software simulation environment for experiments
CPU Inter Xeon W-2133
GPU Nvidia GeForce RTX 2080Ti
Memory device 32GB
Operating system Windows 10
Development environment Pytorch
Programming language Python3.6
On the OTB test platform, there are two main evaluation criteria: precision (Precision Plot) and Success rate (Success Plot).
In the target tracking process, the accuracy can reflect whether the tracker can accurately track the target in the subsequent frame. The central position of the target obtained by the algorithm is called a predicted value, the position of the target marked manually is called a real value, the average Euclidean distance between the predicted value and the real value is calculated by accuracy, if the distance is smaller than a given threshold value, the predicted value is closer to the real value, and the accuracy curve represents the proportion of the frame number of the distance between the predicted value and the real value within the given threshold value to the total frame number when the tracking result is more accurate. The accuracy at different thresholds also constitutes the final accuracy map.
The success rate is measured by the overlapping accuracy between the candidate target frame obtained by target tracking and the artificially marked real area. If B represents the predicted target bounding box and B represents the manually marked real bounding box, the overlapping scores are as follows:
Figure BDA0003168347380000121
wherein B ≈ B*Denotes the area of the overlap region between regions B and B ^ B*The area of the union zone between zones B and B is indicated. Whether the target object is tracked by the algorithm in a picture frame can be judged by testing whether the overlapping score is larger than a certain threshold value, and when the threshold value is between 0 and 1When the success rate changes, and therefore a success rate graph is obtained.
As shown in fig. 6, the accuracy of the present invention reaches 84.8% at a threshold of 20 pixels. Compared with the algorithms of SRDCF, Stacke, CFNet and SimFC, the accuracy of the invention is improved by 5.9%, 6.5%, 7.0% and 8.3%, respectively. As can be seen in fig. 7, the success rate of the present invention reaches 63.2%, which is improved by 3.4%, 4.5%, 5.0% and 5.4% compared to the SRDCF, CFNet, SiamFC and stack algorithms, respectively. The remaining two graphs show the calculated accuracy and success rate according to the sequence with the target deformation challenge, and as can be seen from fig. 8, the accuracy of the present invention is improved by 2.3%, 4.1%, 6.1% and 10.0% respectively compared with several algorithms, namely, stack, SRDCF, CFNet and SiamFC, when the target deformation challenge exists. As can be seen from fig. 9, the success rate of the present invention is improved by 0.8%, 1.5%, 3.3% and 5.7% compared to several algorithms, namely, stack, SRDCF, CFNet and SiamFC, respectively, when there is a background clutter challenge. Based on the above data, it can be seen that the present invention achieves excellent results.
FIG. 10 is a partial tracking effect diagram of the present invention, in which three selected sequences have challenges such as background clutter, target distortion, and scale variation. For Bird1 sequence, all algorithms can track the target in the 10 th frame, from the 208 th frame, due to the change of the target form, other comparison algorithms except the algorithm of the invention gradually lose the target or track the wrong target, which shows that the tracking effect of the invention is the best, for Jump sequence, CFNet already loses the target in the 10 th frame, SRDCF loses the target in the 29 th frame, 62 frames of Staple also lose the target, and 116 frames of Staple lose the target except the invention. For the Skiing sequence, the target morphology changes more obviously, the SimFC and the Staple lose targets by the 25 th frame, the CFNet also loses targets at 42 frames, and only the SRDCF and the invention can still accurately track the targets until 60 frames.
The invention provides a dual-template dense twin network tracking algorithm based on global context, which is characterized in that an AlexNet network is replaced by a dense convolution network with stronger feature extraction capability, the generalization capability of features is improved, in order to further improve the expressive force of target appearance features, a global attention module is added after the twin network template is branched, the global attention module can aggregate information of the target global context, so that the robustness of depth features on target appearance changes is improved, a new template is designed to supplement the original template, the dynamic updating of the template is realized, and the algorithm for updating the target template in real time can also play a good target tracking effect when a target is distorted.
Example 3
The embodiment provides a dual-template dense twin network tracking method, which comprises the following steps:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain a first original template feature map, a first new template feature map and a search image feature map;
inputting the first original template feature map and the first new template feature map into a global attention module added in advance, and acquiring a second original template feature map and a second new template feature map;
inputting a second original template feature map and a second new template feature map to a pre-added space-time attention module to respectively obtain an original template weight map and a new template weight map;
carrying out weighted fusion processing on the original template weight graph and the second template image feature graph as well as the new template weight graph and the second new template feature graph to obtain a fused template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
Further, the twin network model includes: template branch, new template branch and search branch, the twin network adopts dense convolution network,
a global attention module is added after the template branch and the new template branch,
a common space-time attention module is added behind the template branch and the new template branch;
the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4; the template branch and the new template branch are completely consistent in structure.
Example 4
The invention provides a double-template dense twin network tracking device, which comprises a processor and a storage medium, wherein the processor is used for processing a twin network;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain an original template characteristic diagram, a new template characteristic diagram and a search image characteristic diagram;
respectively obtaining an original template weight map and a new template weight map by a pre-added space-time attention module from the original template feature map and the new template feature map;
carrying out weighted fusion processing on the original template weight graph and the template image feature graph, and the new template weight graph and the new template feature graph to obtain a fusion template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
Example 5
The invention provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of any of the methods of:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain an original template characteristic diagram, a new template characteristic diagram and a search image characteristic diagram;
respectively obtaining an original template weight map and a new template weight map by a pre-added space-time attention module from the original template feature map and the new template feature map;
carrying out weighted fusion processing on the original template weight graph and the template image feature graph, and the new template weight graph and the new template feature graph to obtain a fusion template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A double-template dense twin network tracking method is characterized by comprising the following steps:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain an original template characteristic diagram, a new template characteristic diagram and a search image characteristic diagram;
respectively obtaining an original template weight map and a new template weight map by a pre-added space-time attention module from the original template feature map and the new template feature map;
carrying out weighted fusion processing on the original template weight graph and the template image feature graph, and the new template weight graph and the new template feature graph to obtain a fusion template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
2. The dual-template dense twin network tracking method of claim 1, wherein the twin network model comprises: template branch, new template branch and search branch, the twin network adopts the dense convolution network;
a global attention module is added after the template branch and the new template branch,
a common space-time attention module is added behind the template branch and the new template branch;
the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4; the template branch and the new template branch are completely consistent in structure.
3. The dual-template dense twin network tracking method of claim 1, wherein the preprocessing the original template image, the new template image, and the search image comprises:
and giving an initial target frame in the original template image, the new template image and the search image to obtain an initial target center and a target scale, and calculating according to the initial target frame, the initial target center and the target scale to obtain the sizes of the adjusted original template image, the adjusted new template image and the adjusted search image.
4. The dual-template dense twin network tracking method according to claim 1, wherein the obtaining of the original template weight map and the new template weight map comprises:
and calculating the similarity between the new template characteristic diagram and the original template characteristic diagram by using cosine similarity measurement, and executing softmax operation on each position of the new template characteristic diagram and the original template characteristic diagram to obtain an original template weight diagram and a new template weight diagram.
5. A double-template dense twin network tracking method is characterized by comprising the following steps:
acquiring and preprocessing an original template image, a new template image and a search image;
inputting the preprocessed original template image, the preprocessed new template image and the preprocessed search image into a pre-constructed and trained twin network model to obtain a first original template feature map, a first new template feature map and a search image feature map;
inputting the first original template feature map and the first new template feature map into a global attention module added in advance, and acquiring a second original template feature map and a second new template feature map;
inputting a second original template feature map and a second new template feature map to a pre-added space-time attention module to respectively obtain an original template weight map and a new template weight map;
carrying out weighted fusion processing on the original template weight graph and the second template image feature graph as well as the new template weight graph and the second new template feature graph to obtain a fused template feature graph;
and performing cross-correlation operation on the fusion template characteristic diagram and the search image characteristic diagram to obtain a response diagram, and judging the target position according to the maximum value of the response diagram.
6. The dual-template dense twin network tracking method of claim 5, wherein the twin network model comprises: template branch, new template branch and search branch, the twin network adopts dense convolution network,
a global attention module is added after the template branch and the new template branch,
a common space-time attention module is added behind the template branch and the new template branch;
the dense convolutional network comprises a convolutional layer, dense blocks 1, a transition layer, dense blocks 2, a transition layer, dense blocks 3 and dense blocks 4; the template branch and the new template branch are completely consistent in structure.
7. A double-template dense twin network tracking device is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 6.
8. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202110811344.0A 2021-07-19 2021-07-19 Dual-template intensive twin network tracking method, device and storage medium Active CN113658218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110811344.0A CN113658218B (en) 2021-07-19 2021-07-19 Dual-template intensive twin network tracking method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110811344.0A CN113658218B (en) 2021-07-19 2021-07-19 Dual-template intensive twin network tracking method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113658218A true CN113658218A (en) 2021-11-16
CN113658218B CN113658218B (en) 2023-10-13

Family

ID=78477667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110811344.0A Active CN113658218B (en) 2021-07-19 2021-07-19 Dual-template intensive twin network tracking method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113658218B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN111179314A (en) * 2019-12-30 2020-05-19 北京工业大学 Target tracking method based on residual dense twin network
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111462175A (en) * 2020-03-11 2020-07-28 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
US20200327679A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deeply and densely connected neural network
CN112164094A (en) * 2020-09-22 2021-01-01 江南大学 Fast video target tracking method based on twin network
CN112258554A (en) * 2020-10-07 2021-01-22 大连理工大学 Double-current hierarchical twin network target tracking method based on attention mechanism
CN112348849A (en) * 2020-10-27 2021-02-09 南京邮电大学 Twin network video target tracking method and device
CN112560695A (en) * 2020-12-17 2021-03-26 中国海洋大学 Underwater target tracking method, system, storage medium, equipment, terminal and application
CN112785624A (en) * 2021-01-18 2021-05-11 苏州科技大学 RGB-D characteristic target tracking method based on twin network
CN112837344A (en) * 2019-12-18 2021-05-25 沈阳理工大学 Target tracking method for generating twin network based on conditional confrontation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200327679A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deeply and densely connected neural network
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN112837344A (en) * 2019-12-18 2021-05-25 沈阳理工大学 Target tracking method for generating twin network based on conditional confrontation
CN111179314A (en) * 2019-12-30 2020-05-19 北京工业大学 Target tracking method based on residual dense twin network
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111462175A (en) * 2020-03-11 2020-07-28 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
CN112164094A (en) * 2020-09-22 2021-01-01 江南大学 Fast video target tracking method based on twin network
CN112258554A (en) * 2020-10-07 2021-01-22 大连理工大学 Double-current hierarchical twin network target tracking method based on attention mechanism
CN112348849A (en) * 2020-10-27 2021-02-09 南京邮电大学 Twin network video target tracking method and device
CN112560695A (en) * 2020-12-17 2021-03-26 中国海洋大学 Underwater target tracking method, system, storage medium, equipment, terminal and application
CN112785624A (en) * 2021-01-18 2021-05-11 苏州科技大学 RGB-D characteristic target tracking method based on twin network

Also Published As

Publication number Publication date
CN113658218B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN111401201B (en) Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111489403B (en) Method and device for generating virtual feature map by using GAN
CN110033473B (en) Moving target tracking method based on template matching and depth classification network
CN112258554B (en) Double-current hierarchical twin network target tracking method based on attention mechanism
CN107529650A (en) The structure and closed loop detection method of network model, related device and computer equipment
JP2022526513A (en) Video frame information labeling methods, appliances, equipment and computer programs
CN113688723A (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN112348849A (en) Twin network video target tracking method and device
CN108961308B (en) Residual error depth characteristic target tracking method for drift detection
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN111401192B (en) Model training method and related device based on artificial intelligence
CN109886356A (en) A kind of target tracking method based on three branch's neural networks
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN111914878A (en) Feature point tracking training and tracking method and device, electronic equipment and storage medium
CN111723660A (en) Detection method for long ground target detection network
CN112464912A (en) Robot-end face detection method based on YOLO-RGGNet
CN114842343A (en) ViT-based aerial image identification method
CN110310305A (en) A kind of method for tracking target and device based on BSSD detection and Kalman filtering
CN109165654A (en) The training method and object localization method and device of a kind of target location model
CN114723784A (en) Pedestrian motion trajectory prediction method based on domain adaptation technology
CN111144462A (en) Unknown individual identification method and device for radar signals
CN117576149A (en) Single-target tracking method based on attention mechanism
CN113658218B (en) Dual-template intensive twin network tracking method, device and storage medium
CN116052108A (en) Transformer-based traffic scene small sample target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant