CN112785626A - Twin network small target tracking method based on multi-scale feature fusion - Google Patents

Twin network small target tracking method based on multi-scale feature fusion Download PDF

Info

Publication number
CN112785626A
CN112785626A CN202110111717.3A CN202110111717A CN112785626A CN 112785626 A CN112785626 A CN 112785626A CN 202110111717 A CN202110111717 A CN 202110111717A CN 112785626 A CN112785626 A CN 112785626A
Authority
CN
China
Prior art keywords
layer
size
convolution
image
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110111717.3A
Other languages
Chinese (zh)
Inventor
涂铮铮
朱庆文
李成龙
汤进
罗斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202110111717.3A priority Critical patent/CN112785626A/en
Publication of CN112785626A publication Critical patent/CN112785626A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a twin network small target tracking method based on multi-scale feature fusion.A multi-scale fusion feature module and an optimized twin neural network comprehensively consider the accurate position of a target in favor of a low layer in a deep neural network structure, the advantage that a high layer can capture semantic information of the target is taken into consideration, the problem that the information of the small target is abandoned by convolution operation of a deep network is avoided by fully utilizing information of a bottom layer through effective fusion of different layers, the small target challenge in the tracking process is solved, and therefore a good tracking effect is realized.

Description

Twin network small target tracking method based on multi-scale feature fusion
Technical Field
The invention relates to a visual identification technology, in particular to a twin network small target tracking method based on multi-scale feature fusion.
Background
Moving object tracking means that after position information of an object of interest of a first frame of a given video sequence is given, a tracker can continuously accurately track the object in real time in a subsequent sequence and return the position information. In recent years, theoretical methods for target tracking have been developed rapidly, which is an important research direction in the field of computer vision, and have been successfully applied to a plurality of fields such as video surveillance, unmanned driving, semantic segmentation, and the like. The development of tracking problems is greatly promoted by the emergence of deep learning methods, but the small target tracking problem is still a very large challenge, and particularly how to accurately track small targets in real time in a complex context is a key research problem.
At present, the challenges of small target tracking are mainly derived from two aspects: the characteristics of the small target object are very difficult to acquire along with the increase of the depth of the neural network, so that the characteristic acquisition is difficult to represent. On the other hand, during tracking, small objects tend to drift suddenly and substantially compared to normal sized objects due to lens jitter. Current research focuses only on the tracking results of normal-sized target objects on a generic data set, but ignores the small target tracking problem.
The existing small target tracking algorithm is based on the traditional machine learning algorithm, and has great limitation in precision improvement or tracking real-time performance, and the deep neural network can extract high-level semantic information due to the deeper network layer number so as to better express characteristics, but for small target objects, the position information of the small target can be gradually lost by the continuous convolution operation along with the deepening of the network layer number.
Therefore, by utilizing the deep neural network structure of the twin network, the small target object tracking with real-time performance and robustness under complex scenes and environments can be realized by fusing complementary feature information of different network layers from the perspective of multi-scale feature fusion, but the application of the existing twin network has the following problems: how to effectively fuse multi-scale features of different network layers, the existing deep neural network has fuzzy target positions, less semantic information and the like, and finally, the small target features are difficult to obtain.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the defects in the prior art and provides a twin network small target tracking method based on multi-scale feature fusion.
The technical scheme is as follows: the invention discloses a twin network small target tracking method based on multi-scale feature fusion, which comprises the following steps:
step (1), respectively and sequentially carrying out size modification and data amplification pretreatment on a template image x and an image y to be searched to obtain a cut training sample pair with a fixed corresponding size, and respectively inputting a template branch and a search branch in a twin network structure;
step (2), the template branch and the search branch share a feature extractor, namely a multi-scale feature fusion module is used for obtaining a multi-scale fusion feature vector, and the multi-scale fusion feature vector comprises two stages of feature extraction from bottom to top and feature transverse fusion from top to bottom;
when characteristics are extracted from bottom to top, an optimized twin network structure is constructed, the optimized twin network structure comprises 5 convolutional layers, and the output of each layer is sequentially marked as { C1, C2, C3, C4 and C5 };
when the features are transversely fused from top to bottom, firstly, the features of a high layer are subjected to up-sampling and size expansion and then fused with the features of a lower layer, and then, a multi-scale fused feature map of the template branches and the branches to be searched is generated respectively through iteration;
step (3), inputting the template characteristic diagram and the search characteristic diagram obtained in the step (2) into a similarity function, performing related cross operation to obtain a response diagram, and determining the position of a target, namely the target position in the image to be searched (namely the frame to be tracked), as the position with the highest value in the response diagram is determined to be the most similar position of the target object of the two images;
and (4) expanding the response diagram to the size y (for example, 225 x 225) of the original image to be searched, analyzing the response diagram to obtain a final tracking result, and multiplying the position with the maximum score by the total step size of five-layer convolution of the optimized twin network structure to obtain the position information of the current target on the image to be searched.
Further, the specific method for modifying the size of the template image x in the step (1) is as follows:
the size of the first frame target frame in the target tracking process is known, and the size of the first frame target frame is set as (x _ min, y _ min, w, h); then, the size of the template image x is calculated according to the first frame target frame, that is, a square area is cut out by taking the target to be tracked as the center, and the calculation formula is as follows:
s(w+2p)×s(h+2p)=A
Figure BDA0002919146990000021
wherein, (x _ min, y _ min) refers to the coordinate value of the lower left corner of the target frame, w and h represent the width and height of the frame, s refers to the modification size, and A is set to 127 × 127; the target frame size is expanded by the above operation, and then the size is modified to 127 × 127 size to obtain the template image x.
The invention calls the first frame in a video frame as a template frame (namely a template image x), the subsequent frames are all target positions to be searched (namely an image y to be searched), and the positions are all represented by four coordinates of the lower left corner and the width and the height.
The specific method for modifying the size of the image y to be searched comprises the following steps:
firstly, the center of a target frame predicted according to the previous frame is taken as a cutting center, and then the side length of a square area cut out according to a template image x is determined according to the proportion; finally, the size is modified to 255 x 255.
Further, an optimized twin network structure is constructed in the step (2) to extract features from bottom to top, and the optimized twin network structure is set as follows:
firstly, the first layer is a convolution layer, a convolution operation is carried out on the image by using a convolution kernel of 11 × 96 with the step size of 2, and then a maximum pooling operation of 3 × 3 and a batch standardization operation are used for outputting C1;
secondly, the second layer is a convolution layer, 5 × 256 convolution kernels with the step length of 1 are used for performing convolution operation by using two groups of GPUs respectively, then the maximum pooling operation of 3 × 3 and batch standardization operation are used for extracting characteristic information, and C2 is output;
thirdly, the third layer is a convolution layer, convolution operation is carried out by using convolution kernel groups of 3 × 192, batch standardization operation is continued, and C3 is output;
fourthly, the fourth layer is a convolution layer, the convolution kernel grouping of 3 × 192 is used for operation, the batch standardization operation is continued, and C4 is output;
fifthly, the fifth layer is a convolution layer, only the convolution operation of 3 x 128 is used, and finally 256-dimensional high-level semantic features C5 are output.
Further, the specific method for transversely fusing the features from top to bottom in the step (2) is as follows:
(A) by adopting an interpolation method, new elements are inserted between pixels by adopting 2 times of upsampling (nearest neighbor upsampling method) on the basis of the characteristic image pixels of the fifth layer, and the size of the new elements is changed into the characteristic size of the fourth layer, so that the characteristic size of the high layer is enlarged, and the next step of fusion is facilitated; then sequentially enlarging the feature sizes of the fourth layer, the third layer and the second layer;
(B) using a convolution operation of 1 × 1 at a layer C5 to obtain a feature P5 with low resolution, then changing the number of channels of a fourth layer feature map C4 generated in the bottom-up process by using a convolution kernel of 1 × 1, uniformly fixing the channels to 256-d, facilitating subsequent feature fusion, then adding and fusing the result after the fourth layer processing and the result after sampling performed by a fifth layer, using a convolution kernel of 3 × 3 to process the fused result to solve the aliasing effect possibly generated in the up-sampling process, and recording the finally obtained result as P4;
and (C) iterating the process (B) to finally generate a more accurate feature map, and respectively obtaining the feature map after multi-scale fusion of the template branches and the branches to be searched.
Further, in the step (3), the template branch and the multi-scale fused feature map corresponding to the branch to be searched are subjected to cross-correlation operation to obtain a response map. Performing a cross-correlation operation, specifically, using the template branch and the multi-scale fused feature corresponding to the branch to be searched, where the two features have the sizes of 22 × 256 and 6 × 256, respectively, and then performing a convolution operation on the feature of 22 × 256 by using 6 × 256 as a convolution kernel to obtain a response map of 17 × 17, where the score of the target position tracked on the response map of 17 × 17 is higher;
during the training process, a 17 × 17 response map is obtained followed by the determination of positive and negative samples: if the value of the distance target on the search image is smaller than R, the search image is calculated as a positive sample, otherwise, the search image is regarded as a negative sample;
finally, a binary-class cross entropy logic loss function is adopted, a random gradient descent method is utilized, the training iteration number is set to be 50, the minimum batch is set to be 8, and the learning rate is 10-2Attenuation of 10-8Training the whole deep network;
the similarity function formula is as follows:
Figure BDA0002919146990000041
wherein,
Figure BDA0002919146990000042
for convolution kernel, in
Figure BDA0002919146990000043
Is subjected to convolution, b1The value of each position on the score map is represented.
Has the advantages that: the invention is provided with a multi-scale fusion characteristic module, comprehensively considers the accurate position of a target in a low layer in a deep neural network structure, captures the advantage of semantic information of the target in a high layer, and fully utilizes information of a bottom layer to avoid the problem that the convolution operation of a deep network discards information of a small target through effective fusion of different layers. In addition, the invention optimizes the existing twin network structure and can accurately track the visual target tracking method of the small target object.
In conclusion, the invention can comprehensively and effectively fuse the structures of different network layer characteristics, solves the small target challenge in the tracking process and realizes good tracking effect.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a block diagram illustrating a multi-scale feature fusion module for a branch to be searched according to an embodiment of the present invention;
FIG. 3 is a comparative illustration of an embodiment of the present invention;
fig. 3(a) is a visualization characteristic diagram obtained by using the present invention, and fig. 3(b) is a visualization characteristic diagram obtained by using an existing twin network.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
In the practical application of target tracking, a target shot by a camera is tracked at a medium-high altitude, and how to continuously and accurately track the target in a long-distance scene is a difficult problem of research in the tracking field.
The invention is based on the twin network after optimization, carries on the feature fusion through the multi-scale fusion method from top to bottom, has solved the difficult problem of tracking the small object in the prior art, as shown in figure 1, the invention is a twin network small target tracking method based on multi-scale feature fusion, comprising the following steps:
step (1), preprocessing of modifying size data and increasing is sequentially carried out on a template image x and an image y to be searched respectively to obtain a pair of cut training samples with fixed sizes, and the training samples are respectively input into a template branch and a search branch in a twin network structure;
in the target tracking process, the size of a first frame target frame is set as (x _ min, y _ min, w, h); then, the size of the template image x is calculated according to the first frame target frame, that is, a square area is cut out by taking the target to be tracked as the center, and the calculation formula is as follows:
s(w+2p)×s(h+2p)=A
Figure BDA0002919146990000051
wherein s refers to the modified size, and a is set to 127 × 127; expanding the size of the target frame through the above operation, and then modifying the size to 127 × 127 size to obtain a template image x;
in training, the specific method for modifying the size of the image y to be searched comprises the following steps:
firstly, the center of a target frame predicted according to the previous frame is taken as a cutting center, and then the side length of a square area cut out according to a template image x is determined according to the proportion; finally, modifying the size to 255 x 255;
step (2), the template branch and the search branch share a feature extractor, namely a multi-scale feature fusion module is used for obtaining a multi-scale fusion feature vector, and the multi-scale fusion feature vector comprises two stages of feature extraction from bottom to top and feature transverse fusion from top to bottom;
as shown in fig. 2, an optimized twin network structure is constructed to extract features from bottom to top, and the optimized twin network structure is set as follows:
firstly, the first layer is a convolution layer, a convolution operation is carried out on the image by using a convolution kernel of 11 × 96 with the step size of 2, and then a maximum pooling operation of 3 × 3 and a batch standardization operation are used for outputting C1;
secondly, the second layer is a convolution layer, 5 × 256 convolution kernels with the step length of 1 are used for performing convolution operation by using two groups of GPUs respectively, then the maximum pooling operation of 3 × 3 and batch standardization operation are used for extracting characteristic information, and C2 is output;
thirdly, the third layer is a convolution layer, convolution operation is carried out by using convolution kernel groups of 3 × 192, batch standardization operation is continued, and C3 is output;
fourthly, the fourth layer is a convolution layer, the convolution kernel grouping of 3 × 192 is used for operation, the batch standardization operation is continued, and C4 is output;
fifthly, the fifth layer is a convolution layer, only the convolution operation of 3 x 128 is used, and finally 256-dimensional high-level semantic features C5 are output.
The specific method for transversely fusing the features from top to bottom comprises the following steps:
(A) by adopting an interpolation method, new elements are inserted between pixels by adopting 2 times of upsampling (nearest neighbor upsampling method) on the basis of the characteristic image pixels of the fifth layer, and the size of the new elements is changed into the characteristic size of the fourth layer, so that the characteristic size of the high layer is enlarged, and the next step of fusion is facilitated; then sequentially enlarging the feature sizes of the fourth layer, the third layer and the second layer;
(B) using a convolution operation of 1 × 1 at a layer C5 to obtain a feature P5 with low resolution, then changing the number of channels of a fourth layer feature map C4 generated in the bottom-up process by using a convolution kernel of 1 × 1, uniformly fixing the channels to 256-d, facilitating subsequent feature fusion, then adding and fusing the result after the fourth layer processing and the result after sampling performed by a fifth layer, using a convolution kernel of 3 × 3 to process the fused result to solve the aliasing effect possibly generated in the up-sampling process, and recording the finally obtained result as P4;
iterating the process (B) to finally generate a more accurate feature map, and respectively obtaining the feature map after multi-scale fusion of the template branches and the branches to be searched;
and (3) acquiring a response graph by utilizing a cross-correlation operation on the multi-scale fused feature graph corresponding to the template branch and the branch to be searched in the step (3). Performing a cross-correlation operation, specifically, using the template branch and the multi-scale fused feature corresponding to the branch to be searched, where the two features have the sizes of 22 × 256 and 6 × 256, respectively, and then performing a convolution operation on the feature of 22 × 256 by using 6 × 256 as a convolution kernel to obtain a response map of 17 × 17, where the score of the target position tracked on the response map of 17 × 17 is higher;
in the training process, positive and negative samples need to be determined after the response map is obtained: if the value of the distance target on the search image is smaller than R, the search image is calculated as a positive sample, otherwise, the search image is regarded as a negative sample;
finally, a binary-class cross entropy logic loss function is adopted, a random gradient descent method is utilized, the training iteration number is set to be 50, the minimum batch is set to be 8, and the learning rate is 10-2Attenuation of 10-8Training the whole deep network;
the similarity function formula is as follows:
Figure BDA0002919146990000061
wherein,
Figure BDA0002919146990000062
for convolution kernel, in
Figure BDA0002919146990000063
Is subjected to convolution, b1Representing the value of each position on the score map;
and (4) expanding the response image to the size of the original image, analyzing the response image to obtain a final tracking result, and multiplying the position with the maximum score by the total step length of the five-layer convolution of the optimized twin network structure to obtain the position information of the current target on the image to be searched.
As shown in FIG. 3, the target obtained by the method of the present invention has accurate positioning and more clear effect.
As can be seen from the above embodiments, the present invention regards target tracking as learning of the similarity metric problem. Inputting the template image x and the image y to be searched into a twin network structure for the same transformation, designing a multi-scale feature fusion module to respectively obtain corresponding feature vectors, finally, using the template feature image as a convolution kernel to perform cross correlation operation on the searched features, generating a response image so as to compare the similarity between the template feature image and the search features, returning a high score, namely a target position, at a position with higher similarity, and otherwise, returning a low score.

Claims (6)

1. A twin network small target tracking method based on multi-scale feature fusion is characterized in that: the method comprises the following steps:
step (1), respectively and sequentially carrying out size modification and data amplification pretreatment on a template image x and an image y to be searched to obtain a cut training sample pair with a fixed corresponding size, and respectively inputting a template branch and a search branch in a twin network structure;
step (2), the template branch and the search branch share a feature extractor, namely a multi-scale feature fusion module is used for obtaining a multi-scale fusion feature vector, and the multi-scale fusion feature vector comprises two stages of feature extraction from bottom to top and feature transverse fusion from top to bottom;
when characteristics are extracted from bottom to top, an optimized twin network structure is constructed, the optimized twin network structure comprises 5 convolutional layers, and the output of each layer is sequentially marked as { C1, C2, C3, C4 and C5 };
when the features are transversely fused from top to bottom, firstly, the features of a high layer are subjected to up-sampling and size expansion and then fused with the features of a lower layer, and then, a multi-scale fused feature map of the template branches and the branches to be searched is generated respectively through iteration;
step (3), inputting the template characteristic diagram and the search characteristic diagram obtained in the step (2) into a similarity function, performing related cross operation to obtain a response diagram, and determining the position of the target object in the image y to be searched as the most similar position of the target object in the two images by considering the position with higher value in the response diagram;
and (4) expanding the response image to the size of the original image y to be searched, analyzing the response image to obtain a final tracking result, and multiplying the position with the maximum score by the total step length of the five-layer convolution of the optimized twin network structure to obtain the position information of the current target on the image to be searched.
2. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the specific method for modifying the size of the template image x in the step (1) is as follows:
setting the size of the first frame target frame as (x _ min, y _ min, w, h); then, the size of the template image x is calculated according to the first frame target frame, that is, a square area is cut out by taking the target to be tracked as the center, and the calculation formula is as follows:
s(w+2p)×s(h+2p)=A
Figure FDA0002919146980000011
wherein s is the modified dimension transform and a is set to 127 × 127; expanding the size of the target frame through the above operation, and then transforming the modified size to 127 × 127 size to obtain a template image x;
the specific method for modifying the size of the image y to be searched comprises the following steps:
firstly, the center of a target frame predicted according to the previous frame is taken as a cutting center, and then the side length of a square area cut out according to a template image x is determined according to the proportion; finally, the size is modified to 255 x 255.
3. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the method for data augmentation in the step (1) comprises the following steps of in order to increase deep learning training data, wherein four data augmentation modes are utilized: random stretching of randomtretch, random crop random, normalization and totensor conversion into tensor;
finally, the size is modified to the size that needs to be input into the network structure.
4. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: in the step (2), an optimized twin network structure is constructed to extract features from bottom to top, and the optimized twin network structure is set as follows:
firstly, the first layer is a convolution layer, a convolution operation is carried out on the image by using a convolution kernel of 11 × 96 with the step size of 2, and then a maximum pooling operation of 3 × 3 and a batch standardization operation are used for outputting C1;
secondly, the second layer is a convolution layer, 5 × 256 convolution kernels with the step length of 1 are used for performing convolution operation by using two groups of GPUs respectively, then the maximum pooling operation of 3 × 3 and batch standardization operation are used for extracting characteristic information, and C2 is output;
thirdly, the third layer is a convolution layer, convolution operation is carried out by using convolution kernel groups of 3 × 192, batch standardization operation is continued, and C3 is output;
fourthly, the fourth layer is a convolution layer, the convolution kernel grouping of 3 × 192 is used for operation, the batch standardization operation is continued, and C4 is output;
fifthly, the fifth layer is a convolution layer, only the convolution operation of 3 x 128 is used, and finally 256-dimensional high-level semantic features C5 are output.
5. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the specific method for transversely fusing the features from top to bottom in the step (2) comprises the following steps:
(A) by adopting an interpolation method, new elements are inserted between pixels by adopting 2 times of upsampling on the basis of the characteristic image pixels of the fifth layer, and the size of the new elements is changed into the characteristic size of the fourth layer, so that the characteristic size of the high layer is enlarged, and the next step of fusion is facilitated; then sequentially enlarging the feature sizes of the fourth layer, the third layer and the second layer;
(B) using a convolution operation of 1 × 1 at a layer C5 to obtain a feature P5 with low resolution, then changing the number of channels of a fourth layer feature map C4 generated in the bottom-up process by using a convolution kernel of 1 × 1, uniformly fixing the channels to 256-d, then adding and fusing the result after the fourth layer processing and the result after the fifth layer sampling, processing the fused result by using a convolution kernel of 3 × 3, and recording the finally obtained result as P4;
and (C) iterating the process (B) to finally generate a feature map, and respectively obtaining the multi-scale fused feature map of the template branch and the branch to be searched.
6. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the step (3) is to acquire a response graph by utilizing cross correlation operation on the multi-scale fused feature graph corresponding to the template branch and the branch to be searched;
the specific process of the cross-correlation operation is as follows: utilizing the template branches and the multi-scale fused features corresponding to the branches to be searched, wherein the sizes of the two features are 22 × 256 and 6 × 256 respectively, and then performing convolution operation on the features 22 × 256 by taking 6 × 256 as a convolution kernel to obtain a response graph 17 × 17;
during the training process, a 17 × 17 response map is obtained followed by the determination of positive and negative samples: if the value of the distance target on the search image is smaller than R, the search image is calculated as a positive sample, otherwise, the search image is regarded as a negative sample;
finally, performing iterative training by adopting a binary-class cross entropy logic loss function and a random gradient descent method to train the whole depth network;
the similarity function formula is as follows:
Figure FDA0002919146980000031
wherein,
Figure FDA0002919146980000032
for convolution kernel, in
Figure FDA0002919146980000033
Is subjected to convolution, b1The value of each position on the score map is represented.
CN202110111717.3A 2021-01-27 2021-01-27 Twin network small target tracking method based on multi-scale feature fusion Pending CN112785626A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110111717.3A CN112785626A (en) 2021-01-27 2021-01-27 Twin network small target tracking method based on multi-scale feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110111717.3A CN112785626A (en) 2021-01-27 2021-01-27 Twin network small target tracking method based on multi-scale feature fusion

Publications (1)

Publication Number Publication Date
CN112785626A true CN112785626A (en) 2021-05-11

Family

ID=75758302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110111717.3A Pending CN112785626A (en) 2021-01-27 2021-01-27 Twin network small target tracking method based on multi-scale feature fusion

Country Status (1)

Country Link
CN (1) CN112785626A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223053A (en) * 2021-05-27 2021-08-06 广东技术师范大学 Anchor-free target tracking method based on fusion of twin network and multilayer characteristics
CN113627488A (en) * 2021-07-13 2021-11-09 武汉大学 Twin network online update-based single target tracking method and device
CN113808166A (en) * 2021-09-15 2021-12-17 西安电子科技大学 Single-target tracking method based on clustering difference and depth twin convolutional neural network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network
CN111489361A (en) * 2020-03-30 2020-08-04 中南大学 Real-time visual target tracking method based on deep feature aggregation of twin network
CN111681259A (en) * 2020-05-17 2020-09-18 天津理工大学 Vehicle tracking model establishing method based on Anchor-free mechanism detection network
CN111898504A (en) * 2020-07-20 2020-11-06 南京邮电大学 Target tracking method and system based on twin circulating neural network
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network
CN111489361A (en) * 2020-03-30 2020-08-04 中南大学 Real-time visual target tracking method based on deep feature aggregation of twin network
CN111681259A (en) * 2020-05-17 2020-09-18 天津理工大学 Vehicle tracking model establishing method based on Anchor-free mechanism detection network
CN111898504A (en) * 2020-07-20 2020-11-06 南京邮电大学 Target tracking method and system based on twin circulating neural network
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
崔洲涓 等: "面向无人机的轻量级Siamese注意力网络目标跟踪", 《光学学报》 *
杨哲 等: "基于孪生网络融合多模板的目标跟踪算法", 《计算机工程与应用》 *
武玉伟: "《深度学习基础与应用》", 30 April 2020, 北京:北京理工大学出版社 *
董洪义: "《深度学习之PyTorch物体检测实战》", 31 January 2020, 北京:机械工业出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223053A (en) * 2021-05-27 2021-08-06 广东技术师范大学 Anchor-free target tracking method based on fusion of twin network and multilayer characteristics
CN113627488A (en) * 2021-07-13 2021-11-09 武汉大学 Twin network online update-based single target tracking method and device
CN113627488B (en) * 2021-07-13 2023-07-21 武汉大学 Single-target tracking method and device based on online update of twin network
CN113808166A (en) * 2021-09-15 2021-12-17 西安电子科技大学 Single-target tracking method based on clustering difference and depth twin convolutional neural network
CN113808166B (en) * 2021-09-15 2023-04-18 西安电子科技大学 Single-target tracking method based on clustering difference and depth twin convolutional neural network

Similar Documents

Publication Publication Date Title
CN111462175B (en) Space-time convolution twin matching network target tracking method, device, medium and equipment
CN110738207B (en) Character detection method for fusing character area edge information in character image
US10719940B2 (en) Target tracking method and device oriented to airborne-based monitoring scenarios
CN110443827B (en) Unmanned aerial vehicle video single-target long-term tracking method based on improved twin network
CN107273800B (en) Attention mechanism-based motion recognition method for convolutional recurrent neural network
CN112184752A (en) Video target tracking method based on pyramid convolution
CN111340844B (en) Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism
CN111460914B (en) Pedestrian re-identification method based on global and local fine granularity characteristics
CN112132856B (en) Twin network tracking method based on self-adaptive template updating
CN112785626A (en) Twin network small target tracking method based on multi-scale feature fusion
CN111639692A (en) Shadow detection method based on attention mechanism
CN113743269B (en) Method for recognizing human body gesture of video in lightweight manner
CN113744311A (en) Twin neural network moving target tracking method based on full-connection attention module
CN113034545A (en) Vehicle tracking method based on CenterNet multi-target tracking algorithm
CN112163498A (en) Foreground guiding and texture focusing pedestrian re-identification model establishing method and application thereof
CN118097150B (en) Small sample camouflage target segmentation method
CN116740135B (en) Infrared dim target tracking method and device, electronic equipment and storage medium
CN114519807A (en) Global self-attention target detection method combining channel space attention
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN115661754A (en) Pedestrian re-identification method based on dimension fusion attention
CN111882581A (en) Multi-target tracking method for depth feature association
CN113129332A (en) Method and apparatus for performing target object tracking
CN112509014B (en) Robust interpolation light stream computing method matched with pyramid shielding detection block
CN116935249A (en) Small target detection method for three-dimensional feature enhancement under unmanned airport scene
CN117218378A (en) High-precision regression infrared small target tracking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210511

RJ01 Rejection of invention patent application after publication