CN115187629A - Method for fusing target tracking features by using graph attention network - Google Patents

Method for fusing target tracking features by using graph attention network Download PDF

Info

Publication number
CN115187629A
CN115187629A CN202210569792.9A CN202210569792A CN115187629A CN 115187629 A CN115187629 A CN 115187629A CN 202210569792 A CN202210569792 A CN 202210569792A CN 115187629 A CN115187629 A CN 115187629A
Authority
CN
China
Prior art keywords
features
network
module
target tracking
frame image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210569792.9A
Other languages
Chinese (zh)
Inventor
郑忠龙
张大伟
林飞龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN202210569792.9A priority Critical patent/CN115187629A/en
Publication of CN115187629A publication Critical patent/CN115187629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention provides a method for fusing target tracking characteristics by using a graph attention network, belonging to the technical field of computer vision. The method for fusing target tracking characteristics by using the graph attention network comprises the following steps: s1: cutting a current frame image to obtain a search area, and cutting a previous frame image to obtain a template image; s2: inputting the search area into a data enhancement module to obtain a plurality of groups of first training samples, and inputting the template image into the data enhancement module to obtain a plurality of groups of second training samples; s3: inputting a group of first training samples into a feature extraction module to extract the features of a search area, and inputting a group of second training samples into the feature extraction module to extract template features; s4: inputting the search area features and the template features into a graph attention module to obtain fusion features; s5: and inputting the fusion characteristics into a regression module to obtain a final predicted bounding box coordinate. The method has superiority in the aspects of performance and model complexity.

Description

Method for fusing target tracking features by using graph attention network
Technical Field
The invention relates to the technical field of computer vision, in particular to a method for fusing target tracking characteristics by using a graph attention network.
Background
Visual Object Tracking (Visual Object Tracking) refers to detecting, extracting, identifying and Tracking a moving Object in an image sequence to obtain the position of the moving Object, so as to perform the next processing and analysis. It has wide application in the real world, such as intelligent video monitoring, human-computer interaction, robot visual navigation, virtual reality, medical diagnosis, and the like. Target tracking is a challenging visual problem due to the large appearance changes of the target in the image sequence, such as deformation, occlusion, illumination changes, etc.
In recent years, because effective target feature representation can be learned by using a convolutional neural network, a deep learning method is successfully applied to a target tracking task. Wang Naiyan et al, for the first time, proposed a target tracking model using Convolutional Neural Network (CNN), SO-DLT, which uses a network structure similar to AlexNet to extract target features. Since then, various CNN-based methods have emerged, which significantly improve the performance of target tracking, and there is a trend towards designing a deeper network to pursue better performance.
The SimFC proposes a new full convolution twin network, a twin structure is utilized to calculate the similarity between two inputs by using the convolution network, but with the continuous change of the appearance of a target and the increasing complexity of the background, a model only predicts a single scale and cannot adapt to the scale change of the target. The SimRPN applies an RPN module in target detection to a tracking task, namely classification and regression branches, because the existence of the regression branches enables the algorithm to remove an original characteristic pyramid and solve the problem of unreliable scores of response graphs, the algorithm improves the precision and the speed at the same time, and converts the original similarity calculation problem into the regression and classification problem. DaSiamRPN introduced the test data set to expand the training set to improve the generalization capability of the model, and simultaneously increased the difficult negative samples of the same class and different classes to improve the discrimination capability of the model, but the above trackers all constructed networks on the similar architecture to AlexNet, and tried to train trackers with more complex architecture, such as using ResNet, but the performance was not improved at all. SiamRPN + + analyzes existing trackers and finds that the currently proposed tracking network needs to satisfy strict translation invariance, and padding in deep networks destroys this property. The current target tracking method cannot simultaneously ensure the tracking performance and high efficiency, and cannot balance the relationship between the performance and the speed.
Chinese patent CN111161311A and publication 2020-05-15 disclose a visual multi-target tracking method and device based on deep learning, the method comprises: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model, recording coordinate position information and acquiring a corresponding template image; acquiring images of each frame in the video except the 1 st frame as an image of a region to be searched; and inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network so as to obtain a tracking result of the tracking target. In the visual multi-target tracking method and device based on deep learning provided by the embodiment of the prior art, the template images corresponding to the tracking targets acquired by using the target detection network model and the images of the areas to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, so that the tracking results of the tracking targets corresponding to the template images are acquired, the calculation amount is low, and the multi-target real-time and accurate tracking is realized. The target tracking method in the above patent cannot adapt to the scale change of the target, cannot simultaneously ensure the tracking performance and high efficiency, and cannot balance the relationship between the performance and the speed.
Disclosure of Invention
The present invention is directed to a method for fusing target tracking features using a graph attention network, which overcomes the above-mentioned shortcomings of the prior art.
The invention provides a method for fusing target tracking characteristics by using a graph attention network, which comprises the following steps:
s1: obtaining a current frame image and a previous frame image, cutting the current frame image according to the size and the position of a bounding box predicted by the previous frame image to obtain a search area, and cutting the previous frame image to obtain a template image;
s2: inputting the search area into a data enhancement module to obtain a plurality of groups of first training samples, and inputting the template image into the data enhancement module to obtain a plurality of groups of second training samples;
s3: inputting a group of first training samples into a feature extraction module to extract the features of a search area, and inputting a group of second training samples into the feature extraction module to extract template features;
s4: inputting the search area features and the template features into a graph attention module to obtain fusion features;
s5: and inputting the fusion characteristics into a regression module to obtain a final predicted bounding box coordinate.
Further, step S 1 Comprises the following steps: f' t =C(F t ′),F′ t =C(F t ) In which F t For the current frame image, F t-1 C (-) for the previous frame, if the previous frame predicts the bounding box and
Figure BDA0003658685160000031
is a center, and has a width and a height of W t-1 、H t-1 Then the current frame image and the previous frame image are processed
Figure BDA0003658685160000032
Is a center, 2W t -1 、2H t-1 And cutting for width and height.
Further, step S2 includes:
Figure BDA0003658685160000033
Figure BDA0003658685160000034
and Aug (-) is the data enhancement operation of the data enhancement module, and the current frame image at each moment in Aug (-) is sampled and cut through the Laplace distribution.
Further, step S3 includes: f. of s =ResNet(F′ t ),f t =ResNet(F′ i-1 ) Wherein f is s For searching for regional features, f t The feature extraction module comprises a feature extraction network used for extracting search area features and template features, wherein the feature extraction network selects ResNet18, and does not pass through a final average pooling layer and a softmax layer.
Further, step S4 includes:
Figure BDA0003658685160000035
wherein
Figure BDA0003658685160000036
For the fusion feature, F 0 (. Cndot.) is the fusion operation of the graph attention module GAM.
Further, step S5 includes:
Figure BDA0003658685160000041
[left,bottom,right,top]=F 4 (f) In which F i (. H) is the ith set of fully-connected layers, each set of fully-connected layers being provided with a ReLU activation function and a Dropout operation, F 4 (. To) is output layer, [ left, bottom, right, top]The horizontal and vertical coordinate values of the lower left corner and the upper right corner.
Further, in the data enhancement operation, the parameter of the laplacian distribution is set to b c =1/5,b s =1/5,b c 、b t The center and the size of the bounding box are the scale of variation, respectively.
Further, in the data enhancement operation, the aspect ratio of the bounding box varies by the dimension α w ,α h E (0.6,1.4) to prevent the bounding box from overstretching.
Further, the map attention module CAM includes a linear transformation operation and a similarity calculation operation, and the specific operations of the map attention module CAM include: feature f of search area s And template features f t Splitting into two groups of 1X 1 XcNode set N s And N t For each node in the two sets of nodes
Figure BDA0003658685160000042
And
Figure BDA0003658685160000043
after linear transformation, by
Figure BDA0003658685160000044
Calculating the similarity between two nodes using inner product, wherein W s And W t Is a linear matrix, e ij Representing nodes
Figure BDA0003658685160000045
And
Figure BDA0003658685160000046
similarity between them; after all nodes have been computed, e is added ij Input softmax function:
Figure BDA0003658685160000047
wherein a is ij Representing the similarity score after normalization; after the normalized similarity scores are obtained, linear transformation is carried out on the two groups of nodes once, and the information of each template node embedded into the nodes in the search area is respectively calculated according to the similarity scores:
Figure BDA0003658685160000051
wherein W g Representing a linear variation matrix, g i Representing to be aggregated to a node
Figure BDA0003658685160000052
Template information of (2); to pair
Figure BDA0003658685160000053
After the same linear transformation, by
Figure BDA0003658685160000054
The polymerization feature g i And search area node characteristics
Figure BDA0003658685160000055
Is spliced, wherein
Figure BDA0003658685160000056
Is an enhanced node feature.
Further, the feature extraction module further comprises an SGD for fine-tuning parameters in the feature extraction network during training.
The method for fusing the target tracking characteristics by using the attention network has the following beneficial effects:
the invention provides a deep regression network for tracking based on graph attention, which can improve the tracking performance and keep high efficiency, can construct a local relation between the characteristics of a search area and the characteristics of a template, can integrate effective target information, enhances the characteristic representation of a tracked target, better balances the relation between the performance and the speed, and has superiority in the aspects of performance and model complexity.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings, like reference numerals are used to indicate like elements. The drawings in the following description are directed to some, but not all embodiments of the invention. For a person skilled in the art, other figures can be derived from these figures without inventive effort.
FIG. 1 is a flow chart of a method of fusing target tracking features using a graph attention network in accordance with the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
Please refer to fig. 1. The method for fusing the target tracking characteristics by using the attention network comprises the following steps:
s1: obtaining a current frame image and a previous frame image, cutting the current frame image according to the size and the position of a bounding box predicted by the previous frame image to obtain a search area, and cutting the previous frame image to obtain a template image;
s2: inputting the search area into a data enhancement module to obtain a plurality of groups of first training samples, and inputting the template image into the data enhancement module to obtain a plurality of groups of second training samples;
s3: inputting a group of first training samples into a feature extraction module to extract the features of a search area, and inputting a group of second training samples into the feature extraction module to extract template features;
s4: inputting the search area features and the template features into a graph attention module to obtain fusion features;
s5: and inputting the fusion characteristics into a regression module to obtain a final predicted bounding box coordinate.
The step S1 comprises the following steps: f' t =C(F t ),F′ t =C(F t ) In which F is t For the current frame image, F t-1 C (-) for the previous frame, if the previous frame predicts the bounding box and
Figure BDA0003658685160000061
is a center, and has a width and a height of W t-1 、H t-1 Then the current frame image and the previous frame image are processed to obtain the image
Figure BDA0003658685160000062
Is a center, 2W t-1 、2H t-1 And cutting for width and height.
The step S2 comprises the following steps:
Figure BDA0003658685160000063
Figure BDA0003658685160000064
and Aug (-) is the data enhancement operation of the data enhancement module, and the current frame image at each moment in Aug (-) is sampled and cut through the Laplace distribution. F' t Is search area sample, F' t-1 Sampling and cutting the search area sample and the template image sample through Laplace distribution to obtain ten groups of corresponding training samples respectively.
The step S3 comprises the following steps: f. of s =ResNet(F′ t ),f t =ResNet(F′ t-1 ) Wherein f is s For searching for regional features, f t For template features, the feature extraction module comprises a feature extraction network for extracting features of a search area and the template features, the feature extraction network selects ResNet18, and the feature extraction network does not pass through a final average pooling layer and a softmax layer, namely the feature extraction network ResNet18 in the application does not pass through the final average pooling layer and the softmax layer.
The step S4 comprises the following steps:
Figure BDA0003658685160000071
wherein
Figure BDA0003658685160000072
For the fusion feature, F 0 (. Cndot.) is the fusion operation of the graph attention module GAM.
The step S5 comprises the following steps:
Figure BDA0003658685160000073
[left,bottom,right,top]=F 4 (f) In which F is t (. H) is the ith set of fully-connected layers, with a total of three sets of fully-connected layers, each set of fully-connected layers being provided with a ReLU activation function and a Dropout operation, F 4 (. H) is output layer, [ left, bottom, right, top]The horizontal and vertical coordinate values of the lower left corner and the upper right corner. The number of nodes in each set of fully connected layers in the regression module may be 4096 and the number of nodes in the output layer may be 4.
In the data enhancement operation, the parameter of the Laplace distribution is set to b c =1/5,b s =1/5,b c 、b s The center and the size of the bounding box are the scale of variation, respectively.
In the data enhancement operation, the aspect ratio variation scale alpha of the bounding box w ,α h E (0.6,1.4) to prevent the bounding box from overstretching.
The entire tracking network is divided in this application into a feature extraction module, a graph attention module, and a regression module. The minimum batch size was set to 50, the initial learning rate was set to 1e-5, every 1 × 10 5 And (4) performing secondary iteration, reducing the learning rate according to the attenuation factor of 0.1, and optimizing the network weight by adopting an L1 loss function. And (3) realizing a target tracking network framework on a PyTorch platform, and carrying out an experiment by using an Nvidia 1080Ti display card.
The attention CAM module includes a plurality of linear transformation operations and similarity calculation operations, and the specific operations of the attention CAM module include: search area feature f s And template features f t Respectively split into two groups of 1 multiplied by c node sets N s And N t For each node in the two sets of nodes
Figure BDA0003658685160000081
And
Figure BDA0003658685160000082
after linear transformation, by
Figure BDA0003658685160000083
The similarity between two nodes is calculated using the inner product, where W s And W t Is a linear matrix, e ij Representing nodes
Figure BDA0003658685160000084
And
Figure BDA0003658685160000085
similarity between them; after all nodes have completed computation, e is added ij Input softmax function:
Figure BDA0003658685160000086
wherein a is sj Representing the similarity score after normalization; after the normalized similarity scores are obtained, linear transformation is carried out on the two groups of nodes once, and the information of each template node embedded into the nodes in the search area is respectively calculated according to the similarity scores:
Figure BDA0003658685160000087
wherein W g Representing a linear variation matrix, g i Representing to be aggregated to a node
Figure BDA0003658685160000088
Template information of (i.e. g) t Is a polymerization feature; to pair
Figure BDA0003658685160000089
After the same linear transformation, by
Figure BDA00036586851600000810
The polymerization feature g i And search area node characteristics
Figure BDA00036586851600000811
Is spliced, wherein
Figure BDA00036586851600000812
The node characteristics are enhanced, so that richer characteristic representation can be obtained. The invention can notice the force module mechanism through the drawingThe relation between the search area and the target template part is established, and the attention of the network to each part is described, so that richer and effective characteristics can be obtained.
The feature extraction module also includes an SGD to fine-tune parameters in the feature extraction network during training. Feature extraction module uses ResNet18 as a backbone network for extracting search region features f s And template features f t And in the training process, the SGD is used for fine adjustment of parameters of the backbone network.
In this embodiment, the data set, the evaluation index, and the implementation details are sequentially performed as follows:
(1) Data set
The training data in the GOT-10k data set is selected as the training data set, 9335 video sequences are used, and the search area and the template image are respectively obtained by cutting two frames before and after random sampling in the corresponding video sequences. For testing, GOT-10k was selected as the reference dataset.
(2) Evaluation index
For fairness of evaluation, the tracker is trained strictly following the criteria of GOT-10 k: the model was trained using only the training set of GOT-10K, and tests were performed on the test set of GOT-10K. According to AO, SR in GOT-10k 0.5 And SR 0.75 To evaluate the performance of the model.
(3) Implementation details
When the model is trained, a video sequence is randomly selected from a training set, and then two adjacent frames before and after the video sequence are randomly selected. And cutting the front frame and the rear frame according to the position, the size and the position of the bounding box of the front frame to obtain a search area and a template image. At the feature extraction module, feature extraction is performed using ResNet18 as the backbone network, but the final average pooling layer and softmax layer are not used. In GAM, the initial channel of input features is 256, and output features g are calculated through linear transformation and similarity i The number of channels in (2) is still 256, the number of channels after splicing is 512, and finally the number of channels is reduced to 256 through a down-sampling layer. Set the minimum batch size to 50, initial learning rate to1e-5, 1X 10 each 5 And in the secondary iteration, the learning rate is reduced according to the attenuation factor of 0.1, and the L1 loss function is adopted to optimize the network weight. We extend the training data using a laplacian distribution for random sampling. And (3) realizing a target tracking network framework on a PyTorch platform, and carrying out an experiment by using an Nvidia 1080Ti display card.
The present embodiment verifies the proposed method of fusing target tracking features using the graph attention network and the effectiveness of GAM through ablation experiments.
The influence of the backbone network on the model performance is analyzed firstly, then the effectiveness of the deeper convolutional neural network is proved by fine-tuning network parameters, and finally GAM is added to verify the effectiveness of the module.
Table 1 is the result of ablation experiments, which shows that different backbone networks can greatly affect the speed and performance of the tracker. But when using ResNet18 as the backbone network without trimming, the performance is worse than when using AlexNet and without trimming. The main reason is that the ResNet18 network structure is deeper, and the parameters obtained by pre-training are not beneficial to the tracking task. After using the ResNet18 and performing fine tuning, the performance is greatly improved, and the tracking speed still meets the real-time performance. In addition, to study the effectiveness of GAM, the module was added under otherwise unchanged conditions, with the results shown in table 1, with the model improving performance by 1.7%,2.5% and 1.4%, respectively. While the tracking speed increased from 35.20fps to 36.02fps. The tracking performance is improved without affecting the tracking speed, which proves the effectiveness of the module.
TABLE 1 influence of backbone network and GAM on various performance indicators of the model on the GOT-10ks dataset
Figure RE-GDA0003836366920000101
The model was compared to 10 advanced tracking model methods on the GOT-10k dataset, including SiamRPN, daSiamRPN, CFNet, VITAL, siamFC, GOTURN, CCOT, ECO, MDNet. As shown in table 2, the quantitative results indicate that the present invention has good performance relative to some advanced tracking methods at present.
TABLE 2 and 10 model Performance on GOT-10k dataset
Figure RE-GDA0003836366920000102
Figure RE-GDA0003836366920000111
The invention provides a depth regression tracker for feature fusion based on a graph attention network, which comprises a feature extraction module, a graph attention module and a regression module. The graph attention module is used for constructing the relation between the search area and the target template part, fusing effective target information, enhancing the feature representation of the tracking target, and describing the attention distributed by the network to each part so as to obtain richer and effective features. The comparison with other advanced trackers on a reference data set shows that the invention obtains better results in the aspects of performance and effect, can better balance the relation between the performance and the speed, and has superiority in the aspects of performance and model complexity.
The above-described aspects may be implemented individually or in various combinations, and such variations are within the scope of the present invention.
It should be noted that in the description of the present application, the terms "upper end", "lower end" and "bottom end" indicating the orientation or positional relationship are based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship which the product of the application is conventionally placed in use, and are only for convenience of describing the present application and simplifying the description, but do not indicate or imply that the device referred to must have a specific orientation, be constructed in a specific orientation and be operated, and thus should not be construed as limiting the present application. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that: the above examples are only for illustrating the technical solution of the present invention, and are not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for fusing target tracking features using a graph attention network, comprising the steps of:
s1: acquiring a current frame image and a previous frame image, cutting the current frame image according to the size and the position of a bounding box predicted by the previous frame image to acquire a search area, and cutting the previous frame image to acquire a template image;
s2: inputting the search area into a data enhancement module to obtain a plurality of groups of first training samples, and inputting the template image into the data enhancement module to obtain a plurality of groups of second training samples;
s3: inputting a group of first training samples into a feature extraction module to extract the features of a search area, and inputting a group of second training samples into the feature extraction module to extract template features;
s4: inputting the search area features and the template features into a graph attention module to obtain fusion features;
s5: and inputting the fusion characteristics into a regression module to obtain a final predicted bounding box coordinate.
2. A use as claimed in claim 1The method for fusing target tracking features by the graph attention network is characterized in that the step S1 comprises the following steps: f' t =C(F t ),F′ t =C(F t ) In which F is t For the current frame image, F t-1 C (-) is the cropping operation for the previous frame image if the previous frame image predicts the bounding box
Figure FDA0003658685150000011
Is a center, and has a width and a height of W t-1 、H t-1 Then the current frame image and the previous frame image are processed
Figure FDA0003658685150000012
Is a center, 2W t-1 、2H t-1 And cutting for width and height.
3. The method for fusing target tracking features using the attentive force network as claimed in claim 2, wherein the step S2 comprises:
Figure FDA0003658685150000013
Figure FDA0003658685150000014
and Aug (-) is the data enhancement operation of the data enhancement module, and the current frame image at each moment in Aug (-) is sampled and cut through the Laplace distribution.
4. The method of fusing target tracking features using an attentive force network as claimed in claim 3, wherein step S3 comprises: f. of s =ResuNet(F′ t ),f t =ResNet(F′ t-1 ) Wherein f is s For searching for regional features, f t The feature extraction module comprises a feature extraction network used for extracting the features of the search area and the features of the template, wherein the feature extraction network selects ResNet18, and the feature extraction network does not adopt the feature extraction networkThrough the final average pooling layer and softmax layer.
5. The method of fusing target tracking features using an attentive force network as claimed in claim 4, wherein step S4 comprises:
Figure FDA0003658685150000021
wherein
Figure FDA0003658685150000022
For the fusion feature, F 0 (.) is the fusion operation of the graph attention module GAM.
6. The method of fusing target tracking features using an attentive force network as claimed in claim 5, wherein step S5 comprises:
Figure FDA0003658685150000023
[left,bottom,right,top]=F 4 (f) In which F is i (. H) is the ith set of fully-connected layers, each set of fully-connected layers being provided with a ReLU activation function and a Dropout operation, F 4 (. H) is output layer, [ left, bottom, right, top]The horizontal and vertical coordinate values of the lower left corner and the upper right corner.
7. A method for fusing target tracking features using a graph attention network as claimed in any one of claims 3 to 6 wherein: in the data enhancement operation, the parameter of the laplacian distribution is set to b c =1/5,b s =1/5,b c 、b s The center and the size of the bounding box are the scale of variation, respectively.
8. The method of fusing target tracking features using an attentive force network of claim 7, wherein: in the data enhancement operation, the aspect ratio variation scale alpha of the bounding box w ,α h E (0.6,1.4) to prevent the bounding box from overstretching.
9. The method for fusing target tracking features using a graph attention network as claimed in claim 5 or 6, wherein the graph attention module CAM comprises a linear transformation operation and a similarity calculation operation, and the graph attention module CAM specifically comprises: feature f of search area s And template features f t Respectively split into two groups of 1 multiplied by c node sets N s And N t For each node in the two sets of nodes
Figure FDA0003658685150000031
And
Figure FDA0003658685150000032
after linear transformation, by
Figure FDA0003658685150000033
The similarity between two nodes is calculated using inner product, where W s And W t Is a linear matrix, e ij Representing nodes
Figure FDA0003658685150000034
And
Figure FDA0003658685150000035
similarity between them; after all nodes have completed computation, e is added ij Input softmax function:
Figure FDA0003658685150000036
wherein a is ij Representing the similarity score after normalization; after the normalized similarity scores are obtained, linear transformation is carried out on the two groups of nodes, and the information of each template node embedded into the nodes in the search area is respectively calculated according to the similarity scores:
Figure FDA0003658685150000037
wherein W g Representing a linear variation matrix, g i Indicating that is to be aggregated to a node
Figure FDA0003658685150000038
Template information of (2); to pair
Figure FDA0003658685150000039
After the same linear transformation, by
Figure FDA00036586851500000310
The polymerization feature g i And search area node characteristics
Figure FDA00036586851500000311
Is spliced, wherein
Figure FDA00036586851500000312
Is an enhanced node feature.
10. The method of fusing target tracking features using an attentive force network of claim 4, wherein: the feature extraction module also includes an SGD to fine-tune parameters in the feature extraction network during training.
CN202210569792.9A 2022-05-24 2022-05-24 Method for fusing target tracking features by using graph attention network Pending CN115187629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210569792.9A CN115187629A (en) 2022-05-24 2022-05-24 Method for fusing target tracking features by using graph attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210569792.9A CN115187629A (en) 2022-05-24 2022-05-24 Method for fusing target tracking features by using graph attention network

Publications (1)

Publication Number Publication Date
CN115187629A true CN115187629A (en) 2022-10-14

Family

ID=83514297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210569792.9A Pending CN115187629A (en) 2022-05-24 2022-05-24 Method for fusing target tracking features by using graph attention network

Country Status (1)

Country Link
CN (1) CN115187629A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578421A (en) * 2022-11-17 2023-01-06 中国石油大学(华东) Target tracking algorithm based on multi-graph attention machine mechanism

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578421A (en) * 2022-11-17 2023-01-06 中国石油大学(华东) Target tracking algorithm based on multi-graph attention machine mechanism
CN115578421B (en) * 2022-11-17 2023-03-14 中国石油大学(华东) Target tracking algorithm based on multi-graph attention machine mechanism

Similar Documents

Publication Publication Date Title
CN111259850A (en) Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN112784798A (en) Multi-modal emotion recognition method based on feature-time attention mechanism
CN111400536B (en) Low-cost tomato leaf disease identification method based on lightweight deep neural network
CN112489081B (en) Visual target tracking method and device
CN110120065B (en) Target tracking method and system based on hierarchical convolution characteristics and scale self-adaptive kernel correlation filtering
CN111612817A (en) Target tracking method based on depth feature adaptive fusion and context information
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN113705655B (en) Three-dimensional point cloud full-automatic classification method and deep neural network model
CN109740679A (en) A kind of target identification method based on convolutional neural networks and naive Bayesian
CN115393351B (en) Method and device for judging cornea immune state based on Langerhans cells
CN113343755A (en) System and method for classifying red blood cells in red blood cell image
CN114511912A (en) Cross-library micro-expression recognition method and device based on double-current convolutional neural network
CN115187629A (en) Method for fusing target tracking features by using graph attention network
CN115330833A (en) Fruit yield estimation method with improved multi-target tracking
CN113379788B (en) Target tracking stability method based on triplet network
CN114492755A (en) Target detection model compression method based on knowledge distillation
CN113989328A (en) Discriminant correlation filtering tracking method and system fusing multilayer depth features
CN111860601B (en) Method and device for predicting type of large fungi
CN111144497B (en) Image significance prediction method under multitasking depth network based on aesthetic analysis
CN113288157A (en) Arrhythmia classification method based on depth separable convolution and improved loss function
CN116935411A (en) Radical-level ancient character recognition method based on character decomposition and reconstruction
CN116883393A (en) Metal surface defect detection method based on anchor frame-free target detection algorithm
WO2023061174A1 (en) Method and apparatus for constructing risk prediction model for autism spectrum disorder
CN115631526A (en) Shielded facial expression recognition method based on self-supervision learning technology and application
CN116168060A (en) Deep twin network target tracking algorithm combining element learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination