CN115578421A - Target tracking algorithm based on multi-graph attention machine mechanism - Google Patents

Target tracking algorithm based on multi-graph attention machine mechanism Download PDF

Info

Publication number
CN115578421A
CN115578421A CN202211438781.3A CN202211438781A CN115578421A CN 115578421 A CN115578421 A CN 115578421A CN 202211438781 A CN202211438781 A CN 202211438781A CN 115578421 A CN115578421 A CN 115578421A
Authority
CN
China
Prior art keywords
target
classification
branch
graph
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211438781.3A
Other languages
Chinese (zh)
Other versions
CN115578421B (en
Inventor
齐玉娟
闫石磊
叶志鹏
王延江
刘宝弟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202211438781.3A priority Critical patent/CN115578421B/en
Publication of CN115578421A publication Critical patent/CN115578421A/en
Application granted granted Critical
Publication of CN115578421B publication Critical patent/CN115578421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking algorithm based on a multi-graph attention machine mechanism, which belongs to the technical field of general image data processing or generation and is used for tracking a target in a video, wherein a first frame picture and a subsequent frame in the video are respectively used as the input of a template branch and a search branch, and feature extraction is carried out on the first frame picture and the subsequent frame picture through a twin network; inputting the output characteristics obtained in the last step into a graph attention module to perform cross-correlation operation; inputting the output obtained in the last step into an anchor-free tracking head network, obtaining the classification score of each pixel point in the characteristic diagram through classification branches, obtaining the distance relation between each pixel point and a target center through centrality branches, and obtaining target frame information corresponding to each pixel point through regression branches; and multiplying the classification score by the central degree branch to obtain an accurate classification score, finding out the pixel points with the highest scores and the corresponding target frame information to obtain the position of the target of the current frame, and repeating the steps.

Description

Target tracking algorithm based on multi-graph attention machine mechanism
Technical Field
The invention discloses a target tracking algorithm based on a multi-graph attention machine mechanism, and belongs to the technical field of general image data processing or generation.
Background
Target tracking is one of three main flow directions of computer vision, and is always concerned by people, and with continuous deep research on the target tracking, the application field of the target tracking is wider, and the target tracking is applied to the fields of intelligent monitoring, vehicle tracking, man-machine interaction and the like. In practical applications, various complex and changeable scenes are often encountered, such as the target is blocked, the background is complex and changeable, the appearance of the target is changed, the motion blur and the like, and the existing tracker cannot well deal with the problems, so that the moving target tracking still faces huge challenges, and people need to continuously explore and improve a target tracking algorithm.
Single object tracking refers to a given object to be tracked in a first frame of a video and then tracking the object in subsequent frames. The previous research is mainly based on an algorithm based on relevant filtering, and with the development of deep learning, the strong feature extraction capability of a convolutional neural network is also widely concerned by people, and the research direction of target tracking is gradually changed to the deep learning direction.
Branches also gradually appear in the research process of the target tracking algorithm based on deep learning, wherein the target tracking algorithm based on the twin network enables the tracker to reasonably balance the tracking speed and the tracking precision by virtue of unique advantages. However, when the existing tracker faces the situations of fuzzy target, disordered background and the like, the characteristics of the target are difficult to accurately extract, and the position of the target cannot be accurately detected. On the other hand, most twin network trackers perform similarity matching with a search area by taking the characteristics of the whole template picture as a core, the state of a target in a tracking process is not fixed, when the target is deformed or shielded, the global characteristics of the target can change, and the accuracy of a final result can be influenced by performing global similarity matching.
Disclosure of Invention
The invention aims to provide a target tracking algorithm based on a multi-graph attention machine system, and aims to solve the problems that in the prior art, the target tracking algorithm cannot be accurately positioned to the position of a target due to the change of global characteristics of the target, the existing network characteristic extraction capability cannot cope with the complexity and the variability of a target background, and the like.
A multi-graph attention machine mechanism based target tracking algorithm, comprising:
s1, respectively taking a first frame picture and a subsequent frame in a video as input of a template branch and a search branch, and performing feature extraction on the first frame picture and the subsequent frame through a twin network;
s2, inputting the output characteristics obtained in the S1 into a graph attention module to perform mutual correlation operation;
s3, inputting the output obtained in the S2 into an anchor-free tracking head network, obtaining the classification score of each pixel point in the characteristic diagram through classification branches, obtaining the distance relation between each pixel point and a target center through centrality branches, and obtaining target frame information corresponding to each pixel point through regression branches;
s4, multiplying the classification fraction obtained in the step S3 by the central degree branch to obtain an accurate classification fraction, finding out the pixel point with the highest fraction and the corresponding target frame information thereof, and obtaining the position of the target of the current frame;
and S5, repeating S1 to S4 until the positions of the targets in all the subsequent frames of the video are obtained.
The twin network in the S1 is GoogleNet sharing weight, an Inception V3 structure is used, the twin network is combined with a SimAM attention mechanism, and the specific operation is as follows:
adjusting the Inception V3 structure of GoogleNet, only using the convolution and pooling layer in front of the Inception V3 and the three modules of Inception A, inception B and Inception C, and using none of the following Inception module and other network layers;
an attention module is added, one SimAM attention module is added after each of the three included modules, and one SimAM attention module is added after the first and third included modules.
The concrete construction process of the graph attention module of the S2 comprises the following steps:
s2.1, composing the feature images of the template frame and the search frame, and enabling each feature image to be in the feature images
Figure 156323DEST_PATH_IMAGE002
The size part is used as a node to construct a corresponding bipartite graph
Figure 108619DEST_PATH_IMAGE004
Wherein the node setVFrom template subgraph
Figure 842351DEST_PATH_IMAGE006
Node (b) of
Figure 50610DEST_PATH_IMAGE008
And searching subgraphs
Figure 271507DEST_PATH_IMAGE010
Node (a) of
Figure 969335DEST_PATH_IMAGE012
The components of the composition are as follows,
Figure 174052DEST_PATH_IMAGE014
and is also a set of nodes that are,
Figure 513897DEST_PATH_IMAGE016
s2.2. According to the constructed bipartite graphGFor is to
Figure 589301DEST_PATH_IMAGE018
And
Figure 317085DEST_PATH_IMAGE020
the similarity of the nodes is solved, and three graph attention modules are usedRespectively operating the two images to obtain corresponding similarity graphs;
s2.3, obtaining three similarity graphs
Figure 743519DEST_PATH_IMAGE022
Are normalized by softmax respectively
Figure 214951DEST_PATH_IMAGE018
Middle node pair
Figure 410440DEST_PATH_IMAGE023
The attention of the middle node is obtained
Figure 777968DEST_PATH_IMAGE023
Arbitrary node ofjBy polymerization of
Figure 426118DEST_PATH_IMAGE025
S2.4. Polymerization characteristics to be obtained
Figure 435662DEST_PATH_IMAGE026
And
Figure 16816DEST_PATH_IMAGE023
the linear characteristics of the corresponding nodes are fused to obtain characteristic expression
Figure 24087DEST_PATH_IMAGE028
S2.5, obtaining all nodes through the operationjIs characteristic of
Figure 956271DEST_PATH_IMAGE029
Corresponding three complete characteristic maps are obtainedFAnd fusing the two to obtain a final feature expression for subsequent positioning and tracking.
S3, the tracking head network is divided into a classification branch and a regression branch, and the classification branch distinguishes the category of the target and positions the target; the regression branch regresses a target frame of the target to obtain scale information of the target;
the response graph obtained by the classification branch is shown as
Figure 503927DEST_PATH_IMAGE031
WhereinRWhich represents the size of the response map and,
Figure 939587DEST_PATH_IMAGE033
respectively representing the height and the width of the response graph, 2 representing the number of channels of the response graph, and storing classification scores of all pixel points in the two channels, wherein the classification scores are respectively the probability of a positive sample and the probability of a negative sample;
the final response graph of the regression branch is
Figure 435202DEST_PATH_IMAGE035
Wherein each pixel point is in one-to-one correspondence with a pixel point of the classification response map, and each point (c), (d)ij) The corresponding four channels contain the distance of the point from each edge of the bounding box, denoted as
Figure 972190DEST_PATH_IMAGE036
Figure 792379DEST_PATH_IMAGE037
Is shown byij) A corresponding set of four channels is provided,
Figure 82546DEST_PATH_IMAGE039
respectively, the distance of the point from the four sides of the bounding box.
The classification branch and the centrality branch use a cross entropy loss function to calculate the accuracy of the classification and the accuracy of the centrality score, respectively, the regression branch uses an IOU loss function, the final loss of the whole network
Figure 759515DEST_PATH_IMAGE041
Expressed as:
Figure 135133DEST_PATH_IMAGE043
wherein
Figure 564539DEST_PATH_IMAGE045
Figure 178054DEST_PATH_IMAGE047
And
Figure 432449DEST_PATH_IMAGE049
are respectively set as 1, 1 and 2,
Figure 701887DEST_PATH_IMAGE050
and
Figure 67141DEST_PATH_IMAGE052
the classification loss, center loss and regression loss are indicated, respectively.
S4 response diagram of the central branch is
Figure 66321DEST_PATH_IMAGE054
The center degree score of each pixel point isC(ij) Will beC(ij) And multiplying the classification score to further obtain a more accurate target score.
Compared with the prior art, the method has the advantages that the Inception V3 structure of GoogleNet is used and modified to be more suitable for the model provided by the invention, the training parameters are reduced, and simultaneously, the method is combined with the SimAM attention mechanism, so that the target feature extraction capability in the complex background and target blurring process is greatly improved without adding new parameters, and the subsequent target position positioning accuracy is improved; by constructing a plurality of bipartite graphs on the characteristic graphs of the template branches and the search branches, the traditional global matching mode taking the whole template picture as a core is converted into local characteristic matching, the problem of inaccurate characteristic matching when a target is deformed, shielded and the like in the tracking process is effectively solved, the accuracy of classifying each pixel point in the characteristic graphs is improved, and the tracking accuracy of the tracker is improved.
Drawings
FIG. 1 is a technical flow chart of the present invention.
Fig. 2 is an overall block diagram of the present invention.
FIG. 3 is a schematic diagram of the SimAM attention mechanism of the present invention.
FIG. 4 is a block diagram of the graph attention module of the present invention.
Fig. 5 is a graph comparing the accuracy of the present invention and existing tracking algorithms on the UAV 123.
Fig. 6 is a graph comparing the success rate of the present invention and existing tracking algorithms on the UAV 123.
Detailed Description
To make the objects, technical solutions and advantages of the present invention clearer and more complete, the technical solutions of the present invention are described below clearly, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A multi-graph attention machine mechanism based target tracking algorithm, comprising:
s1, respectively taking a first frame picture and a subsequent frame in a video as input of a template branch and a search branch, and performing feature extraction on the first frame picture and the subsequent frame through a twin network;
s2, inputting the output characteristics obtained in the S1 into a graph attention module to perform mutual correlation operation;
s3, inputting the output obtained in the S2 into an anchor-free tracking head network, obtaining the classification score of each pixel point in the characteristic diagram through classification branches, obtaining the distance relation between each pixel point and a target center through centrality branches, and obtaining target frame information corresponding to each pixel point through regression branches;
s4, multiplying the classification fraction obtained in the step S3 by the central degree branch to obtain an accurate classification fraction, finding out the pixel point with the highest fraction and the corresponding target frame information thereof, and obtaining the position of the target of the current frame;
and S5, repeating the steps from S1 to S4 until the positions of the targets in all the subsequent frames of the video are obtained.
The twin network in the S1 is GoogleNet sharing weight, an Inception V3 structure is used, the twin network is combined with a SimAM attention mechanism, and the specific operation is as follows:
adjusting the InceptitionV 3 structure of GoogleNet, wherein only the convolution and pooling layer in front of the InceptitionV 3 and the InceptitionA, the InceptitionB and the InceptitionC are used, and the following Inception module and other network layers are not used;
an attention module is added, one SimAM attention module is added after each of the three included modules, and one SimAM attention module is added after the first and third included modules.
The concrete construction process of the graph attention module of the S2 comprises the following steps:
s2.1, composing the feature images of the template frame and the search frame, and enabling each feature image to be in the feature images
Figure 85092DEST_PATH_IMAGE002
The size part is used as a node to construct a corresponding bipartite graph
Figure 169723DEST_PATH_IMAGE004
Wherein the node setVFrom template subgraph
Figure 135405DEST_PATH_IMAGE006
Node (a) of
Figure 51408DEST_PATH_IMAGE008
And searching subgraphs
Figure 14702DEST_PATH_IMAGE010
Node (a) of
Figure 586628DEST_PATH_IMAGE056
The components of the composition are as follows,
Figure 90422DEST_PATH_IMAGE058
is also a set of nodes, which are,
Figure 126511DEST_PATH_IMAGE016
s2.2. According to the constructed bipartite graphGFor is to
Figure 159189DEST_PATH_IMAGE018
And
Figure 483991DEST_PATH_IMAGE060
solving the similarity of the nodes, and respectively operating the nodes by using three graph attention modules to obtain corresponding similarity graphs;
s2.3, obtaining three similarity graphs
Figure 791476DEST_PATH_IMAGE062
Are normalized by softmax respectively
Figure 416492DEST_PATH_IMAGE018
Middle node pair
Figure 885651DEST_PATH_IMAGE063
The attention of the middle node is obtained
Figure 432170DEST_PATH_IMAGE063
Arbitrary node ofjPolymerization feature of (2)
Figure 340083DEST_PATH_IMAGE025
S2.4. Polymerization characteristics to be obtained
Figure 288448DEST_PATH_IMAGE026
And
Figure 194087DEST_PATH_IMAGE063
the linear characteristics of the corresponding nodes are fused to obtain characteristic expression
Figure 30499DEST_PATH_IMAGE065
S2.5, obtaining all nodes through the operationjIs characterized by expression of
Figure 883048DEST_PATH_IMAGE066
Corresponding three complete characteristic maps are obtainedFThey are fusedAnd obtaining a final feature expression for subsequent positioning and tracking.
S3, the tracking head network is divided into a classification branch and a regression branch, and the classification branch distinguishes the category of the target and positions the target; the regression branch regresses a target frame of the target to obtain scale information of the target;
the response graph obtained by the classification branch is shown as
Figure 623602DEST_PATH_IMAGE068
In whichRWhich represents the size of the response map and,
Figure 168984DEST_PATH_IMAGE070
respectively representing the height and the width of the response graph, 2 representing the number of channels of the response graph, and storing classification scores of all pixel points in the two channels, wherein the classification scores are respectively the probability of a positive sample and the probability of a negative sample;
the final response graph of the regression branch is
Figure 893358DEST_PATH_IMAGE072
Wherein each pixel point is in one-to-one correspondence with a pixel point of the classification response map, and each point (c), (d)ij) The corresponding four channels contain the distance of the point from each edge of the bounding box, denoted as
Figure 346336DEST_PATH_IMAGE036
Figure 962301DEST_PATH_IMAGE037
Is represented by (ij) A corresponding set of four channels is provided,
Figure 6480DEST_PATH_IMAGE039
respectively, the distance of the point from the four sides of the bounding box.
The classification branch and the centrality branch use a cross entropy loss function to calculate the accuracy of the classification and the accuracy of the centrality score, respectively, the regression branch uses an IOU loss function, the final loss of the whole network
Figure 749308DEST_PATH_IMAGE074
Expressed as:
Figure 967098DEST_PATH_IMAGE075
in which
Figure 888436DEST_PATH_IMAGE045
Figure 713304DEST_PATH_IMAGE047
And
Figure 146690DEST_PATH_IMAGE049
are respectively set as 1, 1 and 2,
Figure 144733DEST_PATH_IMAGE050
and
Figure 714386DEST_PATH_IMAGE077
the classification loss, center loss, and regression loss are expressed respectively.
S4 response diagram of the centrality branch is
Figure 772472DEST_PATH_IMAGE079
The center degree score of each pixel point isC(ij) Will beC(ij) And multiplying the classification score to further obtain a more accurate target score.
Now explaining part of the english meaning in the present invention, googleNet: a deep learning network architecture, inclusion v3: a neural network structure, simAM: a three-dimensional attention mechanism, incorporated b, incorporated c: a specific network module in GoogleNet, padding: filling, IOU: one measure is a criterion for the accuracy with which a respective object is detected in a particular data set. The IoU is the result of dividing the overlapping part of the two regions by the aggregation part of the two regions, and is compared with the IoU calculation result through a set threshold value. UAV123: a data set for testing tracker performance. CNN: convolutional neural network, group truth: artificially mark the approximate range of the object to be detected in the training set images, resNet: residual neural networks, alexNet: a deep learning network architecture, GOT10K: a data set for testing tracker performance. COCO, imageNet DET, imageNet VID, and YouTube-BB: target tracking a common training set, the data sets used to train the network, siamGat, siamCar, KCF, ocean-online, CFNet, MDNet, ECO, siamFC, SPM, siamRPN + +, siamFC + +, CGACD, siamBAN, siamRPN, siamww: some more advanced tracking algorithms for the tracking direction of objects.
The technical process of the invention is as shown in figure 1, and an integral network of the model is constructed, wherein the integral network consists of a feature extraction module, a graph attention module and a tracking head network. The characteristic extraction module consists of two CNNs shared by weights and is used for respectively extracting the characteristics of the template picture and the search area; the graph attention module is mainly used for solving the similarity between the template picture and the search area and embedding the characteristic information of the template into the search area; the tracking head network consists of classification and regression branches and is used for positioning and tracking the target, and the twin network structure of the invention is shown in table 1 and fig. 2.
TABLE 1
Figure 21051DEST_PATH_IMAGE081
The SimAM attention mechanism is inspired by the human brain attention mechanism, and a 3D attention weight can be derived for the feature map without additional parameters, as shown in fig. 3, which is described in detail as follows: in neuroscience, information-rich neurons typically exhibit different firing patterns than peripheral neurons, and activation of neurons typically inhibits peripheral neurons, i.e., spatial domain inhibition. Neurons with spatial domain inhibitory effects should therefore be given higher importance. To find these neurons, one can measure the linear separability between one target neuron and the other neurons. Based on the findings of neuroscience, the SimAM defines an energy function, the minimization of the energy function is equivalent to training the linear separability between the neuron t and other neurons in the same channel, and a final energy function is obtained by adopting a binary label and adding a regular term. Can be used forThe lower the energy in the magnitude function, the more the neuron t differs from the peripheral neurons, and the higher the importance. Thus, the significance of a neuron can be determined by
Figure 557205DEST_PATH_IMAGE083
Thus obtaining the product. Inspired by the importance of the energy function and the mining neurons, enhancement processing of features is required as defined by the attention mechanism. The whole feature extraction process can be represented by the following operations:
Figure 778102DEST_PATH_IMAGE085
in which
Figure 803827DEST_PATH_IMAGE087
Which represents a convolution of the two signals of the signal,zandxrepresenting the input of the template branch and the search branch respectively,
Figure 900221DEST_PATH_IMAGE089
and
Figure 833542DEST_PATH_IMAGE091
the feature map obtained after feature extraction by inclusion V3 is shown.
The invention uses three graph attention modules to operate the graph respectively, and the obtained similarity graph can be expressed as follows:
Figure 174525DEST_PATH_IMAGE093
(ii) a Wherein
Figure DEST_PATH_IMAGE095
And
Figure DEST_PATH_IMAGE097
respectively represent
Figure DEST_PATH_IMAGE099
And
Figure DEST_PATH_IMAGE101
the vector of nodes of (a) is,
Figure DEST_PATH_IMAGE103
Figure DEST_PATH_IMAGE105
is passing through
Figure 665157DEST_PATH_IMAGE106
The convolution of (2) linearizes the node vector.
In order to solve the problems that a moving target is often exposed to illumination change, motion blur and the like. According to the method, bipartite graphs are established through the characteristics of template pictures and the characteristics of search areas, the local relation between nodes is established, then similarity calculation is carried out through a plurality of graph attention modules, and the detailed process is shown in figure 4.
Characteristics of polymerization
Figure 48515DEST_PATH_IMAGE108
Figure DEST_PATH_IMAGE109
k=1,2,3, polymerization characteristics to be obtained
Figure 801839DEST_PATH_IMAGE108
And
Figure 528486DEST_PATH_IMAGE110
the linearized characteristics of the corresponding nodes in the tree are fused to obtain more expressive characteristics
Figure 364855DEST_PATH_IMAGE112
Figure 481847DEST_PATH_IMAGE114
k =1,2,3, whereincatRepresenting the concatenation of features.
Obtaining the characteristic expression of all the nodes j through the operation
Figure DEST_PATH_IMAGE116
Also obtain corresponding threeAnd (4) fusing the complete characteristic graphs F to obtain a final characteristic expression for subsequent positioning and tracking.
Figure DEST_PATH_IMAGE118
Wherein
Figure DEST_PATH_IMAGE120
Showing the channel-wise stitching of the three signatures,
Figure DEST_PATH_IMAGE122
is a three feature map, then passes
Figure DEST_PATH_IMAGE124
And fusing the characteristic information by the convolution kernel with the size.
For faster regression networks, the classification branch will take the cross entropy loss function and the regression branch will take the IOU loss function. The upper left corner and the lower right corner of the bounding box of the target are respectively expressed by (
Figure DEST_PATH_IMAGE126
) And (a) and (b)
Figure DEST_PATH_IMAGE128
) And (4) showing. Any point in the search area (xy) The distance from the perimeter of the bounding box may be expressed as:
Figure DEST_PATH_IMAGE130
Figure DEST_PATH_IMAGE132
lis the distance of any point from the left bounding box,ris the distance of any point from the right bounding box,tthe distance from any point to the upper bounding box,bis the distance from any point to the lower bounding box.
Calculating the difference between the group route bounding box and the prediction box through an IOU loss function, and regressing the target box.
According to the investigation, the score of the classification branch is not necessarily accurateIndicating the location of the target and most of the high quality target frames are generated at the center of the target. The invention decides to add a central degree branch to the classification branch to further evaluate the classification score, and the response graph of the central degree branch
Figure DEST_PATH_IMAGE134
A centrality score C (for each pixel point)ij) Expressed as:
Figure DEST_PATH_IMAGE136
Figure DEST_PATH_IMAGE138
in order to be an index function of the target,
Figure DEST_PATH_IMAGE140
by mixingC(ij) And the classification scores are multiplied to further obtain more accurate target scores, so that the positioning is more accurate.
The method of the present invention was tested experimentally on GOT10K and UAV123 and compared to some of the currently more advanced trackers. When comparing on the UAV123 data set, the model used by the invention is trained by only one data set GOT10K, and the other trackers are models trained by four data sets COCO, imageNet DET, imageNet VID and YouTube-BB.
UAV123 contains 123 fully annotated high definition video datasets and benchmarks captured from low altitude aerial perspectives. The system comprises multiple attributes of aspect ratio change, background clutter, camera motion, rapid motion, complete shielding, illumination change, low resolution, visual field emergence, partial shielding, similar targets, scale change and visual angle change, and can better test the comprehensive performance of the tracker. The GOT10K test set consists of 180 video sequences, comprises 84 moving objects and 32 motion modes, can enable a test experiment to be closer to reality, and can better evaluate the performance of a tracker.
The tracker of the present invention was tested and evaluated together with advanced trackers such as SiamGat, siamCar, KCF, ocean, etc. on GOT10K, and the final results are shown in table 2.
TABLE 2
Figure DEST_PATH_IMAGE142
AO represents the average overlap rate between the prediction box and the true target box of the tracker,
Figure DEST_PATH_IMAGE144
and
Figure DEST_PATH_IMAGE146
respectively representing the probability that the overlap ratio of the prediction frame and the real target frame is more than 50% and more than 75% in the prediction frame successfully tracked to the target, which can more accurately evaluate the tracking precision of the tracker. It can be seen from the table that the tracker of the present invention achieves good results in terms of overall performance.
Comparing the tracker of the present invention with advanced trackers such as Ocean, siamRPN + +, siamCAR, etc., the resulting tracking accuracy map and tracking accuracy map are shown in fig. 5 and 6, respectively, fig. 5 is an OPE accuracy map on the UAV123, including accuracy and position error thresholds, and it can be seen from the figures that the model of the present invention has great advantages in both accuracy and precision, when smaller data sets are used.
From the test results on the two data sets of GOT10K and UAV123, it can be seen that the tracker of the present invention has a very good improvement in the comprehensive performance, which also verifies the effectiveness of the algorithm proposed by the present invention.
Although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or some or all of the technical features may be equivalently replaced, and the modifications or the replacements may not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A target tracking algorithm based on a multi-graph attention mechanism is characterized by comprising the following steps:
s1, respectively taking a first frame picture and a subsequent frame in a video as input of a template branch and a search branch, and performing feature extraction on the first frame picture and the subsequent frame through a twin network;
s2, inputting the output characteristics obtained in the S1 into a graph attention module to perform cross-correlation operation;
s3, inputting the output obtained in the S2 into an anchor-free tracking head network, obtaining the classification score of each pixel point in the characteristic diagram through classification branches, obtaining the distance relation between each pixel point and a target center through centrality branches, and obtaining target frame information corresponding to each pixel point through regression branches;
s4, multiplying the classification fraction obtained in the step S3 by the central degree branch to obtain an accurate classification fraction, finding out the pixel point with the highest fraction and the corresponding target frame information thereof, and obtaining the position of the target of the current frame;
and S5, repeating the steps from S1 to S4 until the positions of the targets in all the subsequent frames of the video are obtained.
2. The multi-graph attention machine mechanism-based target tracking algorithm of claim 1, wherein the twin network in S1 is google net sharing weight, using inclusion v3 structure, and the twin network is combined with the SimAM attention machine mechanism, and the specific operations are as follows:
adjusting the InceptitionV 3 structure of GoogleNet, wherein only the convolution and pooling layer in front of the InceptitionV 3 and the InceptitionA, the InceptitionB and the InceptitionC are used, and the following Inception module and other network layers are not used;
an attention module is added, one SimAM attention module is added after each of the three Inception modules, and one SimAM attention module is added after the first and third Inception C modules.
3. The multi-graph attention machine mechanism-based target tracking algorithm according to claim 1, wherein the graph attention module of the S2 is specifically constructed by the following process:
s2.1, composing the feature images of the template frame and the search frame, and enabling each feature image in the feature images to be combined
Figure 207091DEST_PATH_IMAGE001
The part of the size is used as a node to construct a corresponding bipartite graph
Figure 832107DEST_PATH_IMAGE002
Wherein the node setVFrom the template subgraph
Figure 707791DEST_PATH_IMAGE003
Node (b) of
Figure 785468DEST_PATH_IMAGE004
And searching subgraphs
Figure 958961DEST_PATH_IMAGE006
Node (b) of
Figure 297538DEST_PATH_IMAGE007
The components of the components are as follows,
Figure 734336DEST_PATH_IMAGE009
and is also a set of nodes that are,
Figure 96047DEST_PATH_IMAGE010
s2.2. According to the constructed bipartite graphGFor is to
Figure 895068DEST_PATH_IMAGE011
And
Figure 25835DEST_PATH_IMAGE012
solving the similarity of the nodes, and respectively operating the nodes by using three graph attention modules to obtain corresponding similarity graphs;
s2.3, obtaining three similarity graphs
Figure 633534DEST_PATH_IMAGE014
Normalized by softmax, respectively
Figure 544859DEST_PATH_IMAGE011
Middle node pair
Figure 325733DEST_PATH_IMAGE012
The attention of the middle node is obtained
Figure 514269DEST_PATH_IMAGE012
Arbitrary node ofjPolymerization feature of (2)
Figure 902656DEST_PATH_IMAGE016
S2.4. Polymerization characteristics to be obtained
Figure 238959DEST_PATH_IMAGE016
And
Figure 761207DEST_PATH_IMAGE012
the linear characteristics of the corresponding nodes are fused to obtain characteristic expression
Figure 928883DEST_PATH_IMAGE018
S2.5, obtaining all nodes through the operationjIs characteristic of
Figure 878385DEST_PATH_IMAGE019
Corresponding three complete characteristic maps are obtainedFFusing them to obtain final characteristic expression for subsequent usePositioning and tracking.
4. The multi-graph attention machine mechanism-based target tracking algorithm is characterized in that the tracking head network of the S3 is divided into a classification branch and a regression branch, wherein the classification branch distinguishes the category of the target and positions the target; the regression branch regresses a target frame of the target to obtain scale information of the target;
the response graph obtained by the classification branch is shown as
Figure 701984DEST_PATH_IMAGE021
In whichRWhich represents the size of the response map and,
Figure 637711DEST_PATH_IMAGE023
respectively representing the height and the width of the response graph, 2 representing the number of channels of the response graph, and storing classification scores of all pixel points in the two channels, wherein the classification scores are respectively the probability of a positive sample and the probability of a negative sample;
the final response of the regression branch is plotted as
Figure 535259DEST_PATH_IMAGE025
Wherein each pixel point is in one-to-one correspondence with a pixel point of the classification response map, each point (ij) The corresponding four channels contain the distance of the point from each edge of the bounding box, denoted as
Figure 717979DEST_PATH_IMAGE026
Figure 825612DEST_PATH_IMAGE027
Is shown byij) A corresponding set of four channels is provided,
Figure 689663DEST_PATH_IMAGE028
respectively, the distance of the point from the four sides of the bounding box.
5. According to the rightThe multi-graph attention machine mechanism-based target tracking algorithm of claim 3, wherein the classification branch and the centrality branch use a cross entropy loss function to calculate the accuracy of classification and the accuracy of the centrality score respectively, the regression branch uses an IOU loss function, and the final loss of the whole network
Figure 769615DEST_PATH_IMAGE030
Expressed as:
Figure 670706DEST_PATH_IMAGE032
wherein
Figure 406581DEST_PATH_IMAGE034
Figure 871060DEST_PATH_IMAGE035
And
Figure 133414DEST_PATH_IMAGE036
are respectively set as 1, 1 and 2,
Figure 330040DEST_PATH_IMAGE037
and
Figure DEST_PATH_IMAGE039
the classification loss, center loss and regression loss are indicated, respectively.
6. The multi-graph attention machine mechanism-based target tracking algorithm as claimed in claim 5, wherein the response graph of the centrality branch of S4 is
Figure DEST_PATH_IMAGE041
The center degree score of each pixel point isC(ij) Will beC(ij) And multiplying the classification score to further obtain a more accurate target score.
CN202211438781.3A 2022-11-17 2022-11-17 Target tracking algorithm based on multi-graph attention machine mechanism Active CN115578421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211438781.3A CN115578421B (en) 2022-11-17 2022-11-17 Target tracking algorithm based on multi-graph attention machine mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211438781.3A CN115578421B (en) 2022-11-17 2022-11-17 Target tracking algorithm based on multi-graph attention machine mechanism

Publications (2)

Publication Number Publication Date
CN115578421A true CN115578421A (en) 2023-01-06
CN115578421B CN115578421B (en) 2023-03-14

Family

ID=84589711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211438781.3A Active CN115578421B (en) 2022-11-17 2022-11-17 Target tracking algorithm based on multi-graph attention machine mechanism

Country Status (1)

Country Link
CN (1) CN115578421B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090043818A1 (en) * 2005-10-26 2009-02-12 Cortica, Ltd. Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof
CN113256677A (en) * 2021-04-16 2021-08-13 浙江工业大学 Method for tracking visual target with attention
CN114707604A (en) * 2022-04-07 2022-07-05 江南大学 Twin network tracking system and method based on space-time attention mechanism
CN114821390A (en) * 2022-03-17 2022-07-29 齐鲁工业大学 Twin network target tracking method and system based on attention and relationship detection
CN115187629A (en) * 2022-05-24 2022-10-14 浙江师范大学 Method for fusing target tracking features by using graph attention network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090043818A1 (en) * 2005-10-26 2009-02-12 Cortica, Ltd. Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof
CN113256677A (en) * 2021-04-16 2021-08-13 浙江工业大学 Method for tracking visual target with attention
CN114821390A (en) * 2022-03-17 2022-07-29 齐鲁工业大学 Twin network target tracking method and system based on attention and relationship detection
CN114707604A (en) * 2022-04-07 2022-07-05 江南大学 Twin network tracking system and method based on space-time attention mechanism
CN115187629A (en) * 2022-05-24 2022-10-14 浙江师范大学 Method for fusing target tracking features by using graph attention network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIN-KUI: "A visual attention model for robot object tracking", 《INTERNATIONAL JOURNAL OF AUTOMATION & COMPUTING》 *
董吉富等: "基于注意力机制的在线自适应孪生网络跟踪算法", 《激光与光电子学进展》 *

Also Published As

Publication number Publication date
CN115578421B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
Zhang et al. SCSTCF: spatial-channel selection and temporal regularized correlation filters for visual tracking
Liu et al. Overview and methods of correlation filter algorithms in object tracking
Liu et al. Small traffic sign detection from large image
CN109165540B (en) Pedestrian searching method and device based on prior candidate box selection strategy
Peng et al. Rgb-t crowd counting from drone: A benchmark and mmccn network
CN107871106A (en) Face detection method and device
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN113256677A (en) Method for tracking visual target with attention
CN112884742A (en) Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method
Wang et al. Pyramid-dilated deep convolutional neural network for crowd counting
Wang et al. AutoScaler: Scale-attention networks for visual correspondence
CN111881804A (en) Attitude estimation model training method, system, medium and terminal based on joint training
Cao et al. FDTA: Fully convolutional scene text detection with text attention
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Jin et al. The Open Brands Dataset: Unified brand detection and recognition at scale
Tong et al. Transformer based line segment classifier with image context for real-time vanishing point detection in manhattan world
Ma et al. Robust line segments matching via graph convolution networks
CN114492634A (en) Fine-grained equipment image classification and identification method and system
Liu et al. Graph matching based on feature and spatial location information
CN117557804A (en) Multi-label classification method combining target structure embedding and multi-level feature fusion
Fan et al. Generating high quality crowd density map based on perceptual loss
CN115578421B (en) Target tracking algorithm based on multi-graph attention machine mechanism
Japar et al. Coherent group detection in still image
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant