CN114926652A - Twin tracking method and system based on interactive and convergent feature optimization - Google Patents

Twin tracking method and system based on interactive and convergent feature optimization Download PDF

Info

Publication number
CN114926652A
CN114926652A CN202210600748.XA CN202210600748A CN114926652A CN 114926652 A CN114926652 A CN 114926652A CN 202210600748 A CN202210600748 A CN 202210600748A CN 114926652 A CN114926652 A CN 114926652A
Authority
CN
China
Prior art keywords
features
template
layer
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210600748.XA
Other languages
Chinese (zh)
Inventor
陈思
许瑞
王大寒
朱顺痣
吴芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University of Technology
Original Assignee
Xiamen University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University of Technology filed Critical Xiamen University of Technology
Priority to CN202210600748.XA priority Critical patent/CN114926652A/en
Publication of CN114926652A publication Critical patent/CN114926652A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a twin tracking method and a twin tracking system based on interactive and convergent feature optimization, wherein the method comprises the following steps: initializing a template image and a search area image; constructing a feature extraction network to obtain template multilayer features and search area multilayer features; constructing a gated double-view aggregation module to optimize the characteristics of the multilayer template; constructing a semantic-guided attention module to realize coarse-grained feature optimization of a search area; constructing a correlation graph aggregation module to realize fine-grained feature optimization of a search area; and constructing a head prediction network, and predicting the position of the target of the current frame. According to the method and the system, the target significant features are enhanced through self-attention aggregation and interaction of the template features and the search area features, and background noise is suppressed, so that a more stable, robust and accurate tracking result is obtained in a challenging scene.

Description

Twin tracking method and system based on interaction and convergent feature optimization
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a twin tracking method and system based on interaction and convergent feature optimization.
Background
In the field of computer vision, target tracking is one of the most important and active research subjects, and has wide application in the aspects of unmanned driving, intelligent security, human-computer interaction, unmanned aerial vehicles and the like. For a single target tracker or tracking system, it is intended to continuously predict the spatial position of an object in a subsequent video sequence based on the given arbitrary object coordinate information in the first frame.
In recent years, the application of twin networks in the field of target tracking has been greatly developed, which not only benefits from deep learning to obtain better feature expression, but also can realize real-time tracking speed through means such as parameter sharing and offline training, and thus the twin networks are becoming the mainstream of the research in the tracking field. The basic idea of the twin network-based tracking algorithm is as follows: and taking a target area corresponding to the real frame in the first frame of the video as a template, and taking the subsequent frame as a search area. And in the tracking process, the area most similar to the template is matched in the search area and is used as the predicted position of the target of the current frame. SiamFC (berthinetto L, Valmadre J, henqueries J F, et al. full-conditional parameters networks for object tracking. proceedings of the European Conference reference Computer Vision works, 2016, pp.850-865.) and SiamRPN (Li B, Yan J, Wu W, et al. high performance video tracking with parameter registration network protocol) get an appearance model by training offline on a large data set and do not perform parameter updates while tracking online, thus not only is tracker accuracy high, but also has advantages in particular in speed. However, due to the fixed template, the tracker is not particularly sensitive to changes in the appearance of the target and is also susceptible to interference from similar and complex backgrounds. In order to accommodate changes in the appearance of the target, CFNet (Valmadre J, Bertonitto L, Henriques JF, et al. End-to-end representation learning for correlation filter based tracking. Procedings of the IEEE Conference on Computer Vision and Pattern registration.2017, pp.5000-5008.) and RASNet (Wang Q, Teng Z, Xing J, et al. learning entries: identification Z, calculation aspect position for high performance on line Vision tracking. tracking IEEE Conference Computer Vision and Pattern registration.2018, pp.4854-4863) embed the relevant template and local branch attention module, respectively, by updating the local module with the filter parameters. GradNet (Li P, ChenB, Ouyang W, et al.GradNet: Gradient-defined network for visual object tracking. proceedings of the IEEE International Conference on computer Vision.2019, pp.6161-6170.) and UpdateNet (Zhang L, Abel G, Joost V, et al.learning the model data for parameter tracking. proceedings of IEEE International Conference on computer Vision.2019, pp.4009-4018.) template parameter updates are achieved using a process of web iterative learning. Relative to GradNet and UpdateNet, which directly update parameters on the first frame template, MemDTC (Yang T, Antoni B. visual tracking via dynamic network, IEEE Transactions on Pattern Analysis and Machine analysis.2021, pp.360-374) stores reliable target templates in the tracking process by adding a memory unit, so that effective information of the first frame template can be completely saved, and quick recovery of the tracker in case of drift can be facilitated. In addition, in order to improve the discrimination ability of the twin tracker for similar and complex backgrounds, an interference sensing module capable of incremental learning is designed on line by DaSiamRPN (Zhu Z, Wang Q, Li B, et al. Disfractor-aware networks for visual object tracking. proceedings of the European Conference on Computer Vision.2018, pp.103-119.). Nocal-Sim (Tan H, Zhang X, Zhang Z, et al. Nocal-Sim: learning visual characteristics and response with advanced non-local blocks for real-time parameter tracking. IEEE Transactions on Image processing.2021, pp.30:2656 and 2668.) utilizes the remote dependence of non-local attention to enhance the learning of feature weights associated with the target. siamDW (Zhang Z, Peng H. De and wire dimension networks for real-time visual tracking. proceedings of the IEEE Conference on computer Vision and Pattern registration.2019, pp.4591-4600.) designs a deeper and wider network architecture for the twin tracker, and further excavates the feature extraction and discrimination capability of the depth network.
Although the twin tracking algorithm has made some progress in how to design deeper and wider backbone networks, better matching methods, more accurate output representation, and more efficient online updating mechanisms, a more effective solution for scenes such as similar interference, complex background, and occlusion is still lacking.
Disclosure of Invention
The invention aims to provide a twin tracking method and system based on interactive and convergent feature optimization, which are beneficial to obtaining more stable, robust and accurate tracking results in a complex environment.
In order to realize the purpose, the invention adopts the technical scheme that: a twin tracking method based on interactive and convergent feature optimization comprises the following steps:
s1, initializing a template image and a search area image;
s2, constructing a feature extraction network, inputting a template image and a search area image, and acquiring corresponding template multilayer features F z And search region Multi-layer feature F x
S3, constructing a gating double-view aggregation module GDA to optimize template multilayer characteristics, and constructing template multilayer characteristics F z Inputting the GDA module to obtain optimized template multilayer characteristics
Figure BDA0003669794020000031
S4 attention module for constructing semantic guidanceSGA to achieve coarse-grained feature optimization of search regions
Figure BDA0003669794020000032
And search region Multi-layer feature F x Inputting into SGA module to obtain coarse-grained optimized search region characteristics
Figure BDA0003669794020000033
S5, constructing a correlation graph aggregation module (CGA) to realize fine-grained feature optimization of the search area
Figure BDA0003669794020000034
And
Figure BDA0003669794020000035
inputting the CGA module to obtain fine-grained optimized search region characteristics
Figure BDA0003669794020000036
S6, constructing a head prediction network
Figure BDA0003669794020000037
And
Figure BDA0003669794020000038
and inputting and predicting the position of the target of the current frame.
Further, the specific implementation method of step S1 is:
cutting out a template image with the size of 3 multiplied by 127 on the first frame image according to a target real boundary frame given by the first frame; starting from the second frame, the search area image with the size of 3 × 255 × 255 is cut out with the target prediction bounding box center coordinates of the previous frame as a reference point.
Further, the specific implementation method of step S2 is as follows:
adopting ResNet-50 as a feature extraction network, taking a template image and a search area image as input, and acquiring template multilayer features
Figure BDA0003669794020000039
And search area multi-layer features
Figure BDA00036697940200000310
Where l represents the total number of layers of extracted template or search area features,
Figure BDA00036697940200000311
respectively representing the template features and the search region features of the ith layer, i belongs to [1, l ∈ [)]。
Further, the specific implementation method of step S3 is:
the GDA module comprises three sub-modules, namely a local view attention LA, a global view attention GA and a convergence gate; the LA module is used for highlighting high-frequency information of a local visual angle; for single layer template features of size C H W
Figure BDA00036697940200000312
Local visual angle attention feature
Figure BDA00036697940200000313
Expressed as:
Figure BDA00036697940200000314
wherein, W 2 Is a learnable convolution parameter of size
Figure BDA00036697940200000315
And
Figure BDA00036697940200000316
where r denotes the channel compression parameter;
Figure BDA00036697940200000317
representing batch normalization; sigma represents a sigmoid function;
Figure BDA00036697940200000318
represents a bit-wise multiplication; high frequency characteristics
Figure BDA00036697940200000319
Obtained by subtracting the local mean, expressed as:
Figure BDA00036697940200000320
Figure BDA0003669794020000041
in the formula, W 1 Are learnable convolution parameters;
Figure BDA0003669794020000042
is a warp W 1 Features of the convolution map; AvgPool (·) denotes average pooling for obtaining average signal strength of local areas; ks and s denote window size and step size, respectively; δ denotes the nonlinear activation function, here with ReLU;
the LA module focuses on a fixed receptive field and aggregates information of a local area through convolution operation; the GA module is used for aggregating global information of different receptive fields through interaction of multilayer features; characteristic F ═ x for a set of l layers 1 ,x 2 ,...,x l For any two-layer feature }
Figure BDA0003669794020000043
And
Figure BDA0003669794020000044
first, three convolutional layers, θ (-), φ (-), and g (-), are used for x i Linear mapping is carried out to obtain feature maps of ' query ', ' key ' and ' value
Figure BDA0003669794020000045
Figure BDA0003669794020000046
And
Figure BDA0003669794020000047
namely, it is
Q=θ(x i )
K 1 =φ(x i )
V 1 =g(x i )
At the same time, feature x j Sharing convolutional layers phi (-) and g (-) to obtain corresponding feature maps
Figure BDA0003669794020000048
And
Figure BDA0003669794020000049
namely, it is
K 2 =φ(x j )
V 2 =g(x j )
Then, the 'key' and 'value' of each layer are spliced together respectively to obtain a global representation of the multilayer characteristic
Figure BDA00036697940200000410
And
Figure BDA00036697940200000411
where l × H × W, where l denotes the total number of feature layers queried; the global features K and V are then represented as:
K=[φ(x i )||φ(x j )]
V=[g(x i )||g(x j )]
wherein, [ | · ] represents that the characteristic is spliced according to the space dimension;
thus, the attention feature y is derived from a standard non-local attention formula i Expressed as:
Figure BDA0003669794020000051
in the above formula, the first and second carbon atoms are,
Figure BDA0003669794020000052
it represents the softmax operation along the j dimension;
finally, y is transformed by a convolution layer ξ (-) i The number of channels is unified with the original characteristic diagram and is added to the original characteristic diagram in a residual error mode, so that:
Figure BDA0003669794020000053
for template multi-layer features
Figure BDA0003669794020000054
According to the above formula, wherein the i-th layer is characterized
Figure BDA0003669794020000055
The updates through the GA module are represented as:
Figure BDA0003669794020000056
wherein the content of the first and second substances,
Figure BDA0003669794020000057
the LA module and the GA module reduce the redundancy of channels to a certain extent through channel compression parameters r and m and respectively obtain attention characteristics under local and global perspectives
Figure BDA0003669794020000058
And
Figure BDA0003669794020000059
on the basis, the gating mechanism utilizing the aggregation gating module is fused adaptively
Figure BDA00036697940200000510
And
Figure BDA00036697940200000511
thereby enhancing the effective representation of salient features; for input features
Figure BDA00036697940200000512
And
Figure BDA00036697940200000513
firstly, splicing two features together, learning the correlation between the two features by using a convolution layer of 1 multiplied by 1, and obtaining a normalized correlation matrix W by using a sigmoid function gate Let it represent
Figure BDA00036697940200000514
The weight matrix of (1), then
Figure BDA00036697940200000515
Is expressed as 1-W gate (ii) a Then, feature weighting is realized in a mode of multiplying the weight matrix and the features element by element, and finally the obtained optimized features
Figure BDA00036697940200000516
Expressed as:
Figure BDA00036697940200000517
further, the specific implementation method of step S4 is:
optimized features for GDA module output
Figure BDA00036697940200000518
The SGA module extracts global semantic information of spatial dimension layer by layer, generates a target semantic attention matrix, and then compares the target semantic attention matrix with the characteristics of a search area
Figure BDA00036697940200000519
Interacting layer by layer to obtain coarse-grained optimized search region characteristics
Figure BDA00036697940200000520
In particular, for the ith layer template features
Figure BDA00036697940200000521
Generated target semantic attention moment array
Figure BDA0003669794020000061
Expressed as:
Figure BDA0003669794020000062
wherein GAP (·) represents global average pooling in spatial dimensions, and σ is sigmoid function;
then, the SGA module adopts global view attention aggregation multi-layer characteristics, and the global view attention aggregation multi-layer characteristics share parameters with the global view attention in the GDA module to reduce actual calculation amount; through the interaction of target semantic information and the 'inquiry', 'key' and 'value' characteristics of the search area, the ith layer of search area characteristics
Figure BDA0003669794020000063
The resulting Q, K and V are expressed as:
Figure BDA0003669794020000064
Figure BDA0003669794020000065
Figure BDA0003669794020000066
the optimized characteristics obtained by the layer
Figure BDA0003669794020000067
Expressed as:
Figure BDA0003669794020000068
wherein the content of the first and second substances,
Figure BDA0003669794020000069
in the above equation, i represents the current layer and j represents other multi-layer features.
Further, the specific implementation method of step S5 is:
CGA module and
Figure BDA00036697940200000610
and
Figure BDA00036697940200000611
as input, on one hand, calculating the correlation of spatial pixels of a search area and the template overall; on the other hand, local relevance is calculated based on the salient features of the template only; by fusing global and local correlation, the relation of space positions is strengthened by adopting graph convolution, so that the construction of an attention map associated with the target feature is realized; in particular, features are optimized for the template of the ith layer
Figure BDA00036697940200000612
And search area optimization features
Figure BDA00036697940200000613
Firstly, respectively cutting the template along the space and the channel to obtain the space characteristics
Figure BDA00036697940200000614
And channel characteristics
Figure BDA00036697940200000615
Wherein N is 1 =H 1 ×W 1 (ii) a For a certain pixel in the search area, firstly, the correlation between the certain pixel and the template spatial feature is calculated to obtain a spatial correlation diagram S 1 Expressed as:
Figure BDA00036697940200000616
wherein, Corr (·) is a correlation calculation function, and an inner product mode is adopted;
then, at S 1 On the basis, the retrieval of the template global information is realized by calculating the correlation with the channel characteristics, and a correlation graph S obtained by calculation at the moment 2 Expressed as:
Figure BDA0003669794020000071
the global relevance of a certain pixel in the search area to the template is expressed as:
Figure BDA0003669794020000072
Figure BDA0003669794020000073
wherein MaxPool (·) is the maximum pooling operation, ks and s represent the window size and step size, respectively;
the local correlation is then expressed as:
Figure BDA0003669794020000074
finally, fusing the two correlation graphs by adopting a method of adding corresponding elements, and constructing graph relations of the two correlation graphs, thereby enhancing the association of positions; the obtained correlation diagram and the characteristics of the search area are obtained
Figure BDA0003669794020000075
Adding to obtain finer grained optimization features
Figure BDA0003669794020000076
Expressed as:
Figure BDA0003669794020000077
wherein GCN (-) is a two-layer graph convolution network;
Figure BDA0003669794020000078
representing the addition of corresponding elements; further obtaining fine-grained optimized search region characteristics
Figure BDA0003669794020000079
The invention also provides a twin tracking system based on interactive and convergent feature optimization, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, the method steps being able to be carried out when the computer program instructions are executed by the processor.
Compared with the prior art, the invention has the following beneficial effects: the method and the system enhance the target salient features and inhibit background noise through self-attention aggregation and interaction of the template features and the search area features. In particular, a novel interaction and aggregation network is employed that includes a gated dual-perspective attention module, a semantically guided attention module, and a dependency graph aggregation module. The gated double-view attention module is used for aggregating the outputs of the local view attention submodule and the global view attention submodule based on a gating mechanism and is used for enhancing the characteristics of target significance and discriminability. The attention module of semantic guidance is to extract semantic information of a target, and the semantic information is used as feature optimization of a priori guidance search area. Further, for the optimized template and the search area characteristics, local similarity and global similarity are respectively constructed in a correlation graph aggregation module, and spatial position relation is strengthened through a graph convolution network.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
FIG. 2 is a result of comparing the accuracy of the method of the present invention with that of other target tracking methods under different attributes in the embodiment of the present invention.
Detailed Description
The invention is further explained by the following embodiments in conjunction with the drawings.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a twin tracking method based on interactive and convergent feature optimization, which includes the following steps:
s1, initializing the template image and the search area image. The specific implementation method comprises the following steps:
cutting out a template image with the size of 3 multiplied by 127 on the first frame image according to a given target real bounding box (ground-computing bounding box) of the first frame; starting from the second frame, the search area image with the size of 3 × 255 × 255 is cut out with the target prediction bounding box center coordinates of the previous frame as a reference point. The target prediction bounding box refers to the target position predicted by each frame and is given in the form of (x, y, w, h), where (x, y) denotes the center position of the prediction bounding box, and w and h denote the width and height of the prediction bounding box, respectively.
S2, constructing a feature extraction network, inputting the template image and the search area image, and obtaining the corresponding template multilayer features
Figure BDA0003669794020000081
And a search areaDomain multi-layer features
Figure BDA0003669794020000082
The specific implementation method comprises the following steps:
adopting ResNet-50 and its improved network as characteristic extraction network, using template image and search area image as input to obtain template multilayer characteristics
Figure BDA0003669794020000083
And search area multi-layer features
Figure BDA0003669794020000084
Where l represents the total number of layers of extracted template or search area features,
Figure BDA0003669794020000085
respectively represent the ith layer (i e [1, l ]]) Template features and search area features.
S3, constructing a gating double-view aggregation module GDA to optimize template multilayer characteristics
Figure BDA0003669794020000091
Inputting the GDA module to obtain optimized template multilayer characteristics
Figure BDA0003669794020000092
The specific implementation method comprises the following steps:
1) the GDA module includes three sub-modules, local-view attention (LA), global-view attention (GA), and aggregation gate (aggregation gate). The LA module is used for highlighting high-frequency information of a local visual angle. In order to save calculation parameters, a bottleneck structure scheme is adopted for design, and detail characteristics are highlighted in a residual error connection mode, so that the purposes of protecting original characteristics and preventing effective information from being erased by errors are achieved. In addition, for the attribute of convolution local connection, essentially each element of the feature map represents the embedded feature and signal strength of a specific area in the feature map of the previous layer, so that high-frequency discrimination can be obtained by subtracting the average signal strengthAnd (4) sexual information. In the case where the difference between the target and the background is not significant, it is important to express high-frequency information for enhancing the discrimination. Thus, for single layer template features of size C H W
Figure BDA0003669794020000093
Local visual angle attention feature
Figure BDA0003669794020000094
Expressed as:
Figure BDA0003669794020000095
wherein, W 2 Is a learnable convolution parameter of size
Figure BDA0003669794020000096
And
Figure BDA0003669794020000097
where r denotes the channel compression parameter;
Figure BDA0003669794020000098
representing batch normalization; sigma represents a sigmoid function;
Figure BDA0003669794020000099
represents a bit-wise multiplication; high frequency characteristics
Figure BDA00036697940200000910
Obtained by subtracting the local mean, expressed as:
Figure BDA00036697940200000911
Figure BDA00036697940200000912
in the formula, W 1 For learnable volumesA product parameter;
Figure BDA00036697940200000913
is a warp W 1 Features of the convolution map; AvgPool (·) denotes average pooling for obtaining average signal strength of local areas; ks and s represent window size and step size, respectively; δ denotes the nonlinear activation function, here ReLU.
2) The LA module focuses on a fixed receptive field and aggregates information of a local area through convolution operation; and the GA module is used for aggregating the global information of different receptive fields through the interaction of multilayer features. The standard non-local attention captures the dependency relationship of different positions through the interaction at the pixel level, so that the performance of the visual task is greatly improved, but the information interaction of different network layers is not considered, and the importance of different receptive fields on semantic information mining is ignored. In view of the above, the standard non-local attention is extended to cross-layer non-local attention, which aggregates multiple layers of semantic information to the current layer through interaction with different receptive fields, thereby obtaining richer feature representations. For convenience of presentation and general generalization, the characteristic F ═ x for a set of l layers 1 ,x 2 ,...,x l In any two layers of features
Figure BDA0003669794020000101
And
Figure BDA0003669794020000102
first, three convolutional layers, θ (-), φ (-), and g (-), are used for x i Performing linear mapping to obtain feature maps of ' query ', ' key ' and ' value
Figure BDA0003669794020000103
And
Figure BDA0003669794020000104
namely that
Q=θ(x i )
K 1 =φ(x i )
V 1 =g(x i )
At the same time, feature x j Sharing convolutional layers phi (-) and g (-) to obtain corresponding feature maps
Figure BDA0003669794020000105
And
Figure BDA0003669794020000106
namely, it is
K 2 =φ(x j )
V 2 =g(x j )
Then, splicing the key and the value of each layer together respectively to obtain a global representation of the multilayer feature
Figure BDA0003669794020000107
And
Figure BDA0003669794020000108
where S ═ l × H × W, where l denotes the total number of feature layers queried. The global features K and V are then represented as:
K=[φ(x i )||φ(x j )]
V=[g(x i )||g(x j )]
wherein, [ | · ] represents that the features are spliced according to spatial dimensions.
Thus, the attention feature y is derived from a standard non-local attention formula i Expressed as:
Figure BDA0003669794020000109
in the above formula, the first and second carbon atoms are,
Figure BDA00036697940200001010
which represents the softmax operation along the j dimension.
Finally, y is transformed by a convolution layer ξ (-) i The number of channels is unified with the original characteristic diagram and is added to the original characteristic diagram in a residual error mode, so that:
Figure BDA0003669794020000111
for template multi-layer features
Figure BDA0003669794020000112
According to the above formula, wherein the ith layer is characterized
Figure BDA0003669794020000113
The updates through the GA module are represented as:
Figure BDA0003669794020000114
wherein the content of the first and second substances,
Figure BDA0003669794020000115
3) the LA module and the GA module reduce the redundancy of channels to a certain extent through channel compression parameters r and m and respectively obtain attention characteristics under local and global perspectives
Figure BDA0003669794020000116
And
Figure BDA0003669794020000117
on the basis, the gating mechanism utilizing the aggregation gating module is fused adaptively
Figure BDA0003669794020000118
And
Figure BDA0003669794020000119
thereby enhancing the effective representation of the salient features. For input features
Figure BDA00036697940200001110
And
Figure BDA00036697940200001111
firstly, splicing two features together, learning the correlation between the two features by using a 1 multiplied by 1 convolutional layer, and obtaining a normalized correlation matrix W by using a sigmoid function gate Let it represent
Figure BDA00036697940200001112
The weight matrix of (1), then
Figure BDA00036697940200001113
Is expressed as 1-W gate . Then, realizing feature weighting by multiplying the weight matrix and the features element by element (Hardamard product), and finally obtaining the optimized features
Figure BDA00036697940200001114
Expressed as:
Figure BDA00036697940200001115
s4, constructing a semantic-guided attention module SGA to realize coarse-grained feature optimization of the search area
Figure BDA00036697940200001116
And search area multi-layer features
Figure BDA00036697940200001117
Inputting the obtained data into SGA module to obtain coarse-grained optimized search region characteristics
Figure BDA00036697940200001118
The specific implementation method comprises the following steps:
1) optimized features for GDA module output
Figure BDA00036697940200001119
The SGA module extracts global semantic information of space dimensions layer by layer, generates a target semantic attention matrix and then compares the target semantic attention matrix with the characteristics of a search area
Figure BDA00036697940200001120
Interacting layer by layer to obtain coarse-grained optimized search region characteristics
Figure BDA00036697940200001121
In particular, for the ith layer template features
Figure BDA00036697940200001122
Generated target semantic attention moment array
Figure BDA00036697940200001123
Expressed as:
Figure BDA00036697940200001124
wherein GAP (-) represents the global average pooling in spatial dimension, and σ is sigmoid function.
2) The SGA module adopts global view attention aggregation multi-layer characteristics, and the global view attention aggregation multi-layer characteristics share parameters with global view attention in the GDA module to reduce actual calculation amount; in particular, the ith layer search area features are characterized by interaction of target semantic information with the 'query', 'key' and 'value' features of the search area
Figure BDA0003669794020000121
The resulting Q, K and V are expressed as:
Figure BDA0003669794020000122
Figure BDA0003669794020000123
Figure BDA0003669794020000124
thus the layer is obtainedTo optimize the characteristics
Figure BDA0003669794020000125
Expressed as:
Figure BDA0003669794020000126
wherein the content of the first and second substances,
Figure BDA0003669794020000127
in the above equation, i represents the current layer and j represents other multi-layer features.
S5, constructing a correlation graph aggregation module CGA to realize fine-grained feature optimization of the search area, and optimizing the correlation graph aggregation module CGA
Figure BDA0003669794020000128
And with
Figure BDA0003669794020000129
Inputting the CGA module to obtain fine-grained optimized search region characteristics
Figure BDA00036697940200001210
The specific implementation method comprises the following steps:
CGA module and
Figure BDA00036697940200001211
and
Figure BDA00036697940200001212
as input, on one hand, calculating the correlation of spatial pixels of a search area and the global template; on the other hand, local correlation is calculated based on the salient features of the template only; and (3) by fusing global and local correlation, strengthening the relation of spatial positions by adopting graph convolution, thereby realizing the construction of the attention map associated with the target feature.
1) Optimizing features for a template of an ith layer
Figure BDA00036697940200001213
And search area optimization features
Figure BDA00036697940200001214
Firstly, the template is respectively cut along the space and the channel to obtain the space characteristics
Figure BDA00036697940200001215
And channel characteristics
Figure BDA00036697940200001216
Wherein N is 1 =H 1 ×W 1 (ii) a For a certain pixel in the search area, firstly calculating the correlation between the certain pixel and the spatial feature of the template to obtain a spatial correlation diagram S 1 Expressed as:
Figure BDA00036697940200001217
where Corr (-) is a correlation calculation function, here in the form of an inner product.
Then, at S 1 On the basis, the retrieval of the template global information is realized by calculating the correlation with the channel characteristics, and the correlation graph S obtained by calculation at the moment 2 Expressed as:
Figure BDA0003669794020000131
for simplicity of description, the global correlation of a certain pixel in the search area with the template is expressed as:
Figure BDA0003669794020000132
Figure BDA0003669794020000133
where MaxPool (·) is the maximum pooling operation, and ks and s represent the window size and step size, respectively.
The local correlation is then expressed as:
Figure BDA0003669794020000134
2) fusing the two correlation graphs by adopting a method of adding corresponding elements, and constructing graph relations of the two correlation graphs, thereby enhancing the association of positions; the obtained correlation diagram and the characteristics of the search area
Figure BDA0003669794020000135
Adding to obtain finer grained optimization features
Figure BDA0003669794020000136
Expressed as:
Figure BDA0003669794020000137
wherein GCN (-) is a two-layer graph convolution network;
Figure BDA0003669794020000138
representing the addition of corresponding elements; further obtaining fine-grained optimized search region characteristics
Figure BDA0003669794020000139
S6, constructing a head prediction network
Figure BDA00036697940200001310
And
Figure BDA00036697940200001311
and inputting and predicting the position of the target of the current frame.
The present embodiment also provides a twin tracking system based on interactive and convergent feature optimization, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, implement the above-mentioned method steps.
In this embodiment, an OTB100 data set is used for comparison and verification, and fig. 2 shows the result of the accuracy comparison between the FRIA-Track and other target tracking methods under different attributes. Table 1 shows the success rate comparison results of the method proposed by the present invention with other target tracking methods on OTB100 data set.
TABLE 1 comparison of the present invention with other target tracking methods
Figure BDA0003669794020000141
As can be seen from FIG. 2, the FRIA-Track method of the present invention shows the best level under 8 attributes, and the performance under 10 attributes exceeds the standard algorithm, SiamCAR. As can be seen from table 1, the method of the present invention has the best success rate compared to other target tracking methods.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (7)

1. A twin tracking method based on interactive and convergent feature optimization is characterized by comprising the following steps:
s1, initializing a template image and a search area image;
s2, constructing a feature extraction network, inputting the template image and the search area image, and acquiring corresponding template multilayer features F z And search region Multi-layer feature F x
S3, constructing a gating dual-view aggregation module GDA to optimizeTemplate multilayer characterization, template multilayer characterization F z Inputting the GDA module to obtain optimized template multi-layer features
Figure FDA0003669794010000011
S4, constructing a semantic-guided attention module SGA to realize coarse-grained feature optimization of the search area
Figure FDA0003669794010000012
And search region Multi-layer feature F x Inputting the obtained data into SGA module to obtain coarse-grained optimized search region characteristics
Figure FDA0003669794010000013
S5, constructing a correlation graph aggregation module (CGA) to realize fine-grained feature optimization of the search area
Figure FDA0003669794010000014
And
Figure FDA0003669794010000015
inputting the CGA module to obtain fine-grained optimized search region characteristics
Figure FDA0003669794010000016
S6, constructing a head prediction network
Figure FDA0003669794010000017
And
Figure FDA0003669794010000018
and inputting and predicting the position of the target of the current frame.
2. The twin tracking method based on interaction and convergent feature optimization according to claim 1, wherein the step S1 is implemented by:
cutting out a template image with the size of 3 multiplied by 127 on the first frame image according to a target real bounding box given by the first frame; starting from the second frame, the search area image with the size of 3 × 255 × 255 is cut out with the target prediction bounding box center coordinates of the previous frame as a reference point.
3. The twin tracking method based on interaction and convergent feature optimization according to claim 1, wherein the step S2 is implemented by:
adopting ResNet-50 as a feature extraction network, taking a template image and a search area image as input, and acquiring template multilayer features
Figure FDA0003669794010000019
And search region multi-layer features
Figure FDA00036697940100000110
Where l represents the total number of levels of extracted template or search area features,
Figure FDA00036697940100000111
respectively representing the template characteristics and the search region characteristics of the ith layer, i belongs to [1, l ]]。
4. The twin tracking method based on interactive and convergent feature optimization according to claim 1, wherein the specific implementation method of step S3 is:
the GDA module comprises three sub-modules, namely a local view attention LA, a global view attention GA and an aggregation gate; the LA module is used for highlighting high-frequency information of a local visual angle; for single layer template features of size C × H × W
Figure FDA00036697940100000112
Local visual angle attention feature
Figure FDA00036697940100000113
Expressed as:
Figure FDA0003669794010000021
wherein, W 2 Is a learnable convolution parameter of size
Figure FDA0003669794010000022
And
Figure FDA0003669794010000023
where r denotes the channel compression parameter;
Figure FDA0003669794010000024
representing batch normalization; sigma represents a sigmoid function;
Figure FDA0003669794010000025
represents a bit-wise multiplication; high frequency characteristics
Figure FDA0003669794010000026
Obtained by subtracting the local mean, expressed as:
Figure FDA0003669794010000027
Figure FDA0003669794010000028
in the formula, W 1 Are learnable convolution parameters;
Figure FDA0003669794010000029
is a warp W 1 Features of the convolution map; AvgPool (·) denotes average pooling for obtaining average signal strength of local areas; ks and s denote window size and step size, respectively; delta denotes the non-linear activation function, used hereReLU;
The LA module focuses on a fixed receptive field and aggregates information of a local area through convolution operation; the GA module is used for aggregating global information of different receptive fields through interaction of multilayer features; characteristic F ═ x for a set of l layers 1 ,x 2 ,...,x l For arbitrary two-layer features }
Figure FDA00036697940100000210
And
Figure FDA00036697940100000211
first, three convolutional layers, θ (-), φ (-), and g (-), are used to pair x i Linear mapping is carried out to obtain feature maps of ' query ', ' key ' and ' value
Figure FDA00036697940100000212
Figure FDA00036697940100000213
And
Figure FDA00036697940100000214
namely, it is
Q=θ(x i )
K 1 =φ(x i )
V 1 =g(x i )
At the same time, feature x j Sharing convolutional layers phi (-) and g (-) to obtain corresponding feature maps
Figure FDA00036697940100000215
And
Figure FDA00036697940100000216
namely, it is
K 2 =φ(x j )
V 2 =g(x j )
Then, the "key" and the "value" of each layer are spliced together respectively to obtainGlobal representation of multi-layer features
Figure FDA00036697940100000217
And
Figure FDA00036697940100000218
where S ═ l × H × W, where l denotes the total number of feature layers queried; the global features K and V are then expressed as:
K=[φ(x i )||φ(x j )]
V=[g(x i )||g(x j )]
wherein, [ | · ] represents that the characteristic is spliced according to the space dimension;
thus, the attention feature y is derived from a standard non-local attention formula i Expressed as:
Figure FDA0003669794010000031
in the above formula, the first and second carbon atoms are,
Figure FDA0003669794010000032
it represents the softmax operation along the j dimension;
finally, y is transformed by a convolutional layer xi (·) i The number of channels is unified with the original characteristic diagram and is added to the original characteristic diagram in a residual error mode, so that the following results are obtained:
Figure FDA0003669794010000033
for template multi-layer features
Figure FDA0003669794010000034
According to the above formula, wherein the ith layer is characterized
Figure FDA0003669794010000035
Updates through the GA Module are represented as:
Figure FDA0003669794010000036
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003669794010000037
the LA module and the GA module reduce the redundancy of channels to a certain extent through channel compression parameters r and m and respectively obtain attention characteristics under local and global perspectives
Figure FDA0003669794010000038
And
Figure FDA0003669794010000039
on the basis, the gating mechanism utilizing the aggregation gating module is fused adaptively
Figure FDA00036697940100000310
And
Figure FDA00036697940100000311
thereby enhancing the effective representation of salient features; for input features
Figure FDA00036697940100000312
And
Figure FDA00036697940100000313
firstly, splicing two features together, learning the correlation between the two features by using a 1 multiplied by 1 convolutional layer, and obtaining a normalized correlation matrix W by using a sigmoid function gate Let it represent
Figure FDA00036697940100000314
The weight matrix of (1), then
Figure FDA00036697940100000315
Is expressed as 1-W gate (ii) a Then, feature weighting is realized in a mode of multiplying the weight matrix and the features element by element, and finally the obtained optimized features
Figure FDA00036697940100000316
Expressed as:
Figure FDA0003669794010000041
5. the twin tracking method based on interaction and convergent feature optimization according to claim 1, wherein the step S4 is implemented by:
optimized features for GDA module output
Figure FDA0003669794010000042
The SGA module extracts global semantic information of space dimensions layer by layer, generates a target semantic attention matrix and then compares the target semantic attention matrix with the characteristics of a search area
Figure FDA0003669794010000043
Interacting layer by layer to obtain coarse-grained optimized search region characteristics
Figure FDA0003669794010000044
In particular, for the ith layer template features
Figure FDA0003669794010000045
Generated target semantic attention moment array
Figure FDA0003669794010000046
Expressed as:
Figure FDA0003669794010000047
wherein GAP (-) represents global average pooling in spatial dimension, and sigma is sigmoid function;
then, the SGA module adopts global view attention aggregation multi-layer characteristics, and the global view attention aggregation multi-layer characteristics share parameters with the global view attention in the GDA module to reduce actual calculation amount; through the interaction of target semantic information and the 'inquiry', 'key' and 'value' characteristics of the search area, the ith layer of search area characteristics
Figure FDA0003669794010000048
The resulting Q, K and V are expressed as:
Figure FDA0003669794010000049
Figure FDA00036697940100000410
Figure FDA00036697940100000411
the optimized characteristics obtained by the layer
Figure FDA00036697940100000412
Expressed as:
Figure FDA00036697940100000413
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00036697940100000414
in the above equation, i represents the current layer and j represents other multi-layer features.
6. The twin tracking method based on interactive and convergent feature optimization according to claim 1, wherein the specific implementation method of step S5 is:
CGA module and
Figure FDA00036697940100000415
and
Figure FDA00036697940100000416
as input, on one hand, calculating the correlation of spatial pixels of a search area and the global template; on the other hand, local relevance is calculated based on the salient features of the template only; by fusing global and local correlation, the relation of space positions is strengthened by adopting graph convolution, so that the construction of an attention map associated with the target feature is realized; in particular, features are optimized for the template of the ith layer
Figure FDA0003669794010000051
And search area optimization features
Figure FDA0003669794010000052
Firstly, respectively cutting the template along the space and the channel to obtain the space characteristics
Figure FDA0003669794010000053
And channel characteristics
Figure FDA0003669794010000054
Wherein N is 1 =H 1 ×W 1 (ii) a For a certain pixel in the search area, firstly, the correlation between the certain pixel and the template spatial feature is calculated to obtain a spatial correlation diagram S 1 Expressed as:
Figure FDA0003669794010000055
wherein, Corr (·) is a correlation calculation function, and an inner product mode is adopted;
then, at S 1 On the basis, the retrieval of the template global information is realized by calculating the correlation with the channel characteristics, and the correlation graph S obtained by calculation at the moment 2 Expressed as:
Figure FDA0003669794010000056
the global relevance of a certain pixel in the search area to the template is expressed as:
Figure FDA0003669794010000057
Figure FDA0003669794010000058
wherein, MaxPool (-) is the maximum pooling operation, and ks and s respectively represent the window size and the step size;
the local correlation is then expressed as:
Figure FDA0003669794010000059
finally, fusing the two correlation graphs by adopting a method of adding corresponding elements, and constructing graph relations of the two correlation graphs, thereby enhancing the association of positions; the obtained correlation diagram and the characteristics of the search area are obtained
Figure FDA00036697940100000510
Adding to obtain finer grained optimization features
Figure FDA00036697940100000511
Expressed as:
Figure FDA00036697940100000512
wherein GCN (-) is a two-layer graph convolution network;
Figure FDA00036697940100000513
representing the addition of corresponding elements; further obtaining fine-grained optimized search region characteristics
Figure FDA00036697940100000514
7. Twin tracking system based on interactive and convergent feature optimization, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, are capable of implementing the method steps of any of claims 1 to 6.
CN202210600748.XA 2022-05-30 2022-05-30 Twin tracking method and system based on interactive and convergent feature optimization Pending CN114926652A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210600748.XA CN114926652A (en) 2022-05-30 2022-05-30 Twin tracking method and system based on interactive and convergent feature optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210600748.XA CN114926652A (en) 2022-05-30 2022-05-30 Twin tracking method and system based on interactive and convergent feature optimization

Publications (1)

Publication Number Publication Date
CN114926652A true CN114926652A (en) 2022-08-19

Family

ID=82812296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210600748.XA Pending CN114926652A (en) 2022-05-30 2022-05-30 Twin tracking method and system based on interactive and convergent feature optimization

Country Status (1)

Country Link
CN (1) CN114926652A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457259A (en) * 2022-09-14 2022-12-09 华洋通信科技股份有限公司 Image rapid saliency detection method based on multi-channel activation optimization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457259A (en) * 2022-09-14 2022-12-09 华洋通信科技股份有限公司 Image rapid saliency detection method based on multi-channel activation optimization
CN115457259B (en) * 2022-09-14 2023-10-31 华洋通信科技股份有限公司 Image rapid saliency detection method based on multichannel activation optimization

Similar Documents

Publication Publication Date Title
Kejriwal et al. High performance loop closure detection using bag of word pairs
CN111144364B (en) Twin network target tracking method based on channel attention updating mechanism
CN106780631B (en) Robot closed-loop detection method based on deep learning
Dusmanu et al. Multi-view optimization of local feature geometry
Fu et al. Fast ORB-SLAM without keypoint descriptors
CN112651262A (en) Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
Hu et al. Semantic SLAM based on improved DeepLabv3⁺ in dynamic scenarios
CN115439507A (en) Three-dimensional video target tracking method based on multi-level mutual enhancement and relevant pyramid
Sehgal et al. Lidar-monocular visual odometry with genetic algorithm for parameter optimization
Urdiales et al. An improved deep learning architecture for multi-object tracking systems
CN114926652A (en) Twin tracking method and system based on interactive and convergent feature optimization
Liu et al. Learning optical flow and scene flow with bidirectional camera-lidar fusion
Zeng et al. NCT: noise-control multi-object tracking
Tsintotas et al. The revisiting problem in simultaneous localization and mapping
Huang et al. Correlation-filter based scale-adaptive visual tracking with hybrid-scheme sample learning
Bazeille et al. Combining odometry and visual loop-closure detection for consistent topo-metrical mapping
CN116797799A (en) Single-target tracking method and tracking system based on channel attention and space-time perception
CN116245913A (en) Multi-target tracking method based on hierarchical context guidance
CN115880332A (en) Target tracking method for low-altitude aircraft visual angle
CN115830631A (en) One-person one-file system construction method based on posture-assisted occluded human body re-recognition
Jiang et al. Semantic closed-loop based visual mapping algorithm for automated valet parking
Zhang et al. Rt-track: robust tricks for multi-pedestrian tracking
Tan et al. Online visual tracking via background-aware Siamese networks
Cai et al. Explicit invariant feature induced cross-domain crowd counting
CN116152298B (en) Target tracking method based on self-adaptive local mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination