CN117173607A - Multi-level fusion multi-target tracking method, system and computer readable storage medium - Google Patents
Multi-level fusion multi-target tracking method, system and computer readable storage medium Download PDFInfo
- Publication number
- CN117173607A CN117173607A CN202311018804.XA CN202311018804A CN117173607A CN 117173607 A CN117173607 A CN 117173607A CN 202311018804 A CN202311018804 A CN 202311018804A CN 117173607 A CN117173607 A CN 117173607A
- Authority
- CN
- China
- Prior art keywords
- target
- node
- sub
- level
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000004927 fusion Effects 0.000 title claims abstract description 36
- 230000011218 segmentation Effects 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 3
- 230000005540 biological transmission Effects 0.000 claims description 11
- 238000010586 diagram Methods 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 239000012634 fragment Substances 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 230000007774 longterm Effects 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The application discloses a multi-level fusion multi-target tracking method, a system and a computer readable storage medium, which comprise the following steps: extracting target re-identification characteristics by combining a zero-order segmentation network, processing a target relationship graph neural network, and fusing multi-layer long-range tracks; a sub-method one, comprising: dividing and preprocessing a human body target based on a dividing model SAM of a transducer; inputting the target picture frame after the segmentation pretreatment into a pre-training re-recognition network, and extracting re-recognition characteristics of a human body target; a second sub-method, comprising: giving a continuous video segment and a corresponding detection frame set, and constructing a target relation graph model by using GNN; a sub-method three, comprising: training the target relation graph models in the second sub-methods according to different levels of sequences, and linking video clips with different sizes by using the target relation graph models. The method has the effect of improving the feature recognition accuracy of multi-target tracking.
Description
Technical Field
The application relates to the technical field of machine vision, in particular to a multi-level fusion multi-target tracking method, a system and a computer readable storage medium.
Background
Multi-object tracking (Multiple Object Tracking) is one of the fundamental tasks of computer vision, whose object is to track multiple objects over time in a video sequence. The core goal of multi-target tracking is to handle various challenges such as occlusion, motion blur, and light variation while still maintaining the identity of each object between successive frames.
Most multi-target Tracking algorithms take the paradigm of TBD (Tracking-by-Detection): first detecting a target in each frame; the detected objects frame by frame are then correlated to form a trajectory.
In the environment where high-precision object detection exists, data correlation mainly occurs between detection objects that are close in time, so-called short-term correlation (short-term). In general, simple cues like position, motion related proximity or local appearance are sufficient to ensure accurate correlation.
However, in crowded scenes, we face different challenges, such as objects may often be occluded and not detected for a few frames of time. This requires us to detect correlations between time frames that are long-term correlations.
Given the differences in the nature of these tasks, solutions commonly used to handle short-term associations tend to be difficult to handle in long-term association scenarios.
In response to long-term associated challenges, the existing visual information-based tracking method has the main problems that the extracted visual characteristics of the appearance of the target are not strong in characterization capability, and the extracted visual characteristics are difficult to accurately match the same target and distinguish different targets. For example: the Convolutional Neural Network (CNN) based method relies on the receptive field provided by the convolutional kernel, so that global features are difficult to extract, and a certain information loss problem is caused; the method based on the transformer focuses more on the semantic information of a high layer, but lacks modeling capability of the information of a bottom layer, and is difficult to deal with the problems of translation, rotation and distortion caused by the movement of a target. In addition, some methods rely on an independent re-recognition mechanism, and input a detection frame into a pre-trained re-recognition model to extract target appearance characteristics, which is problematic in that the detection frame carries background information to introduce noise, and the recognition degree of the target appearance characteristics is reduced to a certain extent.
In view of the above, the present application proposes a new technical solution.
Disclosure of Invention
In order to improve the feature recognition accuracy of multi-target tracking, the application provides a multi-level fusion multi-target tracking method, a multi-level fusion multi-target tracking system and a computer readable storage medium.
In a first aspect, the present application provides a multi-level fusion multi-target tracking method, which adopts the following technical scheme:
a multi-level fusion multi-target tracking method comprises the steps of extracting target re-identification characteristics by combining a zero-order segmentation network, processing a target relationship graph neural network and fusing multi-level long-range tracks;
the method for extracting the target re-identification characteristic by the combined zero-order segmentation network is called a sub-method I, and comprises the following steps:
dividing and preprocessing a human body target based on a dividing model SAM of a transducer; the method comprises the steps of,
inputting the target picture frame after the segmentation pretreatment into a pre-training re-recognition network, and extracting re-recognition characteristics of a human body target;
the target relation graph neural network processing is called a sub-method II, which comprises the following steps: giving a continuous video segment and a corresponding detection frame set, constructing a target relation graph model by using GNN, and generating a target track;
the multi-level long-range trajectory fusion is referred to as sub-method three, which includes: training the target relation graph models in the second sub-methods according to different levels of sequences, and linking video clips with different sizes by using the target relation graph models to finish track merging from short sequence to long sequence so as to obtain a target track of a complete video.
Optionally, the transformation-based segmentation model SAM performs segmentation pretreatment on a human body target, which includes: assuming a given sequence of video frames, then:
selecting an adaptive detector YOLOX for target detection to obtain target frames of all human bodies under each video frame;
inputting each video frame and a corresponding target frame set into a segmentation model SAM for example segmentation, outputting a segmentation mask with the highest score of each target, and upsampling the mask to the size of the target frame;
cutting the target frame back to the original video frame to obtain a picture frame of each target;
and setting each channel value of the picture frame as the average value of the channels according to the corresponding segmentation mask.
Optionally, the target picture frame after the segmentation pretreatment is input to a pre-training re-recognition network, and re-recognition features of the human body target are extracted, which comprises: and (3) dividing the preprocessed target picture frame into uniform size, inputting the uniform size into a pre-training re-recognition network based on ResNet-50, and extracting an appearance feature vector with the dimension of each target as a preset parameter.
Optionally, the second sub-method includes: and initializing the node characteristics of the target relation graph model into appearance characteristic vectors output by the first sub-method, and initializing the edge characteristics into cosine distances of the appearance characteristics of the relative positions in time and space between node pairs.
Optionally, the second sub-method further includes:
respectively mapping node features and edge features to the same feature space by using two different MLP layers;
features contained in nodes and edges propagate throughout the graph by way of neural messaging, and the manner of messaging includes:
message transmission from node to edge, splicing the characteristics of node u and node v and the characteristics of the connecting edge (u, v) between the node u and the node v, and obtaining the characteristics of the updated edge (u, v) through a layer of MLP;
for a node u, firstly splicing the characteristics of the node u and the characteristics of the edges (u, v), then obtaining the characteristics of the updated temporary edges (u, v) through a layer of MLP, and then averaging the characteristics of all temporary edges connected with the u to obtain the characteristics of the updated node u;
repeating the message transmission process for N times, so that all nodes can aggregate neighbor nodes and edge features with the distance of N;
and carrying out two classifications on the obtained edge characteristics to complete the prediction of the track and generate a target track.
Optionally, the third sub-method includes:
s11, dividing the whole video into a plurality of small fragments, constructing a target relation diagram for each fragment according to the second sub-method, and performing message transmission to obtain node and edge characteristics of the current level;
s12, aggregating the node and edge characteristics of the same track in an average pooling mode, and entering the next level after M training iterations;
s13, in the next level, taking the node and edge characteristics output by a plurality of sub-segments of the previous level as input, continuously constructing a target relation diagram of the new level as in the second sub-method, and executing message transmission to obtain the node and edge characteristics of the new level, thereby completing the track merging from short order to long order
S14, repeating the steps S12 and S13 for a plurality of times to obtain the target track of the complete video.
In a second aspect, the application provides a multi-level fusion multi-target tracking system, which adopts the following technical scheme:
a multi-level fusion multi-target tracking system comprising a memory and a processor, the memory having stored thereon a computer program capable of being loaded by the processor and performing any of the multi-level fusion multi-target tracking methods described above.
In a third aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
a computer readable storage medium storing a computer program capable of being loaded by a processor and executing any one of the multi-level fusion multi-target tracking methods described above.
In summary, the present application includes at least one of the following beneficial technical effects: the SAM is utilized to realize the segmentation of the human body target, so that the human body target characteristics without background noise are obtained, and the accuracy of human body target identification in the tracking process is improved;
the strategy of hierarchical track fusion is adopted to reduce the memory overhead required by network modeling of the whole video, not only the execution speed of the model is improved, but also a relationship graph network model sharing parameters is adopted to link target tracks of different hierarchies, so that the target tracking track from short sequence to long sequence is realized, the method is suitable for multi-target tracking under the long-term association situation, and the accuracy of multi-target tracking can be improved.
Drawings
FIG. 1 is a schematic diagram of the architecture of the present application.
Detailed Description
The present application will be described in further detail with reference to fig. 1.
The embodiment of the application discloses a multi-level fusion multi-target tracking method.
Referring to fig. 1, the multi-level fusion multi-target tracking method includes: and extracting target re-identification characteristics, target relation graph neural network processing and multi-layer-level long-range track fusion by combining a zero-order segmentation network.
Extracting target re-identification features by combining a zero-order segmentation network, wherein the method comprises the steps of carrying out segmentation pretreatment on a human body target based on a segmentation model SAM of a transducer, and specifically, assuming a given video frame sequence, carrying out:
selecting an adaptive detector YOLOX for target detection to obtain target frames (x, y, w, h) of all human bodies under each video frame; wherein (x, y) is the center point of the object frame, w is the width, and h is the length.
Inputting each video frame and a corresponding target frame set into a segmentation model SAM for example segmentation, outputting a segmentation mask with the highest score of each target, and upsampling the mask to the size of the target frame, for example: w is h is 3.
Then, cutting the target frame back to the original video frame to obtain a picture frame of each target;
and setting each channel value of the picture frame as the average value of the channels according to the corresponding segmentation mask.
The arrangement aims to reduce the influence of background noise on the appearance representation of the target, the target picture frame after segmentation pretreatment is input into a pre-training re-recognition network, and the re-recognition characteristics of the human body target are extracted.
Specifically: and (3) dividing the preprocessed target picture frame into uniform size, such as 256×256×3, inputting the uniform size to a pre-training re-recognition network based on ResNet-50, and extracting an appearance feature vector with the dimension of each target as a preset parameter (such as 2048×1).
According to the arrangement, the SAM can be utilized to realize the segmentation of the human body target, so that the human body target characteristics without the background noise are obtained, and the accuracy of human body target identification in the tracking process is improved.
Sub-method two, target relation graph neural network processing, it includes: given a continuous video segment and a corresponding detection frame set, using a GNN (neural network) to construct a target relation graph model, and generating a target track, specifically:
the node characteristics of the target relation diagram (model) are initialized to form appearance characteristic vectors output by the first sub-method, and the edge characteristics are initialized to form the cosine distances of the relative positions and appearance characteristics of time and space between node pairs. For example:
specifically, each edge feature is initialized to a concatenation vector of multiple features, including (t v -t u X, y, w, h, similarity (f (u), f (v)), wherein t v -t u The time interval representing nodes u and v, x, y, w, h represents locating the target spatial location coordinates, f (·) represents the pre-trained ResNet-50 network, similarity (·, ·) represents the apparent similarity between the two nodes, and the calculation formula is as follows:
after the initialization of the node features and the edge features is completed, the node features and the edge features are mapped to the same feature space by using two different MLP layers respectively, and feature dimensions are ensured to be 256.
The features contained in the nodes and edges then propagate throughout the graph by way of neural messaging, and the manner of messaging includes:
message transmission from node to edge, splicing the characteristics of node u and node v and the characteristics of the connecting edge (u, v) between the node u and the node v, and obtaining the characteristics of the updated edge (u, v) through a layer of MLP;
for a node u, firstly splicing the characteristics of the node u and the characteristics of the edges (u, v), then obtaining the characteristics of the updated temporary edges (u, v) through a layer of MLP, and then averaging the characteristics of all temporary edges connected with the u to obtain the characteristics of the updated node u;
repeating the message passing process N (e.g., 12) times so that all nodes can aggregate neighbor nodes and edge features with a distance N;
and carrying out two classifications on the obtained edge characteristics to complete the prediction of the track and generate a target track.
The third sub-method and the multi-level long-range track fusion are mainly realized by adopting a recursion mode, training the relational graph network model (parameter sharing) of the second sub-method according to different levels of sequences, linking video clips with different sizes by using the model, and realizing track merging from short sequence to long sequence with the corresponding clip lengths of [2,4,8, 16, 32, 64, 128 and 256 ].
Specifically:
s11, dividing the whole video into a plurality of small fragments, constructing a target relation diagram for each fragment according to the second sub-method, and performing message transmission to obtain node and edge characteristics of the current level;
s12, aggregating the node and edge characteristics of the same track in an average pooling mode, and entering the next level after M training iterations (such as 200);
s13, in the next level, taking the node and edge characteristics output by a plurality of sub-segments of the previous level as input, continuously constructing a target relation diagram of the new level as in the second sub-method, and performing message transmission to obtain the node and edge characteristics of the new level, so as to finish track merging from short order to long order;
s14, repeating the steps S12 and S13 for a plurality of times (for example, 8 times), completing training of the 8 relationship graph network models, obtaining a target track of the complete video, and completing fusion of the multi-level tracks.
On one hand, the strategy of hierarchical track fusion is adopted to reduce the memory overhead required by modeling the whole video completion graph network before;
on the other hand, the fusion from the short-term track to the long-term track is realized in a sub-graph fusion mode, so that the execution speed of the model is improved, and the generalization capability of the model is ensured.
What needs to be known is:
1) The generalization capability of the existing method is not well represented. With different techniques for different time spans, strong assumptions must be made about the cues required for each time scale, which greatly limits the applicability of these strategies. For example, in tracking scenes where people wear uniform and have a high frame rate, such as in dance video, local trackers based on distance or motion tend to be more reliable than trackers based on visual information. However, in environments where the camera is moving vigorously or the frame rate is low, the performance of the local tracker may be significantly degraded, and visual cues may be the most reliable cues. In general, these differences inevitably lead to the need to customize a specific solution for each scenario.
2) The existing methods cannot process long-period video. When we expand the time span between the detection to be correlated, the correlation becomes more blurred due to the significant visual changes and the large displacement. Thus, short-term tracking methods using manually designed visual and dynamic cues cannot cope with any length of time span. Although graph-based approaches are more robust, for large time span correlations, it is necessary to create extremely large graphs, which is impractical, both in terms of computational effort and memory usage, and these problems can be alleviated by the present approach.
The embodiment of the application also discloses a multi-level fusion multi-target tracking system.
A multi-level fusion multi-target tracking system comprising a memory and a processor, the memory having stored thereon a computer program capable of being loaded by the processor and performing a multi-level fusion multi-target tracking method as described above.
The embodiment of the application also discloses a computer readable storage medium.
A computer readable storage medium storing a computer program capable of being loaded by a processor and executing a multi-level fusion multi-target tracking method as described above.
The above embodiments are not intended to limit the scope of the present application, so: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.
Claims (8)
1. A multi-level fusion multi-target tracking method is characterized in that: extracting target re-identification characteristics, processing a target relation graph neural network and fusing multi-layer long-range tracks by combining a zero-order segmentation network;
the method for extracting the target re-identification characteristic by the combined zero-order segmentation network is called a sub-method I, and comprises the following steps:
dividing and preprocessing a human body target based on a dividing model SAM of a transducer; the method comprises the steps of,
inputting the target picture frame after the segmentation pretreatment into a pre-training re-recognition network, and extracting re-recognition characteristics of a human body target;
the target relation graph neural network processing is called a sub-method II, which comprises the following steps: giving a continuous video segment and a corresponding detection frame set, constructing a target relation graph model by using GNN, and generating a target track;
the multi-level long-range trajectory fusion is referred to as sub-method three, which includes: training the target relation graph models in the second sub-methods according to different levels of sequences, and linking video clips with different sizes by using the target relation graph models to finish track merging from short sequence to long sequence so as to obtain a target track of a complete video.
2. The multi-level fusion multi-target tracking method of claim 1, wherein: the segmentation model SAM based on the transducer carries out segmentation pretreatment on a human body target, and comprises the following steps: assuming a given sequence of video frames, then:
selecting an adaptive detector YOLOX for target detection to obtain target frames of all human bodies under each video frame;
inputting each video frame and a corresponding target frame set into a segmentation model SAM for example segmentation, outputting a segmentation mask with the highest score of each target, and upsampling the mask to the size of the target frame;
cutting the target frame back to the original video frame to obtain a picture frame of each target;
and setting each channel value of the picture frame as the average value of the channels according to the corresponding segmentation mask.
3. The multi-level fusion multi-target tracking method of claim 2, wherein: the target picture frame after the segmentation pretreatment is input to a pre-training re-recognition network, and re-recognition characteristics of a human body target are extracted, wherein the method comprises the following steps:
and (3) dividing the preprocessed target picture frame into uniform size, inputting the uniform size into a pre-training re-recognition network based on ResNet-50, and extracting an appearance feature vector with the dimension of each target as a preset parameter.
4. The multi-level fusion multi-target tracking method of claim 3, wherein the sub-method two comprises: and initializing the node characteristics of the target relation graph model into appearance characteristic vectors output by the first sub-method, and initializing the edge characteristics into cosine distances of the appearance characteristics of the relative positions in time and space between node pairs.
5. The multi-level fusion multi-target tracking method of claim 4, wherein the sub-method two further comprises:
respectively mapping node features and edge features to the same feature space by using two different MLP layers;
features contained in nodes and edges propagate throughout the graph by way of neural messaging, and the manner of messaging includes:
message transmission from node to edge, splicing the characteristics of node u and node v and the characteristics of the connecting edge (u, v) between the node u and the node v, and obtaining the characteristics of the updated edge (u, v) through a layer of MLP;
for a node u, firstly splicing the characteristics of the node u and the characteristics of the edges (u, v), then obtaining the characteristics of the updated temporary edges (u, v) through a layer of MLP, and then averaging the characteristics of all temporary edges connected with the u to obtain the characteristics of the updated node u;
repeating the message transmission process for N times, so that all nodes can aggregate neighbor nodes and edge features with the distance of N; wherein N is a natural number;
and carrying out two classifications on the obtained edge characteristics to complete the prediction of the track and generate a target track.
6. The multi-level fusion multi-target tracking method of claim 5, wherein: the sub-method three, which comprises:
s11, dividing the whole video into a plurality of small fragments, constructing a target relation diagram for each fragment according to the second sub-method, and performing message transmission to obtain node and edge characteristics of the current level;
s12, aggregating the node and edge characteristics of the same track in an average pooling mode, and entering the next level after M training iterations; wherein M is a natural number;
s13, in the next level, taking the node and edge characteristics output by a plurality of sub-segments of the previous level as input, continuously constructing a target relation diagram of the new level as in the second sub-method, and executing message transmission to obtain the node and edge characteristics of the new level, thereby completing the track merging from short order to long order
S14, repeating the steps S12 and S13 for a plurality of times to obtain the target track of the complete video.
7. A multi-level fusion multi-target tracking system, characterized by: comprising a memory and a processor, said memory having stored thereon a computer program capable of being loaded by the processor and performing the multi-level fusion multi-target tracking method according to any of claims 1 to 6.
8. A computer readable storage medium storing a computer program capable of being loaded by a processor and executing the multi-level fusion multi-objective tracking method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311018804.XA CN117173607A (en) | 2023-08-11 | 2023-08-11 | Multi-level fusion multi-target tracking method, system and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311018804.XA CN117173607A (en) | 2023-08-11 | 2023-08-11 | Multi-level fusion multi-target tracking method, system and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117173607A true CN117173607A (en) | 2023-12-05 |
Family
ID=88934676
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311018804.XA Pending CN117173607A (en) | 2023-08-11 | 2023-08-11 | Multi-level fusion multi-target tracking method, system and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117173607A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117726821A (en) * | 2024-02-05 | 2024-03-19 | 武汉理工大学 | Medical behavior identification method for region shielding in medical video |
-
2023
- 2023-08-11 CN CN202311018804.XA patent/CN117173607A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117726821A (en) * | 2024-02-05 | 2024-03-19 | 武汉理工大学 | Medical behavior identification method for region shielding in medical video |
CN117726821B (en) * | 2024-02-05 | 2024-05-10 | 武汉理工大学 | Medical behavior identification method for region shielding in medical video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11854240B2 (en) | Vision based target tracking that distinguishes facial feature targets | |
US11176381B2 (en) | Video object segmentation by reference-guided mask propagation | |
Girdhar et al. | Detect-and-track: Efficient pose estimation in videos | |
Von Stumberg et al. | Gn-net: The gauss-newton loss for multi-weather relocalization | |
CN113920170B (en) | Pedestrian track prediction method, system and storage medium combining scene context and pedestrian social relationship | |
CN110909591A (en) | Self-adaptive non-maximum value inhibition processing method for pedestrian image detection by using coding vector | |
CN111914878A (en) | Feature point tracking training and tracking method and device, electronic equipment and storage medium | |
KR20200023221A (en) | Method and system for real-time target tracking based on deep learning | |
CN113628244A (en) | Target tracking method, system, terminal and medium based on label-free video training | |
Pavel et al. | Recurrent convolutional neural networks for object-class segmentation of RGB-D video | |
Guclu et al. | Integrating global and local image features for enhanced loop closure detection in RGB-D SLAM systems | |
CN117173607A (en) | Multi-level fusion multi-target tracking method, system and computer readable storage medium | |
Jiao et al. | Magicvo: End-to-end monocular visual odometry through deep bi-directional recurrent convolutional neural network | |
KR20200010971A (en) | Apparatus and method for detecting moving object using optical flow prediction | |
CN114612545A (en) | Image analysis method and training method, device, equipment and medium of related model | |
CN117218378A (en) | High-precision regression infrared small target tracking method | |
CN116245913A (en) | Multi-target tracking method based on hierarchical context guidance | |
CN116309707A (en) | Multi-target tracking algorithm based on self-calibration and heterogeneous network | |
Mumuni et al. | Robust appearance modeling for object detection and tracking: a survey of deep learning approaches | |
Han et al. | Multi-target tracking based on high-order appearance feature fusion | |
Mewada et al. | A fast region-based active contour for non-rigid object tracking and its shape retrieval | |
Chang et al. | Fast Online Upper Body Pose Estimation from Video. | |
Liu et al. | Accumulated micro-motion representations for lightweight online action detection in real-time | |
Li et al. | Spatial-temporal graph Transformer for object tracking against noise spoofing interference | |
CN108346158B (en) | Multi-target tracking method and system based on main block data association |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |