CN112734800A - Multi-target tracking system and method based on joint detection and characterization extraction - Google Patents
Multi-target tracking system and method based on joint detection and characterization extraction Download PDFInfo
- Publication number
- CN112734800A CN112734800A CN202011510839.1A CN202011510839A CN112734800A CN 112734800 A CN112734800 A CN 112734800A CN 202011510839 A CN202011510839 A CN 202011510839A CN 112734800 A CN112734800 A CN 112734800A
- Authority
- CN
- China
- Prior art keywords
- target
- frame
- candidate
- joint detection
- characterization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
Abstract
The invention discloses a multi-target tracking system and method based on joint detection and characterization extraction, and relates to the field of computer visual tracking. According to the technical scheme, the magnitude and the calculation cost of the network parameters to be trained are reduced, and the algorithm efficiency and the multi-target tracking precision are improved.
Description
Technical Field
The invention relates to the field of computer vision tracking, in particular to a multi-target tracking system and method based on joint detection and characterization extraction.
Background
With the rapid development of internet technology, the continuous improvement of the performance of devices such as smart phones and computers, and the continuous reduction of manufacturing cost, abundant image and video data are continuously generated every moment. As a colloquial language, "a picture prevails over a thousand of languages," huge amounts of valuable information are contained in images and videos. How to utilize the data quickly and accurately becomes a problem which needs to be solved urgently. Computer vision technology, now rapidly developed, can utilize the powerful computing power of computers to process image data instead of human eyes. Computer vision technology has become the core technology in many fields.
Multi-object tracking (MOT) is an important research direction in the field of computer vision, and its task is to continuously track and locate multiple objects, such as pedestrians on the street, vehicles on the road, etc., from a video sequence, while keeping their identity information unchanged, and then derive the motion trajectory of each object. The multi-target tracking not only can accurately detect the space-time information of the target in the video, but also can provide a great deal of valuable information for gesture prediction, action recognition, behavior analysis and the like. The multi-target tracking algorithm has wide application in the fields of intelligent video monitoring, automatic driving, intelligent robots, intelligent human-computer interaction, intelligent transportation, sports video analysis and the like, and has become a popular direction for research in recent years.
The multi-target tracking problem is an extension of the single-target tracking problem. Given a particular target, the task of single target tracking is to continuously track the target from the scene. The task of multi-target tracking is to track a series of objects of interest in a scene, such as pedestrians, vehicles, etc. in the scene. Therefore, compared with single-target tracking, multi-target tracking also needs to complete two additional tasks:
(1) judging the number change of the targets in the scene, and finishing the initialization of the new track and the termination of the old track;
(2) and keeping the identity information of the tracking target.
Currently, detection-based tracking is the mainstream paradigm of multi-target tracking, and can be divided into the following two independent subtasks:
target detection, detecting a target position in a current image;
data correlation, correlating the detection results with existing traces.
Researchers often use pre-trained target detection models directly, so that the video multi-target tracking problem is converted into a data association problem based on detection results. In order to obtain an optimal correlation result, the correlation cost and the optimization algorithm of two key links as data correlation become the research focus of the detection-based tracking algorithm.
A preliminary multi-target tracking method is designed in a domestic patent 'multi-target tracking method, device, electronic equipment and storage medium' (application number 202010573301.9), but the problem of frequent shielding among targets in a scene is not considered, and the track is broken frequently. The name of the domestic patent application number 202010605987.5 is 'an integrated target detection and associated pedestrian multi-target tracking method' which provides a model capable of simultaneously performing target detection and target feature extraction, but the target association step only adopts a simpler threshold discrimination method, which leads the method to be incapable of obtaining the optimal matching result between targets in a scene in which a plurality of similar targets simultaneously appear.
In a domestic patent 'a vehicle multi-target tracking method based on a target center point' (application number is 20201059041.1), a vehicle detection model and a tracking model are integrated in a network, so that the calculated amount and the running time are greatly reduced, and the detection based on tracking is simplified.
Accordingly, those skilled in the art are devoted to developing a multi-target tracking system and method based on joint detection and characterization extraction.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the technical problems to be solved by the present invention are: 1. how to improve the operation speed of the algorithm on the premise of keeping the accuracy of the algorithm; 2. how to organically combine two major links of target detection and data association and improve the tracking precision by comprehensively utilizing information.
In order to achieve the above purpose, the present invention provides a multi-target tracking system based on joint detection and characterization extraction, which is characterized by comprising a joint detection and characterization extraction module, a trajectory prediction module and a candidate frame screening module, wherein the joint detection and characterization extraction module is composed of a backbone network, a region selection network, a target boundary frame regressor and a characterization extractor.
Furthermore, the track prediction module adopts a linear motion model, speculates the possible position of the tracked target in the current video frame according to the motion information of the track, and corrects the existing track to reduce the error.
Furthermore, the candidate frame screening module adopts a non-maximization inhibition algorithm with identity transmission, can screen out an optimal candidate frame with confidence, and completes data association of the detected candidate frame and the track through identity transmission.
Further, the backbone network adopts a backbone network capable of extracting image features, and a feature pyramid network is established on the basis of the backbone network.
Further, the target bounding box regressor and the characterization extractor both adopt a deep neural network structure, and the target bounding box regressor uses a full-connection layer network.
A multi-target tracking method based on joint detection and characterization extraction comprises the following steps:
and 5, if the current frame is not the last frame of the video, returning to the first step, and if not, ending.
Further, the step 2 further comprises:
step 2.1, detecting a target in the image;
2.2, predicting the possible position of the track;
step 2.3, generating a candidate frame;
and 2.4, extracting the characterization vectors.
Further, step 3 adopts a non-polarization inhibition method with identity transfer, and specifically includes:
step 3.1, clustering candidate frames input according to intersection and comparison between the target boundary frames, clustering the candidate frames belonging to the same target into one class by using the spatial relationship between the candidate frames, and distinguishing the candidate frames not belonging to the same target;
3.2, if a certain cluster in the clustering result contains a candidate frame with an identity label, transmitting the identity label of the candidate frame to all candidate frames in the cluster;
and 3.3, deleting the candidate box with the non-maximum confidence coefficient in each cluster, and only keeping the candidate box with the maximum confidence coefficient in the cluster.
Further, the step 4 further includes:
step 4.1, updating the track in the active track set;
4.2, comparing the characteristics between the inactivation track set and the screening result, and carrying out re-identification operation;
and 4.3, updating the tracks successfully re-identified in the inactivation track set, adding the tracks into the active track set, taking the screening result of the re-identification failure as a new target, creating tracks for the tracks and adding the tracks into the active track set.
Further, the adopted re-identification method is a short-term method based on Euclidean distance between the characterization vectors.
Technical effects
1. And providing a joint detection and characterization extraction module. The target position in the image can be detected, and the target appearance representation for subsequent re-recognition can be extracted, so that the magnitude and the calculation cost of the network parameters to be trained are greatly reduced.
2. A candidate box generation module is designed. The module generates a detection candidate frame by searching a target position in a current image, directionally generates a track candidate frame corresponding to the existing track, and extracts target characteristics in the candidate frame, so that the target position in the image can be accurately detected, the subsequent data association steps are greatly facilitated, and the algorithm efficiency is obviously improved.
3. A candidate box screening module is designed. The module adopts a non-maximum inhibition algorithm with identity transfer, can screen the most accurate target boundary box through a unified standard, and obviously improves the tracking precision; and the association of the candidate box with the existing track is efficiently completed through the identity transfer operation.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a system flow diagram of a preferred embodiment of the present invention;
FIG. 2 is a block diagram of a joint detection and characterization extraction model according to a preferred embodiment of the present invention;
FIG. 3 is a schematic diagram of a regression network for the target bounding box according to a preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of a token extraction network in accordance with a preferred embodiment of the present invention.
Detailed Description
The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
As shown in fig. 1, a target tracking method based on a joint detection and characterization extraction module includes the following steps:
the first step is as follows: movement track setSet of inactivation trajectoriesConverting the video frame sequence I to { I ═ I0,i1,...,iT-1Inputting the frames into the main network of the module to obtain the characteristic table F of the current frame imaget;
The second step is that: based on the characteristic table FtThe candidate frame C is generated by the following four stepst。
2.1 detecting objects in the image. The RPN generates a reference bounding box on each pixel of the image and based on the feature table FtFrom which to find areas where there is a possibility of targets
2.2 predicting the possible positions of the trajectory. The track prediction module infers the possible position of the tracked target in the current video frame according to the motion information of the trackWith RPN outputDifferent, output of prediction moduleWith identity information of the corresponding track, which will be between the candidate box and the trackThe association of (a) provides convenience.
2.3 generating candidate boxes. Bounding box with low precisionAndand characteristic table FtInputting the candidate frame C into a target bounding box grouping module to obtain a candidate frame Ct=Dt+Bt. Therein called DtTo detect candidate boxes, BtIs a track candidate box. In this step, the process is carried out,will automatically pass on to Bt。
2.4 extracting the characterization vector. The step is prepared for a subsequent pedestrian re-identification link. Algorithm will frame candidates CtAnd characteristic table FtThe input is input into a characterization extractor of the module, and a characterization vector of each candidate box is calculated.
The third step: and screening candidate frames.
Candidate frame C generated in the second steptComprises two parts: 1) detection candidate frame D from RPNt(ii) a 2) Trajectory candidate box B from the prediction modulet. Neither of these can be directly regarded as a tracking result of the current frame. Firstly, detecting that a candidate frame is not associated with a track, so that the candidate frame does not carry identity information; secondly, because the prediction accuracy of the prediction module is limited, the direct use of the trajectory candidate box will make the trajectory accuracy not high. The method adopts a non-maximum inhibition method with identity transfer from CtScreening out the optimal candidate frame C'tThe method comprises the following steps:
3.1 clustering. According to the intersection ratio (IoU) between the target bounding boxes, the candidate box set C is processedtAnd clustering, namely clustering the candidate frames belonging to the same target into one class by using the spatial relationship among the candidate frames, and distinguishing the candidate frames not belonging to the same target.
3.2 identity transfer. And if a certain cluster in the clustering result contains a candidate frame with the identity label, transmitting the identity label of the candidate frame to all candidate frames in the cluster.
3.3 inhibition. And deleting the candidate box with non-maximum confidence in each cluster, and only keeping the candidate box with the maximum confidence in the cluster.
This step is to mix CtScreening is optimal candidate frame C'tOf which is C't=D′t+B′t,D′tA bounding box, B ', representing the screening result without the identity information yet'tIs a bounding box with identity information.
The fourth step: and (4) track processing, including track generation, extension and deletion.
4.1 update the track. According to B'tThe corresponding track in T is updated according to the identity information and the position information in the T; deleting ' B ' in activity track set T 'tCorrelating the traces and adding them into the inactivation trace set T';
4.2 re-identification.
Due to frequent occlusion between targets in the scene, the candidate frame D 'without the identity label in the screening result'tPossibly a newly emerging target, and may also belong to a portion of the trajectory of an occluded target. In order to reduce track fracture and keep the linearity and real-time performance of the algorithm, the short-term pedestrian re-identification method is adopted to judge D'tWhether it is an occluded target: first, the traces in the inactive trace set T' will be additionally saved TsFrame time, and still using the trajectory prediction module to predict the location of the trajectory in T' during this time; according to D'tAnd the distance between the characteristic vector of the track in the T' is used for judging whether the two are the same target or not. In order to reduce the error re-identification rate, the following judgment criteria are set: firstly, the distance between the two characterization vectors must be smaller than a certain threshold; second, the interaction ratio between the two is greater than a certain threshold.
And after the re-identification step is completed, updating the successfully re-identified track in the inactivation track set T', and adding the track into the active track set T. D'tWith medium heavy recognition failureThe candidate box is a newly emerged target for which a new trajectory is created and added to the active trajectory set T.
The fifth step: if the current frame is not the last frame of the video, returning to the first step; otherwise, ending.
As shown in fig. 2, the multi-target tracking system based on joint detection and feature extraction uses a designed joint detection and feature extraction module as a core skeleton, and adds a trajectory prediction module and a candidate frame extraction module to complete a multi-target tracking task. The joint detection and characterization extraction module is integrally composed of a backbone network, a Region selection network (RPN), a target bounding box regressor and a characterization extractor. The module can not only detect the target position in the image, but also extract the representation vector of the target.
The backbone network adopts a backbone network which can extract image features, such as Alexnet, VGG, Resnet series, increment series, Densenet series, ResNeXt series and the like. In addition, a Feature Pyramid Network (FPN) is established on the basis of the backbone Network, so that the position of the target can be accurately detected based on Feature tables of different scales.
The region selection network adopts the RPN structure of fast RCNN, which can search the region with object from the picture. The RPN first generates a large number of reference bounding boxes (Anchors) on each pixel point in the image. Secondly, finding out the characteristics corresponding to each reference boundary frame in the characteristic table, and judging whether a target exists in the reference boundary frame; meanwhile, the reference boundary frame is made to accord with the actual position of the target as much as possible by a regression method of the target boundary frame. Typically, the RPN generates the reference bounding box with an aspect ratio of { 1: 2, 1: 1, 2: 1 }. In practical applications, an appropriate aspect ratio may be selected according to the characteristics of the target of interest to improve accuracy and efficiency.
As shown in fig. 3 and 4, both the target bounding box regressor and the characterization extractor adopt a deep neural network structure. The deep neural network has excellent fitting capability and characteristic representation capability, and can effectively improve the accuracy of the algorithm. The target bounding box regressor shown in fig. 3 uses a 4-layer full-link network with the number {1, 2, 3, 4 }. The module obtains a boundary frame with more accurate positioning precision and a corresponding confidence coefficient according to the feature table and the boundary frame with poorer positioning precision. The token extractor shown in fig. 4 uses a 3-layer full connectivity layer network numbered 5, 6, 7. The module extracts a characterization vector of the target according to the feature table and the target bounding box. The generated characterization vector satisfies the following properties: given a distance metric method, the distance between the token vectors of the same object before and after the video is sufficiently small, while the distance between the token vectors of different objects is sufficiently large.
The track prediction module is used for presuming the possible position of a tracked target in the current video frame according to the motion information of the track and correcting the existing track to reduce errors. The method can effectively reduce the search space and improve the tracking precision. The trajectory prediction module predicts a most likely position of the trajectory at the current time based on the linear motion model.
The candidate box filtering module employs a non-maximization suppression algorithm with identity delivery. Different from general non-maximization suppression, the non-maximization suppression with identity transmission is performed after clustering, and if a certain cluster in a clustering result contains a candidate frame with an identity label, the identity label of the candidate frame is transmitted to all candidate frames in the cluster. The module can screen out the optimal candidate frame according to the confidence coefficient, and completes the data association of the detection candidate frame and the track through identity transmission, thereby avoiding the complex similarity measurement calculation and binary image distribution process.
The embodiment of the application also provides electronic equipment which comprises a processor and a memory.
The memory is used for storing a computer program;
the processor is used for realizing any one of the multi-target tracking methods when executing the program stored in the memory.
Embodiments of the present application may also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any one of the multi-target tracking methods described above.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (10)
1. The multi-target tracking system based on joint detection and characterization extraction is characterized by comprising a joint detection and characterization extraction module, a trajectory prediction module and a candidate frame screening module, wherein the joint detection and characterization extraction module is composed of a backbone network, a region selection network, a target boundary frame regressor and a characterization extractor.
2. The multi-target tracking system based on joint detection and feature extraction as claimed in claim 1, wherein the trajectory prediction module uses a linear motion model to infer the possible position of the tracked target in the current video frame according to the motion information of the trajectory, and corrects the existing trajectory to reduce the error.
3. The multi-target tracking system based on joint detection and characterization extraction according to claim 1, wherein the candidate frame screening module employs a non-maximization inhibition algorithm with identity transfer, can screen out an optimal candidate frame with confidence, and completes data association of the detected candidate frame and the track through identity transfer at the same time.
4. The multi-target tracking system based on joint detection and feature extraction as claimed in claim 1, wherein the backbone network adopts a backbone network capable of extracting image features, and a feature pyramid network is established on the basis of the backbone network.
5. The joint detection and characterization extraction based multi-target tracking system of claim 1 wherein the target bounding box regressor and the characterization extractor both employ a deep neural network structure, the target bounding box regressor using a full connectivity layer network.
6. A multi-target tracking method based on joint detection and characterization extraction is characterized by comprising the following steps:
step 1, making an active track set as an empty set and an inactive track set as an empty set; inputting the video frame sequence into the backbone network frame by frame to obtain a feature table of the current frame image;
step 2, generating a candidate frame by utilizing the functions of the RPN, the boundary frame regressor, the characterization extractor and the like in the track prediction module and the joint detection and characterization extraction module according to the information in the feature table;
step 3, screening out the optimal candidate frame from the candidate frames by adopting a non-maximum inhibition method with identity transfer;
step 4, updating the track according to the screening result, including track generation, extension and deletion;
and 5, if the current frame is not the last frame of the video, returning to the first step, and if not, ending.
7. The multi-target tracking method based on joint detection and characterization extraction as claimed in claim 6, wherein said step 2 further comprises:
step 2.1, detecting a target in the image;
2.2, predicting the possible position of the track;
step 2.3, generating a candidate frame;
and 2.4, extracting the characterization vectors.
8. The multi-target tracking method based on joint detection and characterization extraction as claimed in claim 6, wherein the step 3 adopts a non-polarization inhibition method with identity transfer, specifically comprising:
step 3.1, clustering candidate frames input according to intersection and comparison between the target boundary frames, clustering the candidate frames belonging to the same target into one class by using the spatial relationship between the candidate frames, and distinguishing the candidate frames not belonging to the same target;
3.2, if a certain cluster in the clustering result contains a candidate frame with an identity label, transmitting the identity label of the candidate frame to all candidate frames in the cluster;
and 3.3, deleting the candidate box with the non-maximum confidence coefficient in each cluster, and only keeping the candidate box with the maximum confidence coefficient in the cluster.
9. The multi-target tracking method based on joint detection and characterization extraction as claimed in claim 6, wherein said step 4 further comprises:
step 4.1, updating the track in the active track set;
4.2, comparing the characteristics between the inactivation track set and the screening result, and carrying out re-identification operation;
and 4.3, updating the tracks successfully re-identified in the inactivation track set, adding the tracks into the active track set, taking the screening result of the re-identification failure as a new target, creating tracks for the tracks and adding the tracks into the active track set.
10. The multi-target tracking method based on joint detection and feature extraction as claimed in claim 9, wherein the re-recognition method is a short-term one based on Euclidean distance between feature vectors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011510839.1A CN112734800A (en) | 2020-12-18 | 2020-12-18 | Multi-target tracking system and method based on joint detection and characterization extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011510839.1A CN112734800A (en) | 2020-12-18 | 2020-12-18 | Multi-target tracking system and method based on joint detection and characterization extraction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112734800A true CN112734800A (en) | 2021-04-30 |
Family
ID=75603418
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011510839.1A Pending CN112734800A (en) | 2020-12-18 | 2020-12-18 | Multi-target tracking system and method based on joint detection and characterization extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112734800A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113469118A (en) * | 2021-07-20 | 2021-10-01 | 京东科技控股股份有限公司 | Multi-target pedestrian tracking method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919974A (en) * | 2019-02-21 | 2019-06-21 | 上海理工大学 | Online multi-object tracking method based on the more candidate associations of R-FCN frame |
CN110991272A (en) * | 2019-11-18 | 2020-04-10 | 东北大学 | Multi-target vehicle track identification method based on video tracking |
CN111080673A (en) * | 2019-12-10 | 2020-04-28 | 清华大学深圳国际研究生院 | Anti-occlusion target tracking method |
CN111126152A (en) * | 2019-11-25 | 2020-05-08 | 国网信通亿力科技有限责任公司 | Video-based multi-target pedestrian detection and tracking method |
CN111476116A (en) * | 2020-03-24 | 2020-07-31 | 南京新一代人工智能研究院有限公司 | Rotor unmanned aerial vehicle system for vehicle detection and tracking and detection and tracking method |
CN111639551A (en) * | 2020-05-12 | 2020-09-08 | 华中科技大学 | Online multi-target tracking method and system based on twin network and long-short term clues |
-
2020
- 2020-12-18 CN CN202011510839.1A patent/CN112734800A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919974A (en) * | 2019-02-21 | 2019-06-21 | 上海理工大学 | Online multi-object tracking method based on the more candidate associations of R-FCN frame |
CN110991272A (en) * | 2019-11-18 | 2020-04-10 | 东北大学 | Multi-target vehicle track identification method based on video tracking |
CN111126152A (en) * | 2019-11-25 | 2020-05-08 | 国网信通亿力科技有限责任公司 | Video-based multi-target pedestrian detection and tracking method |
CN111080673A (en) * | 2019-12-10 | 2020-04-28 | 清华大学深圳国际研究生院 | Anti-occlusion target tracking method |
CN111476116A (en) * | 2020-03-24 | 2020-07-31 | 南京新一代人工智能研究院有限公司 | Rotor unmanned aerial vehicle system for vehicle detection and tracking and detection and tracking method |
CN111639551A (en) * | 2020-05-12 | 2020-09-08 | 华中科技大学 | Online multi-target tracking method and system based on twin network and long-short term clues |
Non-Patent Citations (1)
Title |
---|
裴明涛: "视频事件分析与理解", 北京理工大学出版社 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113469118A (en) * | 2021-07-20 | 2021-10-01 | 京东科技控股股份有限公司 | Multi-target pedestrian tracking method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Marvasti-Zadeh et al. | Deep learning for visual tracking: A comprehensive survey | |
CN109858390B (en) | Human skeleton behavior identification method based on end-to-end space-time diagram learning neural network | |
US20220383535A1 (en) | Object Tracking Method and Device, Electronic Device, and Computer-Readable Storage Medium | |
Han et al. | Adaptive discriminative deep correlation filter for visual object tracking | |
US20220254157A1 (en) | Video 2D Multi-Person Pose Estimation Using Multi-Frame Refinement and Optimization | |
Kim et al. | CDT: Cooperative detection and tracking for tracing multiple objects in video sequences | |
Shuai et al. | Multi-object tracking with siamese track-rcnn | |
Li et al. | When object detection meets knowledge distillation: A survey | |
Chang et al. | Fast Random‐Forest‐Based Human Pose Estimation Using a Multi‐scale and Cascade Approach | |
CN114898403A (en) | Pedestrian multi-target tracking method based on Attention-JDE network | |
Wu et al. | FSANet: Feature-and-spatial-aligned network for tiny object detection in remote sensing images | |
Urdiales et al. | An improved deep learning architecture for multi-object tracking systems | |
Pang et al. | Analysis of computer vision applied in martial arts | |
CN114926859A (en) | Pedestrian multi-target tracking method in dense scene combined with head tracking | |
Song et al. | Detection and tracking of safety helmet based on DeepSort and YOLOv5 | |
Zhang et al. | Residual memory inference network for regression tracking with weighted gradient harmonized loss | |
Gu et al. | Real-time streaming perception system for autonomous driving | |
CN112734800A (en) | Multi-target tracking system and method based on joint detection and characterization extraction | |
Zhang et al. | Action detection with two-stream enhanced detector | |
Yang et al. | Explorations on visual localization from active to passive | |
CN116245913A (en) | Multi-target tracking method based on hierarchical context guidance | |
Yi et al. | Single online visual object tracking with enhanced tracking and detection learning | |
Nalaie et al. | AttTrack: Online deep attention transfer for multi-object tracking | |
Narmadha et al. | Robust Deep Transfer Learning Based Object Detection and Tracking Approach. | |
Dou et al. | Boosting cnn-based pedestrian detection via 3d lidar fusion in autonomous driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |