CN116109673A - Multi-frame track tracking system and method based on pedestrian gesture estimation - Google Patents

Multi-frame track tracking system and method based on pedestrian gesture estimation Download PDF

Info

Publication number
CN116109673A
CN116109673A CN202310095186.2A CN202310095186A CN116109673A CN 116109673 A CN116109673 A CN 116109673A CN 202310095186 A CN202310095186 A CN 202310095186A CN 116109673 A CN116109673 A CN 116109673A
Authority
CN
China
Prior art keywords
frame
detection
tracking
frame image
gesture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310095186.2A
Other languages
Chinese (zh)
Inventor
田炜
高众
艾文瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202310095186.2A priority Critical patent/CN116109673A/en
Publication of CN116109673A publication Critical patent/CN116109673A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-frame track tracking system based on pedestrian gesture estimation and a method thereof, wherein the system is realized based on a Tracking by Object Detection tracking frame, and the invention increases the point detection on the basis of single-frame object detection, and simultaneously introduces the information of the point into tracking so as to track through a target and a gesture detection paradigm thereof; the method comprises the following steps: the single frame image is input into a detector after feature extraction, and the detection confidence and the coordinates of a detection frame are output; respectively predicting the gesture of pedestrians in different detection frames; initializing a tracker according to output data of a corresponding single-frame model of a first frame image of the video; on the basis of matching based on the detection frame and matching based on the gesture information, tracking optimization is performed based on the reference point of the detector, and target association is established between two frames of images by using the relevant judgment of the gesture and the reference point of the detector. Compared with the prior art, the method and the device can optimize the overall tracking effect, improve the detection and association performance under the scene with shielding and movement, and effectively improve the tracking effect.

Description

Multi-frame track tracking system and method based on pedestrian gesture estimation
Technical Field
The invention relates to the technical field of automatic driving, in particular to a multi-frame track tracking system and method based on pedestrian gesture estimation.
Background
Autopilot, one of the major trends in the development of the automotive industry in recent years, has become a current research hotspot for its corresponding detection and control technology.
In addition to the vehicles coming and going in the common traffic environment, a considerable number of pedestrians are indispensable, and pedestrian detection naturally becomes an unavoidable link in the automatic driving technology. How to detect and track the human body gesture in the automatic driving environment by using the vehicle-mounted visual perception system becomes an important proposition of the direction. The traditional algorithm adopts manual features and complex human models to acquire local representation and global gesture structures, and more models begin to adopt a deep learning method to extract relevant features in consideration of the complexity of the human body.
In recent years, deep learning algorithms have evolved rapidly, and a large number of efficient models and sophisticated data sets have been published, which enable multi-objective understanding and tracking based on human body pose estimation. However, most of the existing methods directly use the results of the detector to analyze the tracking track, and although many powerful detector models have excellent performance, simply tracking with the results of the detector will reduce the overall correlation performance in the scene with occlusion and motion, resulting in poor actual tracking effect.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a multi-frame track tracking method based on pedestrian gesture estimation, which effectively improves the tracking effect by integrating human gesture information on the basis of the prior detector.
The aim of the invention can be achieved by the following technical scheme: the multi-frame track tracking system based on pedestrian gesture estimation is realized based on a Tracking by Object Detection tracking frame, the point detection is added on the basis of single-frame object detection, and meanwhile, the information of the point is introduced into tracking so as to track through a target and a gesture detection paradigm thereof;
the multi-person gesture estimation module is used for carrying out 2D human gesture estimation on the pedestrian target and outputting corresponding 2D gesture key point coordinates;
the tracker is used for tracking and matching the pedestrian targets in the current frame image and the previous frame image according to the output data of the single frame model and synchronously updating the parameters of the tracker.
Further, the detector specifically adopts a transformable DETR framework based on a transformator.
A multi-frame track tracking method based on pedestrian gesture estimation comprises the following steps:
s1, extracting a single-frame image from video data acquired by a vehicle-mounted camera, and inputting a single-frame model;
s2, processing the input single-frame image by the single-frame model, and outputting detection confidence degrees, detection frames and 2D gesture key point coordinates of all pedestrian targets in the single-frame image;
s3, outputting data according to a single frame model corresponding to a first frame image in the video data, initializing parameters of the tracker, and updating the output data to the tracker by the single frame model;
and S4, the tracker performs tracking matching on the pedestrian targets in the current frame image and the previous frame image, outputs tracking results, and synchronously updates the parameters of the tracker.
Further, the step S2 specifically includes the following steps:
s21, inputting a single-frame image into a detector of a single-frame model, and outputting detection confidence degrees and detection frames corresponding to all pedestrian targets in the single-frame image;
s22, according to the data output by the detector, carrying out 2D gesture estimation on each pedestrian target in the single-frame image by a multi-person gesture estimation module of the single-frame model, and outputting corresponding 2D gesture key point coordinates.
Further, the initializing parameters of the tracker in step S3 specifically includes initializing the following parameters of the tracker:
the method comprises the steps of detecting confidence coefficient, a detection frame, 2D gesture key points and track IDs, wherein the detection confidence coefficient, the detection frame and the 2D gesture key points are respectively corresponding to single-frame model output data corresponding to a first frame image in video data, and the track IDs are unrepeatable marks for starting to assign from 0.
Further, the specific process of the tracker in step S4 for tracking and matching the pedestrian targets in the current frame image and the previous frame image is as follows:
s41, respectively calculating the matching degree of a detection frame, the similarity of key points and the matching score based on reference points according to single-frame model data corresponding to the current frame image and single-frame model data corresponding to the previous frame image, and accumulating the three calculation results to obtain a final matching score matrix;
s42, determining the track similarity between the current frame image and the previous frame image according to the final matching score matrix, and judging that a target in the previous frame image finds a matched object in the current frame image, namely the matching is successful if the track similarity exceeds a corresponding preset threshold;
otherwise, judging that the target in the previous frame image does not have a matched object in the current frame image, namely the matching fails.
Further, the matching degree of the detection frame is specifically:
Figure BDA0004071507920000031
wherein A and B correspond to the area occupied by the two detection frames respectively, and C is the area occupied by the circumscribed minimum rectangle of AB.
Further, the key point similarity is specifically:
Figure BDA0004071507920000032
Figure BDA0004071507920000033
Figure BDA0004071507920000034
wherein d is Euclidean distance between corresponding key points, S is the size of the object, x and y are coordinate values of the key points,
Figure BDA0004071507920000035
is the two vertex coordinates on the diagonal of the object truth box.
Further, the calculating process of the matching score based on the reference point comprises the following steps:
1) Performing primary matching on the reference points according to the characteristics and the detection frame;
2) Rearranging the embedded features according to the sequence of the primary matching result of the reference points to obtain the sequence of the reference points of the current frame;
3) Obtaining the offset of a corresponding reference point by using a group of multi-layer perceptrons to obtain the reference point coordinate required by tracking the branch;
4) And rearranging the reference points of the previous frame according to the current frame, and sending the rearranged reference points to a decoder of the current frame to obtain the matching score based on the reference points.
Further, the specific process of updating the parameters of the tracker in step S4 is as follows:
if the matching is successful, updating the confidence coefficient, the detection frame and the 2D key point parameters which are currently stored in the tracker by utilizing the single frame model data corresponding to the current frame, and simultaneously keeping the activation state of the successfully matched target;
if the matching fails, the state corresponding to the target is turned into suspension, a suspension counter is increased by one, and when the target matching object is tracked and found in the subsequent frame image, the suspension counter is cleared; if the suspension counter exceeds a preset threshold, the tracking of the target is turned off.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on Tracking by Detection tracking frame, add the phase detection on the basis of the object detection of the single frame, introduce tracking in the information of phase at the same time, thus construct a multi-frame track tracking system, including single frame model and tracker, wherein there are detector and multiple person gesture estimation module in the single frame model, utilize the detector to predict the detection frame that the pedestrian is located; respectively carrying out gesture prediction on a single person by utilizing a multi-person 2D gesture prediction module, and outputting coordinates of 2D key points of the human body; initializing a tracker by utilizing single-frame model output data of a first frame image of the video; and establishing a connection between two frames of images by using a tracker, and updating a tracking track by simultaneously predicting the detection frame and the key point coordinate information of the pedestrian, so that a tracking result is optimized, and the tracking effect is effectively improved.
2. In the invention, a transformator-based default DETR frame is adopted by a detector in a single frame model and is used for outputting single detection confidence coefficient and a detection frame, and the feature extraction capability of the default DETR can be fully utilized, so that the priori structural information of a human body is mined based on the sampling point in the default DETR, a tracking optimization structure based on a reference point is realized, and the overall detection and association performance of the tracker under a scene with shielding and movement can be improved.
3. According to the invention, the tracker performs tracking matching on the targets in the current frame and the previous frame by calculating the matching degree of the detection frame, the similarity of the key points and the matching score based on the reference points according to the data output by the single frame model, and performs synchronous updating on the tracker parameters based on the matching result. Therefore, real-time optimization of the parameters of the tracker can be ensured, and the track tracking accuracy can be fully ensured.
4. In the invention, the tracker further performs tracking optimization based on the reference point of the detector on the basis of the matching based on the detection frame and the matching based on the gesture information, and the target association is established between two frames by utilizing the correlation judgment of the gesture and the reference point of the detector through calculating the matching score based on the reference point. Compared with a method for tracking the track by directly using the result of the detector, the method can greatly optimize the whole tracking effect and improve the detection and association performance under the scene with shielding and movement.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a process by which a tracker calculates a reference point based matching score.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples.
Examples
The multi-frame track tracking system based on pedestrian gesture estimation is realized based on a Tracking by Object Detection tracking frame, the point detection is added on the basis of single-frame object detection, meanwhile, the information of the point is introduced into tracking, namely, a Tracking by Object and Pose Detection tracking frame is constructed, tracking is carried out through a target and a gesture detection mode thereof, and the system concretely comprises a single-frame model and a tracker which are sequentially connected, wherein the single-frame model is connected with a vehicle-mounted camera so as to acquire a single-frame image from video data acquired by the vehicle-mounted camera;
a detector and a multi-person gesture estimation module are arranged in the single-frame model, and the detector is used for outputting detection confidence degrees and detection frames corresponding to all pedestrian targets in the single-frame image; the multi-person gesture estimation module is used for carrying out 2D human gesture estimation on the pedestrian target and outputting corresponding 2D gesture key point coordinates;
and the tracker is used for tracking and matching the pedestrian targets in the current frame image and the previous frame image according to the output data of the single frame model and synchronously updating the parameters of the tracker.
In this embodiment, the detector specifically employs a transformable DETR framework based on a transformator, and is used to output a single detection confidence level and a single detection frame. The human body detection frame and corresponding feature extraction can be provided for the tracking task by the Deformable DETR, but prior structural information of a human body is not considered, so that the technical scheme is provided with a multi-person gesture estimation module for processing human body gesture information in an emphasized mode, the multi-person gesture estimation module adopts a top-down scheme, performs 2D human body gesture estimation on a single person according to the human body detection frame predicted by the detector, and takes 2D key point coordinates as output.
Because relevant labels of human body gestures are needed to be utilized, the embodiment trains the tracker based on the gesture estimation on a multi-frame image dataset PoseTrack of key points of human bones in advance so as to integrate a detection frame and joint points to carry out multi-frame tracking.
By applying the system, a multi-frame track tracking method based on pedestrian gesture estimation is realized, as shown in fig. 1, and the method comprises the following steps:
s1, extracting a single-frame image from video data acquired by a vehicle-mounted camera, and inputting a single-frame model;
s2, processing the input single-frame image by the single-frame model, and outputting detection confidence degrees, detection frames and 2D gesture key point coordinates of all pedestrian targets in the single-frame image;
s3, initializing parameters of the tracker according to single-frame model output data corresponding to the first frame image in the video data, and updating the output data of the single-frame model to the tracker, wherein the parameters for initializing the tracker comprise detection confidence, a detection frame, 2D gesture key points and track IDs, the track IDs are unrepeatable marks which are assigned from 0, and the detection confidence, the detection frame and the 2D gesture key points correspond to the single-frame model output parameters corresponding to the first frame image;
s4, the tracker performs tracking matching on pedestrian targets in the current frame image and the previous frame image (specifically, performs tracking matching on three information of the matching degree of the detection frame, the similarity of the key points and the matching score of the tracking optimization module based on the reference points), and synchronously updates the parameters of the tracker;
wherein, the matching degree score based on the detection frame is calculated as:
Figure BDA0004071507920000061
wherein A, B corresponds to the area occupied by two detection frames, and C is the area occupied by the circumscribed minimum rectangle of AB;
similarity score based on keypoints is calculated as:
Figure BDA0004071507920000062
Figure BDA0004071507920000063
Figure BDA0004071507920000064
where d represents the Euclidean distance between the corresponding nodes, and S represents the size of the object, and x, y correspond to the coordinate values of the nodes.
Figure BDA0004071507920000065
Two vertex coordinates on a diagonal of the object truth box;
as shown in fig. 2, the multi-frame tracking optimization module flow based on the reference point is as follows:
1) Performing primary matching on the reference points according to the characteristics and the detection frame;
2) Rearranging the sounding features according to the sequence of the primary matching result of the reference points to obtain the sequence of the reference points of the current frame;
3) Calculating a group of MLPs to obtain offset of a corresponding reference point, and obtaining reference point coordinates required by a tracking branch;
4) Rearranging the reference point sequence of the previous frame according to the current frame and sending the reference point sequence into a decoder of the current frame, so that a matching result based on the reference point can be obtained;
the primary matching result of the reference points is as follows:
for the ith detection result of a given T-th frame,
Figure BDA0004071507920000066
representing the corresponding ebedding feature, +.>
Figure BDA0004071507920000067
Reference point for DETR, +.>
Figure BDA0004071507920000068
A predicted detection frame size;
Figure BDA0004071507920000071
Figure BDA0004071507920000072
and obtaining a reference point matching result between two continuous frames of images by using a Hungary algorithm through the weighted sum of the two distance matrixes.
Accumulating the three calculation results to obtain a final matching score matrix, determining the track similarity between the current frame image and the previous frame image according to the final matching score matrix, and judging that a target in the previous frame image finds a matched object in the current frame image, namely the matching is successful if the track similarity exceeds a corresponding preset threshold;
otherwise, judging that the target in the previous frame image does not have a matched object in the current frame image, namely the matching fails.
In addition, when updating the parameters of the tracker itself, updating is mainly performed based on the following principles:
1) For tracks that match successfully in the current frame: and updating the data such as the confidence coefficient, the detection frame, the 2D key points and the like stored in the tracker by using the result of the current frame, and simultaneously maintaining the activation state of the data.
2) For trajectories with no matching object in the current frame: and converting the state of the object into suspension, adding one to the suspension counter, resetting the suspension counter if a matching object is found, and closing the track if the suspension counter exceeds a threshold value.
In summary, the technical scheme considers that multi-target tracking can be generally divided into two subtasks, namely detection and association, and the existing method directly adopts the result of the detector to analyze the tracking track, so that the overall detection and association performance is reduced in a scene with shielding and movement. The human body detection frame and the corresponding feature extraction can be provided for the tracking task, but the prior structural information of the human body is not considered, so that the technical scheme is used for emphasizing the human body posture information by adding the multi-person posture estimation module, the improved human body posture detection frame can be used for simultaneously detecting the key points of the postures of pedestrians and pedestrians, the integration and integration of the human body posture information are realized, and a certain connection can be established between two frames instead of isolated detection by utilizing the judgment related to the postures and the reference point of the detection frame, so that the tracking result is effectively optimized.

Claims (10)

1. The multi-frame track tracking system based on pedestrian gesture estimation is characterized in that a Tracking by Object Detection tracking frame is adopted, the point detection is added on the basis of single-frame object detection, meanwhile, the information of the point is introduced into tracking so as to track through targets and gesture detection norms thereof, the system comprises a single-frame model and a tracker which are sequentially connected, the single-frame model is connected with a vehicle-mounted camera so as to acquire a single-frame image from video data acquired by the vehicle-mounted camera, a detector and a multi-person gesture estimation module are arranged in the single-frame model, and the detector is used for outputting detection confidence and detection frames corresponding to all pedestrian targets in the single-frame image;
the multi-person gesture estimation module is used for carrying out 2D human gesture estimation on the pedestrian target and outputting corresponding 2D gesture key point coordinates;
the tracker is used for tracking and matching the pedestrian targets in the current frame image and the previous frame image according to the output data of the single frame model and synchronously updating the parameters of the tracker.
2. The multi-frame trajectory tracking system based on pedestrian pose estimation according to claim 1, wherein said detector specifically employs a transformable DETR framework based on a transducer.
3. The multi-frame track tracking method based on the pedestrian gesture estimation is characterized by comprising the following steps of:
s1, extracting a single-frame image from video data acquired by a vehicle-mounted camera, and inputting a single-frame model;
s2, processing the input single-frame image by the single-frame model, and outputting detection confidence degrees, detection frames and 2D gesture key point coordinates of all pedestrian targets in the single-frame image;
s3, outputting data according to a single frame model corresponding to a first frame image in the video data, initializing parameters of the tracker, and updating the output data to the tracker by the single frame model;
and S4, the tracker performs tracking matching on the pedestrian targets in the current frame image and the previous frame image, outputs tracking results, and synchronously updates the parameters of the tracker.
4. A multi-frame trajectory tracking method based on pedestrian pose estimation according to claim 3, wherein said step S2 specifically comprises the steps of:
s21, inputting a single-frame image into a detector of a single-frame model, and outputting detection confidence degrees and detection frames corresponding to all pedestrian targets in the single-frame image;
s22, according to the data output by the detector, carrying out 2D gesture estimation on each pedestrian target in the single-frame image by a multi-person gesture estimation module of the single-frame model, and outputting corresponding 2D gesture key point coordinates.
5. The multi-frame trajectory tracking method based on pedestrian pose estimation according to claim 3, wherein the initializing of parameters of the tracker in step S3 is specifically initializing the following parameters of the tracker:
the method comprises the steps of detecting confidence coefficient, a detection frame, 2D gesture key points and track IDs, wherein the detection confidence coefficient, the detection frame and the 2D gesture key points are respectively corresponding to single-frame model output data corresponding to a first frame image in video data, and the track IDs are unrepeatable marks for starting to assign from 0.
6. The multi-frame track tracking method based on pedestrian gesture estimation according to claim 3, wherein the specific process of tracking and matching the pedestrian target in the current frame image and the previous frame image by the tracker in step S4 is as follows:
s41, respectively calculating the matching degree of a detection frame, the similarity of key points and the matching score based on reference points according to single-frame model data corresponding to the current frame image and single-frame model data corresponding to the previous frame image, and accumulating the three calculation results to obtain a final matching score matrix;
s42, determining the track similarity between the current frame image and the previous frame image according to the final matching score matrix, and judging that a target in the previous frame image finds a matched object in the current frame image, namely the matching is successful if the track similarity exceeds a corresponding preset threshold;
otherwise, judging that the target in the previous frame image does not have a matched object in the current frame image, namely the matching fails.
7. The multi-frame trajectory tracking method based on pedestrian pose estimation according to claim 6, wherein the detection frame matching degree is specifically:
Figure FDA0004071507900000021
wherein A and B correspond to the area occupied by the two detection frames respectively, and C is the area occupied by the circumscribed minimum rectangle of AB.
8. The multi-frame trajectory tracking method based on pedestrian pose estimation according to claim 6, wherein the key point similarity is specifically:
Figure FDA0004071507900000022
Figure FDA0004071507900000023
Figure FDA0004071507900000024
wherein d is Euclidean distance between corresponding key points, S is the size of the object, x and y are coordinate values of the key points,
Figure FDA0004071507900000025
is the two vertex coordinates on the diagonal of the object truth box.
9. The method for multi-frame trajectory tracking based on pedestrian pose estimation according to claim 6, wherein the process of calculating the reference point-based matching score comprises:
1) Performing primary matching on the reference points according to the characteristics and the detection frame;
2) Rearranging the embedded features according to the sequence of the primary matching result of the reference points to obtain the sequence of the reference points of the current frame;
3) Obtaining the offset of a corresponding reference point by using a group of multi-layer perceptrons to obtain the reference point coordinate required by tracking the branch;
4) And rearranging the reference points of the previous frame according to the current frame, and sending the rearranged reference points to a decoder of the current frame to obtain the matching score based on the reference points.
10. The multi-frame trajectory tracking method based on pedestrian pose estimation according to claim 3, wherein the specific process of updating the parameters of the tracker in step S4 is as follows:
if the matching is successful, updating the confidence coefficient, the detection frame and the 2D key point parameters which are currently stored in the tracker by utilizing the single frame model data corresponding to the current frame, and simultaneously keeping the activation state of the successfully matched target;
if the matching fails, the state corresponding to the target is turned into suspension, a suspension counter is increased by one, and when the target matching object is tracked and found in the subsequent frame image, the suspension counter is cleared; if the suspension counter exceeds a preset threshold, the tracking of the target is turned off.
CN202310095186.2A 2023-01-20 2023-01-20 Multi-frame track tracking system and method based on pedestrian gesture estimation Pending CN116109673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310095186.2A CN116109673A (en) 2023-01-20 2023-01-20 Multi-frame track tracking system and method based on pedestrian gesture estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310095186.2A CN116109673A (en) 2023-01-20 2023-01-20 Multi-frame track tracking system and method based on pedestrian gesture estimation

Publications (1)

Publication Number Publication Date
CN116109673A true CN116109673A (en) 2023-05-12

Family

ID=86253869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310095186.2A Pending CN116109673A (en) 2023-01-20 2023-01-20 Multi-frame track tracking system and method based on pedestrian gesture estimation

Country Status (1)

Country Link
CN (1) CN116109673A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351405A (en) * 2023-12-06 2024-01-05 江西珉轩智能科技有限公司 Crowd behavior analysis system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351405A (en) * 2023-12-06 2024-01-05 江西珉轩智能科技有限公司 Crowd behavior analysis system and method
CN117351405B (en) * 2023-12-06 2024-02-13 江西珉轩智能科技有限公司 Crowd behavior analysis system and method

Similar Documents

Publication Publication Date Title
WO2021098261A1 (en) Target detection method and apparatus
CN113506317B (en) Multi-target tracking method based on Mask R-CNN and apparent feature fusion
US8345984B2 (en) 3D convolutional neural networks for automatic human action recognition
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
EP2395478A1 (en) Monocular 3D pose estimation and tracking by detection
CN112257569B (en) Target detection and identification method based on real-time video stream
CN111862145B (en) Target tracking method based on multi-scale pedestrian detection
Wang et al. MCF3D: Multi-stage complementary fusion for multi-sensor 3D object detection
WO2023030182A1 (en) Image generation method and apparatus
CN113608663B (en) Fingertip tracking method based on deep learning and K-curvature method
CN112926475B (en) Human body three-dimensional key point extraction method
CN116109673A (en) Multi-frame track tracking system and method based on pedestrian gesture estimation
CN115375737A (en) Target tracking method and system based on adaptive time and serialized space-time characteristics
CN114926796A (en) Bend detection method based on novel mixed attention module
CN113269038A (en) Multi-scale-based pedestrian detection method
CN115063717B (en) Video target detection and tracking method based on real scene modeling of key area
CN116312512A (en) Multi-person scene-oriented audiovisual fusion wake-up word recognition method and device
CN116092189A (en) Bimodal human behavior recognition method based on RGB data and bone data
CN114973305B (en) Accurate human body analysis method for crowded people
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
Xie et al. Pedestrian detection and location algorithm based on deep learning
CN114820723A (en) Online multi-target tracking method based on joint detection and association
Das et al. Indian sign language recognition system for emergency words by using shape and deep features
CN113569650A (en) Unmanned aerial vehicle autonomous inspection positioning method based on electric power tower label identification
CN112069943A (en) Online multi-person posture estimation and tracking method based on top-down framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination