CN116645402A - Online pedestrian tracking method based on improved target detection network - Google Patents

Online pedestrian tracking method based on improved target detection network Download PDF

Info

Publication number
CN116645402A
CN116645402A CN202310327267.0A CN202310327267A CN116645402A CN 116645402 A CN116645402 A CN 116645402A CN 202310327267 A CN202310327267 A CN 202310327267A CN 116645402 A CN116645402 A CN 116645402A
Authority
CN
China
Prior art keywords
frame
pedestrian
detection
target
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310327267.0A
Other languages
Chinese (zh)
Inventor
蒋畅江
舒鹏
刘朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202310327267.0A priority Critical patent/CN116645402A/en
Publication of CN116645402A publication Critical patent/CN116645402A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an online pedestrian tracking method based on an improved target detection network, which belongs to the field of target detection and comprises the following steps: the current frame image is input into a YOLOX target detection fused with a CA attention mechanism, the current frame image is divided into a high-score detection frame and a low-score detection frame according to the confidence level of the target detection, the high-score detection frame and the Kalman filtering prediction frame are subjected to similarity matching by utilizing a ReID network and a GIoU, the high-score detection frame and the Kalman filtering prediction frame are successfully matched and updated by Kalman filtering, a tracking track is newly established for the detection frame with a higher score but without matching the tracking track, the second matching is performed by using the low-score detection frame and the tracking track of the high-score detection frame which is not matched for the first time (due to the fact that the current frame is subjected to an object with a low serious shielding score), 30 frames are reserved for the tracking track without matching the upper detection frame, and the tracking track is matched when the tracking track appears again, and if the tracking track is not matched, the tracking track is deleted. The invention can effectively reduce the influence of shielding on recognition and improve the recognition rate and recognition speed.

Description

Online pedestrian tracking method based on improved target detection network
Technical Field
The invention belongs to the field of target detection, and relates to an online pedestrian tracking method based on an improved target detection network.
Background
Multi-target tracking in video is a fundamental and important task for many visual applications, such as video surveillance and autopilot. The purpose of this task is to locate multiple objects in each frame and to acquire each identified trajectory. Most of the current methods are tracking methods based on detection frames, including an online tracking mode and an offline tracking mode: the online tracking mode is to construct an association matrix according to the similarity between the target and the detection frame, and match the positions of the target and the detection frame by using a matching algorithm; the off-line tracking mode is to construct a graph according to the detection frames and the similarity between the detection frames in a period, and the object tracking problem is solved by utilizing sub-graph division.
The SORT algorithm proposed by Bewley et al is a simple online real-time multi-target tracking algorithm, mainly uses Kalman filtering to propagate a target object into a future frame, and then uses the IOU as a measurement index to establish a relation. The Wojke Nicolai et al propose that the deep Sort method is an improved result of the Sort method, fuses the re-recognition network, and performs similarity matching on the detection frame and the predicted trajectory by using the hungarian algorithm.
The existing target tracking system is easily affected by factors such as shielding, camera pixels, background changes and the like, and problems such as target identification errors can occur, most of the existing tracking systems are based on detection results, detection performance directly affects tracking effects, but the detector is too accurate, and overall speed can be reduced greatly. When a target is disturbed or blocked by the background, the ID number of the target frame may change, and the number of the target frame may also change when a numbered target reenters the shooting range.
Disclosure of Invention
In view of the above, the present invention aims to provide a pedestrian tracking method using YOLOX with a fused attention mechanism as a target detection network, so as to inhibit the influence of occlusion on recognition and improve recognition rate and recognition speed. In the method, coordinate attention is fused (Coordinate Attention) in the Neck part of the Yolox, the feature extraction capability is improved, then a target detection result is input into a tracker, a ReID re-recognition module and a GIoU are utilized to carry out similarity matching on a detection frame and a prediction frame, and finally the matching track is obtained according to the matching distance cost.
In order to achieve the above purpose, the present invention provides the following technical solutions:
an online pedestrian tracking method based on an improved target detection network, comprising the following steps:
s1: acquiring a pedestrian image in a pedestrian video frame, and preprocessing the pedestrian image;
s2: inputting a picture into a YOLOX target detection network, wherein a CA attention module is fused at a backbond output position of the YOLOX target detection network;
s3: inputting the output of target detection to a tracker to obtain a high-score detection frame and a low-score detection frame, wherein the confidence coefficient of the high-score detection frame is higher than that of the low-score detection frame;
s4: performing similarity matching on the high-resolution detection frame and the Kalman filtering prediction frame; for a detection frame which is not matched with the upper track but has high score, a track is newly opened, the track is updated by using Kalman filtering, the successfully matched detection frame updates the track set by using the Kalman filtering, and the track which is failed to match waits for the second similarity matching;
s5: and performing similarity matching on the low-resolution detection frames and the tracks which are not matched, wherein IoU is adopted as a matching measure, the successfully matched tracks are updated by Kalman filtering, the detection frames which are failed to match are deleted, and after the tracks which are failed to match are reserved for a certain time, the deletion is performed if the tracks which are not matched with the detection frames again cannot be performed.
Further, step S1 is to acquire a pedestrian image in the pedestrian video frame, and pre-process the pedestrian image, which specifically includes:
s11: acquiring pedestrian images in pedestrian video frames, and sampling the pedestrian images into a plurality of frame image sets { I } 1 ,I 2 ,···,I n Detecting each frame of image using YOLOX-s and outputting a detection set { D ] of 1 st to n th frames 1 ,D 2 ,···,D n -and coordinate position information { P } of pedestrians 1 to m 1 ,P 2 ,···,P m -comprising a center coordinate, an aspect ratio, and accelerations in various directions thereof;
s12: preprocessing pedestrian images, and adopting Mosaic data enhancement and MixUp data enhancement.
Further, in step S2, the CA attention module input/output flow includes:
s21: data is input from a backclone output position, each channel is encoded in the horizontal direction and the vertical direction by using a pooling core of (H, 1) and (1, W), and a direction perception attention characteristic diagram z with the size of C multiplied by H multiplied by 1 is output H And a direction-aware attention profile z of size c×1×w W
S22: will z H And z W Splicing by Concat, and generating a process characteristic diagram f E R by using a 1X 1 convolution module C /r×1×(H+W) R represents a downsampled scale of the channel downsampling;
s23: then divide f into f along the horizontal and vertical directions h ∈R C/r×H And f w ∈R C/r×W And f is convolved with the other two 1 x 1 convolutions h And f w Adjusting to be tensor which is the same as the number of the input X channels;
s24: then using Sigmiod activation function to obtain the attention weight g of two independent space directions h And g w Then to g h And g w And (3) expanding to finally obtain an output characteristic diagram with stronger characterization information, outputting the output characteristic diagram to a Neck part of the YOLOX-s, and finally passing through a detection head.
Further, the step S3 specifically includes: inputting the output of target detection into a tracker, wherein the tracker is provided with two confidence thresholds, including a high-threshold (high-shrsh) and a low-threshold (low-shrsh); and the high-score detection frames are higher than the high-score threshold, the low-score detection frames are arranged between the high-score threshold and the low-score threshold, all pedestrian frames with confidence less than the low-score threshold in the pedestrian frames are deleted to obtain a set of the low-score detection frames, and finally a set of the high-score detection frames and a set of the low-score detection frames are obtained.
Further, in step S4, the kalman filter prediction box is configured to predict the track set through kalman filter, and the state update equation is as follows:
wherein the method comprises the steps ofA posterior state estimate representing time k, < >>Representing a priori estimates, i.e. based on the optimal predicted estimate at the previous time, z k Representing the observed value. Forming a prediction frame set D t
In step S4, performing similarity matching on the high-resolution detection frame and the kalman filtering prediction frame, specifically, obtaining a final similarity c through the ReID module and the GIoU, and performing matching on the track by using a hungarian algorithm;
the ReId module is used for detecting pedestrians under the pedestrian track library and the high-resolution detection frame, extracting the appearance feature distance of the pedestrians under the high-resolution detection frame so as to judge whether the pedestrians are the same, and updating the pedestrian information in the pedestrian track library; the pedestrian track library comprises pedestrian appearance characteristics and pedestrian positions;
the GIoU considers the non-overlapping part of the detected frame and the predicted frame and reflects the overlapping mode and the overlapping degree of the detected frame and the predicted frame.
Further, the ReID module extracts feature vectors from the prediction frame and the detection frame respectively by using a ReID network model, and uses P j Cutting the image by the medium coordinates, inputting the cut pedestrian image set under the high-resolution detection frame into a pedestrian re-recognition network model to obtain the appearance characteristics of the pedestrians under the high-resolution detection frame, comparing the appearance characteristics with the appearance characteristics of the pedestrians appearing in the image again, and calculating the similarity d of the feature vectors (1) (i,j)。
Further, the formula for calculating the similarity between the prediction frame and the detection frame by the GIoU is as follows:
wherein IoU is the cross-over ratio to obtain the similarity d (2) (i, j) according to formula c i,j =μd (1) (i,j)+(1-μ)d (2) (i, j) setting a super parameter mu to obtain the final similarity c.
Further, the matching the track by using the hungarian algorithm specifically includes:
initializing a bipartite graph, confirming a previous frame target and a current frame target possibly matched according to an input cost matrix, setting U as a previous frame set, setting V as a current frame set, and sequentially matching according to an ID sequence:
first, matching target 1 of the current frame, which may match target 1 of the previous frame, then matching target 2, then matching target 3, if the previous frame target 3 is already matched by targets 1,2, matching the previous target in U to target 1 with another target, if the target 1 in U is already matched by target 2 in V, then changing target 2 in U to match targets 1,2,3 in U to targets, then matching the following targets until finally no matched target in V is considered as a new target, which is a recursive process in general.
The invention has the beneficial effects that:
(1) Because the problems of too small targets, frequent shielding and the like exist in the actual situation of multi-target tracking, the method adds the coordinate attention module in the Neck part of the YOLOX, so that the system can better pay attention to the character detail characteristics of the video stream, the information loss in the characteristic extraction process is reduced, the characteristic fusion part has more abundant information, and the calculation cost is small, thereby improving the detection effect, reducing the false detection result and having better tracking effect.
(2) The invention uses the target detection network to detect pedestrians on the current frame image to obtain a high-resolution detection frame and a low-resolution detection frame, the high-resolution detection frame and the prediction frame use ReID and GIoU as metrics to carry out first-time similarity matching, and the low-resolution detection frame and the unmatched track carry out second-time similarity matching by IoU. The pedestrian tracking precision is enhanced by utilizing twice pedestrian matching, the tracking effect is better under the condition of shielding by fusing the ReID module, and the problems of pedestrian tracking loss, identity ID exchange and the like can be avoided.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of an online pedestrian tracking method based on an improved target detection network according to the present invention;
FIG. 2 is a schematic illustration of CA attention module addition locations;
fig. 3 is a diagram showing the experimental results and the detection frame results of the present invention compared with the deepSORT algorithm in a section of verification video, wherein (a) and (b) are screenshots of the deepSORT algorithm, and (c) and (d) are experimental results of the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.
The invention relates to a pedestrian tracking algorithm for improving an object detection network, and a flow chart of the pedestrian tracking algorithm is shown in figure 1. The method specifically comprises the following steps:
step one, acquiring pedestrian images in pedestrian video frames, and sampling the pedestrian images into a plurality of frame image sets { I } 1 ,I 2 ,···,I n Detecting each frame of image using YOLOX-s and outputting detection result { D } 1 ,D 2 ,···,D n }. Wherein D is i Represents the detection set of the ith frame, { P 1 ,P 2 ,···,P m },P j And representing the position information of each pedestrian coordinate, wherein the position information comprises a center coordinate, an aspect ratio and accelerations in all directions of the center coordinate, preprocessing pedestrian images, and enhancing by using Mosaic data and MixUp data.
Inputting the picture into a Yolox-s target detection network, wherein the network merges a CA attention mechanism, the CA attention is positioned at a back bone output position of the Yolox-s, the position is shown as figure 2, and the CA attention module input and output flow is as follows: input from a backboost output position, usingThe pooling cores of (H, 1) and (1, W) encode each channel along the horizontal direction and the vertical direction, and output a direction sensing attention characteristic diagram z with the size of CxHx1 H And a direction-aware attention profile z of size c×1×w W . Will z H And z W Splicing by Concat, and generating a process characteristic diagram f E R by using a 1X 1 convolution module C/r×1×(H+W) R represents a downsampled ratio of the channel downsampling. Then divide f into f along the horizontal and vertical directions h ∈R C/r×H And f w ∈R C/r×W And f is convolved with the other two 1 x 1 convolutions h And f w Adjusted to the same tensor as the number of input X channels, then the Sigmiod activation function is used to obtain the attention weights g of two independent spatial directions h And g w Then to g h And g w And (3) expanding to finally obtain an output characteristic diagram with stronger characterization information, outputting the output characteristic diagram to a Neck part of the YOLOX-s, and finally passing through a detection head.
Inputting the output of target detection to a tracker, setting two confidence thresholds, namely a high-threshold (high-shrsh) and a low-threshold (low-shrsh), wherein a high-score detection frame is higher than the high-score threshold, a low-score detection frame is arranged between the high-score threshold and the low-score threshold, and all pedestrian frames with confidence in the pedestrian frames being lower than the low-score threshold are deleted to obtain a set of the low-score detection frames, so that a set of the high-score detection frames and a set of the low-score detection frames are finally obtained, and the confidence of the high-score detection frames is higher than that of the low-score detection frames.
And fourthly, predicting a track set by Kalman filtering, wherein a state updating equation is as follows:
wherein the method comprises the steps ofA posterior state estimate representing time k, < >>Representative firstThe experimental estimate, i.e. the optimal predicted estimate from the last moment, z k Representing the observed value. Forming a prediction frame set D t
Extracting feature vectors from the prediction frame and the detection frame respectively by using a ReID network model, and using P j Cutting the image by the medium coordinates, inputting the cut pedestrian image (namely the pedestrian image under the high-resolution detection frame) set into a pedestrian re-recognition network model to obtain pedestrian appearance characteristics (the pedestrian appearance characteristics under the high-resolution detection frame), comparing the pedestrian appearance characteristics with the appearance characteristics of pedestrians appearing in the image again, and calculating the similarity d of the feature vectors (1) (i, j); and then calculating the similarity of the prediction frame and the detection frame by using a GIoU formula, wherein the GIoU formula is as follows:
IoU is the cross-over ratio to obtain the similarity d (2) (i, j) according to formula c i,j =μd (1) (i,j)+(1-μ)d (2) (i, j) setting a super parameter mu to obtain the final similarity c.
And performing similarity matching on the high-resolution detection frame and the Kalman filtering prediction frame, wherein the matching measurement uses the ReID characteristic measurement and the final similarity c obtained by the GIoU. The ReId module can detect pedestrians under the pedestrian track library and the high-resolution detection frame, can extract the appearance feature distance of the pedestrians under the high-resolution detection frame so as to judge whether the pedestrians are the same, and can update the pedestrian information in the pedestrian track library. The pedestrian trajectory library includes pedestrian appearance characteristics and pedestrian positions. The GIoU considers the non-overlapping portion of the detected frame and the predicted frame, which is not considered by IoU, and can reflect the overlapping manner and the overlapping degree of the detected frame and the predicted frame. According to the similarity c, matching the tracks by using a Hungary algorithm, and for the detection frames which are not matched with the tracks but have high scores, newly opening a track, updating the track by using Kalman filtering, updating the track set by using the Kalman filtering for the detection frames successfully matched with the tracks, and waiting for matching of the similarity for the second time for the tracks which are failed to match.
And fifthly, performing similarity matching on the low-resolution detection frames and the tracks which are not matched, wherein IoU is adopted as a matching measure, performing track updating by using Kalman filtering after successful matching, deleting the detection frames which are failed to match, reserving 30 frames for the tracks which are failed to match, and deleting if the detection frames which are not matched again.
Examples: multi-target tracking experiment
The data set is trained by MOT17 and CrowdHuman, and is verified on half of the MOT17 test set. The online pedestrian tracking algorithm based on the improved target detection network uses Mosaic and MixUp for data enhancement, adopts a cosine annealing strategy for dynamic updating of learning rate, and adopts FP16 mixed precision technology for accelerating convergence, and experimental data are shown in Table 1.
TABLE 1
Compared with other methods, the method has the advantages of greatly improving the precision, along with low ID switching frequency (high IDF 1) and good instantaneity (high FPS), which fully proves that the method not only can improve the precision of multi-target tracking, but also can effectively control the influence of the missed detection target on the experimental result.
FIG. 3 is a comparison chart of an experimental result and a detection frame result in a section of verification video, wherein (a) and (b) are screen shots of a deepSORT algorithm, and the situation that a target is misdetected due to a background problem can be found, and position information is missing; (c) And (d) are experimental results of the invention, and it can be seen that false detection of the dummy as a true person does not occur. From (b) and (d), it can be seen that the present invention can effectively track the target when the target pedestrian is blocked or the target is small, and even if the target is blocked, the present invention can be matched to the same target next, and has good robustness to the occurrence of blocking.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (9)

1. An online pedestrian tracking method based on an improved target detection network is characterized by comprising the following steps of: the method comprises the following steps:
s1: acquiring a pedestrian image in a pedestrian video frame, and preprocessing the pedestrian image;
s2: inputting a picture into a YOLOX target detection network, wherein a CA attention module is fused at a backbond output position of the YOLOX target detection network;
s3: inputting the output of target detection to a tracker to obtain a high-score detection frame and a low-score detection frame, wherein the confidence coefficient of the high-score detection frame is higher than that of the low-score detection frame;
s4: performing similarity matching on the high-resolution detection frame and the Kalman filtering prediction frame; for a detection frame which is not matched with the upper track but has high score, a track is newly opened, the track is updated by using Kalman filtering, the successfully matched detection frame updates the track set by using the Kalman filtering, and the track which is failed to match waits for the second similarity matching;
s5: and performing similarity matching on the low-resolution detection frames and the tracks which are not matched, wherein IoU is adopted as a matching measure, the successfully matched tracks are updated by Kalman filtering, the detection frames which are failed to match are deleted, and after the tracks which are failed to match are reserved for a certain time, the deletion is performed if the tracks which are not matched with the detection frames again cannot be performed.
2. The improved object detection network-based online pedestrian tracking method of claim 1, wherein: step S1, acquiring a pedestrian image in a pedestrian video frame, and preprocessing the pedestrian image, specifically including:
s11: acquiring pedestrian images in pedestrian video frames, and sampling the pedestrian images into a plurality of frame image sets { I } 1 ,I 2 ,···,I n Detecting each frame of image using YOLOX-s and outputting a detection set { D ] of 1 st to n th frames 1 ,D 2 ,···,D n -and coordinate position information { P } of pedestrians 1 to m 1 ,P 2 ,···,P m -comprising a center coordinate, an aspect ratio, and accelerations in various directions thereof;
s12: preprocessing pedestrian images, and adopting Mosaic data enhancement and MixUp data enhancement.
3. The improved object detection network-based online pedestrian tracking method of claim 1, wherein: in step S2, the CA attention module input/output flow includes:
s21: data is input from a backclone output position, each channel is encoded in the horizontal direction and the vertical direction by using a pooling core of (H, 1) and (1, W), and a direction perception attention characteristic diagram z with the size of C multiplied by H multiplied by 1 is output H And a direction-aware attention profile z of size c×1×w W
S22: will z H And z W Splicing by Concat, and generating a process characteristic diagram f E R by using a 1X 1 convolution module C /r×1×(H+W) R represents a downsampled scale of the channel downsampling;
s23: then divide f into f along the horizontal and vertical directions h ∈R C/r×H And f w ∈R C/r×W And f is convolved with the other two 1 x 1 convolutions h And f w Adjusting to be tensor which is the same as the number of the input X channels;
s24: then using Sigmiod activation function to obtain the attention weight g of two independent space directions h And g w Then to g h And g w And (3) expanding to finally obtain an output characteristic diagram with stronger characterization information, outputting the output characteristic diagram to a Neck part of the YOLOX-s, and finally passing through a detection head.
4. The improved object detection network-based online pedestrian tracking method of claim 1, wherein: the step S3 specifically comprises the following steps: inputting the output of target detection to a tracker, wherein two confidence thresholds are arranged in the tracker, and the confidence thresholds comprise a high-score threshold and a low-score threshold; and the high-score detection frames are higher than the high-score threshold, the low-score detection frames are arranged between the high-score threshold and the low-score threshold, all pedestrian frames with confidence less than the low-score threshold in the pedestrian frames are deleted to obtain a set of the low-score detection frames, and finally a set of the high-score detection frames and a set of the low-score detection frames are obtained.
5. The improved object detection network-based online pedestrian tracking method of claim 1, wherein: the kalman filter prediction frame in step S4 is used for predicting a track set through kalman filter, and the state update equation is as follows:
wherein the method comprises the steps ofA posterior state estimate representing time k, < >>Representing a priori estimates, i.e. based on the optimal predicted estimate at the previous time, z k Representing the observed values to form a prediction frame set D t
6. The improved object detection network-based online pedestrian tracking method of claim 5 wherein: in the step S4, the high-resolution detection frame and the kalman filtering prediction frame are subjected to similarity matching, specifically, a final similarity c is obtained through the ReID module and the GIoU, and the track is matched by using a hungarian algorithm;
the ReId module is used for detecting pedestrians under the pedestrian track library and the high-resolution detection frame, extracting the appearance feature distance of the pedestrians under the high-resolution detection frame so as to judge whether the pedestrians are the same, and updating the pedestrian information in the pedestrian track library; the pedestrian track library comprises pedestrian appearance characteristics and pedestrian positions;
the GIoU considers the non-overlapping part of the detected frame and the predicted frame and reflects the overlapping mode and the overlapping degree of the detected frame and the predicted frame.
7. The improved object detection network-based online pedestrian tracking method of claim 6 wherein: the ReID module extracts characteristic vectors from the prediction frame and the detection frame respectively by using a ReID network model and uses P j Cutting the image by the medium coordinates, inputting the cut pedestrian image set under the high-resolution detection frame into a pedestrian re-recognition network model to obtain the appearance characteristics of the pedestrians under the high-resolution detection frame, comparing the appearance characteristics with the appearance characteristics of the pedestrians appearing in the image again, and calculating the similarity d of the feature vectors (1) (i,j)。
8. The improved object detection network-based online pedestrian tracking method of claim 7 wherein: the formula for calculating the similarity between the prediction frame and the detection frame by the GIoU is as follows:
wherein IoU is the cross-over ratio to obtain the similarity d (2) (i, j) according to formula c i,j =μd (1) (i,j)+(1-μ)d (2) (i, j) setting a super parameter mu to obtain the final similarity c.
9. The improved object detection network-based online pedestrian tracking method of claim 7 wherein: the track matching method using the Hungary algorithm specifically comprises the following steps:
initializing a bipartite graph, confirming a previous frame target and a current frame target possibly matched according to an input cost matrix, setting U as a previous frame set, setting V as a current frame set, and sequentially matching according to an ID sequence:
first, matching target 1 of the current frame, which may match target 1 of the previous frame, then matching target 2, then matching target 3, if the previous frame target 3 is already matched by targets 1,2, then matching the previous target in U to target 1 with another target, if target 1 in U is already matched to target 2 in V, then changing target 2 in U to match targets 1,2,3 in U to targets, then matching the following targets until finally no matched target in V is considered as a new target.
CN202310327267.0A 2023-03-30 2023-03-30 Online pedestrian tracking method based on improved target detection network Pending CN116645402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310327267.0A CN116645402A (en) 2023-03-30 2023-03-30 Online pedestrian tracking method based on improved target detection network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310327267.0A CN116645402A (en) 2023-03-30 2023-03-30 Online pedestrian tracking method based on improved target detection network

Publications (1)

Publication Number Publication Date
CN116645402A true CN116645402A (en) 2023-08-25

Family

ID=87617593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310327267.0A Pending CN116645402A (en) 2023-03-30 2023-03-30 Online pedestrian tracking method based on improved target detection network

Country Status (1)

Country Link
CN (1) CN116645402A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935446A (en) * 2023-09-12 2023-10-24 深圳须弥云图空间科技有限公司 Pedestrian re-recognition method and device, electronic equipment and storage medium
CN117058139A (en) * 2023-10-11 2023-11-14 苏州凌影云诺医疗科技有限公司 Lower digestive tract focus tracking and key focus selecting method and system
CN117522924A (en) * 2023-11-22 2024-02-06 重庆大学 Depth-associated multi-target tracking method based on detection positioning confidence level guidance
CN117576165A (en) * 2024-01-15 2024-02-20 武汉理工大学 Ship multi-target tracking method and device, electronic equipment and storage medium
CN117576764A (en) * 2024-01-15 2024-02-20 四川大学 Video irrelevant person automatic identification method based on multi-target tracking
CN117636402A (en) * 2024-01-23 2024-03-01 广州市德赛西威智慧交通技术有限公司 Pedestrian re-identification-based passenger analysis method and device and computer storage medium
CN117649430A (en) * 2024-01-29 2024-03-05 中国石油大学(华东) Multi-target tracking method based on Kalman filtering and correlation matching

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935446A (en) * 2023-09-12 2023-10-24 深圳须弥云图空间科技有限公司 Pedestrian re-recognition method and device, electronic equipment and storage medium
CN116935446B (en) * 2023-09-12 2024-02-20 深圳须弥云图空间科技有限公司 Pedestrian re-recognition method and device, electronic equipment and storage medium
CN117058139A (en) * 2023-10-11 2023-11-14 苏州凌影云诺医疗科技有限公司 Lower digestive tract focus tracking and key focus selecting method and system
CN117058139B (en) * 2023-10-11 2024-01-26 苏州凌影云诺医疗科技有限公司 Lower digestive tract focus tracking and key focus selecting method and system
CN117522924A (en) * 2023-11-22 2024-02-06 重庆大学 Depth-associated multi-target tracking method based on detection positioning confidence level guidance
CN117576165A (en) * 2024-01-15 2024-02-20 武汉理工大学 Ship multi-target tracking method and device, electronic equipment and storage medium
CN117576764A (en) * 2024-01-15 2024-02-20 四川大学 Video irrelevant person automatic identification method based on multi-target tracking
CN117576764B (en) * 2024-01-15 2024-04-16 四川大学 Video irrelevant person automatic identification method based on multi-target tracking
CN117576165B (en) * 2024-01-15 2024-04-19 武汉理工大学 Ship multi-target tracking method and device, electronic equipment and storage medium
CN117636402A (en) * 2024-01-23 2024-03-01 广州市德赛西威智慧交通技术有限公司 Pedestrian re-identification-based passenger analysis method and device and computer storage medium
CN117649430A (en) * 2024-01-29 2024-03-05 中国石油大学(华东) Multi-target tracking method based on Kalman filtering and correlation matching

Similar Documents

Publication Publication Date Title
CN116645402A (en) Online pedestrian tracking method based on improved target detection network
CN111460926B (en) Video pedestrian detection method fusing multi-target tracking clues
CN114972418B (en) Maneuvering multi-target tracking method based on combination of kernel adaptive filtering and YOLOX detection
WO2019220622A1 (en) Image processing device, system, method, and non-transitory computer readable medium having program stored thereon
CN110516556A (en) Multi-target tracking detection method, device and storage medium based on Darkflow-DeepSort
CN109784130B (en) Pedestrian re-identification method, device and equipment thereof
KR102132722B1 (en) Tracking method and system multi-object in video
JP6941966B2 (en) Person authentication device
CN112488057A (en) Single-camera multi-target tracking method utilizing human head point positioning and joint point information
CN111160295A (en) Video pedestrian re-identification method based on region guidance and space-time attention
CN114240997B (en) Intelligent building online trans-camera multi-target tracking method
CN111626194A (en) Pedestrian multi-target tracking method using depth correlation measurement
CN111723707A (en) Method and device for estimating fixation point based on visual saliency
CN116883458B (en) Transformer-based multi-target tracking system fusing motion characteristics with observation as center
CN112215156A (en) Face snapshot method and system in video monitoring
US9256945B2 (en) System for tracking a moving object, and a method and a non-transitory computer readable medium thereof
CN114581954A (en) Cross-domain retrieval and target tracking method based on pedestrian features
CN113608663A (en) Fingertip tracking method based on deep learning and K-curvature method
CN115497056A (en) Method for detecting lost articles in region based on deep learning
CN116883457A (en) Light multi-target tracking method based on detection tracking joint network and mixed density network
Ali et al. A General Framework for Multi-Human Tracking using Kalman Filter and Fast Mean Shift Algorithms.
CN112131984A (en) Video clipping method, electronic device and computer-readable storage medium
CN114494349A (en) Video tracking system and method based on target feature space-time alignment
Han et al. Multi-target tracking based on high-order appearance feature fusion
Elmezain et al. A novel system for automatic hand gesture spotting and recognition in stereo color image sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination