CN115620098B - Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment - Google Patents

Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment Download PDF

Info

Publication number
CN115620098B
CN115620098B CN202211636207.9A CN202211636207A CN115620098B CN 115620098 B CN115620098 B CN 115620098B CN 202211636207 A CN202211636207 A CN 202211636207A CN 115620098 B CN115620098 B CN 115620098B
Authority
CN
China
Prior art keywords
frame
pedestrian
matching
prediction
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211636207.9A
Other languages
Chinese (zh)
Other versions
CN115620098A (en
Inventor
尚果超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Digital City Technology Co ltd
Original Assignee
China Telecom Digital City Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Digital City Technology Co ltd filed Critical China Telecom Digital City Technology Co ltd
Priority to CN202211636207.9A priority Critical patent/CN115620098B/en
Publication of CN115620098A publication Critical patent/CN115620098A/en
Application granted granted Critical
Publication of CN115620098B publication Critical patent/CN115620098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for evaluating a cross-camera pedestrian tracking algorithm and electronic equipment, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: firstly, based on a test data set, generating a pedestrian prediction result by using a cross-camera pedestrian tracking algorithm model; the test data set comprises a label frame, and the pedestrian prediction result comprises a prediction frame; then dividing the monitoring video according to the nodes of a time axis, and determining the number of prediction matching nodes and the number of label matching nodes; carrying out frame matching on the label frame and the predicted frame, and determining a target matching node; calculating the recall rate and the accuracy rate of each pedestrian prediction result based on the number of the prediction matching nodes, the number of the label matching nodes and the number of the target matching nodes; and finally, calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate, solving the technical problems of low prediction efficiency and low accuracy of the existing cross-camera pedestrian tracking algorithm, and achieving the technical effect of improving the prediction accuracy.

Description

Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a cross-camera pedestrian tracking algorithm evaluation method and system and electronic equipment.
Background
In an actual cross-environment tracking scene, due to the influence of factors such as camera resolution, video recording distance, illumination conditions, pedestrian shielding, clothing color change and the like, the algorithm cannot accurately predict and track pedestrians in each frame. If the algorithm can identify the video frames in which the pedestrians appear in the predicted key nodes, the motion trail of the pedestrians can be successfully judged. Most of the prior art rely on frame-by-frame comparison to evaluate the percentage of correctly predicted frames, which is time consuming and unreliable. That is to say, the existing cross-camera pedestrian tracking technology has the problems of low prediction efficiency and low precision.
Disclosure of Invention
The invention aims to provide a method and a system for evaluating a cross-camera pedestrian tracking algorithm and electronic equipment, so as to solve the technical problems of low prediction efficiency and low precision of the existing cross-camera pedestrian tracking algorithm.
In a first aspect, an embodiment of the present invention provides an evaluation method for a pedestrian tracking algorithm of a camera, where the method includes: generating at least one pedestrian prediction result by using a cross-camera pedestrian tracking algorithm model based on a pre-acquired test data set; the pre-acquired test data set comprises a monitoring video clip crossing the camera and a corresponding label frame; the pedestrian prediction result comprises a prediction frame;
dividing the monitoring video according to the nodes of a time axis, and determining the number of prediction matching nodes contained in the prediction frame and the number of label matching nodes contained in the label frame;
performing frame matching on the label frame and the predicted frame to determine a target matching node;
calculating recall rate and accuracy rate of each of the pedestrian prediction results based on the number of the prediction matching nodes, the number of the tag matching nodes and the number of the target matching nodes;
and calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate.
In some possible embodiments, the method further comprises: acquiring a target monitoring video clip containing a target pedestrian picture and a corresponding label frame according to target pedestrian videos of a plurality of monitoring cameras across scenes; the parameters of the tag frame comprise a target camera number, a target frame number, a target pedestrian ID number and target rectangular frame position information; generating target pedestrian library information based on the target monitoring video clips and the tag frames; and generating a test data set according to the information of the target pedestrian bank.
In some possible embodiments, the step of performing frame matching on the tag frame and the predicted frame to determine a target matching node includes: matching parameters contained in the prediction frame and the label frame respectively; wherein the parameters of the predicted frame include: predicting the number of a camera, the number of a frame, the ID number of a pedestrian and the position information of a rectangular frame;
if the parameters of the predicted frame and the parameters of the label frame both meet the matching conditions, the predicted frame and the label frame are correctly matched; wherein, the matching conditions include: the predicted camera number matches the target camera number, the predicted frame number matches the target frame number, the predicted pedestrian ID number matches the target pedestrian ID number, and the predicted rectangular frame position information matches the target rectangular frame position information.
In some possible embodiments, the condition that the predicted rectangular frame position information matches the target rectangular frame position information is: the intersection ratio IOU of the prediction rectangular frame and the target rectangular frame is larger than a set threshold value; the IOU is the ratio of the first overlapping area to the first joint area; the first overlap region is an intersection region of the prediction rectangular frame and the target rectangular frame; the first union region is a union region of the prediction rectangular frame and the target rectangular frame.
In some possible embodiments, the step of performing frame matching on the tag frame and the predicted frame to determine a target matching node further includes: and if the number of the correct matches between the predicted frame and the label frame in the current node is not less than half of the number of the label frames, determining that the current node is a target matching node.
In some possible embodiments, the step of dividing the monitoring video according to nodes of a time axis and determining the number of prediction matching nodes included in the prediction frame and the number of tag matching nodes included in the tag frame includes: dividing the monitoring video according to the nodes of a time axis to generate a plurality of time nodes; determining the number of prediction matching nodes and the number of label matching nodes according to the track of the label frame, the relation between the track of the prediction frame and the time node; wherein, the relationship between the track of the label frame and the time node comprises: if the track of the tag frame of the pedestrian passes through one camera in the current time axis, the track of the tag frame comprises the time node, and the time node is a tag matching node; the relationship between the trajectory of the predicted frame and the time node includes: if the trajectory of the predicted frame of the pedestrian falls within a certain time node, the trajectory of the predicted frame includes the time node, and the time node is the prediction matching node.
In some possible embodiments, the method further comprises: calculating the total recall rate and the total accuracy rate of all the pedestrian prediction results in the test data set based on the recall rate and the accuracy rate of each pedestrian prediction result;
the step of calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate comprises the following steps: and carrying out weighted average on the total recall rate and the total accuracy rate, and determining the evaluation score of the cross-camera pedestrian tracking algorithm model.
In a second aspect, an embodiment of the present invention provides an evaluation system for a cross-camera pedestrian tracking algorithm, where the system includes:
the pedestrian prediction result generation module is used for generating at least one pedestrian prediction result by utilizing a cross-camera pedestrian tracking algorithm model based on a pre-acquired test data set; the pre-acquired test data set comprises a monitoring video clip crossing the camera and a corresponding label frame; the pedestrian prediction result comprises a prediction frame;
the node number determining module is used for dividing the monitoring video according to the nodes of a time axis and determining the number of prediction matching nodes contained in the prediction frame and the number of label matching nodes contained in the label frame;
a target matching node generation module, configured to perform frame matching on the tag frame and the predicted frame, and determine a target matching node;
a calculating module, configured to calculate a recall rate and an accuracy rate of each pedestrian prediction result based on the number of prediction matching nodes, the number of tag matching nodes, and the number of target matching nodes;
the calculation module is further used for calculating an evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps of the method according to any one of the above first aspect when executing the computer program.
The invention provides a method, a system and electronic equipment for evaluating a cross-camera pedestrian tracking algorithm, wherein the method comprises the following steps: firstly, based on a pre-acquired test data set, generating a pedestrian prediction result by using a cross-camera pedestrian tracking algorithm model; the test data set comprises monitoring video clips crossing the camera and corresponding label frames, and the pedestrian prediction result comprises a prediction frame; then dividing the monitoring video according to the nodes of a time axis, and determining the number of predicted matching nodes and the number of label matching nodes; carrying out frame matching on the label frame and the predicted frame, and determining a target matching node; calculating the recall rate and the accuracy rate of each pedestrian prediction result based on the number of the prediction matching nodes, the number of the label matching nodes and the number of the target matching nodes; and finally, calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate, wherein the method realizes the evaluation of the existing cross-camera pedestrian tracking algorithm, solves the technical problems of low prediction efficiency and low accuracy of the existing model, and achieves the technical effect of improving the prediction accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of an evaluation method of a cross-camera pedestrian tracking algorithm according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another cross-camera pedestrian tracking algorithm evaluation method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a corresponding relationship between a time axis and a matching node according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
The traditional cross-camera pedestrian tracking system mainly comprises a pedestrian detection module, a pedestrian target tracking module and a pedestrian re-identification module. The pedestrian detection module usually adopts a lightweight network (RBFNet) to perform multi-scale and multi-proportion feature fusion; the pedestrian target tracking module generally adopts an SORT tracking algorithm, the SORT algorithm takes the detection result of the pedestrian detection algorithm as a key component, transmits the target state to a future frame, associates the current detection with the existing target, and manages the life cycle of the tracked target; the pedestrian re-identification module usually adopts a ReID algorithm to calculate the characteristic distance between the image of the pedestrian bank and the image of the detection target so as to judge whether the images are the same pedestrian.
The task of cross-camera pedestrian tracking is as follows: (1) Giving a pedestrian library, namely specified target pedestrian information to be identified; (2) Requiring the algorithm model to find out pedestrians in a pedestrian bank, positioning the positions of the pedestrians and tracking the pedestrians in different scene videos; (3) If the pedestrian disappears in the current scene, the algorithm model is required to be capable of searching in the videos of the other scenes for continuous tracking; (4) If the pedestrian target appears in a plurality of videos or appears in a single video for a plurality of times, the algorithm model is required to be capable of iteratively searching for continuous tracking; (5) And predicting the tracking result (camera number, frame number, pedestrian ID number and rectangular frame position information) of the pedestrian in the pedestrian bank.
The principle of the cross-camera pedestrian tracking algorithm is as follows: (1) Detecting all pedestrians in the video frame by using a pedestrian detection network to obtain a pedestrian detection frame; (2) Identifying the human body weight of the detected pedestrian, and extracting a target pedestrian in the video; (3) Tracking a target pedestrian by using a pedestrian tracking algorithm, and if the condition that the tracking ID is lost occurs (the tracking ID is lost), re-positioning the target by using a pedestrian re-identification algorithm (ReiD); (4) Calculating the distance between the center coordinates of the target pedestrian detection frames in the front and rear three frames of images, and if the distance is greater than a set threshold value, determining that the tracking is wrong, and re-positioning the ReiD target; and (5) if the tracking is successful, continuously tracking by using an SORT algorithm.
In an actual cross-environment tracking scene, due to the influence of factors such as camera resolution, video recording distance, illumination conditions, pedestrian shielding, clothing color change and the like, the algorithm cannot completely and accurately predict and track pedestrians in each frame. If the algorithm can identify the video frames in which the pedestrians appear in the predicted key nodes, the motion trail of the pedestrians can be successfully judged. Most of the prior art rely on frame-by-frame comparison to evaluate the percentage of correctly predicted frames, which is time consuming and unreliable. That is to say, the existing cross-camera pedestrian tracking technology has the problems of low prediction efficiency and low precision, and a comprehensive evaluation system for the cross-camera pedestrian tracking technology is lacked so as to effectively evaluate the precision of the algorithm.
Based on this, the embodiment of the invention provides an evaluation method and system for a cross-camera pedestrian tracking algorithm and electronic equipment, so as to realize comprehensive evaluation of the cross-camera pedestrian tracking algorithm, thereby alleviating the technical problems of low prediction efficiency and low precision of the existing cross-camera pedestrian tracking algorithm.
To facilitate understanding of the present embodiment, first, a detailed description is given to an evaluation method for a cross-camera pedestrian tracking algorithm disclosed in the embodiment of the present invention, referring to a flowchart of the evaluation method for a cross-camera pedestrian tracking algorithm shown in fig. 1, where the method may be executed by an electronic device and mainly includes the following steps:
s110: generating at least one pedestrian prediction result by using a cross-camera pedestrian tracking algorithm model based on a pre-acquired test data set; the method comprises the steps that a pre-acquired test data set comprises cross-camera monitoring video clips and corresponding label frames; the pedestrian prediction result comprises a prediction frame;
s120: dividing the monitoring video according to nodes of a time axis, and determining the number of prediction matching nodes contained in a prediction frame and the number of label matching nodes contained in a label frame;
in this embodiment, a target surveillance video clip including a target pedestrian picture and a corresponding tag frame may be obtained according to a target pedestrian video of a plurality of surveillance cameras across a scene; then generating target pedestrian library information based on the target monitoring video clips and the label frames; and generating a test data set according to the information of the target pedestrian libraries.
The parameters of the tag frame comprise a target camera number, a target frame number, a target pedestrian ID number and target rectangular frame position information; parameters of the predicted frame include: the predicted camera number, the predicted frame number, the predicted pedestrian ID number and the predicted rectangular frame position information.
S130: carrying out frame matching on the label frame and the predicted frame, and determining a target matching node;
in this embodiment, the step S130 of performing frame matching on the tag frame and the predicted frame, and determining the target matching node may include:
(1) Respectively matching parameters contained in the predicted frame and the label frame;
(2) If the parameters of the predicted frame and the parameters of the label frame both meet the matching conditions, the predicted frame and the label frame are correctly matched;
the matching conditions specifically include: the predicted camera number matches with the target camera number, the predicted frame number matches with the target frame number, the predicted pedestrian ID number matches with the target pedestrian ID number, and the predicted rectangular frame position information matches with the target rectangular frame position information.
In this embodiment, the condition that the predicted rectangular frame position information matches the target rectangular frame position information may be: the intersection ratio IOU of the prediction rectangular frame and the target rectangular frame is larger than a set threshold value.
The intersection ratio IOU of the prediction rectangular frame and the target rectangular frame is the ratio of the first overlapping area to the first joint area; the first overlapping area is an intersection area of the prediction rectangular frame and the target rectangular frame; the first joint area is a union area of the prediction rectangular frame and the target rectangular frame.
In one embodiment, the step of performing frame matching on the tag frame and the predicted frame to determine the target matching node further includes: and if the number of the correct matches between the predicted frame and the label frame in the current node is not less than half of the number of the label frames, determining that the current node is a target matching node.
S140: calculating the recall rate and the accuracy rate of each pedestrian prediction result based on the number of the prediction matching nodes, the number of the label matching nodes and the number of the target matching nodes;
s150: and calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate.
The invention provides a cross-camera pedestrian tracking algorithm evaluation method, which comprises the following steps: firstly, based on a pre-acquired test data set, generating a pedestrian prediction result by using a cross-camera pedestrian tracking algorithm model; the test data set comprises monitoring video clips crossing the camera and corresponding label frames, and the pedestrian prediction result comprises a prediction frame; then dividing the monitoring video according to the nodes of a time axis, and determining the number of prediction matching nodes and the number of label matching nodes; performing frame matching on the tag frame and the predicted frame, and determining a target matching node; calculating the recall rate and the accuracy rate of each pedestrian prediction result based on the number of the prediction matching nodes, the number of the label matching nodes and the number of the target matching nodes; and finally, calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate, wherein the method realizes the evaluation of the existing cross-camera pedestrian tracking algorithm, solves the technical problems of low prediction efficiency and low accuracy of the existing model, and achieves the technical effect of improving the prediction accuracy.
In an embodiment, the dividing the monitoring video according to nodes of a time axis in step S120, and determining the number of prediction matching nodes included in the prediction frame and the number of tag matching nodes included in the tag frame may include: firstly, dividing a monitoring video according to nodes of a time axis to generate a plurality of time nodes; and then determining the number of the prediction matching nodes and the number of the label matching nodes according to the track of the label frame and the relation between the track of the prediction frame and the time node.
The relationship between the track of the label frame and the time node comprises the following steps: if the track of the tag frame of the pedestrian passes through one camera in the current time axis, the track of the tag frame comprises a time node, and the time node is a tag matching node; the relationship of the trajectory of the predicted frame to the temporal node includes: if the track of the predicted frame of the pedestrian falls within a certain time node, the track of the predicted frame comprises the time node, and the time node is a prediction matching node.
In an embodiment, after calculating the recall rate and the accuracy rate of the prediction result of each pedestrian in the step S140, the method may further include: and calculating the total recall rate and the total accuracy rate of all the pedestrian prediction results in the test data set based on the recall rate and the accuracy rate of each pedestrian prediction result.
As a specific example, the calculation formulas of recall rate and accuracy rate of individual pedestrian prediction results are as follows:
(formula 1);
Figure F_221216173255118_118116002
(equation 2);
wherein i is a pedestrian weaveNumber; n is the camera number; n is the total number of the cameras;
Figure F_221216173255246_246548003
matching the number of nodes for the labels of the label frame track of the ith pedestrian in the camera n;
Figure F_221216173255422_422393004
the number of the predicted matching nodes in the camera n for the track of the predicted frame of the ith pedestrian;
Figure F_221216173255500_500461005
and matching the number of the nodes for the target of the ith pedestrian in the camera n.
The calculation formulas of the total recall rate and the total accuracy rate of the prediction results of all the pedestrians in the test data set are as follows:
Figure F_221216173255595_595656006
(equation 3);
Figure F_221216173255689_689439007
(equation 4);
where M is the number of all pedestrians noted in the pedestrian library of the test data set. If M =10, 10 different pedestrians are marked in the video.
Correspondingly, the step S150 of calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall ratio and the accuracy ratio may include: and carrying out weighted average on the total recall rate and the total accuracy rate, and determining the evaluation score of the cross-camera pedestrian tracking algorithm model.
As a specific example, the calculation formula of the evaluation score is as follows:
Figure F_221216173255801_801235008
(equation 5);
and finally, carrying out algorithm precision evaluation sequencing according to the principle that the larger the Fscore value is, the higher the model precision is.
The evaluation method for the cross-camera pedestrian tracking algorithm provided by the embodiment combines the modules of pedestrian detection, pedestrian re-identification, pedestrian tracking and the like in the existing cross-camera pedestrian tracking algorithm for comprehensive evaluation, can effectively evaluate the accuracy of pedestrian detection, the accuracy of pedestrian re-identification and the effectiveness of tracking, solves the problem that the algorithm is sensitive to initial frame information, and solves the problem that the threshold percentage evaluation is unreliable, and the method is more suitable for practical application scenes and can effectively evaluate the accuracy of the algorithm.
As a specific example, an overall test evaluation flow is provided for a cross-camera pedestrian tracking algorithm, which mainly includes the following four steps: (1) preparing a diversified test set; (2) Inputting the information of the pedestrian library into a cross-camera pedestrian tracking algorithm model, and storing the pedestrian result identified and tracked by the model; (3) Dividing nodes of the video at fixed intervals according to a time axis, calculating Recall and Precision values of all pedestrian banks, and finally carrying out weighted average to obtain an Fscore value; (4) And carrying out algorithm evaluation sorting according to the principle that the larger the Fscore value is, the higher the model precision is.
Referring to fig. 2, the specific algorithm evaluation test procedure is as follows:
1. manufacturing a test data set;
s1-1, preparing a test data set, wherein the test data set comprises a monitoring video clip and a pedestrian library label (ground-route). Pedestrian videos (indoor and outdoor recordings) of N monitoring cameras across a scene are manufactured, and the pedestrian videos comprise various factors such as body shielding, head shielding, backpacks, front (back and side) faces of people, dresses in different color styles, different fuzzy degrees and the like. And the pedestrian library is randomly intercepted from the monitoring video and then manually marked, wherein the pedestrian library comprises a camera number, a frame number, a pedestrian ID number and rectangular frame position information.
Table 1: examples of manual annotations
cameraID FrameIndex TargetId X1 Y1 X2 Y2
1.mp4 2 895 2 3509 0 .825 208.152 295.004 1046.361
2.mp4 5 00 23595 1 23.456 2 35.236 3 33.123 6 78.123
N.mp4 4 19 23598 2 34.689 6 78.123 1 23.123 4 56.123
2. Obtaining a model prediction result by using the test set;
and S2-1, inputting the pedestrian library information into a cross-camera pedestrian tracking algorithm model, and storing pedestrian results (camera number, frame number, pedestrian ID number and rectangular frame position information) identified and tracked by the model.
3. Video node division and Fscore calculation;
and S3-1, dividing video nodes, and counting the number of nodes contained in the prediction frame, the number of nodes contained in the label frame and the number of matched nodes. For all that
Figure F_221216173255895_895024009
Each camera of the cameras is arranged on a time axis
Figure F_221216173256006_006803010
And each matching node has a certain range (for example, front and back 50 frames) on the time axis.
Referring to fig. 3, in the above steps, the node refers to: the video is divided into node sections according to fixed interval frames on a time axis. The frame matching is: and if the camera number, the frame number, the pedestrian ID number and the rectangular frame position are matched, the predicted frame is considered to be matched with the label frame.
Wherein, the rectangular frame position matching comprises: and the matched measurement index is an intersection-to-parallel ratio IOU, and if the IOU of the detection frame and the mark frame is greater than a set threshold value, the rectangular frame is considered to be matched in position. The set threshold may be set to be greater than 0.7 with no fixed value.
The tag frame contains the matching nodes (i.e.: tag matching nodes): if the motion trail of a certain pedestrian passes through one camera in the time axis, the tag frame trail of the pedestrian is considered to contain a corresponding node, and the node is a tag frame and contains a matching node. The predicted frame contains the matching node (i.e.: the predicted matching node): if the predicted frame track falls in a certain node, the node is considered that the predicted frame track comprises the corresponding node, and the node is a predicted frame comprising a matching node.
50% frame matching: for a specific tracking object, in a range where a certain tag frame includes a matching node, when the number of correctly predicted frames is greater than or equal to half of the number of tag frames, the situation is considered as 50% frame matching.
And (3) target matching nodes: for a certain designated tracking object, in the range that a certain label frame contains a matching node, when more than 50% of submitted tracking tracks (namely tracks of predicted frames) and group Truth tracking tracks (namely tracks of label frames) can be matched, the tracking object is considered to be matched with the node during cross-head tracking, and the node is a target matching node.
And S3-2, calculating the Recall and Precision values of the prediction result of each pedestrian according to the fact that the prediction frame comprises the number of matched nodes, the tag frame (group channel) comprises the number of matched nodes and 50% of the number of frame matched nodes. As can be seen from fig. 3, the predicted frame includes 5 matching nodes, the tag frame includes 7 matching nodes, and the matched target matching nodes are 2. Therefore, the evaluation result on the track is: recall =2/7 and precision =2/5. The specific calculation formula is as follows.
For the ith pedestrian target, the evaluation indexes are as follows:
Figure F_221216173256084_084924011
(formula 1);
Figure F_221216173256196_196746012
(equation 2);
where N is the total number of cameras,
Figure F_221216173256290_290478013
the number of matched nodes contained in a label frame of a camera n is the number of tracked group Truth tracks of the ith specified tracked object.
Figure F_221216173256385_385676014
The number of matched nodes contained in a prediction frame of the ith specified tracking object in the camera n is determined.
Figure F_221216173256464_464334015
Is the number of matching nodes which are contained in 50% of frames of the ith designated tracking object in the camera n in a matching way.
And S3-3, calculating the Recall values and Precision values of all the pedestrian bank prediction results.
Figure F_221216173256558_558111016
(formula 3);
Figure F_221216173256653_653299017
(equation 4);
and S3-4, calculating the Fscore value of the cross-camera pedestrian tracking algorithm. The Recall and Precision weighted average of all pedestrian pools resulted in the Fscore value.
Figure F_221216173256747_747046018
(equation 5);
and S3-5, performing algorithm precision evaluation sequencing according to the principle that the larger the Fscore value is, the higher the model precision is.
The evaluation method for the cross-camera pedestrian tracking algorithm provided by the embodiment of the application can evaluate the algorithm precision more effectively and efficiently, and is more suitable for practical application scenarios. The method provides the concepts of frame matching, detection position matching, node matching, matching node contained in a label frame, matching node contained in a prediction frame, 50% frame matching and matching node; calculating Recall rate (Recall) and Precision rate (Precision) of all specified tracked pedestrians on video time axis nodes, carrying out weighted average on the Recall rate and the Precision rate to obtain an Fscore value, and evaluating the advantages and the disadvantages of the algorithm according to the height of the Fscore value.
The comprehensive evaluation of the combination of modules such as pedestrian detection, pedestrian re-identification and pedestrian tracking in the existing cross-camera pedestrian tracking algorithm is realized, the accuracy of pedestrian detection, the accuracy of pedestrian re-identification and the effectiveness of tracking can be effectively evaluated, the problem that the algorithm is sensitive to initial frame information is solved, and the problem that threshold percentage evaluation is unreliable is solved.
In addition, the embodiment of the invention also provides an evaluation system of a cross-camera pedestrian tracking algorithm, which comprises the following steps:
the pedestrian prediction result generation module is used for generating at least one pedestrian prediction result by utilizing a cross-camera pedestrian tracking algorithm model based on a pre-acquired test data set; the method comprises the steps that a pre-acquired test data set comprises cross-camera monitoring video clips and corresponding label frames; the pedestrian prediction result comprises a prediction frame;
the node number determining module is used for dividing the monitoring video according to the nodes of the time axis and determining the number of the prediction matching nodes contained in the prediction frame and the number of the label matching nodes contained in the label frame;
the target matching node generation module is used for carrying out frame matching on the label frame and the prediction frame to determine a target matching node;
the calculation module is used for calculating the recall rate and the accuracy rate of each pedestrian prediction result based on the number of the prediction matching nodes, the number of the label matching nodes and the number of the target matching nodes;
and the calculation module is also used for calculating the evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate.
It should be noted that the evaluation system of the cross-camera pedestrian tracking algorithm provided by the embodiment of the invention can be deployed on embedded terminal equipment or an AI cloud server and implemented as a terminal evaluation product or an AI platform evaluation system.
The evaluation system of the cross-camera pedestrian tracking algorithm provided by the embodiment of the application can be specific hardware on the equipment or software or firmware installed on the equipment and the like. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. The evaluation system of the cross-camera pedestrian tracking algorithm provided by the embodiment of the application has the same technical characteristics as the evaluation method of the cross-camera pedestrian tracking algorithm provided by the embodiment, so that the same technical problems can be solved, and the same technical effect is achieved.
The embodiment of the application further provides an electronic device, and specifically, the electronic device comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the above described embodiments.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device 400 includes: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like may be used.
The bus 42 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.
The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware integrated logic circuits or software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.
Corresponding to the method, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores machine executable instructions, and when the computer executable instructions are called and executed by a processor, the computer executable instructions cause the processor to execute the steps of the method.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters in the figures refer to like items and, thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures, and moreover, the terms "first," "second," "third," etc. are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A cross-camera pedestrian tracking algorithm evaluation method is characterized by comprising the following steps:
generating at least one pedestrian prediction result by using a cross-camera pedestrian tracking algorithm model based on a pre-acquired test data set; the pre-acquired test data set comprises a monitoring video clip crossing the camera and a corresponding label frame; the pedestrian prediction result comprises a prediction frame;
dividing the monitoring video according to the nodes of a time axis, and determining the number of prediction matching nodes contained in the prediction frame and the number of label matching nodes contained in the label frame;
dividing the monitoring video according to the nodes of a time axis, and determining the number of the prediction matching nodes contained in the prediction frame and the number of the label matching nodes contained in the label frame, wherein the steps comprise: dividing the monitoring video according to the nodes of a time axis to generate a plurality of time nodes; determining the number of prediction matching nodes and the number of label matching nodes according to the track of the label frame and the relation between the track of the prediction frame and the time node; wherein the relationship between the track of the tag frame and the time node comprises: if the track of the tag frame of the pedestrian passes through one camera in the current time axis, the track of the tag frame comprises the time node, and the time node is a tag matching node; the relationship of the trajectory of the predicted frame to the temporal node includes: if the track of the predicted frame of the pedestrian falls within a certain time node, the track of the predicted frame comprises the time node, and the time node is the predicted matching node;
performing frame matching on the tag frame and the predicted frame, and determining a target matching node; if the number of the correct matches between the predicted frame and the label frame in the current node is not less than half of the number of the label frames, determining that the current node is a target matching node;
calculating recall rate and precision rate of each pedestrian prediction result based on the number of prediction matching nodes, the number of tag matching nodes and the number of target matching nodes;
calculating an evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate.
2. The cross-camera pedestrian tracking algorithm evaluation method of claim 1, further comprising:
acquiring a target monitoring video clip containing a target pedestrian picture and a corresponding label frame according to target pedestrian videos of a plurality of monitoring cameras across scenes;
the parameters of the tag frame comprise a target camera number, a target frame number, a target pedestrian ID number and target rectangular frame position information;
generating target pedestrian library information based on the target surveillance video clip and the tag frame;
and generating a test data set according to the information of the target pedestrian library.
3. The cross-camera pedestrian tracking algorithm evaluation method according to claim 2, wherein the step of performing frame matching on the tag frame and the predicted frame to determine a target matching node comprises:
respectively matching parameters contained in the predicted frame and the label frame; wherein the parameters of the predicted frame include: predicting the number of a camera, the number of a frame, the ID number of a pedestrian and the position information of a rectangular frame;
if the parameters of the predicted frame and the parameters of the label frame both meet the matching conditions, the predicted frame and the label frame are correctly matched; wherein the matching condition includes: the predicted camera number matches the target camera number, the predicted frame number matches the target frame number, the predicted pedestrian ID number matches the target pedestrian ID number, and the predicted rectangular frame position information matches the target rectangular frame position information.
4. The cross-camera pedestrian tracking algorithm evaluation method according to claim 3, wherein the condition that the predicted rectangular frame position information matches the target rectangular frame position information is: the intersection ratio IOU of the prediction rectangular frame and the target rectangular frame is larger than a set threshold value;
the IOU is the ratio of the first overlapping area to the first joint area; the first overlapping area is an intersection area of the prediction rectangular frame and the target rectangular frame; the first joint region is a union region of the prediction rectangular frame and the target rectangular frame.
5. The cross-camera pedestrian tracking algorithm evaluation method of claim 1, further comprising:
calculating the total recall rate and the total accuracy rate of all the pedestrian prediction results in the test data set based on the recall rate and the accuracy rate of each pedestrian prediction result;
calculating an evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate, comprising:
and carrying out weighted average on the total recall rate and the total accuracy rate, and determining the evaluation score of the cross-camera pedestrian tracking algorithm model.
6. An evaluation system for a cross-camera pedestrian tracking algorithm, the system comprising:
the pedestrian prediction result generation module is used for generating at least one pedestrian prediction result by utilizing a cross-camera pedestrian tracking algorithm model based on a pre-acquired test data set; the pre-acquired test data set comprises a monitoring video clip crossing the camera and a corresponding label frame; the pedestrian prediction result comprises a prediction frame;
the node number determining module is used for dividing the monitoring video according to nodes of a time axis and determining the number of prediction matching nodes contained in the prediction frame and the number of label matching nodes contained in the label frame; the node number determining module is also used for dividing the monitoring video according to the nodes of the time axis to generate a plurality of time nodes; determining the number of prediction matching nodes and the number of label matching nodes according to the track of the label frame and the relation between the track of the prediction frame and the time node; wherein the relationship between the track of the tag frame and the time node comprises: if the track of the tag frame of the pedestrian passes through one camera in the current time axis, the track of the tag frame comprises the time node, and the time node is a tag matching node; the relationship of the trajectory of the predicted frame to the temporal node includes: if the track of the predicted frame of the pedestrian falls in a certain time node, the track of the predicted frame comprises the time node, and the time node is the predicted matching node;
the target matching node generation module is used for carrying out frame matching on the label frame and the predicted frame and determining a target matching node; if the number of the correct matches between the predicted frame and the label frame in the current node is not less than half of the number of the label frames, determining that the current node is a target matching node;
the calculation module is used for calculating the recall rate and the accuracy rate of each pedestrian prediction result based on the prediction matching node number, the label matching node number and the target matching node number;
the calculation module is further used for calculating an evaluation score of the cross-camera pedestrian tracking algorithm model based on the recall rate and the accuracy rate.
7. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 5 when executing the computer program.
CN202211636207.9A 2022-12-20 2022-12-20 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment Active CN115620098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211636207.9A CN115620098B (en) 2022-12-20 2022-12-20 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211636207.9A CN115620098B (en) 2022-12-20 2022-12-20 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment

Publications (2)

Publication Number Publication Date
CN115620098A CN115620098A (en) 2023-01-17
CN115620098B true CN115620098B (en) 2023-03-10

Family

ID=84880864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211636207.9A Active CN115620098B (en) 2022-12-20 2022-12-20 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment

Country Status (1)

Country Link
CN (1) CN115620098B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152197B (en) * 2023-10-30 2024-01-23 成都睿芯行科技有限公司 Method and system for determining tracking object and method and system for tracking

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930768A (en) * 2016-04-11 2016-09-07 武汉大学 Spatial-temporal constraint-based target re-identification method
CN109146910A (en) * 2018-08-27 2019-01-04 公安部第研究所 A kind of video content analysis index Evaluation Method based on target positioning
CN111242972A (en) * 2019-12-23 2020-06-05 中国电子科技集团公司第十四研究所 Online cross-scale multi-fluid target matching and tracking method
CN113095263A (en) * 2021-04-21 2021-07-09 中国矿业大学 Method and device for training heavy identification model of pedestrian under shielding and method and device for heavy identification of pedestrian under shielding
CN114037886A (en) * 2021-11-04 2022-02-11 重庆紫光华山智安科技有限公司 Image recognition method and device, electronic equipment and readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6968057B2 (en) * 1994-03-17 2005-11-22 Digimarc Corporation Emulsion products and imagery employing steganography
CN103971384B (en) * 2014-05-27 2017-01-25 苏州经贸职业技术学院 Node cooperation target tracking method of wireless video sensor
CN107563313B (en) * 2017-08-18 2020-07-07 北京航空航天大学 Multi-target pedestrian detection and tracking method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930768A (en) * 2016-04-11 2016-09-07 武汉大学 Spatial-temporal constraint-based target re-identification method
CN109146910A (en) * 2018-08-27 2019-01-04 公安部第研究所 A kind of video content analysis index Evaluation Method based on target positioning
CN111242972A (en) * 2019-12-23 2020-06-05 中国电子科技集团公司第十四研究所 Online cross-scale multi-fluid target matching and tracking method
CN113095263A (en) * 2021-04-21 2021-07-09 中国矿业大学 Method and device for training heavy identification model of pedestrian under shielding and method and device for heavy identification of pedestrian under shielding
CN114037886A (en) * 2021-11-04 2022-02-11 重庆紫光华山智安科技有限公司 Image recognition method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN115620098A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
Tran et al. Video event detection: From subvolume localization to spatiotemporal path search
CN108133172B (en) Method for classifying moving objects in video and method and device for analyzing traffic flow
CN109598743B (en) Pedestrian target tracking method, device and equipment
Lukežič et al. Now you see me: evaluating performance in long-term visual tracking
CN111652912B (en) Vehicle counting method and system, data processing equipment and intelligent shooting equipment
US11080559B2 (en) Product onboarding machine
CN106934817B (en) Multi-attribute-based multi-target tracking method and device
GB2502187A (en) Determining parking lot occupancy from digital camera images
CN110866515A (en) Method and device for identifying object behaviors in plant and electronic equipment
CN115620098B (en) Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment
WO2022048572A1 (en) Target identification method and apparatus, and electronic device
CN110781733A (en) Image duplicate removal method, storage medium, network equipment and intelligent monitoring system
CN115810178B (en) Crowd abnormal aggregation early warning method and device, electronic equipment and medium
Soleimanitaleb et al. Single object tracking: A survey of methods, datasets, and evaluation metrics
He et al. Identity-quantity harmonic multi-object tracking
CN113111838A (en) Behavior recognition method and device, equipment and storage medium
CN114783037B (en) Object re-recognition method, object re-recognition apparatus, and computer-readable storage medium
Lee et al. Online multiple object tracking using rule distillated siamese random forest
JP2003044859A (en) Device for tracing movement and method for tracing person
Emami et al. Long-range multi-object tracking at traffic intersections on low-power devices
Galor et al. Strong-TransCenter: Improved multi-object tracking based on transformers with dense representations
Shuai et al. Large scale real-world multi-person tracking
KR101137110B1 (en) Method and apparatus for surveying objects in moving picture images
CN103810460A (en) Object tracking method and object tracking device
CN112562315A (en) Method, terminal and storage medium for acquiring traffic flow information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant