CN117392176A - Pedestrian tracking method and system for video monitoring and computer readable medium - Google Patents

Pedestrian tracking method and system for video monitoring and computer readable medium Download PDF

Info

Publication number
CN117392176A
CN117392176A CN202311381410.0A CN202311381410A CN117392176A CN 117392176 A CN117392176 A CN 117392176A CN 202311381410 A CN202311381410 A CN 202311381410A CN 117392176 A CN117392176 A CN 117392176A
Authority
CN
China
Prior art keywords
pedestrian
frame
tracking method
video
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311381410.0A
Other languages
Chinese (zh)
Inventor
陈从平
吴伟鹏
陆洋
陈奔
刘雅玄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202311381410.0A priority Critical patent/CN117392176A/en
Publication of CN117392176A publication Critical patent/CN117392176A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to a pedestrian tracking method and system for video monitoring and a computer readable medium, wherein the method comprises the steps of processing a video stream containing a pedestrian target frame by frame, generating a video frame corresponding image and preprocessing the image; inputting continuous frame image data into an improved YOLOv5 network model, and carrying out feature extraction and target detection to obtain the boundary frame information of a target pedestrian; establishing a lightweight DeepSort tracking model, and replacing an original feature extraction module of the DeepSort with a MobileNet V2 feature extraction module; and correlating the pedestrian motion information frame by calculating the mahalanobis distance between the detected pedestrian position and the Kalman filter predicted position. The invention solves the problems that the traditional monitoring is easy to generate unstable monitoring result and easily leak important information; and the problem of constraints on limited computing resources on edge devices.

Description

Pedestrian tracking method and system for video monitoring and computer readable medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a pedestrian tracking method and system for video surveillance, and a computer readable medium.
Background
With the development of computer vision technology, cameras have been widely used in the field of video surveillance, and particularly with the increasing demands of people for public and personal security, cameras of various sizes and diversity are currently distributed in large numbers in frontiers, public buildings, public transportation, shops, office buildings, parking lots and even homes.
An intelligent monitoring technology needs to be introduced, so that the burden of a monitor in a traditional monitoring system is reduced, and the monitoring effect is improved; YOLOv5 is a target detection algorithm based on deep learning, however, it is not enough to perform target detection only, because in an actual monitoring scene, a shielding phenomenon often occurs to pedestrians, so that it is difficult for a conventional target tracking algorithm to accurately track the track of the pedestrian.
In addition, running complex models on embedded and edge devices is often constrained by computational resources and memory limitations, and therefore a lighter weight feature extraction module is employed to reduce model parameters and accommodate the resource limitations of the edge devices.
Disclosure of Invention
Aiming at the defects of the prior method, the invention solves the problems that the traditional monitoring is easy to generate unstable monitoring result and easily leak important information; and the problem of constraints on limited computing resources on edge devices.
The technical scheme adopted by the invention is as follows: the pedestrian tracking method for video monitoring comprises the following steps:
step one, processing a video stream containing a pedestrian target frame by frame to generate a video frame corresponding image, and preprocessing the image;
further, the preprocessing operation includes: the image is adjusted for brightness and contrast, noise reduction and resizing.
Inputting continuous frame image data into an improved YOLOv5 network model, and performing feature extraction and target detection to obtain boundary box information of a target pedestrian;
further, the improved YOLOv5 network model is to replace the C3 modules of layers 2, 4, 6 and 8 of the YOLOv5 network backbone with DAMC3 modules; the DAMC3 is a spatial attention module, a channel attention module, and a convolution module connected in sequence at the output layer of the C3 module.
Step three, a lightweight deep Sort tracking model is established, and the original characteristic extraction module of the deep Sort is replaced by a MobileNet V2 characteristic extraction module;
further, the MobileNetV2 feature extraction module includes:
the expansion layer increases the dimension of the low-dimension feature by performing point-by-point convolution by using a 1x1 convolution kernel; depth separable convolution channel convolution by a 3x3 convolution kernel reduces the number of computation parameters, and projection layers reduce the dimensions of high-dimensional features by point-by-point convolution by using a 1x1 convolution kernel.
And step four, correlating the pedestrian movement information frame by calculating the mahalanobis distance between the detected pedestrian position and the Kalman filter predicted position.
Further, the equation of the mahalanobis distance is:
wherein: d, d j Is the position of the detection frame j; y is i Is the predicted position of the Kalman filter i; s is S i Is the covariance matrix of the detected and predicted positions.
Further, the fourth step further includes:
and carrying out IOU matching on the associated pedestrian prediction frame and the pedestrian detection frame transmitted by the second frame detection, setting a threshold value, confirming the tracking state and carrying out cascade matching.
Further, a pedestrian tracking system for video monitoring, comprising: a memory for storing instructions executable by the processor; and the processor is used for executing the instructions to realize the pedestrian tracking method for video monitoring.
Further, a computer readable medium storing computer program code, characterized in that the computer program code, when executed by a processor, implements a pedestrian tracking method for video surveillance.
The invention has the beneficial effects that:
1. by adding a dual-attentiveness mechanism DAM in the YOLOv5, pedestrian characteristic representation capability can be enhanced, target detection accuracy can be improved, target position and scale estimation can be improved, and the influence of noise and shielding can be restrained; the calculated amount is effectively reduced;
2. the combination of the improved YOLOv5 network model and the light-weight deep start tracking model can improve the robustness of pedestrian targets in complex scenes, the improved YOLOv5 network model has stronger target detection capability, the light-weight deep start tracking model utilizes appearance characteristics and a motion model to carry out target association, and the challenges of shielding, appearance change and the like can be processed, so that the accuracy and the robustness of pedestrian tracking in intelligent video monitoring are improved;
3. by replacing the original feature extraction convolution module with the MobileNetV2 feature extraction module, the parameter quantity of the model is greatly reduced, and the whole system can be easily deployed to edge equipment with limited computing resources and storage capacity.
Drawings
FIG. 1 is a schematic logic flow diagram of a pedestrian tracking method for video surveillance of the present invention;
FIG. 2 is a diagram of a modified yolov5 network model of the present invention;
FIG. 3 is a schematic diagram of a dual-attention mechanism DAM of the present invention;
FIG. 4 is a block diagram of a dual-attention mechanism DAMC3 of the present invention;
FIG. 5 is a diagram of a MobileNet V2 feature extraction module of the present invention;
FIG. 6 is a schematic diagram of a depth separable convolution structure of the present invention;
FIG. 7 is a tracking state flow diagram of the present invention;
fig. 8 is a graph of the pedestrian tracking effect of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic illustrations showing only the basic structure of the invention and thus showing only those constructions that are relevant to the invention.
As shown in fig. 1, the pedestrian tracking method for video monitoring includes the steps of:
step one, processing a video stream containing a pedestrian target frame by frame to generate a video frame;
firstly, acquiring a real-time monitoring video, and decoding each frame of video to obtain image data.
Secondly, before each frame is processed, preprocessing operation is performed on the image, and the image is optimized to improve the performance of a subsequent target detection and tracking algorithm, wherein the preprocessing operation comprises: adjusting brightness and contrast, reducing noise and adjusting image size of the image; the quality and definition of the image can be improved, so that the target can be more easily and accurately detected and tracked in the image;
finally, repeating the steps, and carrying out the same processing on the next frame of video until all the video frames are processed.
Step two, as shown in fig. 2, an improved YOLOv5 network model is shown, continuous frame image data is input into the network model for feature extraction and target detection, and boundary box information of a target pedestrian is obtained;
to improve the accuracy and stability of pedestrian detection and tracking, YOLOv5 was improved, adding a dual-attention mechanism DAM (Dual attention model); the mechanism of attention stems from the intuition of the human brain, which tends to focus on important information while ignoring secondary information when dealing with large amounts of information; the dual-attention mechanism DAM principle is shown in fig. 3, and the DAM module can better capture the spatial correlation and the channel correlation in the image by introducing the spatial attention mechanism and the channel attention mechanism; for the pedestrian tracking task, the two focusing mechanisms are very useful, and the spatial focusing mechanism is helpful to focus on important areas in the image, such as the positions of pedestrians, so that the accuracy of pedestrian detection is improved, and the key parts of the pedestrians are ensured not to be ignored; the channel attention mechanism allows the model to automatically learn weights in different channels of the feature map to emphasize channels related to pedestrian features, so that the model is facilitated to better distinguish pedestrians from background or other objects, and the distinguishing property of the features is improved.
The DAM module firstly applies a spatial attention mechanism, and the spatial attention can help the model focus on a specific area in the image when processing a visual task, which is beneficial to a pedestrian target detection task; features of specific positions can be selectively enhanced through spatial attention, and calculation cost is reduced; if the channel attention mechanism is applied first, the operation is performed on the whole characteristic diagram, and the calculation amount is increased without considering the information of a specific position.
Changing the original C3 layer to the DAMC3 layer as shown in fig. 4, introducing DAM into the C3 layer internal benefits include:
1. finer feature weighting may enable the attention mechanism of the DAM to be more accurately applied to a particular feature map, and the model may more finely adjust the channel and spatial attention on the feature map generated by the C3 layer to better adapt to the features of that layer, better integrate and utilize the features of the C3 layer, to improve detection and recognition performance.
2. Better model fine tuning: placing the DAM inside a particular layer may make the model easier to fine-tune, as the DAM may more precisely adapt to the features of that layer, which may be helpful for migration learning and fine-tuning for a particular task.
Inputting the obtained continuous frame image data into a network model for feature extraction and target detection, and obtaining the boundary frame information of the target pedestrianAnd output; wherein (x, y) is represented as the center coordinates of the bounding box of the target pedestrian, w is represented as the width, h is represented as the height,>and the corresponding speed information of x, y, w and h in the image coordinate system is represented.
Step three, a lightweight deep Sort tracking model is established, and the original characteristic extraction module of the deep Sort is replaced by a MobileNet V2 characteristic extraction module; under the condition of little precision loss, the parameter quantity of the model is greatly reduced, the model weight is reduced to 2.5M from the original 45M, and the tracking model is conveniently deployed to the edge equipment with limited computational power.
Because the computational effort of the embedded edge monitoring device is limited, the mobilenet v2 is designed to provide good performance in a resource-limited environment, and the main reason that the parameter number is reduced by replacing the mobilenet v2 with the deep original feature extraction module is as follows: mobileNetV2 is itself a lightweight convolutional neural network architecture with a small amount of parameters relative to some large feature extraction networks.
Some key factors for mobilenet v2 to reduce the number of parameters:
1. depth separable convolution: mobilenet v2 uses a depth separable convolution instead of a standard convolution, which is a convolution operation that separates space and channel dimensions, and performs a convolution operation on each channel instead of on the entire signature, which reduces the number of parameters in the model.
2. Lightweight structure: mobilenet v2 is designed with fewer layers and parameters to accommodate computing resource limitations of embedded devices and mobile devices, using a range of lightweight design strategies such as residual connection and batch normalization to reduce the complexity and number of parameters of the network.
3. Inversion residual: the mobilenet v2 also introduces an inverted residual block, further reducing the amount of parameters. These inverted residual blocks contain a lightweight, expanded convolution and a linear convolution to increase the dimension of the feature map.
The principle of the MobileNet V2 feature extraction module is shown in fig. 5, and an Expansion layer (Expansion layer) increases the low-dimensional feature by point-by-point convolution by using a 1x1 convolution kernel; the depth separable convolution (Depthwise Convolution) performs channel convolution through a 3x3 convolution kernel to reduce the number of calculation parameters, and the Projection layer (Projection layer) performs point-by-point convolution through a 1x1 convolution kernel to reduce the dimension of the high-dimensional feature; the depth separable convolution is characterized in that the convolution operation is decomposed into two steps, the space information is processed once, and the channel information is processed once, so that the parameter quantity required by calculation is obviously reduced; this separate operation allows the model to be more lightweight while still maintaining high detection performance.
In the channel-by-channel convolution, the feature map of each channel is calculated through a convolution kernel, as shown in the front section of fig. 6, and the number of channels of the feature map obtained after the process is consistent with the number of channels in input.
Bounding box information for a target pedestrian using a Kalman filter using bounding box information obtained by improving a YOLOv5 network model as inputInitializing, calculating through a Kalman filter to generate prediction frame information, and primarily predicting the position of the target pedestrian in the current frame.
Step four, correlating the pedestrian motion information frame by calculating the mahalanobis distance between the detected pedestrian position and the Kalman filter predicted position;
the mahalanobis distance expression is:
wherein: d, d j Is the position of the detection frame j; y is i Is the predicted position of the Kalman filter i; s is S i Is the covariance matrix of the detected and predicted positions.
And carrying out IOU matching on the associated pedestrian prediction frame and the pedestrian detection frame transmitted by the second frame detection, setting a threshold value, confirming the tracking state and carrying out cascade matching.
The IOU matching calculation formula is shown as follows;
where a represents a predicted frame and B represents an actual frame.
Tracking states fall into three categories: confirm status, do not confirm status, and delete track.
The tracking state as shown in fig. 7 is as follows: firstly, in an initialization stage, a track T is created according to a target detection result of a first frame, a Kalman filter is used for carrying out position prediction, the track is in an unacknowledged state at the moment, then an IOU (input-output) matching is carried out on a target pedestrian detection frame and a prediction frame of a current frame, a cost matrix is calculated, a Hungary algorithm is applied, and a linear matching result is obtained according to the cost matrix.
According to the matching result, the following cases are processed: 1. if the track T is mismatched, deleting the mismatched track T; 2. initializing a mismatched detection frame D as a new track T; if the Kalman filter prediction frame is successfully matched with the pedestrian detection frame, updating a variable of a track T successfully matched through the Kalman filter; and performing target matching step in a loop iteration mode until the track T of the confirmation state appears or the video frame ends.
In addition, the method also comprises a cascade matching stage, wherein a frame corresponding to the track T in the confirmation state and the track T in the non-confirmation state is predicted by using Kalman filtering, the frame of the track T in the confirmation state is subjected to cascade matching with the detection frame D, appearance characteristics and motion information are utilized in cascade matching, and the appearance characteristics and the motion information of the previous n frames are stored, so that the matching accuracy is improved.
Finally, in the stage of processing the matching result, three cases are processed according to the cascade matching result: if the track T is successfully matched, updating the variable of the track T successfully matched through Kalman filtering; if the mismatch of the target frame D is detected, carrying out IOU matching on the track T in the unacknowledged state and the mismatched track T with the detection frame D which is not successfully matched, and calculating a cost matrix; and then, the Hungary algorithm is applied again to obtain a linear matching result. And performing the step of processing the matching result in a loop iteration mode until the video frame is ended.
And outputting a result of successfully tracking the pedestrian according to the associated and updated tracker states, drawing a boundary box, allocating unique IDs and generating a pedestrian track, wherein the effect is shown in figure 8.
With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.

Claims (8)

1. The pedestrian tracking method for video monitoring is characterized by comprising the following steps of:
step one, processing a video stream containing a pedestrian target frame by frame to generate a video frame corresponding image, and preprocessing the image;
inputting continuous frame image data into an improved YOLOv5 network model, and performing feature extraction and target detection to obtain boundary box information of a target pedestrian;
step three, a lightweight deep Sort tracking model is established, and the original characteristic extraction module of the deep Sort is replaced by a MobileNet V2 characteristic extraction module;
and step four, correlating the pedestrian movement information frame by calculating the mahalanobis distance between the detected pedestrian position and the Kalman filter predicted position.
2. The pedestrian tracking method for video surveillance of claim 1, wherein the preprocessing operation includes: the image is adjusted for brightness and contrast, noise reduction and resizing.
3. The pedestrian tracking method for video surveillance of claim 1, wherein the improvement to the YOLOv5 network model is to replace C3 modules of layers 2, 4, 6 and 8 of the YOLOv5 network backup with DAMC3 modules; the DAMC3 is a spatial attention module, a channel attention module, and a convolution module connected in sequence at the output layer of the C3 module.
4. The pedestrian tracking method for video surveillance of claim 1, wherein the MobileNetV2 feature extraction module comprises:
the expansion layer increases the dimension of the low-dimension feature by performing point-by-point convolution by using a 1x1 convolution kernel; depth separable convolution channel convolution by a 3x3 convolution kernel reduces the number of computation parameters, and projection layers reduce the dimensions of high-dimensional features by point-by-point convolution by using a 1x1 convolution kernel.
5. The pedestrian tracking method for video surveillance of claim 1, wherein the equation of mahalanobis distance is:
wherein: d, d j Is the position of the detection frame j; y is i Is the predicted position of the Kalman filter i; s is S i Is the covariance matrix of the detected and predicted positions.
6. The pedestrian tracking method for video surveillance of claim 1, wherein step four further includes:
and carrying out IOU matching on the associated pedestrian prediction frame and the pedestrian detection frame transmitted by the second frame detection, setting a threshold value, confirming the tracking state and carrying out cascade matching.
7. A pedestrian tracking system for video surveillance, comprising: a memory for storing instructions executable by the processor; a processor for executing instructions to implement the pedestrian tracking method for video surveillance of any one of claims 1-6.
8. Computer readable medium storing computer program code, characterized in that the computer program code, when executed by a processor, implements the pedestrian tracking method for video monitoring as claimed in any one of claims 1-6.
CN202311381410.0A 2023-10-24 2023-10-24 Pedestrian tracking method and system for video monitoring and computer readable medium Pending CN117392176A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311381410.0A CN117392176A (en) 2023-10-24 2023-10-24 Pedestrian tracking method and system for video monitoring and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311381410.0A CN117392176A (en) 2023-10-24 2023-10-24 Pedestrian tracking method and system for video monitoring and computer readable medium

Publications (1)

Publication Number Publication Date
CN117392176A true CN117392176A (en) 2024-01-12

Family

ID=89436993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311381410.0A Pending CN117392176A (en) 2023-10-24 2023-10-24 Pedestrian tracking method and system for video monitoring and computer readable medium

Country Status (1)

Country Link
CN (1) CN117392176A (en)

Similar Documents

Publication Publication Date Title
CN110660082B (en) Target tracking method based on graph convolution and trajectory convolution network learning
Van Den Oord et al. Pixel recurrent neural networks
CN107529650B (en) Closed loop detection method and device and computer equipment
US8989442B2 (en) Robust feature fusion for multi-view object tracking
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
Ramirez-Quintana et al. Self-adaptive SOM-CNN neural system for dynamic object detection in normal and complex scenarios
EP1542155A1 (en) Object detection
CN107833239B (en) Optimization matching target tracking method based on weighting model constraint
CN113065645B (en) Twin attention network, image processing method and device
Zhao et al. Stacked multilayer self-organizing map for background modeling
GB2409027A (en) Face detection
CN113689464B (en) Target tracking method based on self-adaptive multi-layer response fusion of twin network
CN111401196A (en) Method, computer device and computer readable storage medium for self-adaptive face clustering in limited space
CN116343330A (en) Abnormal behavior identification method for infrared-visible light image fusion
GB2409029A (en) Face detection
CN112084952B (en) Video point location tracking method based on self-supervision training
CN111260687A (en) Aerial video target tracking method based on semantic perception network and related filtering
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
CN108537825B (en) Target tracking method based on transfer learning regression network
Vijayan et al. A fully residual convolutional neural network for background subtraction
Casagrande et al. Abnormal motion analysis for tracking-based approaches using region-based method with mobile grid
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN107169997B (en) Background subtraction method for night environment
Yue et al. Improved Ada Boost classifier for sports scene detection in videos: From data extraction to image understanding
CN110111358B (en) Target tracking method based on multilayer time sequence filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination