CN116030412A - Escalator monitoring video anomaly detection method and system - Google Patents

Escalator monitoring video anomaly detection method and system Download PDF

Info

Publication number
CN116030412A
CN116030412A CN202211703754.4A CN202211703754A CN116030412A CN 116030412 A CN116030412 A CN 116030412A CN 202211703754 A CN202211703754 A CN 202211703754A CN 116030412 A CN116030412 A CN 116030412A
Authority
CN
China
Prior art keywords
frame
key point
model
yolov5
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211703754.4A
Other languages
Chinese (zh)
Inventor
童勤峰
钟毅
卓荣荣
杨建党
蒋俊涛
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Ningbo Hongda Elevator Co Ltd
Original Assignee
Zhejiang University ZJU
Ningbo Hongda Elevator Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Ningbo Hongda Elevator Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202211703754.4A priority Critical patent/CN116030412A/en
Publication of CN116030412A publication Critical patent/CN116030412A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B50/00Energy efficient technologies in elevators, escalators and moving walkways, e.g. energy saving or recuperation technologies

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for detecting an abnormal escalator monitoring video, which relate to the field of escalator detection, wherein a trained YOLOv5 target detection model and an HR-Net key point extraction model are combined to obtain a key point heat map corresponding to each frame picture in a data set, a key point inter-frame change map corresponding to the key point heat map is obtained, a convolutional neural network model is obtained by training the convolutional neural network through an image tag containing the key point inter-frame change map and a tag corresponding to the key point inter-frame change map, when the escalator monitoring video to be detected starts to be detected, a YOLOv5 target detection model is input into a set of escalator monitoring video pictures frame by frame to obtain the positions of pedestrian target frames corresponding to each frame picture, the HR-Net key point extraction model is used for predicting the key point heat map corresponding to each pedestrian target frame position, the key point inter-frame change map is input into the convolutional neural network model to be predicted to obtain the corresponding tag, and the behavior state on the escalator is obtained through the tag, so that intelligent escalator monitoring video abnormality detection is realized.

Description

Escalator monitoring video anomaly detection method and system
Technical Field
The invention relates to the field of escalator detection, in particular to an escalator monitoring video anomaly detection method and system.
Background
With the promotion of urban construction and the improvement of the material level of people, urban infrastructure is continuously perfected, and people pay more attention to the safety of the facilities. The escalator is widely used in public places such as subway stations, shopping malls and the like. However, safety accidents related to escalators are also increasing due to equipment problems, improper user behavior, and the like. According to news reports, the activities such as retrograde, running, falling or carrying a baby carriage and large pieces of luggage on an escalator often easily cause safety accidents, and passengers are injured. In time, accurately and efficiently monitor and alarm the abnormal behavior of the elevator, thereby being beneficial to quickly responding to accidents, avoiding casualties and improving the emergency treatment level.
To raise the level of abnormality detection, researchers have made some studies and searches. Shao Haibo an escalator safety monitoring system is proposed, with an authorized bulletin number CN205257749U, comprising a first camera group for capturing an overall image of the escalator and passengers on the escalator, a second camera group for capturing mechanical parts of the escalator, and a data processing device for image analysis and processing. Liuzhuo and the like propose a safety monitoring device for an escalator, the authorized bulletin number is CN204310668U, the monitoring device is relatively separated from a control system, and the safety monitoring device can be conveniently and flexibly applied to different control systems, and is convenient for modularized design of the control system. However, the invention lacks an automatic algorithm design (or an intelligent detection design), is still only focused on the aspect of information acquisition, does not relate to the joint application of the YOLOv5 model, the HR-Net model and the convolutional neural network model in the field of monitoring video anomaly detection of the escalator, and realizes the intelligent detection of the monitoring video anomaly of the escalator by the joint application of the YOLOv5 model, the HR-Net model and the convolutional neural network model, and simultaneously greatly improves the detection accuracy.
Disclosure of Invention
In order to realize intelligent detection of the abnormal escalator monitoring video and improve the accuracy of the abnormal escalator monitoring video detection, the invention provides a method for detecting the abnormal escalator monitoring video by combining a YOLOv5 model, an HR-Net model and a convolutional neural network model, which comprises the following steps:
acquiring a monitoring video of an escalator, converting the monitoring video into a picture set with continuous time points, judging whether a current frame is a normal frame or not frame by frame, if so, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
respectively training a YOLOv5 model and an HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
inputting pictures in the model data set into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
for each frame of key point heat map, a corresponding forward difference map and a corresponding backward difference map are obtained, and a key point inter-frame change map corresponding to the current frame of key point heat map is obtained by taking or operating the forward difference map and the backward difference map;
acquiring image label pairs corresponding to each key point inter-frame change graph, and training a convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
inputting the escalator monitoring video image set to be detected into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to all frame images, predicting key point heat maps corresponding to all pedestrian target frame positions through an HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to all frame key point heat maps, and inputting the key point inter-frame change maps into a convolutional neural network model to predict to obtain corresponding labels.
Further, the predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
Further, the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps:
training YOLOv5 model by MS COCO data set and passing loss function during training
Figure BDA0004025530330000036
Training YOL0v5 modelModel regression branches, target and class branches of YOL0v5 model were trained by BCE loss function.
Further, the loss function
Figure BDA0004025530330000037
The formula expression of (2) is:
Figure BDA0004025530330000031
wherein:
Figure BDA0004025530330000032
Figure BDA0004025530330000033
wherein, intersectionb (a) represents the Intersection area of the predicted frame a and the target frame B of the YOLOv5 model, and union (a, B) represents the union area of the predicted frame a and the target frame B of the YOLOv5 model; b, b gt Respectively representing the center points of the predicted frame and the real frame, p 2 (b,bg t ) For the Euclidean distance between the center points of the predicted frame and the real frame, c is the diagonal distance of the minimum closure area capable of simultaneously containing the predicted frame and the real frame, and alpha is the loss weight;
Figure BDA0004025530330000038
representing a loss value; w (w) gt Is the width of a real frame, h gt The height of the real frame is w is the width of the prediction frame, and h is the height of the prediction frame;
the formula expression of the BCE loss function is as follows:
Figure BDA0004025530330000034
wherein p is the probability that the YOLOv5 model predicts that the sample is a positive sample;
Figure BDA0004025530330000035
the label is a sample, when the sample belongs to a positive sample, the value is 1, otherwise, the value is 0, and BCELoss is a loss value.
Further, when training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Further, the formula for acquiring the forward difference map is as follows: BDI k =|H k-1 -H k |;
The acquisition formula of the backward difference graph is as follows: FDI (fully drawn yarn) k =|H k -H k+1 |;
Wherein the value range of k is (1, n-2); wherein n represents the total frame number of the key point heat map, H k Represents the kth frame key point heat map, H k-1 Represents the key point heat diagram of the k-1 frame, H k+1 Representing the k+1st frame key point heat map, BDI k Forward difference map, FDI, corresponding to the kth frame keypoint heatmap k Representing a backward difference graph corresponding to the key point heat graph of the kth frame;
the acquisition formula of the key point inter-frame change graph is as follows: CDI (compact digital interface) k =BDI k ∪FDI k In the formula, CDI k Representing a keypoint inter-frame variation graph.
The invention also provides a system for detecting the abnormality of the monitoring video of the escalator, which comprises the following steps:
the data set acquisition module is used for acquiring the monitoring video of the escalator, converting the monitoring video into a picture set with continuous time points, judging whether the current frame is a normal frame or not frame by frame, if yes, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
the first training module is used for respectively training the YOLOv5 model and the HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the key point heat map acquisition module is used for inputting pictures in the model data set into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the inter-frame change map acquisition module is used for acquiring a forward difference map and a backward difference map corresponding to each frame key point heat map, and acquiring a key point inter-frame change map corresponding to the current frame key point heat map by taking or operating the forward difference map and the backward difference map;
the second training module is used for acquiring image label pairs corresponding to the inter-frame change graphs of the key points, and training the convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
the detection module is used for inputting the escalator monitoring video image set to be detected into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the images of each frame, predicting key point heat maps corresponding to the pedestrian target frame positions through the HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to the key point heat maps of each frame, and inputting the key point inter-frame change maps into the convolutional neural network model to predict to obtain corresponding labels.
Further, the predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
Further, the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps: training YOLOv5 model by MS COCO data set and passing loss function during training
Figure BDA0004025530330000051
Training regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
Further, when training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Compared with the prior art, the invention at least has the following beneficial effects:
(1) According to the method, a trained YOLOv5 target detection model and an HR-Net key point extraction model are combined to obtain a key point heat map corresponding to each frame picture in a data set, a key point inter-frame change map corresponding to the key point heat map is obtained, a convolutional neural network model is obtained by training the convolutional neural network through an image tag pair comprising the key point inter-frame change map and a tag corresponding to the key point inter-frame change map, when detection is started, an escalator monitoring video picture set to be detected is input into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to each frame picture, the key point heat map corresponding to each pedestrian target frame position is predicted through the HR-Net key point extraction model, the key point inter-frame change map corresponding to the key point heat map is input into the convolutional neural network model to be predicted to obtain a corresponding tag, and the behavior state (abnormal or normal) on an escalator is obtained through the tag, so that intelligent detection of the escalator monitoring video abnormality is realized, and meanwhile, the abnormality detection accuracy is greatly improved;
(2) Specifically, the pedestrian in the picture is positioned through the YOLOv5 target detection model, the human body key points are extracted through the HR-Net key point extraction model, the key point heat map is obtained in a combined mode, and the inter-frame change map of the key points is obtained by utilizing the inter-frame relation so as to capture abnormal behaviors and reflect the inter-frame change of the human body posture, so that the accuracy of abnormality detection is greatly improved;
(3) According to the invention, the key point inter-frame change diagram corresponding to the key point thermal diagram of the current frame can be obtained by taking or operating the forward difference diagram and the backward difference diagram, so that inter-frame changes of human actions are obtained very quickly and conveniently, and the abnormal actions are captured conveniently, and the abnormal detection speed is accelerated;
(4) When the method starts to detect, after the key point heat map is obtained, the detection can be realized by only a small convolutional neural network model, and the detection efficiency is greatly improved.
Drawings
FIG. 1 is a flow chart of a method for detecting anomaly of monitoring video of an escalator;
FIG. 2 is a block diagram of an escalator surveillance video anomaly detection system;
FIG. 3 is a network structure diagram of the HR-Net model.
Detailed Description
The following are specific embodiments of the present invention and the technical solutions of the present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to these embodiments.
Example 1
In order to realize intelligent detection of the monitoring video of the escalator, as shown in fig. 1, the invention provides a method for detecting abnormality of the monitoring video of the escalator, which comprises the following steps:
acquiring a monitoring video of an escalator through a camera or other sensing equipment, converting the monitoring video into a picture set with continuous time points, judging whether a current frame is a normal frame by manually frame by frame, if so, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
it should be noted that, in general, retrograde, fall, pram, large luggage and other rare anomalies may be manually marked as negative examples.
Respectively training a YOLOv5 model (specifically three detection heads YOLOv5 s) and an HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the large-scale data set is an MS COCO data set; splitting the MS COCO data set to obtain a training set and a verification set, wherein the YOLOv5 model is trained through the large-scale data set (before training, the images in the data set are required to be unified into a designated size), and the method specifically comprises the following steps:
training the YOLOv5 model by a training set and passing the loss function during the training process
Figure BDA0004025530330000061
Training regression branches of a YOLOv5 model, training target and category branches of the YOLOv5 model through a BCE loss function, verifying the trained model through a verification set, and taking the model with the best performance on the verification set in all rounds as a YOLOv5 target detection model.
The loss function
Figure BDA0004025530330000071
The formula expression of (2) is:
Figure BDA0004025530330000072
wherein:
Figure BDA0004025530330000073
Figure BDA0004025530330000074
wherein, intersectionb (a) represents the Intersection area of the predicted frame a and the target frame B of the YOLOv5 model, and Union B (a) represents the Union area of the predicted frame a and the target frame B of the YOLOv5 model; b, b gt Respectively representing a predicted frame and a real frameCenter point ρ of (1) 2 (b,b gt ) For the Euclidean distance between the center points of the predicted frame and the real frame, c is the diagonal distance of the minimum closure area capable of simultaneously containing the predicted frame and the real frame, and alpha is the loss weight;
Figure BDA0004025530330000075
representing a loss value; w (w) gt Is the width of a real frame, h gt The height of the real frame is w is the width of the prediction frame, and h is the height of the prediction frame;
the formula expression of the BCE loss function is as follows:
Figure BDA0004025530330000076
wherein p is the probability that the YOLOv5 model predicts that the sample is a positive sample;
Figure BDA0004025530330000077
the label is a sample, when the sample belongs to a positive sample, the value is 1, otherwise, the value is 0, and BCELoss is a loss value.
It should be noted that the loss function used for training the HR-Net model is:
Figure BDA0004025530330000078
wherein the method comprises the steps of
Figure BDA0004025530330000079
For the sample true pixel coordinate value, y is the sample predicted pixel coordinate value, m is the total number of pixels for the sample, and MSELoss is the loss value.
When training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Inputting pictures in the model data set into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the method for predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
It should be explained in detail that, as shown in fig. 3, the HR-Net keypoint extraction model starts from a high resolution sub-network, gradually increases the sub-networks from high to low resolution one by one, and then connects the multi-resolution sub-networks in parallel; in the whole process, information is exchanged in parallel multi-resolution subnets one by one, so that the repeated multi-scale fusion process is completed, a high-resolution characteristic diagram is obtained, loss of the high-resolution information is avoided, and the predicted key point heat diagram is more accurate.
For each frame of key point heat map, a corresponding forward difference map and a corresponding backward difference map are obtained, and a key point inter-frame change map corresponding to the current frame of key point heat map is obtained by taking or operating the forward difference map and the backward difference map;
the invention can obtain the key point inter-frame change map corresponding to the key point thermal map of the current frame by taking or operating the forward difference map and the backward difference map, thereby obtaining the inter-frame change of human body actions very quickly and conveniently, and accelerating the speed of abnormality detection while being beneficial to capturing abnormal actions.
The forward difference map obtaining formula is as follows: BDI k =|H k-1 -H k |;
The acquisition formula of the backward difference graph is as follows: FDI (fully drawn yarn) k =|H k -H k+1 |;
Wherein n represents the total frame number of the key point heat map, H k Represents the kth frame key point heat map, H k-1 Represents the key point heat diagram of the k-1 frame, H k+1 Representing the k+1st frame key point heat map, BDI k Forward difference map, FDI, corresponding to the kth frame keypoint heatmap k Representing a backward difference graph corresponding to the key point heat graph of the kth frame; because the first frame key point heat map has no forward difference map and the last frame has no backward difference map, the value range of k is (1, n-2);
the acquisition formula of the key point inter-frame change graph is as follows: CDI (compact digital interface) k =BDI k ∪FDI k In the formula, CDI k And representing a keypoint inter-frame change diagram corresponding to the keypoint heat diagram of the kth frame.
Acquiring image tag pairs (CDI) corresponding to inter-frame change maps of key points k ,Y k ),Y k E (+1, -1), training a convolutional neural network through an image label pair (in the embodiment, the convolutional neural network is ResNet 10), and obtaining a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph; wherein Y is k And the label corresponding to the key point inter-frame change graph is represented.
The method comprises the steps of acquiring escalator monitoring videos in real time, inputting a picture set corresponding to the escalator monitoring videos to be detected (namely, acquired in real time) into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to all frames of pictures, predicting key point heat maps corresponding to all pedestrian target frame positions through an HR-Net key point extraction model, acquiring key point inter-frame change maps corresponding to all frames of key point heat maps, inputting the key point inter-frame change maps into a convolutional neural network model to predict to obtain corresponding labels, and accordingly realizing real-time detection of escalator monitoring video abnormality.
The invention can realize detection by only needing a smaller convolutional neural network model after starting detection and obtaining the key point heat map, thereby greatly improving the detection efficiency.
According to the method, a trained YOLOv5 target detection model and an HR-Net key point extraction model are combined to obtain a key point heat map corresponding to each frame picture in a data set, a key point inter-frame change map corresponding to the key point heat map is obtained, a convolutional neural network model is obtained by training the convolutional neural network through an image tag pair comprising the key point inter-frame change map and a tag corresponding to the key point inter-frame change map, when detection is started, an escalator monitoring video picture set to be detected is input into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to each frame picture, the key point heat map corresponding to each pedestrian target frame position is predicted through the HR-Net key point extraction model, the key point inter-frame change map corresponding to the key point heat map is input into the convolutional neural network model to be predicted to obtain a corresponding tag, and the behavior state (abnormal or normal) on an escalator is obtained through the tag.
Example two
As shown in fig. 2, the invention further provides a system for detecting the abnormality of the monitoring video of the escalator, which comprises:
the data set acquisition module is used for acquiring the monitoring video of the escalator, converting the monitoring video into a picture set with continuous time points, judging whether the current frame is a normal frame or not frame by frame, if yes, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
the first training module is used for respectively training the YOLOv5 model and the HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the large-scale data set is an MS COCO data set; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps: training YOLOv5 model by MS COCO data set and passing loss function during training
Figure BDA0004025530330000101
Training regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
When training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
The key point heat map acquisition module is used for inputting pictures in the model data set into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the method for predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
The inter-frame change map acquisition module is used for acquiring a forward difference map and a backward difference map corresponding to each frame key point heat map, and acquiring a key point inter-frame change map corresponding to the current frame key point heat map by taking or operating the forward difference map and the backward difference map;
the second training module is used for acquiring image label pairs corresponding to the inter-frame change graphs of the key points, and training the convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
the detection module is used for inputting the escalator monitoring video image set to be detected into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the images of each frame, predicting key point heat maps corresponding to the pedestrian target frame positions through the HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to the key point heat maps of each frame, and inputting the key point inter-frame change maps into the convolutional neural network model to predict to obtain corresponding labels.
Specifically, the pedestrian in the picture is positioned through the YOLOv5 target detection model, the human body key points are extracted through the HR-Net key point extraction model, the key point heat map is obtained in a combined mode, and the inter-frame change map of the key points is obtained by utilizing the inter-frame relation, so that abnormal behaviors are captured, the inter-frame change of the human body posture is reflected, and the accuracy of abnormality detection is greatly improved.
Example III
The invention also provides equipment for detecting the abnormality of the monitoring video of the escalator, which comprises a memory and a processor; the memory is used for storing a computer program; the processor is used for realizing the escalator monitoring video abnormity detection method when executing the computer program.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Furthermore, descriptions such as those referred to herein as "first," "second," "a," and the like are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the present invention, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In addition, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present invention.

Claims (10)

1. The method for detecting the abnormality of the monitoring video of the escalator is characterized by comprising the following steps of:
acquiring a monitoring video of an escalator, converting the monitoring video into a picture set with continuous time points, judging whether a current frame is a normal frame or not frame by frame, if so, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
respectively training a YOLOv5 model and an HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
inputting pictures in the model data set into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
for each frame of key point heat map, a corresponding forward difference map and a corresponding backward difference map are obtained, and a key point inter-frame change map corresponding to the current frame of key point heat map is obtained by taking or operating the forward difference map and the backward difference map;
acquiring image label pairs corresponding to each key point inter-frame change graph, and training a convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
inputting the escalator monitoring video image set to be detected into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to all frame images, predicting key point heat maps corresponding to all pedestrian target frame positions through an HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to all frame key point heat maps, and inputting the key point inter-frame change maps into a convolutional neural network model to predict to obtain corresponding labels.
2. The method for detecting the abnormal condition of the monitoring video of the escalator according to claim 1, wherein the predicting the key point heat map corresponding to the target frame position of each pedestrian by the HR-Net key point extraction model specifically comprises: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
3. The escalator surveillance video anomaly detection method of claim 1, wherein the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps:
training YOLOv5 model by MS COCO data set and passing loss function during training
Figure FDA0004025530320000021
Training regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
4. The method for detecting the abnormal condition of the monitoring video of the escalator according to claim 3, wherein,
the loss function
Figure FDA0004025530320000022
The formula expression of (2) is:
Figure FDA0004025530320000023
wherein:
Figure FDA0004025530320000024
Figure FDA0004025530320000025
wherein, intersectionb (a) represents the Intersection area of the predicted frame a and the target frame B of the YOLOv5 model, and Union B (a) represents the Union area of the predicted frame a and the target frame B of the YOLOv5 model; b, b gt Respectively representing the center points of the prediction frame and the real frame, ρ 2 (b,b gt ) For the Euclidean distance between the center points of the predicted frame and the real frame, c is the diagonal distance of the minimum closure area capable of simultaneously containing the predicted frame and the real frame, and alpha is the loss weight;
Figure FDA0004025530320000026
representing a loss value; w (w) gt Is the width of a real frame, h gt The height of the real frame is w is the width of the prediction frame, and h is the height of the prediction frame;
the formula expression of the BCE loss function is as follows:
Figure FDA0004025530320000027
wherein p is the probability that the YOLOv5 model predicts that the sample is a positive sample;
Figure FDA0004025530320000028
the label is a sample, when the sample belongs to a positive sample, the value is 1, otherwise, the value is 0, and BCELoss is a loss value.
5. The escalator surveillance video anomaly detection method of claim 4, further comprising, when training the YOLOv5 model with a large-scale dataset:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
6. The method for detecting the video anomaly of the escalator according to claim 5, wherein,
the forward difference map obtaining formula is as follows: BDI k =|H k-1 -H k |;
The acquisition formula of the backward difference graph is as follows: FDI (fully drawn yarn) k =|H k -H k+1 |;
Wherein the value range of k is (1, n-2); wherein n represents the total frame number of the key point heat map, H k Represents the kth frame key point heat map, H k-1 Represents the key point heat diagram of the k-1 frame, H k+1 Representing the k+1st frame key point heat map, BDI k Forward difference map, FDI, corresponding to the kth frame keypoint heatmap k Representing a backward difference graph corresponding to the key point heat graph of the kth frame;
the acquisition formula of the key point inter-frame change graph is as follows: CDI (compact digital interface) k =BDI k ∪FDI k In the formula, CDI k Representing a keypoint inter-frame variation graph.
7. An escalator surveillance video anomaly detection system, comprising:
the data set acquisition module is used for acquiring the monitoring video of the escalator, converting the monitoring video into a picture set with continuous time points, judging whether the current frame is a normal frame or not frame by frame, if yes, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
the first training module is used for respectively training the YOLOv5 model and the HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the key point heat map acquisition module is used for inputting pictures in the model data set into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the inter-frame change map acquisition module is used for acquiring a forward difference map and a backward difference map corresponding to each frame key point heat map, and acquiring a key point inter-frame change map corresponding to the current frame key point heat map by taking or operating the forward difference map and the backward difference map;
the second training module is used for acquiring image label pairs corresponding to the inter-frame change graphs of the key points, and training the convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
the detection module is used for inputting the escalator monitoring video image set to be detected into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the images of each frame, predicting key point heat maps corresponding to the pedestrian target frame positions through the HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to the key point heat maps of each frame, and inputting the key point inter-frame change maps into the convolutional neural network model to predict to obtain corresponding labels.
8. The escalator surveillance video anomaly detection system according to claim 7, wherein the predicting the keypoint heatmap corresponding to each pedestrian target frame position through the HR-Net keypoint extraction model specifically comprises: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
9. The escalator surveillance video anomaly detection system of claim 8, wherein the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps: training YOLOv5 model by MS COCO data set and passing loss function during training
Figure FDA0004025530320000041
Training regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
10. The escalator surveillance video anomaly detection system of claim 9, further comprising, when training the YOLOv5 model with a large-scale dataset:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
CN202211703754.4A 2022-12-29 2022-12-29 Escalator monitoring video anomaly detection method and system Pending CN116030412A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211703754.4A CN116030412A (en) 2022-12-29 2022-12-29 Escalator monitoring video anomaly detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211703754.4A CN116030412A (en) 2022-12-29 2022-12-29 Escalator monitoring video anomaly detection method and system

Publications (1)

Publication Number Publication Date
CN116030412A true CN116030412A (en) 2023-04-28

Family

ID=86073410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211703754.4A Pending CN116030412A (en) 2022-12-29 2022-12-29 Escalator monitoring video anomaly detection method and system

Country Status (1)

Country Link
CN (1) CN116030412A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292329A (en) * 2023-11-24 2023-12-26 烟台大学 Method, system, medium and equipment for monitoring abnormal work of building robot

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292329A (en) * 2023-11-24 2023-12-26 烟台大学 Method, system, medium and equipment for monitoring abnormal work of building robot
CN117292329B (en) * 2023-11-24 2024-03-08 烟台大学 Method, system, medium and equipment for monitoring abnormal work of building robot

Similar Documents

Publication Publication Date Title
KR102153591B1 (en) Method and apparatus for detecting garbage dumping action in real time on video surveillance system
CN107292240B (en) Person finding method and system based on face and body recognition
WO2019179024A1 (en) Method for intelligent monitoring of airport runway, application server and computer storage medium
US11288887B2 (en) Object tracking method and apparatus
CN111860352B (en) Multi-lens vehicle track full tracking system and method
CN111738240A (en) Region monitoring method, device, equipment and storage medium
CN104966304A (en) Kalman filtering and nonparametric background model-based multi-target detection tracking method
CN111325048B (en) Personnel gathering detection method and device
CN111079621A (en) Method and device for detecting object, electronic equipment and storage medium
CN112836683A (en) License plate recognition method, device, equipment and medium for portable camera equipment
CN116030412A (en) Escalator monitoring video anomaly detection method and system
CN111666821A (en) Personnel gathering detection method, device and equipment
CN111723656B (en) Smog detection method and device based on YOLO v3 and self-optimization
CN113505704B (en) Personnel safety detection method, system, equipment and storage medium for image recognition
CN111079722A (en) Hoisting process personnel safety monitoring method and system
Purohit et al. Multi-sensor surveillance system based on integrated video analytics
CN113920585A (en) Behavior recognition method and device, equipment and storage medium
CN116403162B (en) Airport scene target behavior recognition method and system and electronic equipment
CN112528903A (en) Face image acquisition method and device, electronic equipment and medium
US10783365B2 (en) Image processing device and image processing system
CN116311166A (en) Traffic obstacle recognition method and device and electronic equipment
CN111627224A (en) Vehicle speed abnormality detection method, device, equipment and storage medium
US20230267779A1 (en) Method and system for collecting and monitoring vehicle status information
CN111368726B (en) Construction site operation face personnel number statistics method, system, storage medium and device
CN114639084A (en) Road side end vehicle sensing method based on SSD (solid State disk) improved algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination