CN116030412A - Escalator monitoring video anomaly detection method and system - Google Patents
Escalator monitoring video anomaly detection method and system Download PDFInfo
- Publication number
- CN116030412A CN116030412A CN202211703754.4A CN202211703754A CN116030412A CN 116030412 A CN116030412 A CN 116030412A CN 202211703754 A CN202211703754 A CN 202211703754A CN 116030412 A CN116030412 A CN 116030412A
- Authority
- CN
- China
- Prior art keywords
- frame
- key point
- model
- yolov5
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02B—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
- Y02B50/00—Energy efficient technologies in elevators, escalators and moving walkways, e.g. energy saving or recuperation technologies
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for detecting an abnormal escalator monitoring video, which relate to the field of escalator detection, wherein a trained YOLOv5 target detection model and an HR-Net key point extraction model are combined to obtain a key point heat map corresponding to each frame picture in a data set, a key point inter-frame change map corresponding to the key point heat map is obtained, a convolutional neural network model is obtained by training the convolutional neural network through an image tag containing the key point inter-frame change map and a tag corresponding to the key point inter-frame change map, when the escalator monitoring video to be detected starts to be detected, a YOLOv5 target detection model is input into a set of escalator monitoring video pictures frame by frame to obtain the positions of pedestrian target frames corresponding to each frame picture, the HR-Net key point extraction model is used for predicting the key point heat map corresponding to each pedestrian target frame position, the key point inter-frame change map is input into the convolutional neural network model to be predicted to obtain the corresponding tag, and the behavior state on the escalator is obtained through the tag, so that intelligent escalator monitoring video abnormality detection is realized.
Description
Technical Field
The invention relates to the field of escalator detection, in particular to an escalator monitoring video anomaly detection method and system.
Background
With the promotion of urban construction and the improvement of the material level of people, urban infrastructure is continuously perfected, and people pay more attention to the safety of the facilities. The escalator is widely used in public places such as subway stations, shopping malls and the like. However, safety accidents related to escalators are also increasing due to equipment problems, improper user behavior, and the like. According to news reports, the activities such as retrograde, running, falling or carrying a baby carriage and large pieces of luggage on an escalator often easily cause safety accidents, and passengers are injured. In time, accurately and efficiently monitor and alarm the abnormal behavior of the elevator, thereby being beneficial to quickly responding to accidents, avoiding casualties and improving the emergency treatment level.
To raise the level of abnormality detection, researchers have made some studies and searches. Shao Haibo an escalator safety monitoring system is proposed, with an authorized bulletin number CN205257749U, comprising a first camera group for capturing an overall image of the escalator and passengers on the escalator, a second camera group for capturing mechanical parts of the escalator, and a data processing device for image analysis and processing. Liuzhuo and the like propose a safety monitoring device for an escalator, the authorized bulletin number is CN204310668U, the monitoring device is relatively separated from a control system, and the safety monitoring device can be conveniently and flexibly applied to different control systems, and is convenient for modularized design of the control system. However, the invention lacks an automatic algorithm design (or an intelligent detection design), is still only focused on the aspect of information acquisition, does not relate to the joint application of the YOLOv5 model, the HR-Net model and the convolutional neural network model in the field of monitoring video anomaly detection of the escalator, and realizes the intelligent detection of the monitoring video anomaly of the escalator by the joint application of the YOLOv5 model, the HR-Net model and the convolutional neural network model, and simultaneously greatly improves the detection accuracy.
Disclosure of Invention
In order to realize intelligent detection of the abnormal escalator monitoring video and improve the accuracy of the abnormal escalator monitoring video detection, the invention provides a method for detecting the abnormal escalator monitoring video by combining a YOLOv5 model, an HR-Net model and a convolutional neural network model, which comprises the following steps:
acquiring a monitoring video of an escalator, converting the monitoring video into a picture set with continuous time points, judging whether a current frame is a normal frame or not frame by frame, if so, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
respectively training a YOLOv5 model and an HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
inputting pictures in the model data set into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
for each frame of key point heat map, a corresponding forward difference map and a corresponding backward difference map are obtained, and a key point inter-frame change map corresponding to the current frame of key point heat map is obtained by taking or operating the forward difference map and the backward difference map;
acquiring image label pairs corresponding to each key point inter-frame change graph, and training a convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
inputting the escalator monitoring video image set to be detected into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to all frame images, predicting key point heat maps corresponding to all pedestrian target frame positions through an HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to all frame key point heat maps, and inputting the key point inter-frame change maps into a convolutional neural network model to predict to obtain corresponding labels.
Further, the predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
Further, the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps:
training YOLOv5 model by MS COCO data set and passing loss function during trainingTraining YOL0v5 modelModel regression branches, target and class branches of YOL0v5 model were trained by BCE loss function.
wherein:
wherein, intersectionb (a) represents the Intersection area of the predicted frame a and the target frame B of the YOLOv5 model, and union (a, B) represents the union area of the predicted frame a and the target frame B of the YOLOv5 model; b, b gt Respectively representing the center points of the predicted frame and the real frame, p 2 (b,bg t ) For the Euclidean distance between the center points of the predicted frame and the real frame, c is the diagonal distance of the minimum closure area capable of simultaneously containing the predicted frame and the real frame, and alpha is the loss weight;representing a loss value; w (w) gt Is the width of a real frame, h gt The height of the real frame is w is the width of the prediction frame, and h is the height of the prediction frame;
the formula expression of the BCE loss function is as follows:
wherein p is the probability that the YOLOv5 model predicts that the sample is a positive sample;the label is a sample, when the sample belongs to a positive sample, the value is 1, otherwise, the value is 0, and BCELoss is a loss value.
Further, when training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Further, the formula for acquiring the forward difference map is as follows: BDI k =|H k-1 -H k |;
The acquisition formula of the backward difference graph is as follows: FDI (fully drawn yarn) k =|H k -H k+1 |;
Wherein the value range of k is (1, n-2); wherein n represents the total frame number of the key point heat map, H k Represents the kth frame key point heat map, H k-1 Represents the key point heat diagram of the k-1 frame, H k+1 Representing the k+1st frame key point heat map, BDI k Forward difference map, FDI, corresponding to the kth frame keypoint heatmap k Representing a backward difference graph corresponding to the key point heat graph of the kth frame;
the acquisition formula of the key point inter-frame change graph is as follows: CDI (compact digital interface) k =BDI k ∪FDI k In the formula, CDI k Representing a keypoint inter-frame variation graph.
The invention also provides a system for detecting the abnormality of the monitoring video of the escalator, which comprises the following steps:
the data set acquisition module is used for acquiring the monitoring video of the escalator, converting the monitoring video into a picture set with continuous time points, judging whether the current frame is a normal frame or not frame by frame, if yes, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
the first training module is used for respectively training the YOLOv5 model and the HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the key point heat map acquisition module is used for inputting pictures in the model data set into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the inter-frame change map acquisition module is used for acquiring a forward difference map and a backward difference map corresponding to each frame key point heat map, and acquiring a key point inter-frame change map corresponding to the current frame key point heat map by taking or operating the forward difference map and the backward difference map;
the second training module is used for acquiring image label pairs corresponding to the inter-frame change graphs of the key points, and training the convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
the detection module is used for inputting the escalator monitoring video image set to be detected into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the images of each frame, predicting key point heat maps corresponding to the pedestrian target frame positions through the HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to the key point heat maps of each frame, and inputting the key point inter-frame change maps into the convolutional neural network model to predict to obtain corresponding labels.
Further, the predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
Further, the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps: training YOLOv5 model by MS COCO data set and passing loss function during trainingTraining regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
Further, when training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Compared with the prior art, the invention at least has the following beneficial effects:
(1) According to the method, a trained YOLOv5 target detection model and an HR-Net key point extraction model are combined to obtain a key point heat map corresponding to each frame picture in a data set, a key point inter-frame change map corresponding to the key point heat map is obtained, a convolutional neural network model is obtained by training the convolutional neural network through an image tag pair comprising the key point inter-frame change map and a tag corresponding to the key point inter-frame change map, when detection is started, an escalator monitoring video picture set to be detected is input into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to each frame picture, the key point heat map corresponding to each pedestrian target frame position is predicted through the HR-Net key point extraction model, the key point inter-frame change map corresponding to the key point heat map is input into the convolutional neural network model to be predicted to obtain a corresponding tag, and the behavior state (abnormal or normal) on an escalator is obtained through the tag, so that intelligent detection of the escalator monitoring video abnormality is realized, and meanwhile, the abnormality detection accuracy is greatly improved;
(2) Specifically, the pedestrian in the picture is positioned through the YOLOv5 target detection model, the human body key points are extracted through the HR-Net key point extraction model, the key point heat map is obtained in a combined mode, and the inter-frame change map of the key points is obtained by utilizing the inter-frame relation so as to capture abnormal behaviors and reflect the inter-frame change of the human body posture, so that the accuracy of abnormality detection is greatly improved;
(3) According to the invention, the key point inter-frame change diagram corresponding to the key point thermal diagram of the current frame can be obtained by taking or operating the forward difference diagram and the backward difference diagram, so that inter-frame changes of human actions are obtained very quickly and conveniently, and the abnormal actions are captured conveniently, and the abnormal detection speed is accelerated;
(4) When the method starts to detect, after the key point heat map is obtained, the detection can be realized by only a small convolutional neural network model, and the detection efficiency is greatly improved.
Drawings
FIG. 1 is a flow chart of a method for detecting anomaly of monitoring video of an escalator;
FIG. 2 is a block diagram of an escalator surveillance video anomaly detection system;
FIG. 3 is a network structure diagram of the HR-Net model.
Detailed Description
The following are specific embodiments of the present invention and the technical solutions of the present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to these embodiments.
Example 1
In order to realize intelligent detection of the monitoring video of the escalator, as shown in fig. 1, the invention provides a method for detecting abnormality of the monitoring video of the escalator, which comprises the following steps:
acquiring a monitoring video of an escalator through a camera or other sensing equipment, converting the monitoring video into a picture set with continuous time points, judging whether a current frame is a normal frame by manually frame by frame, if so, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
it should be noted that, in general, retrograde, fall, pram, large luggage and other rare anomalies may be manually marked as negative examples.
Respectively training a YOLOv5 model (specifically three detection heads YOLOv5 s) and an HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the large-scale data set is an MS COCO data set; splitting the MS COCO data set to obtain a training set and a verification set, wherein the YOLOv5 model is trained through the large-scale data set (before training, the images in the data set are required to be unified into a designated size), and the method specifically comprises the following steps:
training the YOLOv5 model by a training set and passing the loss function during the training processTraining regression branches of a YOLOv5 model, training target and category branches of the YOLOv5 model through a BCE loss function, verifying the trained model through a verification set, and taking the model with the best performance on the verification set in all rounds as a YOLOv5 target detection model.
wherein:
wherein, intersectionb (a) represents the Intersection area of the predicted frame a and the target frame B of the YOLOv5 model, and Union B (a) represents the Union area of the predicted frame a and the target frame B of the YOLOv5 model; b, b gt Respectively representing a predicted frame and a real frameCenter point ρ of (1) 2 (b,b gt ) For the Euclidean distance between the center points of the predicted frame and the real frame, c is the diagonal distance of the minimum closure area capable of simultaneously containing the predicted frame and the real frame, and alpha is the loss weight;representing a loss value; w (w) gt Is the width of a real frame, h gt The height of the real frame is w is the width of the prediction frame, and h is the height of the prediction frame;
the formula expression of the BCE loss function is as follows:
wherein p is the probability that the YOLOv5 model predicts that the sample is a positive sample;the label is a sample, when the sample belongs to a positive sample, the value is 1, otherwise, the value is 0, and BCELoss is a loss value.
It should be noted that the loss function used for training the HR-Net model is:
wherein the method comprises the steps ofFor the sample true pixel coordinate value, y is the sample predicted pixel coordinate value, m is the total number of pixels for the sample, and MSELoss is the loss value.
When training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Inputting pictures in the model data set into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the method for predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
It should be explained in detail that, as shown in fig. 3, the HR-Net keypoint extraction model starts from a high resolution sub-network, gradually increases the sub-networks from high to low resolution one by one, and then connects the multi-resolution sub-networks in parallel; in the whole process, information is exchanged in parallel multi-resolution subnets one by one, so that the repeated multi-scale fusion process is completed, a high-resolution characteristic diagram is obtained, loss of the high-resolution information is avoided, and the predicted key point heat diagram is more accurate.
For each frame of key point heat map, a corresponding forward difference map and a corresponding backward difference map are obtained, and a key point inter-frame change map corresponding to the current frame of key point heat map is obtained by taking or operating the forward difference map and the backward difference map;
the invention can obtain the key point inter-frame change map corresponding to the key point thermal map of the current frame by taking or operating the forward difference map and the backward difference map, thereby obtaining the inter-frame change of human body actions very quickly and conveniently, and accelerating the speed of abnormality detection while being beneficial to capturing abnormal actions.
The forward difference map obtaining formula is as follows: BDI k =|H k-1 -H k |;
The acquisition formula of the backward difference graph is as follows: FDI (fully drawn yarn) k =|H k -H k+1 |;
Wherein n represents the total frame number of the key point heat map, H k Represents the kth frame key point heat map, H k-1 Represents the key point heat diagram of the k-1 frame, H k+1 Representing the k+1st frame key point heat map, BDI k Forward difference map, FDI, corresponding to the kth frame keypoint heatmap k Representing a backward difference graph corresponding to the key point heat graph of the kth frame; because the first frame key point heat map has no forward difference map and the last frame has no backward difference map, the value range of k is (1, n-2);
the acquisition formula of the key point inter-frame change graph is as follows: CDI (compact digital interface) k =BDI k ∪FDI k In the formula, CDI k And representing a keypoint inter-frame change diagram corresponding to the keypoint heat diagram of the kth frame.
Acquiring image tag pairs (CDI) corresponding to inter-frame change maps of key points k ,Y k ),Y k E (+1, -1), training a convolutional neural network through an image label pair (in the embodiment, the convolutional neural network is ResNet 10), and obtaining a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph; wherein Y is k And the label corresponding to the key point inter-frame change graph is represented.
The method comprises the steps of acquiring escalator monitoring videos in real time, inputting a picture set corresponding to the escalator monitoring videos to be detected (namely, acquired in real time) into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to all frames of pictures, predicting key point heat maps corresponding to all pedestrian target frame positions through an HR-Net key point extraction model, acquiring key point inter-frame change maps corresponding to all frames of key point heat maps, inputting the key point inter-frame change maps into a convolutional neural network model to predict to obtain corresponding labels, and accordingly realizing real-time detection of escalator monitoring video abnormality.
The invention can realize detection by only needing a smaller convolutional neural network model after starting detection and obtaining the key point heat map, thereby greatly improving the detection efficiency.
According to the method, a trained YOLOv5 target detection model and an HR-Net key point extraction model are combined to obtain a key point heat map corresponding to each frame picture in a data set, a key point inter-frame change map corresponding to the key point heat map is obtained, a convolutional neural network model is obtained by training the convolutional neural network through an image tag pair comprising the key point inter-frame change map and a tag corresponding to the key point inter-frame change map, when detection is started, an escalator monitoring video picture set to be detected is input into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to each frame picture, the key point heat map corresponding to each pedestrian target frame position is predicted through the HR-Net key point extraction model, the key point inter-frame change map corresponding to the key point heat map is input into the convolutional neural network model to be predicted to obtain a corresponding tag, and the behavior state (abnormal or normal) on an escalator is obtained through the tag.
Example two
As shown in fig. 2, the invention further provides a system for detecting the abnormality of the monitoring video of the escalator, which comprises:
the data set acquisition module is used for acquiring the monitoring video of the escalator, converting the monitoring video into a picture set with continuous time points, judging whether the current frame is a normal frame or not frame by frame, if yes, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
the first training module is used for respectively training the YOLOv5 model and the HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the large-scale data set is an MS COCO data set; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps: training YOLOv5 model by MS COCO data set and passing loss function during trainingTraining regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
When training the YOLOv5 model by a large-scale dataset, it further comprises:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
The key point heat map acquisition module is used for inputting pictures in the model data set into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the method for predicting the keypoint heat map corresponding to the target frame position of each pedestrian through the HR-Net keypoint extraction model specifically comprises the following steps: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
The inter-frame change map acquisition module is used for acquiring a forward difference map and a backward difference map corresponding to each frame key point heat map, and acquiring a key point inter-frame change map corresponding to the current frame key point heat map by taking or operating the forward difference map and the backward difference map;
the second training module is used for acquiring image label pairs corresponding to the inter-frame change graphs of the key points, and training the convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
the detection module is used for inputting the escalator monitoring video image set to be detected into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the images of each frame, predicting key point heat maps corresponding to the pedestrian target frame positions through the HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to the key point heat maps of each frame, and inputting the key point inter-frame change maps into the convolutional neural network model to predict to obtain corresponding labels.
Specifically, the pedestrian in the picture is positioned through the YOLOv5 target detection model, the human body key points are extracted through the HR-Net key point extraction model, the key point heat map is obtained in a combined mode, and the inter-frame change map of the key points is obtained by utilizing the inter-frame relation, so that abnormal behaviors are captured, the inter-frame change of the human body posture is reflected, and the accuracy of abnormality detection is greatly improved.
Example III
The invention also provides equipment for detecting the abnormality of the monitoring video of the escalator, which comprises a memory and a processor; the memory is used for storing a computer program; the processor is used for realizing the escalator monitoring video abnormity detection method when executing the computer program.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Furthermore, descriptions such as those referred to herein as "first," "second," "a," and the like are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the present invention, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In addition, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present invention.
Claims (10)
1. The method for detecting the abnormality of the monitoring video of the escalator is characterized by comprising the following steps of:
acquiring a monitoring video of an escalator, converting the monitoring video into a picture set with continuous time points, judging whether a current frame is a normal frame or not frame by frame, if so, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
respectively training a YOLOv5 model and an HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
inputting pictures in the model data set into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
for each frame of key point heat map, a corresponding forward difference map and a corresponding backward difference map are obtained, and a key point inter-frame change map corresponding to the current frame of key point heat map is obtained by taking or operating the forward difference map and the backward difference map;
acquiring image label pairs corresponding to each key point inter-frame change graph, and training a convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
inputting the escalator monitoring video image set to be detected into a YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to all frame images, predicting key point heat maps corresponding to all pedestrian target frame positions through an HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to all frame key point heat maps, and inputting the key point inter-frame change maps into a convolutional neural network model to predict to obtain corresponding labels.
2. The method for detecting the abnormal condition of the monitoring video of the escalator according to claim 1, wherein the predicting the key point heat map corresponding to the target frame position of each pedestrian by the HR-Net key point extraction model specifically comprises: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
3. The escalator surveillance video anomaly detection method of claim 1, wherein the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps:
4. The method for detecting the abnormal condition of the monitoring video of the escalator according to claim 3, wherein,
wherein:
wherein, intersectionb (a) represents the Intersection area of the predicted frame a and the target frame B of the YOLOv5 model, and Union B (a) represents the Union area of the predicted frame a and the target frame B of the YOLOv5 model; b, b gt Respectively representing the center points of the prediction frame and the real frame, ρ 2 (b,b gt ) For the Euclidean distance between the center points of the predicted frame and the real frame, c is the diagonal distance of the minimum closure area capable of simultaneously containing the predicted frame and the real frame, and alpha is the loss weight;representing a loss value; w (w) gt Is the width of a real frame, h gt The height of the real frame is w is the width of the prediction frame, and h is the height of the prediction frame;
the formula expression of the BCE loss function is as follows:
5. The escalator surveillance video anomaly detection method of claim 4, further comprising, when training the YOLOv5 model with a large-scale dataset:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
6. The method for detecting the video anomaly of the escalator according to claim 5, wherein,
the forward difference map obtaining formula is as follows: BDI k =|H k-1 -H k |;
The acquisition formula of the backward difference graph is as follows: FDI (fully drawn yarn) k =|H k -H k+1 |;
Wherein the value range of k is (1, n-2); wherein n represents the total frame number of the key point heat map, H k Represents the kth frame key point heat map, H k-1 Represents the key point heat diagram of the k-1 frame, H k+1 Representing the k+1st frame key point heat map, BDI k Forward difference map, FDI, corresponding to the kth frame keypoint heatmap k Representing a backward difference graph corresponding to the key point heat graph of the kth frame;
the acquisition formula of the key point inter-frame change graph is as follows: CDI (compact digital interface) k =BDI k ∪FDI k In the formula, CDI k Representing a keypoint inter-frame variation graph.
7. An escalator surveillance video anomaly detection system, comprising:
the data set acquisition module is used for acquiring the monitoring video of the escalator, converting the monitoring video into a picture set with continuous time points, judging whether the current frame is a normal frame or not frame by frame, if yes, marking the current frame as a positive sample, and if not, marking the current frame as a negative sample, so as to obtain a model data set containing the positive sample and the negative sample; the positive sample is specifically a picture marked with a normal label, and the negative sample is specifically a picture marked with an abnormal label;
the first training module is used for respectively training the YOLOv5 model and the HR-Net model through a large-scale data set to obtain a corresponding YOLOv5 target detection model and an HR-Net key point extraction model;
the key point heat map acquisition module is used for inputting pictures in the model data set into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the pictures of each frame; predicting a keypoint heat map corresponding to each pedestrian target frame position through an HR-Net keypoint extraction model;
the inter-frame change map acquisition module is used for acquiring a forward difference map and a backward difference map corresponding to each frame key point heat map, and acquiring a key point inter-frame change map corresponding to the current frame key point heat map by taking or operating the forward difference map and the backward difference map;
the second training module is used for acquiring image label pairs corresponding to the inter-frame change graphs of the key points, and training the convolutional neural network through the image label pairs to obtain a convolutional neural network model; the image tag pair comprises a key point inter-frame change graph and a tag corresponding to the key point inter-frame change graph;
the detection module is used for inputting the escalator monitoring video image set to be detected into the YOLOv5 target detection model frame by frame to obtain pedestrian target frame positions corresponding to the images of each frame, predicting key point heat maps corresponding to the pedestrian target frame positions through the HR-Net key point extraction model, obtaining key point inter-frame change maps corresponding to the key point heat maps of each frame, and inputting the key point inter-frame change maps into the convolutional neural network model to predict to obtain corresponding labels.
8. The escalator surveillance video anomaly detection system according to claim 7, wherein the predicting the keypoint heatmap corresponding to each pedestrian target frame position through the HR-Net keypoint extraction model specifically comprises: and sequentially inputting the positions of the target frames of the pedestrians into an HR-Net key point extraction model to obtain a high-resolution feature map comprising human key points and human key point rectangular bounding boxes confidence degrees through the HR-Net key point extraction model, and estimating the human body gestures of the human key point rectangular bounding boxes with the confidence degrees higher than a set threshold in the high-resolution feature map to obtain pixel coordinates of the human key points and the prediction confidence degrees thereof, so as to obtain a key point heat map corresponding to the positions of the target frames of the pedestrians.
9. The escalator surveillance video anomaly detection system of claim 8, wherein the large-scale dataset is an MS COCO dataset; the training of the YOLOv5 model by a large-scale data set specifically comprises the following steps: training YOLOv5 model by MS COCO data set and passing loss function during trainingTraining regression branches of the YOLOv5 model, and training target and class branches of the YOLOv5 model by BCE loss function.
10. The escalator surveillance video anomaly detection system of claim 9, further comprising, when training the YOLOv5 model with a large-scale dataset:
aiming at the picture data in a large-scale data set, randomly overturning the picture of a current input YOLOv5 model or HR-Net model, and enhancing by using Mosaic data, wherein the method specifically comprises the following steps: and splicing any four pictures, obtaining a new picture after splicing, and adding training to expand the data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211703754.4A CN116030412A (en) | 2022-12-29 | 2022-12-29 | Escalator monitoring video anomaly detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211703754.4A CN116030412A (en) | 2022-12-29 | 2022-12-29 | Escalator monitoring video anomaly detection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116030412A true CN116030412A (en) | 2023-04-28 |
Family
ID=86073410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211703754.4A Pending CN116030412A (en) | 2022-12-29 | 2022-12-29 | Escalator monitoring video anomaly detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116030412A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117292329A (en) * | 2023-11-24 | 2023-12-26 | 烟台大学 | Method, system, medium and equipment for monitoring abnormal work of building robot |
-
2022
- 2022-12-29 CN CN202211703754.4A patent/CN116030412A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117292329A (en) * | 2023-11-24 | 2023-12-26 | 烟台大学 | Method, system, medium and equipment for monitoring abnormal work of building robot |
CN117292329B (en) * | 2023-11-24 | 2024-03-08 | 烟台大学 | Method, system, medium and equipment for monitoring abnormal work of building robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102153591B1 (en) | Method and apparatus for detecting garbage dumping action in real time on video surveillance system | |
CN107292240B (en) | Person finding method and system based on face and body recognition | |
WO2019179024A1 (en) | Method for intelligent monitoring of airport runway, application server and computer storage medium | |
US11288887B2 (en) | Object tracking method and apparatus | |
CN111860352B (en) | Multi-lens vehicle track full tracking system and method | |
CN111738240A (en) | Region monitoring method, device, equipment and storage medium | |
CN104966304A (en) | Kalman filtering and nonparametric background model-based multi-target detection tracking method | |
CN111325048B (en) | Personnel gathering detection method and device | |
CN111079621A (en) | Method and device for detecting object, electronic equipment and storage medium | |
CN112836683A (en) | License plate recognition method, device, equipment and medium for portable camera equipment | |
CN116030412A (en) | Escalator monitoring video anomaly detection method and system | |
CN111666821A (en) | Personnel gathering detection method, device and equipment | |
CN111723656B (en) | Smog detection method and device based on YOLO v3 and self-optimization | |
CN113505704B (en) | Personnel safety detection method, system, equipment and storage medium for image recognition | |
CN111079722A (en) | Hoisting process personnel safety monitoring method and system | |
Purohit et al. | Multi-sensor surveillance system based on integrated video analytics | |
CN113920585A (en) | Behavior recognition method and device, equipment and storage medium | |
CN116403162B (en) | Airport scene target behavior recognition method and system and electronic equipment | |
CN112528903A (en) | Face image acquisition method and device, electronic equipment and medium | |
US10783365B2 (en) | Image processing device and image processing system | |
CN116311166A (en) | Traffic obstacle recognition method and device and electronic equipment | |
CN111627224A (en) | Vehicle speed abnormality detection method, device, equipment and storage medium | |
US20230267779A1 (en) | Method and system for collecting and monitoring vehicle status information | |
CN111368726B (en) | Construction site operation face personnel number statistics method, system, storage medium and device | |
CN114639084A (en) | Road side end vehicle sensing method based on SSD (solid State disk) improved algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |