CN111144247A

CN111144247A - Escalator passenger reverse-running detection method based on deep learning

Info

Publication number: CN111144247A
Application number: CN201911292323.1A
Authority: CN
Inventors: 王曰海; 柳能; 奚永新; 唐慧明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-12
Anticipated expiration: 2039-12-16
Also published as: CN111144247B

Abstract

The invention discloses a method for detecting the passenger reverse running of an escalator based on deep learning, which comprises the following steps of firstly, obtaining image frames from a monitoring video stream of the escalator, and setting a detection area (ROI); secondly, detecting the head position of the passenger in the ROI specified in the step one by using a target detection algorithm; step three, judging the direction of the head detected in the step two by using a classifier, and judging whether the passenger possibly has a retrograde motion phenomenon; step four, tracking each possible retrograde target (head) in the step three by using a multi-target tracking algorithm to obtain a tracking track; and step five, analyzing each track in the step four, and judging whether the passenger has a retrograde motion behavior. The algorithm can be used for effectively judging the passenger reverse running behavior on the escalator with higher accuracy and real-time performance under the complex condition, and avoiding the occurrence of accidents.

Description

Escalator passenger reverse-running detection method based on deep learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an escalator passenger converse detection method based on deep learning.

Background

With the rise of domestic smart phones, almost all people develop the habit of walking and seeing mobile phones, so that when people take the escalator, people often miss the elevator entrance without paying attention to the elevator and are prone to running wrong, the people do not stop the escalator in time, and serious elevator taking accidents such as falling down and the like often happen after the people enter the escalator if the people do not stop the escalator in time. According to statistics, the rate of the elevator-taking safety accidents caused by the retrograde motion is the largest of all safety accidents, so that the detection of whether passengers enter the escalator in the retrograde motion or not and the reminding of the passengers in the retrograde motion are necessary measures for ensuring the safety of the passengers.

At present most elevators do not all set up equipment and the device that can detect the retrograde motion, and staff's maintenance order can be arranged in the place that a small amount of flows of people are big, and it is tired very much that still to examine whether the passenger takes place the retrograde motion when the staff maintenance order is obvious, and the automatic escalator of most public occasions all does not arrange the staff moreover, takes place the accident very easily. In some public occasions, an infrared detection instrument is additionally arranged on an elevator, and when a pedestrian is detected at an elevator opening, an alarm is given, so that the device has the defect that firstly, false detection is often caused because no behavior recognition is carried out.

The invention with publication number 106503632A discloses an escalator intelligent safety monitoring method based on video analysis, which comprises the following steps: firstly, acquiring a real-time video image sequence installed in a camera in a monitoring area; secondly, establishing a mixed Gaussian background model, and inhibiting the shadow in the video image to extract the foreground; then, the classifier is used for identifying the human body object of the extracted foreground, and the angular point light stream of the human body object is calculated, so that whether abnormal behaviors such as retrograde movement or falling down of passengers exist or not is judged, measures such as emergency stop of the escalator are taken in time, and the personal safety of people riding the escalator is guaranteed. According to the method, the abnormal behavior of the human body object is judged through the angular point light stream, so that the probability of trampling accidents caused by retrograde motion or falling of people can be effectively prevented, and the personal safety of passengers is greatly protected. However, the method has a single judgment mode and has low accuracy in judging complex retrograde conditions.

Therefore, by applying a computer vision technology and a monitoring camera of the elevator, the person who drives in the wrong direction is tracked and the behavior is analyzed by an artificial intelligence method, and whether the passenger drives in the wrong direction or not is judged accurately and rapidly, so that the method has great significance for reducing the accidents of the escalator.

Disclosure of Invention

The invention aims to effectively judge the passenger converse behavior on an escalator under the complex condition, has higher accuracy and real-time performance, and provides an escalator passenger converse detection method based on deep learning.

The method for detecting the passenger in the escalator in the reverse direction based on deep learning comprises the following steps:

step one, acquiring image frames from a monitoring video stream of an escalator, and setting a detection Region (ROI);

secondly, detecting the head position of the passenger in the detection area specified in the first step by using a target detection algorithm;

step three, judging the direction of the head detected in the step two by using a classifier, and judging whether the passenger possibly has a retrograde motion phenomenon;

step four, tracking each possible retrograde target (head) in the step three by using a multi-target tracking algorithm to obtain a tracking track;

step five, analyzing each track in the step four, and judging whether the passenger has a retrograde motion behavior;

in the first step, the video stream comes from a monitoring camera of the escalator, and the camera is arranged right opposite to the escalator opening and can clearly shoot images of passengers riding the escalator getting on and off the escalator. After the camera is installed, a rectangular area is set, the rectangular area is an exit of the escalator and serves as a detection area (ROI) of the invention, and when passengers enter from the exit, the phenomenon of reverse driving is considered to occur.

In the second step, the adopted target detection algorithm is a Yolov3 algorithm, a head detection model is trained by using a transfer learning algorithm, the head in a video frame is detected, and some misdetected targets and targets outside the ROI are filtered out through post-processing, wherein the head detection comprises the following steps:

(2.1) during training, firstly building a model structure of YOLOv3, wherein a YOLOv3 network structure comprises a dark net53 network for feature extraction, three full-convolution networks called yolo serve as output layers, each yolo layer outputs results on different feature maps, and in the yolo layer, the output dimensions are as follows:

S×S×(B×(5+C))

where S is the size of the feature map, B represents the number of frames output per trellis, 5 represents the 4 frame parameters and a confidence score of whether an object is contained, and C represents the number of categories.

In step two, there is only one category of head, so C is 1, the resolution of the picture used in this invention is 416x416, and (52x52) + (26x26) +13x13)) x3 is predicted to be 10647 bounding boxes.

The method is characterized in that a darknet53 model trained on an ILSVRC image data set is used as initial weight of a network backbone, and the rest parameters adopt a kaiming initialization method, so that transfer learning is performed equivalently, and the capability of extracting the network features from the convolution features is enhanced.

And training the network by using the public human head data set, wherein the loss function used in the training comprises a category loss function and a frame position regression loss function, the category loss function adopts binary cross entropy, and the frame position regression loss function adopts a square loss function.

(2.2) during detection, the pictures are input into a trained network after being normalized, and the output of the network can be obtained at a yolo layer, wherein the prediction result of the output of the network to a frame is converted as follows:

b_x＝σ(t_x)+c_x

b_y＝σ(t_y)+c_y

wherein t is_x，t_y，t_w，t_hBorder information output for the network, c_x，c_y，p_w，p_hThe coordinate corresponding to the upper left corner of the feature map grid and the width and height of the anchor set, respectively.

Wherein the probability of a certain class predicted by each frame is given by the following formula:

P＝objectness*P_i

objectness indicates the confidence that the current bbox contains objects;

P_irepresenting the conditional probability of predicting as the ith category in the current bbox;

(2.3) post-processing, predicting 10647 bounding boxes by YOLOv3 for the image with the resolution size of 416x416, and processing the output bounding boxes by adopting a non-maximum value suppression NMS algorithm in S2 so as to screen out effective boxes with confidence degrees larger than a certain threshold value from a plurality of output mutually overlapped bounding boxes. In addition, according to the particularity of the scene, the length-width ratio and the prediction of the size in a possible range are screened out through an NMS algorithm, and the error of the network is reduced through priori knowledge. And finally outputting the screened frames in the R0I area set in the step one.

In the third step, the used classifier is a CNN classifier and is used for judging whether the face of a person faces towards the elevator, and whether a retrograde motion phenomenon exists can be judged more accurately through the face of the person. The method specifically comprises the following steps:

and (3.1) training a classifier, training the network by adopting transfer learning, initializing the convolution layer of the network by adopting the weight trained on the ILSVRC image data set, initializing the parameters of the full connection layer of the network by adopting random initialization, and finally training to obtain an effective model.

And (3.2) detecting, namely intercepting the head region output in the step two, normalizing the picture, outputting the picture to a network to obtain a classification result, and defining the head in the direction towards an outlet Region (ROI) of the elevator as a passenger possibly having retrograde motion.

In the fourth step, the multi-target tracking algorithm adopts a deppsort algorithm, and a tracker is generated for the head of the person who may move in the wrong direction in the third step to obtain a tracking track. The method comprises the following steps:

(4.1) estimating the prediction result by using a Kalman filter, and describing the motion state of the human head by using eight state parameters of (x, y, r, h, x ^ y, y ^ r, h ^) wherein x and y are the central coordinates of the human head detection frame, r represents the length-width ratio of the human head frame, and h represents the height of the human head frame, wherein x ^ y, y ^ r, h ^ represents the variation of x, y, r, h, namely speed information. In which the pedestrian walking is considered to be at a constant speed, the kalman filter of the linear observation model is used here to estimate the motion state of the passenger.

And (4.2) target association, wherein the deppsort algorithm simultaneously considers motion information association and appearance information association.

And (3) correlating the motion information by adopting the Mahalanobis distance between the detected target position and the position predicted by the Kalman filter, wherein the Mahalanobis distance calculation method comprises the following steps:

d⁽¹⁾(i，j)＝(d_j-y_i)^TS_i ^-1(d_j，y_i)

wherein: d_jHead position, y, of a passenger detected in step three, who may be in a wrong direction_iRepresenting the prediction result of the i-th Kalman filter in the previous frame, S_iRepresenting a covariance matrix between the detected position and the average tracked position.

And (3) correlating appearance information, namely inputting the picture di of the head part of each passenger which is detected in the third step and is possible to run in the wrong direction into a convolutional neural network to obtain a feature vector ri, wherein | | | ri | | | 1. If di association is successful, it is put into the feature set with successful association, and the feature set retains the feature vector of the nearest k frames with successful association.

The specific process is as follows:

calculating the distance between the picture of each head in each Kalman filter and each feature in the feature set which is successfully associated recently, wherein the calculation formula is as follows:

wherein r is_jInputting the picture di representing the header part into a convolutional neural network to obtain a feature vector rj,

if di association is successful, it is put into the feature set with successful association, and the feature set retains the feature vector of the nearest k frames with successful association.

If the calculation is less than a specified threshold, the association is deemed to be successful.

Finally, taking the result of linear weighting of the above two association modes as the metric of the final association, the calculation formula is as follows:

c_i，j＝λd⁽¹⁾(i，j)+(1-λ)d⁽²⁾(i，j)

wherein λ represents a linear weighting coefficient for two distances;

in addition, a cascade matching method is adopted to give a large priority to a target appearing recently, so that errors caused by position updating uncertainty due to long-time shielding are solved.

(4.3) starting and ending of the tracking track, when 3 continuous frames of a tracking path are not matched with the detection frame, the tracking is considered to be ended, when the detection frame is not matched with the target in any tracker, a new target is considered to possibly appear, if the Kalman prediction result of the new target can be matched with the detection result in 3 continuous frames, the tracker of the target is added into the tracking list, otherwise, the target is considered to be detected by a detector in error, and the target is required to be deleted.

In the fifth step, each track in the fourth step is analyzed to judge whether the passenger has the retrograde motion behavior, and the process is as follows:

(5.1) initializing a record Δ Y of 0 for each track, representing the total displacement in the Y axis (parallel to the direction of installation of the stairs);

(5.2) recording the coordinate (x) of the tracked head of the ith frame image_i，y_i) And (x) of the previous point_i-1，y_i-1) Pixel difference (Δ x)_i，Δy_i)。

(5.3) when Δ Y is equal to Δ Y + Δ Y_iDetermining Δ Y and TH_ΔY(threshold for reverse) if DeltaY is greater than TH_ΔYIt indicates that the retrograde motion phenomenon occurs and sends a retrograde motion signal.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the method and the device have the advantages that the video frame is directly obtained from the monitoring camera, and then whether the passenger in the monitoring video has the behavior of going backwards or not is analyzed. Secondly, the method is based on the tracking of passengers in the video, and then the behaviors of the passengers are understood and analyzed, so that the method has the advantages of high accuracy and stable effect.

Compared with the method for detecting the whole pedestrian, the method for directly detecting the head of the pedestrian has the advantages that:

1) the number of pedestrians on the elevator is large, so that the problem of shielding is caused when the pedestrians are detected, and the problem of shielding of the pedestrians is not serious when the monitoring camera is in a overlooking visual angle;

2) the detection of the human head is more beneficial to the next judgment of the walking direction of the passenger, because the human face is obviously distinguished from the characteristics of the back of the human head, but the whole front and back characteristics of the human are not obviously distinguished, especially under the condition of shielding;

3) the head of a person shelters from fewly, consequently for whole pedestrian, the tracking effect is better, and the orbit more can reflect passenger's true orbit. In addition, the invention can store the video segment of the passenger who has the phenomenon of reverse running and can generate the report, thereby being beneficial to the design of the escalator in public places and the arrangement of personnel.

Drawings

Fig. 1 is a flow chart of an escalator passenger reverse detection method based on deep learning according to the invention.

Fig. 2 is a schematic photograph of an acquired image frame and a set ROI region.

Fig. 3 is a photograph showing the detection of the occurrence of the retrograde motion phenomenon.

Detailed Description

For purposes of making the objects, aspects and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples, it being understood that the specific examples described herein are for purposes of illustration only and are not intended to be all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for detecting passenger reverse movement of an escalator based on deep learning provided by the invention comprises the following steps:

step one, acquiring image frames from a monitoring video stream of an escalator, and setting a detection area (ROI):

the video stream comes from a monitoring camera of the escalator, the monitoring camera can be a Haikang or Dahua network camera, the frame rate is 25fps, and the resolution is 1920 x 1080. The camera is arranged right opposite to the escalator entrance, and can clearly shoot images of passengers riding the escalator for getting on and off the escalator. After the camera is installed, a rectangular area is set to serve as a detection area (ROI), the rectangular area is an exit of the escalator, and when passengers enter the escalator from the exit, the phenomenon of reverse driving is considered to occur.

Because the processing speed is lower than the reading speed of the video stream, the currently processed video frame is not the latest read video frame at this time, a multithreading method is adopted to reduce the time delay, one thread is used to read the video frame in the video stream, the other thread processes the image, and when the processing speed is lower than the reading speed, the thread reading the video frame automatically updates the buffer to ensure that the processing thread processes the latest video frame, thereby reducing the time delay.

Step two, detecting the head position of the passenger in the ROI area specified in the step one by using an object detection algorithm:

the adopted target detection algorithm is a YOLOv3 algorithm, a human head detection model is trained by using a transfer learning algorithm, the human head in a video frame is detected, and some false detection targets and targets outside the ROI are filtered out through post-processing. The invention adopts direct detection of human head, and comprises the following steps:

during training, a model structure of YOLOv3 is built, the resolution of the picture used in the invention is 416x416 images, and ((52x52) + (26x26) +13x13)) x3 is predicted to be 10647 bounding boxes. The method is characterized in that a darknet53 model trained on an ILSVRC image data set is used as initial weight of a network backbone, and the rest parameters adopt a kaiming initialization method, so that transfer learning is performed equivalently, and the capability of extracting the network features from the convolution features is enhanced.

Training the network using the public head data set, the braiwash data set, the loss function used in the training is as follows:

the loss function comprises a category loss function and a border position regression loss function, the category loss function adopts binary cross entropy, and the border position regression loss function adopts a square loss function.

During prediction, after the pictures are normalized, the pictures are input into a trained network, and the output of the network can be obtained at a yolo layer, wherein the prediction result from the output of the network to a frame is converted as follows:

b_x＝σ(t_x)+c_x

b_y＝σ(t_y)+c_y

wherein t is_x，t_y，t_w，t_hBorder information output for the network, c_x，c_y，p_w，p_hCoordinates corresponding to the upper left corner of the feature map lattice, respectivelyThe width and height of the anchor are set.

P＝objectness*P_i

objectness indicates the confidence that the current bbox contains objects;

P_irepresenting the conditional probability of predicting the ith category in the current bbox.

In the post-processing, for the image with the size of 416x416, the YOLO predicts ((52x52) + (26x26) +13x13)) x3 as 10647 bounding boxes, and in the second step, the non-maximum suppression NMS algorithm is used for processing the output bounding boxes, so that effective boxes with the confidence coefficient greater than 0.6 are screened out from the output numerous overlapped bounding boxes. In addition, according to the particularity of the scene, the length-width ratio and the prediction of the size in a possible range are screened out through an NMS algorithm, and the error of the network is reduced through priori knowledge. And finally, outputting the screened frame positioned in the ROI area set in the step one.

Step three, using a classifier to judge the direction of the human head detected in the step two:

a CNN classifier is added for judging whether the face faces towards the elevator or not, and the reverse phenomenon can be more accurately judged through the face direction. The method comprises the following steps:

training the network, wherein the used classification network is resnet50, the training input sample size is 34 × 34, the network is trained by adopting transfer learning, the convolution layer of the network is initialized by adopting the weight trained on the ILSVRC image data set, the parameters of the full connection layer of the network are initialized randomly, and finally an effective model is obtained by training.

And (3) detecting, namely intercepting the head region output in the step two, deforming the size to 34 x 34, normalizing the picture, outputting the picture to a network to obtain a classification result, and defining the head facing the direction of an elevator outlet Region (ROI) as a passenger possibly having reverse running.

Step four, tracking each possible retrograde target (head) in the step three by using a multi-target tracking algorithm to obtain a tracking track:

the multi-target tracking algorithm adopts a deppsort algorithm, and a tracker is generated for the head of the person who may move in the wrong direction in the third step to obtain a tracking track. The method comprises the following steps:

and estimating the prediction result by using a Kalman filter, and describing the motion state of the human head by adopting eight state parameters of (x, y, r, h, x ^ y, y ^ r, r ^ h), wherein x and y are the central coordinates of the human head detection frame, r represents the aspect ratio of the human head frame, and h represents the height of the human head frame, wherein x ^ y, y ^, r ^ h represents the variation of x, y, r and h, namely speed information. In which the pedestrian walking is considered to be at a constant speed, the kalman filter of the linear observation model is used here to estimate the motion state of the passenger.

And (3) target association, wherein the deppsort algorithm simultaneously considers motion information association and appearance information association:

in the motion information association, the Mahalanobis distance between the detected target position and the position predicted by the Kalman filter is adopted for association, and the Mahalanobis distance calculation method comprises the following steps:

d⁽¹⁾(i，j)＝(d_j-y_i)^TS_i ^-1(d_j-y_i)

where dj denotes the head position of the passenger detected in step three, possibly in the wrong direction, yi denotes the prediction result of the i-th kalman filter in the previous frame, and Si denotes the covariance matrix between the detected position and the average tracking position.

And (3) appearance information association, namely inputting the picture dj of the head part of each passenger which is detected in the third step and is possible to run in the opposite direction into a convolutional neural network to obtain a feature vector ri, wherein | ri | | | | is 1, if dj association is successful, the dj association is put into a feature set which is successfully associated, and the feature set retains the feature vector of the nearest k frames which are successfully associated. Calculating the distance between the picture of each head in each Kalman filter and each feature in the feature set which is successfully associated recently, wherein the calculation formula is as follows:

if the calculation is less than a specified threshold, the association is deemed to be successful. Finally, taking the result of linear weighting of the above two association modes as the metric of the final association, the calculation formula is as follows:

c_i，j＝λd⁽¹⁾(i，j)+(1-λ)d⁽²⁾(i，j)

And (3) starting and ending tracking tracks, when 3 continuous frames of a tracking path are not matched with a detection frame, considering that the tracking is ended, when the detection frame is not matched with a target in any tracker, considering that a new target possibly appears, if the Kalman prediction result of the new target can be matched with the detection result in 3 continuous frames, adding the tracker of the target into a tracking list, otherwise, considering that the detector performs false detection, and deleting the target.

Step five, analyzing each track in the step four, and judging whether the passenger has a retrograde motion:

analyzing each track to judge whether the passenger has the behavior of going backwards or not, wherein the process is as follows:

initializing a record Δ Y ═ 0 for each track, representing the total displacement in the Y axis (parallel to the direction of escalator installation);

recording the coordinate (x) of the tracked human head of the ith frame image_i，y_i) And (x) of the previous point_i-1，y_i-1) Pixel difference (Δ x)_i，Δy_i)。

Let Δ Y be Δ Y + Δ Y_iDetermining Δ Y and TH_ΔY(threshold for reverse) if DeltaY is greater than TH_ΔYIt indicates that the retrograde motion phenomenon occurs and sends a retrograde motion signal.

Claims

1. The escalator passenger converse running detection method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:

step one, acquiring image frames from a monitoring video stream of an escalator, and setting a detection area;

step four, tracking each head target which can be retrograde in step three by using a multi-target tracking algorithm to obtain a tracking track;

and step five, analyzing each track in the step four, and judging whether the passenger has a retrograde motion behavior.

2. The deep learning based escalator passenger retrograde detection algorithm of claim 1, wherein: in the first step, the monitoring video stream comes from a monitoring camera of the escalator, the camera is installed right opposite to the escalator entrance, a rectangular area is set as a detection area, and the rectangular area is an exit of the escalator.

3. The deep learning based escalator passenger retrograde detection algorithm of claim 1, wherein: in the second step, the adopted target detection algorithm uses a transfer learning algorithm to train a human head detection model, the human head in the video frame is detected, and some false detection targets and targets outside the detection area are filtered out through post-processing.

4. The deep learning based escalator passenger retrograde detection algorithm of claim 3, wherein: the second step specifically comprises the following steps:

(2.1) during training, firstly building a target detection model structure, adopting a transfer learning method, using a model trained on an ILSVRC image data set as an initial value weight of a network backbone, and adopting a random initialization method for the rest parameters; training a network by using a public human head data set, wherein the loss function comprises a category loss function and a frame position regression loss function;

(2.3) during detection, normalizing the pictures and inputting the normalized pictures into a trained network to obtain the output of the network;

and (2.3) post-processing, outputting a plurality of bounding boxes by the network, and processing the output bounding boxes by adopting a non-maximum value inhibition NMS algorithm, thereby screening effective boxes with confidence degrees larger than a certain threshold value from a plurality of output mutually overlapped bounding boxes.

5. The deep learning based escalator passenger retrograde detection algorithm of claim 1, wherein: in the third step, a classifier is added for judging whether the face is oriented, namely, whether the face is oriented towards the elevator, and whether the phenomenon of retrograde motion exists is judged according to the orientation of the face.

6. The deep learning based escalator passenger retrograde detection algorithm of claim 5, wherein: the third step specifically comprises:

(3.1) training a classifier, namely taking the image of the human head part detected by the target detector as training data, taking whether the human head faces the elevator direction as a label, training the network by adopting transfer learning, initializing the convolution layer of the network by adopting the weight trained on the ILSVRC image data set, initializing the parameters of the full connection layer of the network randomly, and finally training to obtain an effective model;

and (3.2) detecting, namely intercepting the head area output in the step two, normalizing the picture, outputting the picture to a network to obtain a classification result, and defining the head facing the direction of the elevator exit detection area as a passenger possibly having a retrograde motion.

7. The deep learning based escalator passenger retrograde detection algorithm of claim 1, wherein: in the fourth step, the multi-target tracking algorithm adopts a deppsort algorithm, and a tracker is generated for the head of the person who may move in the wrong direction in the third step to obtain a tracking track.

8. The deep learning based escalator passenger retrograde detection algorithm of claim 7, wherein: the fourth step comprises the following specific steps:

(4.1) estimating a prediction result by using a Kalman filter, and describing the motion state of the human head by using eight state parameters of x, y, r, h, x ^ y, y ^ r, r ^ h, wherein x and y are the central coordinates of a human head detection frame, r represents the length-width ratio of the human head frame, and h represents the height of the human head frame, wherein x ^ y, y ^ r, h ^ h represents the variation of x, y, r, h, namely speed information;

(4.2) performing target association, wherein the deppsort algorithm simultaneously considers motion information association and appearance information association;

(4.3) starting and ending of the tracking track, when a plurality of continuous frames of a tracking path are not matched with the detection frame, the tracking is considered to be ended, when the detection frame is not matched with the target in any tracker, a new target is considered to possibly appear, if the Kalman prediction results of the new target in a plurality of continuous frames can be matched with the detection results, the tracker of the target is added into the tracking list, otherwise, the target is considered to be detected by a detector in error, and the target is required to be deleted.

9. The deep learning based escalator passenger retrograde detection algorithm of claim 1, wherein: and step five, analyzing each track in the step four, and judging whether the passenger has a retrograde motion behavior.

10. The deep learning based escalator passenger retrograde detection algorithm of claim 9, wherein: the concrete process of the step five is as follows:

(5.1) initializing a record Δ Y of 0 for each trace, representing the total displacement on the Y-axis;

(5.2) recording the coordinate (x) of the tracked head of the ith frame image_i，y_i) And (x) with the last point_i-1，y_i-1) Pixel difference (Δ x)_t，Δy_t)；

(5.3) when Δ Y is ═ Δ Y + Δ γ_iDetermining Δ Y and TH_ΔYIf DeltaY is greater than TH_ΔYWherein TH is_ΔYA threshold value for whether or not retrograde motion is present; it indicates that the retrograde motion phenomenon occurs and sends a retrograde motion signal.