CN111382705A

CN111382705A - Reverse behavior detection method and device, electronic equipment and readable storage medium

Info

Publication number: CN111382705A
Application number: CN202010164016.1A
Authority: CN
Inventors: 黄泽; 胡太祥; 王梦秋
Original assignee: Alnnovation Guangzhou Technology Co ltd
Current assignee: Alnnovation Guangzhou Technology Co ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2020-07-07

Abstract

The application provides a retrograde behavior detection method and device, electronic equipment and a readable storage medium, and relates to the technical field of safety. The method comprises the following steps: acquiring n frames of video images of the escalator to be detected; sequentially taking i as 2 to n, determining a target tracking frame of a target object in the ith frame of video image in the n frames of video images, and obtaining a plurality of target tracking frames; tracking the target object in the n frames of video images by using the plurality of target tracking frames to obtain a motion track of the target object in the n frames of video images; and determining whether the target object has a retrograde motion behavior in the escalator to be detected or not based on the motion trail. According to the scheme, human participation in detection is not needed, and the consumption of human resources can be greatly reduced.

Description

Reverse behavior detection method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of security technologies, and in particular, to a method and an apparatus for detecting a retrograde motion behavior, an electronic device, and a readable storage medium.

Background

The escalator plays an important role in public places such as markets, subways and railway station entrances and exits, and when the escalator is used normally, special staff are not arranged to supervise the escalator, so that some pedestrians can move backwards on the escalator in order to play. However, the behavior of going backwards on the escalator can cause the pedestrian to fall down and cause a serious safety accident. Therefore, in order to avoid the above situation, in the prior art, a worker generally knows whether a pedestrian runs backwards on the escalator through video monitoring, but this method needs to consume a large amount of human resources.

Disclosure of Invention

An object of the embodiments of the present application is to provide a retrograde motion behavior detection method, device, electronic device, and readable storage medium, so as to solve the problem in the prior art that it is necessary to consume relatively large human resources to manually monitor whether there is a pedestrian on an escalator that has a retrograde motion behavior.

In a first aspect, an embodiment of the present application provides a retrograde behavior detection method, where the method includes: acquiring n frames of video images of the escalator to be detected; sequentially taking i as 2 to n, determining a target tracking frame of a target object in an ith frame of video image in the n frames of video images, and obtaining a plurality of target tracking frames in total, wherein the ith target tracking frame is obtained by scaling the target tracking frame of the target object in the ith-x frame of video image in different scales, and the tracking frame with the optimal scale is determined based on the target tracking frames with different scales, each target tracking frame is used for indicating an area where the target object is located in the video image, n is an integer greater than or equal to 2, and x is an integer greater than or equal to 1; tracking the target object in the n frames of video images by using the plurality of target tracking frames to obtain a motion track of the target object in the n frames of video images; and determining whether the target object has a retrograde motion behavior in the escalator to be detected or not based on the motion trail.

In the implementation process, the target tracking frame with the optimal scale of the target object in each frame of video image is obtained based on the target tracking frames with different scales, so that in the process of tracking the target object, the target tracking frame with the optimal scale can better adapt to the size change of the target object in the video image, and the target object can be effectively tracked, thereby accurately obtaining the motion track of the target object, automatically determining whether the target object has a retrograde motion behavior based on the motion track without human participation in detection, and greatly reducing the consumption of human resources.

Optionally, the determining the target tracking frame of the target object in the ith frame of the n frames of video images to obtain a plurality of target tracking frames includes:

determining a head tracking frame of the head in an ith frame of video images in the n frames of video images to obtain a plurality of head tracking frames;

determining a shoulder tracking frame of the shoulder in the ith frame of video image in the n frames of video images to obtain a plurality of shoulder tracking frames;

determining a body tracking frame of the body in the ith frame of video image in the n frames of video images to obtain a plurality of body tracking frames;

the tracking the target object in the n frames of video images by using the plurality of target tracking frames to obtain the motion trail of the target object in the n frames of video images includes:

tracking the head of the target pedestrian in the n frames of video images by using the plurality of head tracking frames to obtain a head motion track of the head of the target pedestrian in the n frames of video images;

tracking the shoulder of the target pedestrian in the n frames of video images by using the plurality of shoulder tracking frames to obtain the shoulder motion track of the shoulder of the target pedestrian in the n frames of video images;

tracking the body of the target pedestrian in the n frames of video images by using the plurality of body tracking frames to obtain a body motion track of the body of the target pedestrian in the n frames of video images;

and obtaining the motion trail of the target pedestrian in the n frames of video images based on the head motion trail, the shoulder motion trail and the body motion trail.

In the implementation process, the target object is divided into the head part, the shoulder part and the body part to be tracked respectively, so that the situation that the target object cannot be tracked under the condition that some parts are shielded can be effectively avoided, and the motion track of the target object can be obtained more accurately.

Optionally, the obtaining a motion trajectory of the target pedestrian in the n-frame video images based on the head motion trajectory, the shoulder motion trajectory and the body motion trajectory includes:

determining a complete motion trajectory from the head motion trajectory, the shoulder motion trajectory and the body motion trajectory;

and taking the complete motion track as the motion track of the target pedestrian in the n frames of video images.

In the implementation process, the complete motion track is taken as the motion track of the target pedestrian, so that the situation that the motion track obtained by the target pedestrian in some video images is incomplete due to occlusion can be effectively avoided.

Optionally, the obtaining the motion trajectory of the target pedestrian in the n frames of video images based on the head motion trajectory, the shoulder motion trajectory and the body motion trajectory includes:

acquiring a track coordinate mean value in each frame of video image based on the head track coordinate, the shoulder track coordinate and the body track coordinate;

and generating the motion track of the target pedestrian in the n frames of video images based on the track coordinate mean value.

In the implementation process, the motion trail of the target pedestrian can be better embodied by the track coordinate mean value through acquiring the track coordinate mean value of the head track coordinate, the shoulder track coordinate and the body track coordinate, so that the motion trail of the target pedestrian can be more accurately obtained.

Optionally, the determining a target tracking frame of a target object in an i-th frame of the n frames of video images includes:

determining a target tracking frame of a target object in the i-x frame video image;

zooming the target tracking frame of the target object in the i-x frame video image at different scales to obtain a plurality of target tracking frames at different scales;

and determining a tracking frame with the optimal scale of the target object in the ith frame of video image based on the plurality of target tracking frames with different scales.

In the implementation process, the target tracking frame with the optimal scale of the target object in each frame of video image is obtained based on the target tracking frames with different scales, so that in the process of tracking the target object, the target tracking frame with the optimal scale can better adapt to the size change of the target object in the video image, and the target object can be effectively tracked.

Optionally, the determining a tracking frame of an optimal scale of the target object in the ith frame of video image based on the plurality of target tracking frames of different scales includes:

calculating response values of the target tracking frames of different scales respectively matched with the filters through a kernel correlation filter KCF model to obtain the response values corresponding to the target tracking frames of each scale;

determining the maximum response value in the response values corresponding to the target tracking frames of all scales;

and taking the target tracking frame of the scale corresponding to the maximum response value as the target tracking frame of the optimal scale of the target object in the ith frame of video image.

In the implementation process, the KCF model has a good effect on target tracking, so that the target tracking frame with the optimal scale acquired by the KCF model can be adapted to the size change of the target object more, and the target object can be effectively tracked.

Optionally, the determining whether the target object has a retrograde motion behavior in the escalator to be detected based on the motion trajectory includes:

and determining whether the target object has a retrograde motion behavior in the escalator to be detected or not through a classifier based on the motion trail.

In the implementation process, the retrograde behavior of the target object is identified through the classifier, so that the method is more convenient and efficient.

In a second aspect, an embodiment of the present application provides a retrograde behavior detection apparatus, where the apparatus includes:

the video image acquisition module is used for acquiring n frames of video images of the escalator to be detected;

a target tracking frame determining module, configured to sequentially take i as 2 to n, determine a target tracking frame of a target object in an ith frame of video image in the n frames of video images, and obtain a plurality of target tracking frames in total, where the ith target tracking frame is obtained by scaling the target tracking frame of the target object in the ith-x frame of video image in different scales, and is a tracking frame of an optimal scale determined based on the target tracking frames in different scales, each target tracking frame is used to indicate an area where the target object is located in the video image, n is an integer greater than or equal to 2, and x is an integer greater than or equal to 1;

a motion track obtaining module, configured to track the target object in the n frames of video images by using the multiple target tracking frames, and obtain a motion track of the target object in the n frames of video images;

and the retrograde motion behavior detection module is used for determining whether the target object has retrograde motion behavior in the escalator to be detected based on the motion trail.

Optionally, the target object includes a head, a shoulder and a body of a target pedestrian, and the target tracking frame determination module is configured to:

the motion trail obtaining module is configured to:

Optionally, the motion trajectory acquiring module is configured to determine a complete motion trajectory from the head motion trajectory, the shoulder motion trajectory, and the body motion trajectory; and taking the complete motion track as the motion track of the target pedestrian in the n frames of video images.

Optionally, the head movement track includes head track coordinates of the head in each frame of video image, the shoulder movement track includes shoulder track coordinates of the shoulder in each frame of video image, the body movement track includes body track coordinates of the body in each frame of video image, and the movement track acquiring module is configured to acquire a track coordinate mean value in each frame of video image based on the head track coordinates, the shoulder track coordinates, and the body track coordinates; and generating the motion track of the target pedestrian in the n frames of video images based on the track coordinate mean value.

Optionally, the target tracking frame determining module is further configured to determine a target tracking frame of a target object in the i-x frame video image; zooming the target tracking frame of the target object in the i-x frame video image at different scales to obtain a plurality of target tracking frames at different scales; and determining a tracking frame with the optimal scale of the target object in the ith frame of video image based on the plurality of target tracking frames with different scales.

Optionally, the target tracking frame determining module is further configured to:

Optionally, the retrograde motion behavior detection module is configured to determine, by the classifier, whether a retrograde motion behavior exists in the escalator to be detected for the target object based on the motion trajectory.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the steps in the method as provided in the first aspect are executed.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps in the method as provided in the first aspect.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of an electronic device for executing a retrograde behavior detection method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for detecting a retrograde behavior according to an embodiment of the present application;

fig. 3 is a schematic diagram of a target tracking frame corresponding to a target object in a video image according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a KCF model provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a target tracking frame corresponding to each part of a target object in a video image according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a reverse behavior detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The embodiment of the application provides a retrograde behavior detection method, which obtains a target tracking frame with an optimal scale of a target object in each frame of video image through target tracking frames with different scales, so that the target tracking frame with the optimal scale can better adapt to the size change of the target object in the video image in the process of tracking the target object, thereby facilitating the realization of effective tracking of the target object, accurately obtaining the motion track of the target object, automatically determining whether the target object has retrograde behavior based on the motion track without human participation in detection, and greatly reducing the consumption of human resources.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an electronic device for executing a retrograde behavior detection method according to an embodiment of the present disclosure, where the electronic device may include: at least one processor 110, such as a CPU, at least one communication interface 120, at least one memory 130, and at least one communication bus 140. Wherein the communication bus 140 is used for realizing direct connection communication of these components. The communication interface 120 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 130 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). Memory 130 may optionally be at least one memory device located remotely from the aforementioned processor. The memory 130 stores computer readable instructions, and when the computer readable instructions are executed by the processor 110, the electronic device executes the following method shown in fig. 2, for example, the memory 130 may be configured to store tracking frames of different scales corresponding to the target object, a plurality of tracking frames of an optimal scale, and a motion trajectory of the target object, and the processor 110 may be configured to track the target object by using the tracking frames of the optimal scales, obtain the motion trajectory of the target object, and then determine whether the target object has a reverse-movement behavior in the escalator to be detected based on the motion trajectory.

It will be appreciated that the configuration shown in fig. 1 is merely illustrative and that the electronic device may also include more or fewer components than shown in fig. 1 or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a flowchart of a reverse behavior detection method according to an embodiment of the present application, where the method includes the following steps:

step S110: acquiring n frames of video images of the escalator to be detected.

The escalator to be detected can be a pedestrian escalator in each scene, such as a pedestrian escalator in a market, a subway, a pedestrian overpass and other scenes, and in order to ensure the pedestrian safety of the escalator, behavior detection needs to be performed on each object in the escalator, for example, whether each object has a dangerous behavior of going backwards is detected.

In order to effectively detect the behaviors of pedestrians, the escalator to be detected can be shot by a camera to obtain a video stream or a plurality of images of the escalator to be detected, and then the behaviors of all objects in the video stream or the images are detected by an image processing technology. If the video stream of the escalator to be detected can be obtained firstly, then the video stream can be subjected to framing processing to obtain n frames of video images, wherein n is an integer greater than or equal to 2. For example, for a video stream, 25 frames per second of images may be set, and then the video stream may be divided by time and then 25 frames per second of images may be taken, so that a plurality of frames of video images may be obtained. Of course, the division of the frame video image may be divided according to actual requirements, for example, one frame image per second may also be set, or 5 frame images per second may also be set.

Step S120: and sequentially taking i as 2 to n, determining the target tracking frame of the target object in the ith frame of video image in the n frames of video images, and obtaining a plurality of target tracking frames.

In the embodiment of the present application, the target object may refer to a pedestrian or another movable object, such as a dog, and the target object may be one or more, and the user may set the type and/or number of the target objects according to actual needs. For the case that a plurality of target objects exist, for each target object, whether a retrograde motion behavior exists in each target object may be detected according to the retrograde motion behavior detection method provided in the embodiment of the present application.

For tracking the target object, the target object may be identified first, for example, the target object in a first frame video image in n frames of video images may be detected first, where the first frame video image refers to a first frame video image, and if the n frames of video images collectively include 100 frames of video images, the first frame video image is the first frame video image in the 100 frames of video images, and the target object in the first frame video image may be detected first, so as to track the target object in a subsequent frame of video images.

In some embodiments, a machine learning algorithm may be used to detect a target object in the first frame of video image, and if the target object is a pedestrian, the pedestrian may be identified based on feature information of the pedestrian, such as features of a face, clothes, limbs, and the like. The machine learning algorithm may be a Viola-joints face detection algorithm, a deformable component Model (DPM) algorithm, or the like, and a deep learning algorithm may be used to detect a target object in the first frame of video image, such as a fast-Convolutional neural network Feature-based region (fast-Regions with conditional neural Networks features, fast-RCNN) algorithm, a Single-Shot multiple-box detector (SSD) algorithm, a Feature Pyramid Network (FPN) algorithm, a You Only Look at one eye (YOLO) algorithm, a YOLOv3 algorithm, and the like, which are all used to detect a target object in the first frame of video image.

The method includes the steps that i is greater than or equal to 1 and less than n, when i is equal to 1, the ith frame of video image is the first frame of video image, after a target object in the first frame of video image is detected, a target tracking frame corresponding to the target object can be automatically generated, the target tracking frame is used for indicating an area where the target object is located in the first frame of video image, and understandably, for each frame of video image, each target tracking frame is used for indicating an area where the target object is located in the corresponding video image. In order to better track the target object, the target tracking frame may completely frame corresponding features of the target object in the frame, and if the target object is a pedestrian, the target tracking frame corresponding to the pedestrian may completely frame all features of the pedestrian in the target tracking frame, as shown in fig. 3.

After the target object in the first frame of video image is detected, the electronic device can automatically generate a target tracking frame corresponding to the target object according to the size and the position of the target object in the first frame of video image.

In addition, as the escalator to be detected moves, a target object also moves along with the escalator to be detected, so that the size of the target object in a video stream changes, and in order to adapt to the size change of the target object in a video image to realize effective tracking of the target object, scaling of different scales is also required to be performed on a target tracking frame to obtain a plurality of target tracking frames with different scales, namely, an ith target tracking frame is obtained by scaling the target tracking frame of the target object in an ith-x frame video image in different scales, and is a tracking frame with an optimal scale determined based on the target tracking frames with different scales, wherein the ith target tracking frame is a tracking frame with the optimal scale of the target object in the ith frame video image.

It can be understood that scaling the target tracking frame by different scales may refer to performing different times of scaling or scaling on the size of the target tracking frame, for example, performing 2 times of scaling, 2.5 times of scaling, 2 times of scaling, and 2.5 times of scaling on the target tracking frame, so that four target tracking frames with different scales can be obtained.

It should be noted that the target tracking frame may have any shape, such as a rectangle, a polygon, a triangle, and the like, but for convenience of subsequent calculation, in the embodiment of the present application, the target tracking frame may be understood as a rectangle.

In the process of obtaining the ith target tracking frame, the target tracking frame of the target object in the ith-x frame video image may be determined, then the target tracking frame of the target object in the ith-x frame video image is scaled in different scales to obtain a plurality of target tracking frames in different scales, and then the tracking frame of the optimal scale of the target object in the ith frame video image is determined based on the plurality of target tracking frames in different scales.

For example, in some embodiments, if x is 1, when i is 2, in order to obtain a target tracking frame of a target object in a 2 nd frame of video image, the target tracking frame of the target object in the 1 st frame of video image may be scaled by different scales to obtain a plurality of target tracking frames of different scales, and then the target tracking frame of the target object in the 2 nd frame of video image with the best scale is determined based on the plurality of target tracking frames of different scales, so that the target tracking frame of the target object in the 2 nd frame of video image may be determined.

When i is 3, after the target tracking frame of the target object in the 3 rd frame of video image is obtained, scaling the target tracking frame of the target object in the 2 nd frame of video image in different scales is performed, obtaining a plurality of target tracking frames in different scales, and then determining the target tracking frame of the target object in the 3 rd frame of video image in the best scale based on the plurality of target tracking frames in different scales, so as to determine the target tracking frame of the target object in the 3 rd frame of video image. The target tracking frame for the target object in the video image of the other frame can also be obtained in the above manner.

When x is equal to 1, the ith frame of video image is a next frame of video image of the ith-x frame of video image, and since the size of the target object in the next frame of video image may change, the target tracking frame with the optimal scale is just to adapt to the size change of the target object, if the target object in the next frame of video image may become small, the target tracking frame with the optimal scale may also be a reduced target tracking frame, that is, the target tracking frame may be updated and changed in real time, so as to be able to adapt to the size change of the target object well, thereby tracking the target object better.

For another example, in some embodiments, if x is 2, when i is 3, in order to obtain a target tracking frame of a target object in the 3 rd frame of video image, the target tracking frame of the target object in the 1 st frame of video image may be scaled by different scales to obtain a plurality of target tracking frames of different scales, and then the target tracking frame of the target object in the 3 rd frame of video image with the best scale is determined based on the plurality of target tracking frames of different scales, so that the target tracking frame of the target object in the 3 rd frame of video image may be determined. In this case, the target tracking frame of the target object in the 2 nd frame of video image may be the target tracking frame of the target object in the 1 st frame of video image, that is, the target tracking frames of the target object in the 1 st frame of video image and the 2 nd frame of video image are consistent.

It is to be understood that since the size of the target object may not change significantly in a short time, if the target tracking frame with an optimal scale is calculated for each frame of the video image, the calculation amount may be large, so in order to reduce the data calculation amount, x may take a value greater than 1, and the i-x frame of the video image may be understood as an image separated from the i-x frame of the video image by a plurality of frames of video images. For example, in some embodiments, n frames of video images may be divided, for example, n frames of video images may be grouped into a plurality of groups of video images, for example, n frames of video images include 100 frames of video images, 100 frames of video images are divided into 10 groups, each group of 10 frames of video images includes only one target tracking frame with the optimal scale for each group of 10 frames of video images, for example, for a first group of 10 frames of video images, a target object in a first frame of video image is detected, a target tracking frame of the target object is generated, the target tracking frame is scaled in different scales to obtain a plurality of target tracking frames with different scales, and a target tracking frame with the optimal scale in a next frame of video image is determined based on the plurality of target tracking frames with different scales, where a target tracking frame with the optimal scale for a target object in a subsequent 9 frames of video images may remain unchanged due to a possible small size change of the target object in the 10 frames of video images And tracking the target object in the subsequent 9 frames of video images in the first group of video images by using the target tracking frame with the optimal scale. And then for the second group of video images, zooming the target tracking frame with the optimal scale obtained in the first group of video images in different scales, then obtaining a target tracking frame with the optimal scale, tracking the target object in 10 frames of video images in the second group of video images by using the target tracking frame with the optimal scale, and determining the target tracking frame with the optimal scale to track the target object for the target tracking frame with the optimal scale corresponding to the target object in other subsequent groups of video images according to the mode. That is to say, the target tracking frame with the optimal scale has updating change in each group of video images, and the target tracking frame with the optimal scale is not required to be updated for each frame of video images, so that the data processing amount is reduced.

Additionally, in some embodiments, a neural network model may be employed to determine an optimal scale target tracking box based on a plurality of different scale target tracking boxes, such as a Long Short-Term Memory network (LSTM) model may be employed to predict the optimal scale target tracking box. It can be understood that the LSTM model may be trained in advance, in the training process, a plurality of target tracking frames with different scales in each frame of video image in consecutive multi-frame video images may be used as training samples, the target tracking frame with the optimal scale in each frame of video image may be used as a label sample to be input into the LSTM model for training, after the training is completed, the target tracking frame with the optimal scale of the target object in each frame of video image may be obtained by predicting with the trained LSTM model, so that, for each frame of video image, one target tracking frame with the optimal scale may be obtained, and then a plurality of target tracking frames are obtained in total.

Step S130: and tracking the target object in the n frames of video images by using the plurality of target tracking frames to obtain the motion trail of the target object in the n frames of video images.

For example, after the target tracking frame with the optimal scale is determined for the second frame of video image in the above manner, the target tracking frame with the optimal scale in the second frame of video image may be scaled in different scales to obtain a plurality of target tracking frames with different scales, and then the target tracking frame with the optimal scale in the third frame of video image is determined based on the target tracking frames with different scales, so that the target tracking frame with the optimal scale of the target object in each frame of video image may be obtained, and after the target object is tracked, the motion trajectory of the target object may be more accurately obtained.

It can be understood that, after the target object in the first frame of video object is identified, the target object in the subsequent frame of video image is continuously identified, and then the target object is labeled in the subsequent frame of video image by using the target tracking frame of the optimal scale corresponding to the target object in each frame of video image, so that the position of the target object in each frame of video image can be obtained, and the position of the target object in each frame of video image can be understood as the coordinates of the central point of the target tracking frame of the optimal scale corresponding to the target object in each frame of video image, or the coordinates of each vertex (e.g., four vertices of a rectangle) of the target tracking frame of the optimal scale in each frame of video image. And then the coordinate change of the target tracking frame with the optimal scale in each frame of video image can be used as the motion track of the target object in the plurality of frames of video images.

Step S140: and determining whether the target object has a retrograde motion behavior in the escalator to be detected or not based on the motion trail.

The moving direction of the escalator to be detected can be marked in advance in the shot multi-frame video images, and understandably, the moving direction of the target object can be determined based on the moving track of the target object, if the moving direction of the target object is opposite to the moving direction of the escalator to be detected, the dangerous behavior that the target object runs in a reverse direction is shown, and if the moving direction of the target object is the same as the moving direction of the escalator to be detected, the dangerous behavior that the target object does not run in a reverse direction is shown.

In some embodiments, if it is determined that the target object has a dangerous behavior of going backwards, the electronic device may output warning information, such as sending a voice prompt message, sending a dangerous warning message to an administrator terminal of the escalator, or sending a warning sound, so as to prompt the target object or the administrator that the dangerous behavior of going backwards exists, and timely stop the dangerous behavior of the target object, thereby ensuring the safety of pedestrians in the escalator to be detected.

Or, in other embodiments, whether the target object has a retrograde motion behavior in the escalator to be detected can be determined by the classifier based on the motion track, so that the method is more convenient and efficient.

The classifier can be a Support Vector Machine (SVM), the SVM model has a good effect on classification, for example, the SVM model can be trained in advance, in the training process, the running direction of the escalator and the reverse direction of the target object can be used as training samples to train the SVM model, so that the trained SVM model can be used for distinguishing the reverse behavior of the target object, the reverse behavior of the target object can be better recognized, and for simplicity of description, the specific recognition process of the SVM model is not described in detail herein.

In some embodiments, in order to better adapt to the size change of the target object and obtain a more accurate target tracking frame with an optimal scale, in the process of determining the tracking frame with the optimal scale of the target object in the i-th frame video image, response value calculation may be performed on each target tracking frame with different scales by using a Kernel Correlation Filter (KCF) model to respectively match the Filter, so as to obtain a response value corresponding to each scale of the target tracking frame, then a maximum response value in the response values corresponding to the target tracking frames with the scales is determined, and the target tracking frame with the scale corresponding to the maximum response value is used as the target tracking frame with the optimal scale of the target object in the i-th frame video image.

The principle of the KCF model is to construct a filter, which is reflected on the object to be tracked, so as to obtain the maximum response value, and the area generating the maximum response is the position of the object, and the principle is shown in fig. 4.

For example, scaling of different scales may be performed on a target tracking frame of a target object in an i-x frame video image, if 4 target tracking frames of different scales are obtained, then the 4 target tracking frames of different scales are input into a filter, response value calculation is performed to obtain 4 response values, then a maximum response value is determined from the 4 response values, and then the target tracking frame of the scale corresponding to the maximum response value may be used as the target tracking frame of the optimal scale of the target object in the i-x frame video image. For the specific implementation process involved in the KCF model, reference may be made to the related implementation process in the prior art, and for brevity of description, the detailed description is omitted here.

In some embodiments, since there may be a situation that a part of the target object is blocked during the moving process, in order to better track the target object, the target object may be further divided into a plurality of parts for tracking, if the target object is a pedestrian, and the target object includes a head, a shoulder and a body of the target pedestrian, when a plurality of target tracking frames are obtained, a head tracking frame of the head in the i-th frame video image may be respectively determined, a plurality of head tracking frames may be obtained altogether, and a shoulder tracking frame of the shoulder in the i-th frame video image may be determined, a plurality of shoulder tracking frames may be obtained altogether, and a body tracking frame of the body in the i-th frame video image may be determined altogether, and a plurality of body tracking frames may be obtained altogether.

It should be understood that the above-mentioned process of obtaining a plurality of head tracking frames, a plurality of shoulder tracking frames, and a plurality of body tracking frames is similar to that described above, that is, the head tracking frame, the shoulder tracking frame, and the body tracking frame in the i-th frame video image are all obtained by performing different-scale scaling on the head tracking frame, the shoulder tracking frame, and the body tracking frame of the target object in the i-x-th frame video image, and then the head tracking frame, the shoulder tracking frame, and the body tracking frame of the optimal scale are determined based on the plurality of head tracking frames, shoulder tracking frames, and body tracking frames of different scales.

Of course, the head tracking frame, the shoulder tracking frame, and the body tracking frame are obtained only when the target object is not occluded, and it can be understood that if the head, the shoulder, and the body of the target pedestrian can be detected in the first frame of video image, the head tracking frame, the shoulder tracking frame, and the body tracking frame can be automatically generated, as shown in fig. 5, and then the head tracking frame, the shoulder tracking frame, and the body tracking frame of the optimal size of the target pedestrian in the second frame of video image are determined based on the above manner. Of course, if the body of the target pedestrian in the first frame of video image is occluded and only the head and the shoulder of the target pedestrian can be detected, the head tracking frame and the shoulder tracking frame may be generated first, then the head tracking frame and the shoulder tracking frame of the optimal scale of the target pedestrian in the second frame of video image are determined based on the above manner, if the body of the target pedestrian is detected in the second frame of video image, the body tracking frame is generated first, and then the body tracking frame of the optimal scale of the target pedestrian in the third frame of video image may be generated based on the above manner, in this way, the obtained plurality of head tracking frames, plurality of shoulder tracking frames and plurality of body tracking frames may be used to track the target object in the n frame of video image.

In some embodiments, the head of the target pedestrian may be tracked in the n frames of video images by using a plurality of head tracking frames, a head movement track of the head of the target pedestrian in the n frames of video images is obtained, the shoulder of the target pedestrian is tracked in the n frames of video images by using a plurality of shoulder tracking frames, a shoulder movement track of the shoulder of the target pedestrian in the n frames of video images is obtained, the body of the target pedestrian is tracked in the n frames of video images by using a plurality of body tracking frames, a body movement track of the body of the target pedestrian in the n frames of video images is obtained, and then a movement track of the target pedestrian in the n frames of video images is obtained based on the head movement track, the shoulder movement track and the body movement track.

The head movement track can be understood as a track formed by coordinates of each head tracking frame in each frame video image, the shoulder movement track can be understood as a track formed by coordinates of each shoulder tracking frame in each frame video image, and the body movement track can be understood as a track formed by coordinates of each body tracking frame in each frame video image, so that the movement tracks can be used as the movement tracks of the target pedestrian.

Since the target pedestrian may have a head, a shoulder or a body missing in the partial frame video image, the head motion trajectory, the shoulder motion trajectory or the body motion trajectory obtained as above may also have a missing, not complete, motion trajectory, in order to better track the target pedestrian, in some embodiments, a complete motion trajectory may be determined from the head motion trajectory, the shoulder motion trajectory and the body motion trajectory, and then the complete motion trajectory may be used as the motion trajectory of the target pedestrian in the n frame video image.

That is, since each motion trajectory is formed by the coordinates of the tracking frame, it can be determined whether there are n head coordinates from the head motion trajectory, if there are n head coordinates, it indicates that the head motion trajectory is a complete motion trajectory, and if there are less than n head coordinates, it indicates that the head motion trajectory is an incomplete motion trajectory; the same is true for the judgment mode of whether the shoulder motion track and the body motion track are complete motion tracks, namely whether n shoulder coordinates exist is determined from the shoulder motion track, if so, the shoulder motion track is a complete motion track, and if less than n shoulder coordinates exist, the shoulder motion track is an incomplete motion track; and determining whether n body coordinates exist in the body motion track, if so, indicating that the body motion track is a complete motion track, and if less than n body coordinates exist, indicating that the body motion track is an incomplete motion track.

If the head movement track is determined to be a complete movement track based on the above manner, and the shoulder movement track and the body movement track are incomplete movement tracks, the head movement track can be used as the movement track of the target pedestrian, if the head movement track and the shoulder movement track are both complete movement tracks, one of the movement tracks can be arbitrarily selected to be used as the movement track of the target pedestrian, and of course, each coordinate in the two complete movement tracks can also be averaged, and the obtained coordinate average value is used as the movement track of the target pedestrian.

In some embodiments, the head movement trajectory includes head trajectory coordinates of the head in each frame of video image, the shoulder movement trajectory includes shoulder trajectory coordinates of the shoulder in each frame of video image, and the body movement trajectory includes body trajectory coordinates of the body in each frame of video image, in the process of obtaining the movement trajectory of the target pedestrian based on the head movement trajectory, the shoulder movement trajectory and the body movement trajectory, a trajectory coordinate mean value in each frame of video image may be obtained based on the head trajectory coordinates, the shoulder trajectory coordinates and the body trajectory coordinates, and then the movement trajectory of the target pedestrian in n frames of video image may be generated based on the trajectory coordinate mean value.

For example, in the ith frame of video image, the head track coordinate of the target pedestrian is x1, the shoulder track coordinate is x2, and the body track coordinate is x3, and the average value of the track coordinates is x ═ 3/3 (x1+ x2+ x3), so that the track coordinate average values in each frame of video image can be obtained, and the track coordinate average values can form the motion track of the target pedestrian.

It should be noted that, if only the head trajectory coordinate x1 and the shoulder trajectory coordinate x2 are included in a certain frame of video image, the average value of the trajectory coordinates in the certain frame of video image is x ═ x1+ x2)/2, and the method for obtaining the average value of the trajectory coordinates when a certain part of features of the target pedestrian are absent in other frame of video images can also be obtained according to this method, which is not described in detail herein.

Of course, in the above embodiment, in order to facilitate more accurate tracking of the target pedestrian, the target pedestrian may be further divided into more detailed parts, such as a head part, a left shoulder part, a right shoulder part, a body part, and lower limbs, each part may be tracked separately, and then the tracks obtained by tracking may be merged into the motion track of the target pedestrian.

Referring to fig. 6, fig. 6 is a block diagram of a reverse behavior detection apparatus 200 according to an embodiment of the present disclosure, where the apparatus 200 may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus 200 corresponds to the above-mentioned embodiment of the method of fig. 2, and can perform various steps related to the embodiment of the method of fig. 2, and the specific functions of the apparatus 200 can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy.

Optionally, the apparatus 200 comprises:

the video image acquisition module 210 is used for acquiring n frames of video images of the escalator to be detected;

a target tracking frame determining module 220, configured to sequentially take i as 2 to n, determine a target tracking frame of a target object in an ith frame of video image in the n frames of video images, and obtain a plurality of target tracking frames in total, where the ith target tracking frame is obtained by scaling the target tracking frame of the target object in the ith-x frame of video image in different scales, and is a tracking frame of an optimal scale determined based on the target tracking frames in different scales, each target tracking frame is used to indicate an area where the target object is located in the video image, n is an integer greater than or equal to 2, and x is an integer greater than or equal to 1;

a motion track obtaining module 230, configured to track the target object in the n frames of video images by using the multiple target tracking frames, so as to obtain a motion track of the target object in the n frames of video images;

and a reverse behavior detection module 240, configured to determine whether a reverse behavior exists in the escalator to be detected for the target object based on the motion trajectory.

Optionally, the target object includes a head, a shoulder and a body of a target pedestrian, and the target tracking frame determining module 220 is configured to:

the motion trajectory obtaining module 230 is configured to:

Optionally, the motion trajectory obtaining module 230 is configured to determine a complete motion trajectory from the head motion trajectory, the shoulder motion trajectory, and the body motion trajectory; and taking the complete motion track as the motion track of the target pedestrian in the n frames of video images.

Optionally, the head movement track includes head track coordinates of the head in each frame of video image, the shoulder movement track includes shoulder track coordinates of the shoulder in each frame of video image, the body movement track includes body track coordinates of the body in each frame of video image, and the movement track acquiring module 230 is configured to acquire a track coordinate mean value in each frame of video image based on the head track coordinates, the shoulder track coordinates, and the body track coordinates; and generating the motion track of the target pedestrian in the n frames of video images based on the track coordinate mean value.

Optionally, the target tracking frame determining module 220 is further configured to determine a target tracking frame of a target object in the i-x frame video image; zooming the target tracking frame of the target object in the i-x frame video image at different scales to obtain a plurality of target tracking frames at different scales; and determining a tracking frame with the optimal scale of the target object in the ith frame of video image based on the plurality of target tracking frames with different scales.

Optionally, the target tracking frame determining module 220 is further configured to:

Optionally, the retrograde motion behavior detecting module 240 is configured to determine, by a classifier, whether a retrograde motion behavior exists in the escalator to be detected for the target object based on the motion trajectory.

The embodiment of the present application provides a readable storage medium, and when being executed by a processor, the computer program performs the method process performed by the electronic device in the method embodiment shown in fig. 2.

The present embodiments disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-described method embodiments, for example, comprising: acquiring n frames of video images of the escalator to be detected; sequentially taking i as 2 to n, determining a target tracking frame of a target object in an ith frame of video image in the n frames of video images, and obtaining a plurality of target tracking frames in total, wherein the ith target tracking frame is obtained by scaling the target tracking frame of the target object in the ith-x frame of video image in different scales, and the tracking frame with the optimal scale is determined based on the target tracking frames with different scales, each target tracking frame is used for indicating an area where the target object is located in the video image, n is an integer greater than or equal to 2, and x is an integer greater than or equal to 1; tracking the target object in the n frames of video images by using the plurality of target tracking frames to obtain a motion track of the target object in the n frames of video images; and determining whether the target object has a retrograde motion behavior in the escalator to be detected or not based on the motion trail.

In summary, the present application provides a retrograde motion detection method, an apparatus, an electronic device, and a readable storage medium, where a target tracking frame with an optimal scale of a target object in each frame of a video image is obtained based on target tracking frames with different scales, so that in a process of tracking the target object, the target tracking frame with the optimal scale can better adapt to size change of the target object in the video image, so as to implement effective tracking of the target object, thereby accurately obtaining a motion trajectory of the target object, and then automatically determining whether a retrograde motion exists in the target object based on the motion trajectory, without human involvement in detection, thereby greatly reducing consumption of human resources.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for detecting retrograde behavior, the method comprising:

acquiring n frames of video images of the escalator to be detected;

sequentially taking i as 2 to n, determining a target tracking frame of a target object in an ith frame of video image in the n frames of video images, and obtaining a plurality of target tracking frames in total, wherein the ith target tracking frame is obtained by scaling the target tracking frame of the target object in the ith-x frame of video image in different scales, and the tracking frame with the optimal scale is determined based on the target tracking frames with different scales, each target tracking frame is used for indicating an area where the target object is located in the video image, n is an integer greater than or equal to 2, and x is an integer greater than or equal to 1;

tracking the target object in the n frames of video images by using the plurality of target tracking frames to obtain a motion track of the target object in the n frames of video images;

and determining whether the target object has a retrograde motion behavior in the escalator to be detected or not based on the motion trail.

2. The method according to claim 1, wherein the target object comprises a head, a shoulder and a body of a target pedestrian, and the determining the target tracking frame of the target object in the i-th frame of the n frames of video images obtains a plurality of target tracking frames, including:

3. The method according to claim 2, wherein the obtaining the motion trajectory of the target pedestrian in the n-frame video images based on the head motion trajectory, the shoulder motion trajectory and the body motion trajectory comprises:

4. The method of claim 2, wherein the head motion trajectory comprises head trajectory coordinates of the head in each frame of video image, the shoulder motion trajectory comprises shoulder trajectory coordinates of the shoulder in each frame of video image, the body motion trajectory comprises body trajectory coordinates of the body in each frame of video image, and the obtaining the motion trajectory of the target pedestrian in the n frames of video image based on the head motion trajectory, the shoulder motion trajectory and the body motion trajectory comprises:

5. The method of claim 1, wherein determining a target tracking frame for a target object in an i-th frame of the n frames of video images comprises:

6. The method of claim 5, wherein determining the optimal scale tracking box for the target object in the ith frame of video image based on the plurality of different scale tracking boxes comprises:

7. The method according to claim 1, wherein the determining whether the target object has a retrograde motion in the escalator to be detected based on the motion trajectory comprises:

8. A retrograde behavior detection apparatus, characterized in that the apparatus comprises:

9. An electronic device comprising a processor and a memory, the memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-7.

10. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.