CN112418118A

CN112418118A - Method and device for detecting pedestrian intrusion under unsupervised bridge

Info

Publication number: CN112418118A
Application number: CN202011357277.1A
Authority: CN
Inventors: 张威奕; 邹晓芳; 杨德元; 王勇; 卢佳程
Original assignee: China Merchants Bureau Highway Network Technology Holding Co ltd; Hubei Edong Yangtze River Highway Bridge Co ltd; Zhaoshang Xinzhi Technology Co ltd
Current assignee: China Merchants Bureau Highway Network Technology Holding Co ltd; Hubei Edong Yangtze River Highway Bridge Co ltd; Zhaoshang Xinzhi Technology Co ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-02-26

Abstract

According to the unsupervised under-bridge pedestrian intrusion detection method and device provided by one or more embodiments of the specification, firstly, the pedestrian position information in an original image is marked through a pre-trained neural network model to obtain a marked image, the marked image is processed through a Kalman filtering multi-target tracking algorithm and a Hungary algorithm to obtain track data of each pedestrian, then the track data is detected to find out abnormal data in the track data, and the abnormal data represents that the pedestrian abnormally intrudes into a detection area, so that the under-bridge pedestrian intrusion is monitored with high precision.

Description

Method and device for detecting pedestrian intrusion under unsupervised bridge

Technical Field

One or more embodiments of the present disclosure relate to the field of underbridge pedestrian intrusion detection technologies, and in particular, to an unsupervised underbridge pedestrian intrusion detection method and apparatus.

Background

The bridge is used as a key node and a great infrastructure on a transportation line, and the safety of the bridge is extremely important. Currently, most bridges are built with an under-bridge monitoring system, the monitoring system connected to the internet is arranged at a position to be monitored, and the system is used for processing and analyzing the monitored and collected data in real time to judge whether an event of pedestrian abnormal invasion occurs or not, so that the safety of the bridges is guaranteed.

For example, patent No. CN103150550A discloses a road pedestrian event detection method based on motion trail analysis, and the invention proposes a road pedestrian event detection method based on motion trail analysis. The method detects the pedestrian track through the difference of image gray values by a traditional frame difference method. The defect is that the accuracy is low, so that the detection result is easy to be misrepresented.

Disclosure of Invention

In view of the above, one or more embodiments of the present disclosure are directed to a method and a device for unsupervised under-bridge pedestrian intrusion detection, so as to solve the technical problems in the prior art.

In view of the above, one or more embodiments of the present specification provide an unsupervised under-bridge pedestrian intrusion detection method, including:

acquiring an original image;

inputting the original image into a pre-trained neural network model to obtain a marked image which is output by the neural network model and is used for marking pedestrian position information;

processing the marked image by using a Kalman filtering multi-target tracking algorithm and a Hungarian algorithm to obtain the trajectory data of each pedestrian in the marked image;

and detecting abnormal data in the track data and outputting the abnormal data.

As an alternative embodiment, the neural network model is the YOLO v3 model;

the training method of the neural network model comprises the following steps:

acquiring a training set, wherein the training set comprises training images marked with pedestrian information;

inputting the training images in the training set into an initial YOLO v3 model for iterative operation until a preset iteration number is reached;

a trained YOLO v3 model was obtained.

As an optional implementation, the inputting the original image into a pre-trained neural network model to obtain a labeled image output by the neural network model and used for labeling pedestrian position information includes:

and masking the non-detection area in the marked image by using a mask to obtain a masked marked image.

As an optional implementation manner, after masking the non-detection area in the marked image with a mask to obtain a masked marked image, the method further includes:

extracting binary background information of an original image;

calculating a difference value between binary background information of two frames of original images at preset time intervals, if the difference value is larger than a threshold value, using SIFT characteristics to perform image offset matching, if the matching is successful, using the same offset to perform offset on the mask, and if the matching is failed, emptying the mask; if the mask is successfully matched by using the SIFT features and the mask is in the retention period, the current mask and the stored mask are overlapped, a new mask is generated according to the overlapping result, and the new mask participates in mask covering again.

As an optional implementation, the detecting and outputting abnormal data in the trajectory data includes:

acquiring historical track data;

clustering the historical track data by adopting a clustering algorithm to obtain conventional path information which is frequently passed by the pedestrian;

preprocessing the track data;

comparing the preprocessed track data with the conventional path information, and marking the track data deviating from the conventional path information as first abnormal data;

marking the track data with the residence time exceeding a threshold value in the preprocessed track data as second abnormal data;

and outputting the first abnormal data and the second abnormal data.

As an optional implementation, the preprocessing the trajectory data includes:

extracting the characteristics of the direction characteristics of the track points in the track data to eliminate the noise in the track data;

and filtering the abnormal track of which the track starting point and the track ending point are not in the detection area.

As an optional implementation, the pedestrian position information includes:

the pedestrian boundary frame comprises the frame number F of a pedestrian, a transverse position coordinate X of the upper left corner of the pedestrian, a longitudinal position coordinate Y of the upper left corner of the pedestrian, the width W of a pedestrian boundary frame and the height H of the pedestrian boundary frame.

Corresponding to the method for detecting the pedestrian intrusion under the unsupervised bridge, the invention also provides a device for detecting the pedestrian intrusion under the unsupervised bridge, which comprises the following steps:

the acquisition module is used for acquiring an original image;

the marking module is used for inputting the original image into a pre-trained neural network model to obtain a marked image which is output by the neural network model and is used for marking pedestrian position information;

the processing module is used for processing the marked image by using a Kalman filtering multi-target tracking algorithm and a Hungarian algorithm to obtain the trajectory data of each pedestrian in the marked image;

and the detection module is used for detecting abnormal data in the track data and outputting the abnormal data.

As an optional implementation, the detection module includes:

an acquisition unit configured to acquire historical track data;

the clustering unit is used for clustering the historical track data by adopting a clustering algorithm to acquire conventional path information which is frequently passed by pedestrians;

the preprocessing unit is used for preprocessing the track data;

the comparison unit is used for comparing the preprocessed track data with the conventional path information and marking the track data deviating from the conventional path information as first abnormal data;

the marking unit is used for marking the track data with the retention time exceeding a threshold value in the preprocessed track data as second abnormal data;

and the output unit is used for outputting the first abnormal data and the second abnormal data.

As an optional implementation, the preprocessing unit is configured to:

As can be seen from the above, in the unsupervised under-bridge pedestrian intrusion detection method and device provided in one or more embodiments of the present specification, firstly, the position information of the pedestrian in the original image is marked through the pre-trained neural network model to obtain the marked image, the marked image is processed through the kalman filter multi-target tracking algorithm and the hungarian algorithm to obtain the trajectory data of each pedestrian, and then the trajectory data is detected to find out the abnormal data therein, where the abnormal data indicates that the pedestrian abnormally invades into the detection area, so that the under-bridge pedestrian intrusion is monitored with high precision.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a logic diagram of a method according to one embodiment of the present description;

FIG. 2 is a logic diagram of a method according to another embodiment of the present disclosure;

FIG. 3 is a logic diagram of detecting anomalous data in a method in accordance with one or more embodiments of the present disclosure;

FIG. 4 is a schematic view of an apparatus according to one or more embodiments of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure is further described in detail below with reference to specific embodiments.

In order to achieve the above object, the present invention provides an unsupervised under-bridge pedestrian intrusion detection method, comprising:

acquiring an original image;

and detecting abnormal data in the track data and outputting the abnormal data.

In the embodiment of the invention, firstly, the pedestrian position information in the original image is marked through a pre-trained neural network model to obtain a marked image, the marked image is processed through a Kalman filtering multi-target tracking algorithm and a Hungarian algorithm to obtain the track data of each pedestrian, then the track data is detected to find out abnormal data in the marked image, and the abnormal data represents that the pedestrian abnormally breaks into a detection area, so that the high-precision monitoring of pedestrian invasion under a bridge is realized.

Fig. 1 shows an unsupervised under-bridge pedestrian intrusion detection method, comprising:

and S100, acquiring an original image.

S200, inputting the original image into a pre-trained neural network model to obtain a marked image which is output by the neural network model and is used for marking the pedestrian position information.

The traditional target detection algorithm based on image features and an Adaboost cascade classifier is completely insufficient to meet the requirement of precision rate; the detection model represented by fast R-CNN can meet the requirements of precision rate and recall rate, but the detection speed is very slow due to huge calculation amount, and the detection requirement of a real-time high-definition camera cannot be met; the original algorithms such as YOLO, YOLOv2 and the like can meet the requirements at speed, but the accuracy and recall rate are sacrificed, the detection capability on small targets is limited, and the requirement on high-view-angle pedestrian track detection in a road and bridge scene cannot be met; the YOLOv3 algorithm effectively balances the accuracy problem of the deep convolutional neural network in the field of target detection, firstly improves the measurement of loss of the network on training samples from the perspective of loss, and learns the samples by adopting three types of loss synthesis; secondly, a new network structure DarkNet-53 is adopted to match with residual connection, so that the depth of the network is enhanced; finally, multi-scale prediction is carried out, and the characteristics of the sample are captured from each scale.

Preferably, the neural network model is a YOLO v3 model;

the training method of the neural network model comprises the following steps:

a trained YOLO v3 model was obtained.

Wherein the training set requires the use of two sources of image data: one is a data set intercepted and calibrated from video data sets under a bridge, the position information of pedestrians in each frame of image needs to be marked in the image, and the data set helps the model to carry out field self-adaptation; the second is the Pascal VOC 2007+2012 provenance data set, which helps to strengthen the generalization ability of the model. For the first type of data, the specific construction process comprises: extracting 170 frames of original images from the video under the bridge, and manually marking the extracted original images in a VOC data set format by adopting LabelImg software to construct training images; for the second category of data, the Pascal VOC data set was screened for data with a person tag. After the two types of data are merged, a training set and a verification set are divided according to the proportion of 8 to 2, the training set is sent to an input initial YOLO v3 model for iterative operation, and network weight is obtained; the validation set is used to detect the training effect of the model.

The position information of the pedestrian is extracted by using the YOLOv3, the accurate position information of the pedestrian in the monitored scene can be extracted with high precision and high efficiency, and different scenes can be adapted to the problem of scene change caused by deflection of the monitoring camera.

And cutting the monitoring video acquired in real time according to frames, and sending the obtained original image into a trained YOLOv3 network, so that the pedestrian position information of the pedestrian in the image can be obtained.

The pedestrian position information is represented by a circumscribed rectangle of a pedestrian, the representation mode of the pedestrian position information in each frame is a tuple pair (F, X, Y, W, H), wherein the frame number F of the pedestrian, the transverse position coordinate X of the upper left corner of the pedestrian, the longitudinal position coordinate Y of the upper left corner of the pedestrian, the width W of a pedestrian boundary frame and the height H of the pedestrian boundary frame are shown.

As an alternative implementation, as shown in fig. 2, after S200, the method further includes:

and S500, masking the non-detection area in the marked image by using a mask to obtain a masked marked image.

Wherein the masking is implemented by delimiting the non-detection area.

The purpose of masking is to prevent the model from detecting the regions in the image that do not need to be detected, so that the model computation amount is increased, and a possible false alarm situation occurs. The mask covering principle is mainly carried out by performing matrix dot multiplication on a mask matrix and an actual image.

Optionally, after S500, the method further includes:

extracting binary background information of an original image;

and acquiring a gray scale image of the original image, and setting 1 for pixel points which are larger than a given threshold value in the gray scale image, otherwise, setting 0 for the pixel points.

Calculating a difference value between binary background information of two frames of original images separated by a preset time, wherein if the difference value is greater than a threshold value, it indicates that a monitoring camera (for example, a dome camera) used for acquiring the original images deflects, and at this time, a part of a non-detection area originally covered by a mask may be exposed again, which may cause false alarm, so that a mark image acquired after the monitoring camera deflects needs to be revised: if the difference value between the binary background information of the two frames of original images at the preset time interval is larger than a threshold value, using SIFT characteristics to perform image offset matching, if the matching is successful, using the same offset to perform offset on the mask so that the mask is matched with an area needing to be detected, namely the mask is kept consistent with an under-bridge forbidden access space in the camera under a geodetic coordinate system, if the matching is failed, the rotation angle of the camera is considered to be too large, the mask cannot be kept consistent with the position of the under-bridge space, and clearing the mask reserved before; if the mask is successfully matched by using the SIFT features and the mask is in the retention period, the current mask and the stored mask are superposed, namely the current position is considered to be the position where the pedestrian frequently appears and does not belong to the content needing to be detected when the space under the bridge invades, the content can be shielded, and finally a new mask is generated by a superposition result and participates in mask covering again.

S300, processing the marked image by using a Kalman filtering multi-target tracking algorithm and a Hungarian algorithm to obtain track data of each pedestrian in the marked image;

the Kalman filtering multi-target tracking algorithm correlates pedestrian position information detected by the trained YOLOv3 algorithm in an image and gives a pedestrian number, wherein the number form is an integer form, and the order of the number is determined by the sequence of the pedestrian appearing in the image.

The Kalman filtering multi-target tracking algorithm comprises the following steps: and judging whether the matching value between two detection frames representing the pedestrian position information between two adjacent frames reaches a certain threshold (for example, the frame intersection ratio is greater than 0.5), if so, considering that the two detection frames belong to the same target, and adopting a Hungary algorithm for matching.

The Hungarian algorithm is as follows: when the prediction boundary box can be matched with the detection boundary box, the detection boundary box updates the state (is endowed with a number), the detection boundary box is obtained by a YOLOv3 neural network in the previous step, and the prediction boundary box is obtained by predicting the position of a pedestrian in the next frame after Kalman filtering is input into the detection boundary box of the current frame; if the predicted bounding box is not matched with the upper detection bounding box, continuing predicting the parameters of the predicted target bounding box by using the classical motion model; if no corresponding detection boundary box is matched with the prediction boundary box within a certain time (for example, set to 25 frames in combination with a traffic detection scene), deleting the prediction boundary box to form a complete moving target tracking track; and finally, outputting the pedestrian boundary frame information with the pedestrian number in all the frames.

S400, detecting abnormal data in the track data and outputting the abnormal data.

Wherein the abnormal data indicates that a pedestrian is abnormally intruded or that the staying time of the pedestrian in the detection area exceeds a threshold value.

As an alternative embodiment, as shown in fig. 3, S400 includes:

s401, acquiring historical track data;

s402, clustering the historical track data by adopting a clustering algorithm to obtain conventional path information which is frequently passed by the pedestrian;

optionally, a Spectral Clustering algorithm is adopted to match the track similarity measure DTW distance.

S403, preprocessing the track data;

optionally, the S403 includes:

S404, comparing the preprocessed track data with the conventional path information, and marking the track data deviating from the conventional path information as first abnormal data;

s405, marking the track data with the retention time exceeding a threshold value in the preprocessed track data as second abnormal data;

and S406, outputting the first abnormal data and the second abnormal data.

The neural network model algorithm and the target tracking method adopted by the embodiment of the invention have high real-time performance, the image detection speed reaches 23FPS, the target tracking speed reaches 110FPS, and the real-time requirement of pedestrian monitoring under a bridge is met, so that the method can be effectively applied in practice.

In the embodiment of the invention, as a mask updating mechanism is adopted, the method is not only suitable for a fixed camera, but also suitable for a movable or zooming camera with a holder, and has wide applicability.

For ease of understanding, a specific embodiment is described below.

Examples

The selected configuration is as follows: i78700, memory: 16G, display card: the desktop of GTX1660TI acts as a training and testing machine. The input of the implementation case is a high-point monitoring video below a certain bridge in east of Ehou, and the specific steps of the implementation case are as follows:

and manually calibrating the pedestrian position information of the pedestrians at the intersection by adopting Labelimage software to obtain a manually calibrated data set.

And merging the data set obtained by manual calibration and the pedestrian part of the Pascal VOC 2007+2012 initial data set, and sending the training set obtained by merging into an original YOLOv3 network for iterative training to obtain a trained YOLO v3 model.

And taking the obtained YOLO v3 model as a network model during detection, detecting the monitored video, reading video stream data in rtsp format by using detect.py to obtain a marking frame (namely behavior position information) of the position of the pedestrian in the picture, and storing information of time, position, width and the like of the marking frame in data.txt in a corresponding format.

Py is called to carry out mask processing on the marking frame, and sort is automatically called to carry out tracking processing on the marking frame, so that the pedestrian marking frame with the pedestrian number is obtained.

Py file is used to time the pedestrian information acquired above, and if a single person appears in the video for more than a period of time (for example, set to 6 seconds according to the environment), the system automatically transmits the warning information to the cloud server through a POST instruction.

If the camera is detected to rotate, automatically matching the image offset by using a camera Change py file, and updating to the mask offset; and if the rotation does not occur, automatically updating the mask by using the mask update.

It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Corresponding to the method for detecting pedestrian intrusion under the unsupervised bridge, the invention also provides a device for detecting pedestrian intrusion under the unsupervised bridge, which is shown in figure 4 and comprises the following steps:

an obtaining module 10, configured to obtain an original image;

the marking module 20 is configured to input the original image into a pre-trained neural network model, and obtain a marked image output by the neural network model and used for marking pedestrian position information;

the processing module 30 is configured to process the tag image by using a kalman filtering multi-target tracking algorithm and a hungarian algorithm to obtain trajectory data of each pedestrian in the tag image;

and the detection module 40 is used for detecting abnormal data in the track data and outputting the abnormal data.

As an optional implementation, the detection module includes:

an acquisition unit configured to acquire historical track data;

the preprocessing unit is used for preprocessing the track data;

As an optional implementation, the preprocessing unit is configured to:

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An unsupervised under-bridge pedestrian intrusion detection method is characterized by comprising the following steps:

acquiring an original image;

and detecting abnormal data in the track data and outputting the abnormal data.

2. The unsupervised under-bridge pedestrian intrusion detection method of claim 1, wherein the neural network model is a YOLO v3 model;

the training method of the neural network model comprises the following steps:

a trained YOLO v3 model was obtained.

3. The unsupervised under-bridge pedestrian intrusion detection method according to claim 1, wherein the inputting the original image into a pre-trained neural network model to obtain a labeled image output by the neural network model and used for labeling pedestrian position information comprises:

4. The unsupervised under-bridge pedestrian intrusion detection method according to claim 3, wherein after masking the non-detection area in the marked image with a mask to obtain a masked marked image, the method further comprises:

extracting binary background information of an original image;

5. The unsupervised under-bridge pedestrian intrusion detection method of claim 1, wherein the detecting and outputting of abnormal data in the trajectory data comprises:

acquiring historical track data;

preprocessing the track data;

and outputting the first abnormal data and the second abnormal data.

6. The unsupervised under-bridge pedestrian intrusion detection method of claim 5, wherein the preprocessing the trajectory data comprises:

7. The unsupervised under-bridge pedestrian intrusion detection method of claim 1, wherein the pedestrian location information comprises:

8. An unsupervised under-bridge pedestrian intrusion detection device, comprising:

the acquisition module is used for acquiring an original image;

9. The unsupervised under-bridge pedestrian intrusion detection device of claim 8, wherein the detection module comprises:

an acquisition unit configured to acquire historical track data;

the preprocessing unit is used for preprocessing the track data;

10. The unsupervised under-bridge pedestrian intrusion detection device of claim 9, wherein the preprocessing unit is configured to: