CN112598709A

CN112598709A - Pedestrian movement speed intelligent sensing method based on video stream

Info

Publication number: CN112598709A
Application number: CN202011559927.0A
Authority: CN
Inventors: 寄珊珊; 李特; 孟启炜; 朱世强; 顾建军
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-02
Anticipated expiration: 2040-12-25
Also published as: CN112598709B

Abstract

The invention discloses a pedestrian movement speed intelligent sensing method based on video streaming, which comprises the following steps: acquiring internal parameters and external parameters of a camera by adopting a camera calibration method, and establishing a coordinate conversion model from an image coordinate system to a space coordinate system; acquiring a real-time video stream under a monitoring camera, and carrying out pedestrian target detection and tracking to obtain a pedestrian ID and a pedestrian detection frame coarse positioning result under a corresponding image coordinate system; performing component segmentation on the obtained pedestrian target by using a human body component segmentation model to obtain a pedestrian foot point fine positioning result under an image coordinate system; the method comprises the steps of utilizing a coordinate conversion model to realize position information resolving of pedestrian foot points from an image coordinate system to a space coordinate system in the real world, calculating the moving speed of pedestrians by combining video stream frame interval time, and realizing the intelligent sensing method of the moving speed of the pedestrians based on the video stream of the camera.

Description

Pedestrian movement speed intelligent sensing method based on video stream

Technical Field

The invention relates to the technical field of machine vision and deep learning, in particular to a pedestrian movement speed intelligent sensing method based on video streaming.

Background

The pedestrian monitoring system has the advantages that intelligent sensing such as detection, tracking, space position positioning and real-time movement speed calculation is carried out on pedestrians under the monitoring camera, and the pedestrian monitoring system is of great significance in helping to establish intelligent security and free manpower.

The traditional method for acquiring the motion information of the target pedestrian by means of an external sensor or an external object worn on the pedestrian has poor user experience, adopts an intelligent algorithm based on vision to realize non-contact pedestrian spatial information perception and motion speed calculation, is limited by the precision of a model converted from an image coordinate system to a spatial coordinate system in terms of precision, and is particularly easily influenced by the precision of a pedestrian detection frame.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a pedestrian movement speed intelligent sensing method based on video streaming.

The purpose of the invention is realized by the following technical scheme: a pedestrian movement speed intelligent sensing method based on video streaming comprises the following steps:

acquiring a calibration picture under a camera, calibrating the camera, acquiring internal parameters and external parameters of a monitoring camera, and establishing a coordinate conversion model from an image coordinate system to a world coordinate system;

acquiring a real-time video stream of a monitoring camera according to a streaming media protocol, detecting and tracking pedestrians by adopting a pedestrian detection model and a multi-target tracking model, outputting a pedestrian ID and a detection frame, and obtaining a pedestrian coarse positioning result under an image coordinate system;

step three, acquiring a pedestrian picture according to the pedestrian detection frame in the step two, inputting a trained human body part segmentation model for segmenting the pedestrian part, and acquiring a pedestrian foot point fine positioning result under an image coordinate system;

step four, according to pixel information of the foot points of the pedestrians in an image coordinate system, coordinate conversion calculation is carried out by combining the coordinate conversion model established in the step one, and space coordinate information of the pedestrians taking the ground plane as a two-dimensional plane XY axis and the ground perpendicular to the ground as a Z axis is obtained; and combining the space coordinate information with a pedestrian tracking algorithm to obtain the space position moving distance of the target pedestrian under the interval frame, and then combining the interval frame time to calculate the pedestrian movement speed.

Furthermore, the camera calibration method comprises the steps of obtaining internal parameters and external parameters of the camera, wherein the internal parameters are obtained by a checkerboard calibration method and comprise an internal parameter matrix M_CAnd a distortion coefficient dist, wherein external parameters are calculated by adopting an ArUcormarker two-dimensional code calibration cloth and combining with internal parameters of the monitoring camera by utilizing a PNP method, and the external parameters comprise a rotation matrix R and a translation matrix T;

further, the third step comprises the following substeps:

(3.1) according to the pedestrian coarse positioning result obtained in the step two, cutting all pedestrian pictures from the original image frame of the real-time video stream of the monitoring camera for batch processing, inputting a trained human body part segmentation model for pedestrian segmentation, and outputting a pedestrian part segmentation result;

(3.2) judging whether the video frame image contains the pedestrian foot points according to the pedestrian segmentation result, and if not, not carrying out coordinate conversion calculation; if the pedestrian foot points are contained, the pedestrian foot points are converted into pixel coordinates of the original video image frame through scale conversion, then the lower frame of the pedestrian detection frame is corrected according to the lower boundary result of the pedestrian foot point pixels, and the middle point of the connecting line of the lower left corner and the lower right corner of the corrected pedestrian detection frame is output and used as a pedestrian foot point fine positioning output value.

Further, the training process of the human body part segmentation model comprises the following steps: the PPSS dataset is used as a training set, a human body part segmentation model is input, training is carried out in a random gradient descent mode, the initial learning rate is set to be 0.03, the learning rate is updated through a linear attenuation learning strategy, and when the training times or the loss function reaches a threshold value, training of the human body part segmentation model is completed.

Further, the pedestrian movement speed

The calculation process is as follows:

wherein the content of the first and second substances,

represents the abscissa of the pedestrian with the ID of i in the ground two-dimensional coordinate system at the time of the N-k frame,

represents the ordinate of the pedestrian with ID i in the ground two-dimensional coordinate system at the time of the N-k frame,

represents the abscissa of the pedestrian with the ID of i in the ground two-dimensional coordinate system at the time of the Nth frame,

and f is the video stream frame frequency, and k is the interval frame number.

Compared with the prior art, the invention has the following beneficial effects: adopting a non-contact intelligent sensing method, utilizing a coordinate conversion model to realize space position information resolving and calculating the moving speed of the pedestrian by combining the video stream frame interval time; in the conversion from an image coordinate system to a space coordinate system, the model precision is limited by the foot point position information of the pedestrian, the obtained detection frame only adopting the pedestrian detection model is relatively rough, and under the condition that the pedestrian is shielded, the lower boundary of the pedestrian given by the pedestrian detection frame is not the lower boundary of the foot point position of the real pedestrian, so that a large error is easily caused; on the basis of a detection frame output by pedestrian detection, a pedestrian component segmentation network is adopted to segment human body components, so that pedestrian foot point judgment can be carried out, and more accurate pixel-level pedestrian foot point positions in an image coordinate system can be obtained; therefore, a more accurate pedestrian space position can be obtained, the calculation accuracy of subsequent pedestrian movement speed perception is correspondingly improved to some extent, and the accuracy can reach centimeter level.

Drawings

Fig. 1 is a flow chart of a pedestrian movement speed intelligent sensing method based on video streaming.

Detailed Description

Fig. 1 is a flow chart of a pedestrian movement speed intelligent sensing method based on video streaming, and the pedestrian movement speed intelligent sensing method includes the following steps:

manufacturing a checkerboard calibration plate with inner angular points of 11 x 8, and collecting about 20 checkerboard calibration pictures at different positions, at different angles and in different postures; detecting the angular points of the chessboard pattern calibration picture, searching sub-pixel angular points on the basis, removing the pictures with the detected angular points less than 88, calibrating the camera according to the Zhang calibration method, and obtaining a camera internal reference matrix M_CAnd a distortion coefficient dist, performing precision judgment by adopting a back projection error, performing distortion correction according to the distortion coefficient, and outputting the internal parameters of the corrected and optimized camera;

the method comprises the steps of generating two-dimensional code markers for external parameter calibration by adopting an ArUco two-dimensional code dictionary, manufacturing calibration cloth, wherein the number of the two-dimensional code markers on the calibration cloth is 5 × 5, the length of each marker is 50cm, the interval between the markers is also 50cm, and the larger size of the marker is selected to meet the requirement that a monitoring camera can identify the two-dimensional code mark points at the far position in a monitoring vision field. The method comprises the steps of placing a two-dimensional code calibration cloth on the ground flatly, collecting a picture, detecting two-dimensional code mark points in the calibrated picture, obtaining pixel coordinates of the mark points when the number of the detected two-dimensional code mark points is more than or equal to 4, carrying out PNP calculation by combining internal parameters of a camera, calculating a camera rotation matrix R and a translation matrix T, and obtaining external parameters of the camera.

According to the field range of the monitoring camera deployed in the indoor monitoring scene,establishing a three-dimensional world coordinate system which takes a ground plane as a two-dimensional plane X axis and a two-dimensional plane Y axis and takes a vertical direction to the ground as a Z axis, and selecting a proper reference point as a coordinate origin. And establishing a conversion relation between an image coordinate system and a world coordinate system by combining internal parameters and external parameters of the camera:

from P^-1P ═ E can be obtained:

thereby obtaining a conversion formula from a two-dimensional image coordinate system to a three-dimensional world coordinate system:

wherein Z is_CDenotes camera coordinates, [ u, v ]]Representing image pixel coordinates, [ X ]_W,Y_W,Z_W]Representing world coordinates of a pedestrian target, R representing a surveillance camera rotation matrix, T representing a surveillance camera translation matrix, M_CRepresenting a surveillance camera internal reference matrix. For unknown parameter Z_CThe solution is as follows:

order to

R^-1·T＝P₂From

The third behavior 1 may result in: z_C·P₁[2]＝Z_w·P₂[2]Thereby obtaining Z_C＝(Z_w+P₂[2])/P₁[2]；

All the parameters from the two-dimensional image coordinate system to the three-dimensional world coordinate system are obtained, and the coordinates [ u, v ] of the pedestrian foot point in the image coordinate system are obtained]Substituting and making the coordinate axis perpendicular to the ground in the three-dimensional world coordinate system as Z_wAnd (5) obtaining the coordinate output result of the pedestrian under the space coordinate system when the coordinate is 0.

Step two, acquiring a real-time video stream of the monitoring camera according to a streaming media protocol, wherein the frame rate is 25fps, and the resolution is 1920 x 1080; adopting a pedestrian detection model and a multi-target tracking model to detect and track pedestrians, outputting a pedestrian ID and a detection frame, and obtaining a pedestrian coarse positioning result under an image coordinate system;

the pedestrian detection model adopts a YoloV3 algorithm, and a backbone network is Darknet 53; the size of the input picture after preprocessing is 416, the number of detection frames output by the network is [ (13 × 13) + (26 × 26) + (52 × 52) ] × 3-10647, then the detection frames output by the non-maximum suppression NMS algorithm are subjected to post-processing, and frame screening is carried out according to the set threshold and the confidence coefficient to obtain the output of pedestrian detection;

the multi-target tracking model adopts a Deepsort-based multi-target tracking algorithm to continuously track the pedestrians, firstly, a Kalman filter is adopted to estimate the motion state of the pedestrians, and motion information association is carried out according to the Markov distance between the detected frame position of the current frame and the position predicted by the Kalman filter; appearance information association firstly adopts a pedestrian ReID feature extraction network based on Resnet50 to extract appearance features, the feature dimension is 512, then similarity calculation is carried out by adopting Euclidean distance, whether association is successful or not is judged through a set threshold value, and finally linear weighting is carried out by combining two associations of motion features and appearance features to serve as a final association result; and if the tracked target continuously exceeds 3 frames and is not matched with the detection frame, judging whether the target is deleted or not by setting the maximum survival period, thereby realizing continuous tracking of multiple pedestrians under continuous frame video images and obtaining the ID of the pedestrian.

the human body part segmentation model adopts a semantic segmentation network DeepLab V3+, a light-weight network MobileNet V2 is adopted as a backbone network, the segmentation model comprises an encoder module and a decoder module, the encoder module performs feature extraction through the series connection of a plurality of layers of convolutional layers, and then multi-scale features are obtained through a void space pyramid pooling module; the decoder module performs two times of upsampling by using the bilinear difference value, wherein the first time of upsampling fuses bottom layer features with the same resolution ratio features in the decoder module network, and the feature pixels restored to the original input size can be output by the segmentation network after two times of upsampling. The training process of the human body part segmentation model comprises the following steps: the method comprises the steps of inputting a human body part segmentation model by using a PPSS dataset as a training set, training by using a random gradient descent mode, setting an initial learning rate to be 0.03, updating the learning rate by using a linear attenuation learning strategy, finishing training of the human body part segmentation model after the training times or the Loss function reaches a threshold value, and adopting a Softmax Loss function as a Loss function of the human body part segmentation model by a person skilled in the art.

The third step specifically comprises the following substeps:

(3.1) according to the pedestrian rough positioning result in the second step, cutting all pedestrian pictures from the original image frame of the real-time video stream of the monitoring camera to perform batch processing, inputting N C H W into a trained human body part segmentation model to perform pedestrian segmentation, and outputting a pedestrian part segmentation result, wherein N is the number of pedestrian targets in a single frame image, C is the number of channels of the input image, C is 3, H is the image height, H is 160, W is the image width, W is 80, obtaining a semantic feature map after dimensionality reduction through a feature coding unit, then recovering the feature of the input resolution of the original image after two times of upsampling through a feature decoding unit, and the value set of feature pixel values is {0,1,2,3,4,5,6,7}, which respectively represent the human body parts: hair, face, upper body, arms, lower body, legs, feet, and background;

(3.2) after the human body part segmentation model is processed, a fine-grained semantic segmentation result of a human body pixel level can be further obtained on the basis of the pedestrian detection frame, so that the pedestrian detection frame is further corrected; judging whether the video frame image contains the pedestrian foot points according to the pedestrian segmentation result, if not, judging that the current pedestrian does not have the foot point pedestrian foot points according to the characteristic value 6(6 marks the pixel value of the pedestrian foot points), and not carrying out coordinate conversion calculation; it should be noted that the situation that the foot points do not exist may be due to the blocking phenomenon between pedestrians, and when the target reappears, we can perform continuous tracking according to the pedestrian tracking algorithm and perform subsequent calculation of the moving speed of the pedestrian according to the recorded time interval and the moving distance. If the pedestrian foot points are contained, the pedestrian foot points are converted into pixel coordinates of the original video image frame through scale conversion, then the lower frame of the pedestrian detection frame is corrected according to the lower boundary result of the pedestrian foot point pixels, and the middle point of the connecting line of the lower left corner and the lower right corner of the corrected pedestrian detection frame is output and used as a pedestrian foot point fine positioning output value.

(4.1) two-dimensional image coordinates [ u, v ] of pedestrian foot points in the image coordinate system determined according to the pedestrian detection frame and the pedestrian component segmentation network]And combining the image coordinate system established in the step one with a conversion model from the world coordinate system to obtain:

corresponding parameters are brought in, and the coordinates [ X, Y, Z ] of the foot points of the pedestrians under the world coordinate system can be obtained]；

(4.2) in combination with a pedestrian tracking algorithm,the pedestrian ID and the corresponding pedestrian space coordinate information under continuous interval frames can be obtained; ideally, the height of the foot point of the pedestrian on the ground is 0, that is, Z is 0, so that the XY-axis coordinates [ X, Y ] of the foot point of the pedestrian on the two-dimensional plane of the ground plane can be obtained]Suppose for a continuously tracked pedestrian that its identity ID is i, its coordinates in the ground two-dimensional coordinates at frame N-k are

The coordinates of the ground two-dimensional coordinate system at the Nth frame are

The video stream frame rate is f, and k is the interval frame number; the moving distance of the pedestrian in the nth frame of the current frame on the two-dimensional plane can be calculated by the euclidean distance, and the moving speed of the pedestrian can be obtained by combining the time of the interval frame:

and sending the pedestrian target ID and the pedestrian movement speed to the front end for visual display through the message middleware, thereby finishing intelligent perception of the pedestrian movement speed based on the video stream.

In summary, the video stream is acquired by the aid of the camera deployed at the monitoring point, the real-time movement speed of the target pedestrian is acquired by the aid of a non-contact visual perception algorithm, behaviors such as static pedestrians and excessive pedestrian movement speed are judged, and effective early warning and customized service is provided for a pedestrian intelligent perception system in a monitoring scene. The method realizes the coordinate conversion from an image coordinate system to a space coordinate system in the real world through a camera calibration method, and further foot point judgment and fine positioning are carried out by adopting a human body part segmentation model on the basis of a pedestrian detection frame because the conversion precision is limited by the positioning result of the foot points of pedestrians in the image coordinate system, so that the intelligent sensing of the pedestrian movement speed based on the camera video stream is more accurate.

Claims

1. A pedestrian movement speed intelligent sensing method based on video streaming is characterized by comprising the following steps:

2. The method according to claim 1, wherein the camera calibration method comprises obtaining internal parameters and external parameters of the camera, the internal parameters are obtained by a checkerboard calibration method, and the internal parameters comprise an internal parameter matrix M_CAnd distortion coefficients dist, wherein external parameters are calculated by adopting ArUcoMarker two-dimensional code calibration cloth and combining with internal parameters of the monitoring camera by utilizing a PNP method, and the external parameters comprise a rotation matrix R and a translation matrix T.

3. The video stream-based pedestrian motion speed intelligent sensing method according to claim 1, wherein the third step comprises the following sub-steps:

4. The pedestrian motion speed intelligent sensing method based on the video stream as claimed in claim 1, wherein the training process of the human body part segmentation model is as follows: the PPSS dataset is used as a training set, a human body part segmentation model is input, training is carried out in a random gradient descent mode, the initial learning rate is set to be 0.03, the learning rate is updated through a linear attenuation learning strategy, and when the training times or the loss function reaches a threshold value, training of the human body part segmentation model is completed.

5. The video stream-based pedestrian motion speed intelligent sensing method according to claim 1, wherein the pedestrian motion speed is

The calculation process is as follows:

wherein the content of the first and second substances,

and f is the video stream frame frequency, and k is the interval frame number.