CN112598709B

CN112598709B - Pedestrian movement speed intelligent sensing method based on video stream

Info

Publication number: CN112598709B
Application number: CN202011559927.0A
Authority: CN
Inventors: 寄珊珊; 李特; 孟启炜; 朱世强; 顾建军
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-11-01
Anticipated expiration: 2040-12-25
Also published as: CN112598709A

Abstract

The invention discloses a pedestrian movement speed intelligent sensing method based on video streaming, which comprises the following steps: acquiring internal parameters and external parameters of a camera by adopting a camera calibration method, and establishing a coordinate conversion model from an image coordinate system to a space coordinate system; acquiring a real-time video stream under a monitoring camera, and carrying out pedestrian target detection and tracking to obtain a pedestrian ID and a pedestrian detection frame coarse positioning result under a corresponding image coordinate system; performing component segmentation on the obtained pedestrian target by using a human body component segmentation model to obtain a pedestrian foot point fine positioning result under an image coordinate system; the method comprises the steps of utilizing a coordinate conversion model to realize position information resolving of pedestrian foot points from an image coordinate system to a space coordinate system in the real world, calculating the moving speed of pedestrians by combining video stream frame interval time, and realizing the intelligent sensing method of the moving speed of the pedestrians based on the video stream of the camera.

Description

Pedestrian movement speed intelligent sensing method based on video stream

Technical Field

The invention relates to the technical field of machine vision and deep learning, in particular to a pedestrian movement speed intelligent sensing method based on video streaming.

Background

The pedestrian detection and tracking system has the advantages that intelligent sensing such as detection, tracking, spatial position positioning and real-time movement speed calculation is carried out on pedestrians under the monitoring camera, and the pedestrian detection and tracking system is of great significance to help to establish intelligent security and manpower liberation.

The traditional method for acquiring the motion information of the target pedestrian by means of an external sensor or an external object worn on the pedestrian has poor user experience, adopts an intelligent algorithm based on vision to realize non-contact pedestrian spatial information perception and motion speed calculation, is limited by the precision of a model converted from an image coordinate system to a spatial coordinate system in terms of precision, and is particularly easily influenced by the precision of a pedestrian detection frame.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a pedestrian movement speed intelligent perception method based on video streaming.

The purpose of the invention is realized by the following technical scheme: a pedestrian movement speed intelligent sensing method based on video streaming comprises the following steps:

acquiring a calibration picture under a camera, calibrating the camera, acquiring internal parameters and external parameters of a monitoring camera, and establishing a coordinate conversion model from an image coordinate system to a world coordinate system;

step two, acquiring a real-time video stream of a monitoring camera according to a streaming media protocol, detecting and tracking pedestrians by adopting a pedestrian detection model and a multi-target tracking model, outputting a pedestrian ID and a detection frame, and obtaining a pedestrian coarse positioning result under an image coordinate system;

step three, acquiring a pedestrian picture according to the pedestrian detection frame in the step two, inputting a trained human body part segmentation model for segmenting the pedestrian part, and acquiring a pedestrian foot point fine positioning result under an image coordinate system;

step four, according to pixel information of the foot points of the pedestrians in an image coordinate system, coordinate conversion calculation is carried out by combining the coordinate conversion model established in the step one, and space coordinate information of the pedestrians taking the ground plane as a two-dimensional plane XY axis and the ground perpendicular to the ground as a Z axis is obtained; and combining the space coordinate information with a pedestrian tracking algorithm to obtain the space position moving distance of the target pedestrian under the interval frame, and then combining the interval frame time to calculate the pedestrian movement speed.

Furthermore, the camera calibration method comprises the steps of obtaining internal parameters and external parameters of the camera, wherein the internal parameters are checked by a checkerboardObtaining by a calibration method, including an internal parameter matrix M_CAnd a distortion coefficient dist, wherein external parameters are calculated by adopting an ArUcormarker two-dimensional code calibration cloth and combining with internal parameters of the monitoring camera by utilizing a PNP method, and the external parameters comprise a rotation matrix R and a translation matrix T;

further, the third step comprises the following substeps:

(3.1) according to the pedestrian coarse positioning result obtained in the step two, cutting all pedestrian pictures from the original image frame of the real-time video stream of the monitoring camera for batch processing, inputting a trained human body part segmentation model for pedestrian segmentation, and outputting a pedestrian part segmentation result;

(3.2) judging whether the video frame image contains the pedestrian foot points according to the pedestrian segmentation result, and if not, not carrying out coordinate conversion calculation; if the pedestrian foot points are contained, the pedestrian foot points are converted into pixel coordinates of the original video image frame through scale conversion, then the lower frame of the pedestrian detection frame is corrected according to the lower boundary result of the pedestrian foot point pixels, and the middle point of the connecting line of the lower left corner and the lower right corner of the corrected pedestrian detection frame is output and used as a pedestrian foot point fine positioning output value.

Further, the training process of the human body part segmentation model comprises the following steps: the PPSS dataset is used as a training set, a human body part segmentation model is input, training is carried out in a random gradient descent mode, the initial learning rate is set to be 0.03, the learning rate is updated through a linear attenuation learning strategy, and when the training times or the loss function reaches a threshold value, training of the human body part segmentation model is completed.

Further, the pedestrian moving speed

The calculation process is as follows:

wherein, the first and the second end of the pipe are connected with each other,

represents the abscissa of the pedestrian with the ID of i in the ground two-dimensional coordinate system at the time of the N-k frame,

represents the ordinate of the pedestrian with ID i in the ground two-dimensional coordinate system at the time of the N-k frame,

represents the abscissa of the pedestrian with the ID of i in the ground two-dimensional coordinate system at the time of the Nth frame,

and f is the video stream frame frequency, and k is the interval frame number.

Compared with the prior art, the invention has the following beneficial effects: adopting a non-contact intelligent sensing method, utilizing a coordinate conversion model to realize space position information resolving and calculating the moving speed of the pedestrian by combining the video stream frame interval time; in the conversion from an image coordinate system to a space coordinate system, the accuracy of the model is limited by the information of the positions of the foot points of the pedestrians, the detection frame obtained by only adopting the pedestrian detection model is relatively rough, and under the condition that the pedestrians are shielded, the lower boundary of the pedestrian given by the pedestrian detection frame is not the lower boundary of the positions of the foot points of the real pedestrians, so that a large error is easily caused; on the basis of a detection frame output by pedestrian detection, a pedestrian component segmentation network is adopted to segment human body components, so that pedestrian foot point judgment can be carried out, and more accurate pixel-level pedestrian foot point positions in an image coordinate system can be obtained; therefore, a more accurate pedestrian space position can be obtained, the calculation accuracy of subsequent pedestrian movement speed perception is correspondingly improved to some extent, and the accuracy can reach centimeter level.

Drawings

Fig. 1 is a flow chart of a pedestrian movement speed intelligent sensing method based on video streaming.

Detailed Description

Fig. 1 is a flow chart of a pedestrian movement speed intelligent sensing method based on video streaming, and the pedestrian movement speed intelligent sensing method includes the following steps:

manufacturing a checkerboard calibration plate with inner angular points of 11-8, and collecting about 20 checkerboard calibration pictures at different positions, different angles and different postures; detecting the angular points of the chessboard pattern calibration picture, searching sub-pixel angular points on the basis, removing the pictures with the detected angular points less than 88, calibrating the camera according to the Zhang calibration method, and obtaining a camera internal reference matrix M_CAnd a distortion coefficient dist, performing precision judgment by adopting a back projection error, performing distortion correction according to the distortion coefficient, and outputting the internal parameters of the corrected and optimized camera;

the method comprises the steps of generating two-dimensional code markers for external reference calibration by adopting an ArUco two-dimensional code dictionary, manufacturing calibration cloth, wherein the number of the two-dimensional code markers on the calibration cloth is 5 × 5, the length of each marker is 50cm, the interval between the markers is also 50cm, and selecting a larger marker size to meet the requirement that a monitoring camera can identify two-dimensional code mark points far away in a monitoring view field. The method comprises the steps of placing a two-dimensional code calibration cloth on the ground flatly, collecting a picture, detecting two-dimensional code mark points in the calibrated picture, obtaining pixel coordinates of the mark points when the number of the detected two-dimensional code mark points is more than or equal to 4, carrying out PNP calculation by combining internal parameters of a camera, calculating a camera rotation matrix R and a translation matrix T, and obtaining external parameters of the camera.

According to the field range of a monitoring camera deployed in an indoor monitoring scene, a three-dimensional world coordinate system taking a ground plane as a two-dimensional plane X axis and a two-dimensional plane Y axis and taking a position perpendicular to the ground as a Z axis is established, and a proper reference point is selected as a coordinate origin. And establishing a conversion relation between an image coordinate system and a world coordinate system by combining internal parameters and external parameters of the camera:

from P^-1P = E can yield:

thereby obtaining a conversion formula from a two-dimensional image coordinate system to a three-dimensional world coordinate system:

wherein Z is_CDenotes camera coordinates, [ u, v ]]Representing image pixel coordinates, [ X ]_W,Y_W,Z_W]Representing the world coordinates of the pedestrian target, R representing a surveillance camera rotation matrix, T representing a surveillance camera translation matrix, M_CAnd representing a monitoring camera internal reference matrix. For unknown parameter Z_CThe solution is as follows:

order to

R^-1·T＝P₂From

The third behavior 1 may result in: z_C·P₁[2]＝Z_w·P₂[2]Thereby obtaining Z_C＝(Z_w+P₂[2])/P₁[2]；

All the parameters from the two-dimensional image coordinate system to the three-dimensional world coordinate system are obtained, and the coordinates [ u, v ] of the pedestrian foot point in the image coordinate system are obtained]Substituting and making the coordinate axis perpendicular to the ground in the three-dimensional world coordinate system as Z_wAnd =0, the coordinate output result of the pedestrian under the space coordinate system can be obtained.

Step two, acquiring a real-time video stream of the monitoring camera according to a streaming media protocol, wherein the frame rate is 25fps, and the resolution is 1920 × 1080; adopting a pedestrian detection model and a multi-target tracking model to detect and track pedestrians, outputting a pedestrian ID and a detection frame, and obtaining a pedestrian coarse positioning result under an image coordinate system;

the pedestrian detection model adopts a YoloV3 algorithm, and a backbone network is Darknet53; the size of the input picture after preprocessing is 416 × 416, the number of detection frames output by the network is [ (13 × 13) + (26 × 26) + (52 × 52) ] + 3=10647, then the detection frames output by the non-maximum suppression NMS algorithm are subjected to post-processing, and frame screening is performed according to the set threshold and confidence coefficient to obtain the output of pedestrian detection;

the multi-target tracking model adopts a Deepsort-based multi-target tracking algorithm to continuously track the pedestrian, firstly, a Kalman filter is adopted to estimate the motion state of the pedestrian, and the relevance of motion information is carried out according to the Markov distance between the position of the detected frame of the current frame and the position predicted by the Kalman filter; appearance information association is carried out by firstly adopting a pedestrian ReID feature extraction network based on Resnet50 to extract appearance features, wherein the feature dimension is 512, then similarity calculation is carried out by adopting Euclidean distance, whether association succeeds or not is judged through a set threshold value, and finally linear weighting is carried out by combining two associations of motion features and appearance features to serve as a final association result; and if the Kalman prediction result of the new target can be matched with the detection result in 3 continuous frames, adding the tracker of the target into a tracking list, distributing a new ID, and if the tracked target continuously exceeds 3 frames and is not matched with the detection frame, judging whether the target is deleted or not by setting a maximum survival period, thereby realizing continuous tracking of multiple pedestrians under continuous frame video images and obtaining the pedestrian ID.

Step three, acquiring a pedestrian picture according to the pedestrian detection frame in the step two, inputting a trained human body part segmentation model to segment a pedestrian part, and acquiring a pedestrian foot point fine positioning result under an image coordinate system;

the human body part segmentation model adopts a semantic segmentation network DeepLab V3+ and a light-weight network MobileNet V2 as a main network, the segmentation model comprises an encoder module and a decoder module, the encoder module performs feature extraction through the series connection of a plurality of layers of convolution layers, and then multi-scale features are obtained through a void space pyramid pooling module; the decoder module performs two times of upsampling by using the bilinear difference value, wherein the first time of upsampling fuses bottom layer features with the same resolution ratio features in the decoder module network, and the feature pixels restored to the original input size can be output by the segmentation network after two times of upsampling. The training process of the human body part segmentation model comprises the following steps: the method comprises the steps of inputting a human body part segmentation model by using a PPSS dataset as a training set, training by using a random gradient descent mode, setting an initial learning rate to be 0.03, updating the learning rate by using a linear attenuation learning strategy, finishing training of the human body part segmentation model after the training times or the Loss function reaches a threshold value, and adopting a Softmax Loss function as a Loss function of the human body part segmentation model by a person skilled in the art.

The third step specifically comprises the following substeps:

(3.1) according to the pedestrian rough positioning result in the second step, cutting all pedestrian pictures from the original image frame of the real-time video stream of the monitoring camera to perform batch processing, inputting N C H W into a trained human body part segmentation model to perform pedestrian segmentation, and outputting a pedestrian part segmentation result, wherein N is the number of pedestrian targets in a single frame image, C is the number of channels of the input image, C =3, H is the image height, H =160, W is the image width, W =80, a semantic feature map after dimensionality reduction is obtained through a feature coding unit, then the feature is recovered to the feature of the input resolution of the original image after two times of upsampling through a feature decoding unit, the value set of the feature pixel value is {0,1,2,3,4,5,6,7}, and respectively represents a human body part: hair, face, upper body, arms, lower body, legs, feet, and background;

(3.2) after the human body part segmentation model is processed, a human body pixel level fine-grained semantic segmentation result can be further obtained on the basis of the pedestrian detection frame, so that the pedestrian detection frame is further corrected; judging whether the video frame image contains the pedestrian foot points according to the pedestrian segmentation result, if not, judging that the current pedestrian does not have the foot point pedestrian foot points according to the characteristic value 6 (6 marks the pixel value of the pedestrian foot points), and not carrying out coordinate conversion calculation; it should be noted that the situation that the foot points do not exist may be due to the blocking phenomenon between pedestrians, and when the target reappears, we can perform continuous tracking according to the pedestrian tracking algorithm and perform subsequent calculation of the moving speed of the pedestrian according to the recorded time interval and the moving distance. If the pedestrian foot points are contained, the pedestrian foot points are converted into pixel coordinates of the original video image frame through scale conversion, then the lower frame of the pedestrian detection frame is corrected according to the lower boundary result of the pedestrian foot point pixels, and the middle point of the connecting line of the lower left corner and the lower right corner of the corrected pedestrian detection frame is output and used as a pedestrian foot point fine positioning output value.

(4.1) two-dimensional image coordinates [ u, v ] of pedestrian foot points in the image coordinate system determined according to the pedestrian detection frame and the pedestrian component segmentation network]Combining the conversion model from the image coordinate system to the world coordinate system established in the step one to obtain:

corresponding parameters are brought in, and the coordinates [ X, Y, Z ] of the foot points of the pedestrians under the world coordinate system can be obtained]；

(4.2) combining with a pedestrian tracking algorithm, obtaining the pedestrian ID and corresponding pedestrian space coordinate information under continuous interval frames; ideally, the height of the foot point of the pedestrian on the ground is 0, that is, Z =0, so that the XY-axis coordinates [ X, Y ] of the foot point of the pedestrian on the two-dimensional plane of the ground plane can be obtained]Suppose for a continuously tracked pedestrian that its identity ID is i, its coordinates in the ground two-dimensional coordinates at frame N-k are

Terrestrial two at Nth frameThe coordinates of the dimensional coordinate system are

The video stream frame rate is f, and k is the interval frame number; the moving distance of the pedestrian in the nth frame of the current frame on the two-dimensional plane can be calculated by the euclidean distance, and the moving speed of the pedestrian can be obtained by combining the time of the interval frame:

and sending the pedestrian target ID and the pedestrian movement speed to the front end for visual display through the message middleware, thereby finishing intelligent perception of the pedestrian movement speed based on the video stream.

To sum up, this application utilizes the camera of deployment at the control point to acquire the video stream, adopts contactless visual perception algorithm to acquire the real-time velocity of motion of target pedestrian, judges to behaviors such as static pedestrian and pedestrian's velocity of motion overquick, provides effectual early warning and customization service for pedestrian's intelligence perception system under the control scene. The method realizes the coordinate conversion from an image coordinate system to a space coordinate system in the real world through a camera calibration method, and further foot point judgment and fine positioning are carried out by adopting a human body part segmentation model on the basis of a pedestrian detection frame because the conversion precision is limited by the positioning result of the foot points of pedestrians in the image coordinate system, so that the intelligent sensing of the pedestrian movement speed based on the camera video stream is more accurate.

Claims

1. A pedestrian movement speed intelligent sensing method based on video streaming is characterized by comprising the following steps:

acquiring a real-time video stream of a monitoring camera according to a streaming media protocol, detecting and tracking pedestrians by adopting a pedestrian detection model and a multi-target tracking model, outputting a pedestrian ID and a detection frame, and obtaining a pedestrian coarse positioning result under an image coordinate system;

the third step comprises the following substeps:

(3.2) judging whether the video frame image contains the pedestrian foot points according to the pedestrian segmentation result, and if not, not carrying out coordinate conversion calculation; if the pedestrian foot points are contained, converting the pixel coordinates into pixel coordinates of the original video image frame through scale conversion, correcting a lower border of the pedestrian detection frame according to a lower border result of the pixel of the pedestrian foot points, and outputting a middle point of a connecting line of a lower left corner and a lower right corner of the corrected pedestrian detection frame as a pedestrian foot point fine positioning output value;

step four, according to pixel information of the pedestrian foot points in an image coordinate system, coordinate conversion calculation is carried out by combining the coordinate conversion model established in the step one, and space coordinate information of the pedestrian in a three-dimensional world coordinate system with the ground plane as a two-dimensional plane XY axis and a coordinate axis perpendicular to the ground as a Z axis is obtained; and combining the space coordinate information with a pedestrian tracking algorithm to obtain the moving distance of the space position of the target pedestrian under the interval frame, and then combining the interval frame time to calculate the moving speed of the pedestrian.

2. The method as claimed in claim 1, wherein the camera calibration method comprises obtaining internal parameters and external parameters of the camera, the internal parameters are obtained by a checkerboard calibration method, and the internal parameters comprise an internal parameter matrix

And distortion coefficient dist, and ArUcoMar is adopted as external parameterAnd resolving the internal parameters of the monitoring camera by using a PNP method in combination with the marker two-dimensional code calibration cloth, wherein the PNP method comprises a rotation matrix R and a translation matrix T.

3. The pedestrian motion speed intelligent perception method based on the video stream as claimed in claim 1, wherein the training process of the human body part segmentation model is as follows: the PPSS dataset is used as a training set, a human body part segmentation model is input, training is carried out in a random gradient descent mode, the initial learning rate is set to be 0.03, the learning rate is updated through a linear attenuation learning strategy, and when the training times or the loss function reaches a threshold value, training of the human body part segmentation model is completed.

4. The video stream-based pedestrian motion speed intelligent sensing method according to claim 1, wherein the pedestrian motion speed is

The calculation process is as follows:

wherein the content of the first and second substances,

represents the ordinate of the pedestrian with the ID being i in the ground two-dimensional coordinate system at the time of the N-k frame,

and f is the video stream frame frequency, and k is the interval frame number.