CN109522854B

CN109522854B - Pedestrian traffic statistical method based on deep learning and multi-target tracking

Info

Publication number: CN109522854B
Application number: CN201811400758.9A
Authority: CN
Inventors: 朱志宾; 徐清侠; 李圣京; 周敏仪
Original assignee: Guangzhou Zhongju Intelligent Technology Co ltd
Current assignee: Guangzhou Zhongju Intelligent Technology Co ltd
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2021-05-11
Anticipated expiration: 2038-11-22
Also published as: CN109522854A

Abstract

The invention relates to the technical field of image processing, and aims to provide a pedestrian flow statistical method based on deep learning and multi-target tracking. The invention mainly comprises the following steps: s1: shooting a pedestrian monitoring video and reading an image in the video; s2: setting an effective area and flow count of the image; s3: constructing a pedestrian detection model based on deep learning and training the pedestrian detection model; s4: detecting the current pedestrian to obtain the coordinate and the image block of the current pedestrian frame; s5: tracking the current pedestrian by using a multi-target tracking algorithm based on deep learning, and generating the coordinate of the current pedestrian; s6: generating a moving track of the current pedestrian; s7: judging whether the current pedestrian leaves the effective area; if yes, go to step S8, otherwise go to step S4; s8: selecting a noise threshold value and judging noise; s9: and deleting the coordinates of the current pedestrian in the continuous video frames. The invention can provide an accurate flow statistic result in an actual use scene.

Description

Pedestrian traffic statistical method based on deep learning and multi-target tracking

Technical Field

The invention relates to the technical field of image processing, in particular to a pedestrian flow statistical method based on deep learning and multi-target tracking.

Background

The popularization of the monitoring camera and the development of the image processing technology provide good conditions for intelligent monitoring. Pedestrian flow statistics is widely applied to intelligent monitoring, and can be used in various occasions such as hospitals, channels, shops and the like. The accurate flow information can help to carry out reasonable resource allocation, estimate the rent level of the shop, the operating condition and the like, and has great significance.

At present, there are many people flow rate statistical methods, one is relying on hardware sensor equipment, and the other is directly processing video. Wherein the sensor is greatly influenced by the density of the stream of people; the methods for carrying out people flow statistics on video processing mainly include the following methods: a. a face recognition based method that is affected by face occlusion and head pose; b. according to the method based on human head detection and tracking, a camera is placed at the top, requirements are placed on the installation environment, meanwhile, cap shielding can generate large influence, and a tracking algorithm is simple and easy to cause tracking errors; c. based on a head and shoulder detection and tracking method, a tracking algorithm is simple, but the pedestrian density is high, and errors are easy to occur; d. in the method, the pedestrian is judged by adopting clothes or some traditional characteristics, so that a large error is easy to generate; e. a multi-camera based approach requires multiple cameras to match count the pedestrians. In addition, the people flow counting mode basically accumulates when the target moving track crosses the auxiliary line, and due to the fact that the adopted tracking algorithm is simple, tracking is disordered when a plurality of people exist, the track is disordered, counting is affected, and the pedestrian flow counting accuracy is low.

Disclosure of Invention

The invention provides a pedestrian flow statistical method based on deep learning and multi-target tracking, which can provide accurate flow statistical results in practical use scenes.

The technical scheme adopted by the invention is as follows:

a pedestrian flow statistical method based on deep learning and multi-target tracking comprises the following steps:

s1: shooting a pedestrian monitoring video in real time, and reading images in continuous video frames of the pedestrian monitoring video;

s2: setting an effective area of images in continuous video frames, and setting a flow count with an initial value of 0;

s3: constructing a pedestrian detection model based on deep learning and training the pedestrian detection model;

s4: zooming the image with the effective area by using the trained pedestrian detection model, and then detecting the current pedestrian of the image to obtain the coordinate and the image block of the current pedestrian frame, wherein the current pedestrian frame is used for indicating the area with the current pedestrian in the image;

s5: tracking the current pedestrian in real time by using a multi-target tracking algorithm based on deep learning, and generating coordinates of the current pedestrian in a continuous video frame image;

s6: generating a moving track of the current pedestrian according to the coordinates of the current pedestrian in the images of the continuous video frames;

s7: judging whether the current pedestrian leaves the effective area in real time by utilizing a multi-target tracking algorithm according to the moving track of the current pedestrian; if yes, go to step S8, otherwise go to step S4;

s8: selecting a noise threshold value and judging the noise of the moving track of the current pedestrian; if the number of coordinates of the moving track of the current pedestrian is smaller than the noise threshold value, judging that the moving track of the current pedestrian is a noise track; if the number of coordinates of the moving track of the current pedestrian is larger than or equal to the noise threshold, judging that the moving track of the current pedestrian is an effective track, and adding 1 to the numerical value of the flow meter number;

s9: the coordinates of the current pedestrian in the consecutive video frames are deleted and then step S4 is repeated.

Preferably, in the step S3, the pedestrian detection model is trained by using a YOLOv3 network structure.

Preferably, the specific steps of step S5 are as follows:

s501: reading the coordinates and image blocks of the current pedestrian frame;

s502: constructing a pedestrian feature extraction model based on deep learning and training the pedestrian feature extraction model, and performing depth apparent feature extraction on image blocks of a current pedestrian frame by using the trained deep network pedestrian feature extraction model to generate depth apparent features of different current pedestrian frames in continuous video frames;

s503: according to the coordinates of the current pedestrian frame, the prediction coordinates of the current pedestrian and the updating coordinates of the current pedestrian are calculated by using a Kalman filter, and the distance d between the prediction coordinates of the current pedestrian and the coordinates of the current pedestrian frame in the Markov space is calculated₁；

S504: calculating the cosine distance d between the apparent depth features of different current pedestrian frames in the continuous video frames obtained in step S502₂；

S505: the distance d obtained in step S503₁And the distance d obtained in step S504₂Combining to form a fusion metric c, wherein c ═ λ d₁+(1-λ)d₂Lambda is 0.1;

s506: and according to the fusion metric c, performing target matching on different current pedestrian frames in the continuous video frames by using a Hungarian matching algorithm to obtain the coordinates of the current pedestrian in the continuous video frames.

Further preferably, in step S502, the deep network pedestrian feature extraction model is obtained based on residual error network training.

Further preferably, in step S506, the target matching is performed on each current pedestrian frame according to a cascade image matching algorithm.

Further preferably, the specific steps of step S6 are as follows:

s601: establishing a coordinate buffer queue of the current pedestrian movement track, wherein the size of the queue is set as m;

s602: taking the updated coordinates of the current pedestrian obtained in the step S503 as a current pedestrian tracking frame, calculating the frame center coordinates of the current pedestrian tracking frame, and adding the frame center coordinates as the current pedestrian trajectory coordinates into a current pedestrian coordinate buffer queue;

s603: after the current pedestrian tracking frame is updated, recalculating the central coordinates of the frame, and adding the recalculated central coordinates of the frame into a coordinate buffer queue;

s604: comparing the number of buffer queue coordinates with the queue size m; if the number of coordinates of the buffering queue is equal to the size m of the queue, adding the newly added center coordinates of the frame into the buffering queue, and removing the updated coordinates of the earliest pedestrian, so that the number of coordinates of the buffering queue is always equal to the size m of the queue, and the moving track of the current pedestrian is generated.

Preferably, in step S8, the noise threshold is selected as follows:

s801: based on the current pedestrian movement track, intercepting n sections of sample VIDEOs from the pedestrian monitoring VIDEO, and recording the n sections of sample VIDEOs as VIDEO₁,…,VIDEO_i，i＝1,2,…,n；

S802: for n segments of sample VIDEO VIDEO₁,…,VIDEO_iManually counting the pedestrian volume, and recording the result of the pedestrian volume counting as NUM₁,…,NUM_i；

S803: setting a value set of a noise threshold value as { j; j is 2,3, …,20}, and n sample VIDEOs are subjected to VIDEO image processing based on steps S1 to S9₁,…,VIDEO_iCarrying out flow statistics, and recording the obtained flow counting RESULT as RESULT_ij；

S804: calculating the difference between the flow counting result obtained in step S803 and the flow counting result obtained in step S802, and recording the difference as E_ij＝RESULT_ij-NUM_i；

S805: calculating corresponding difference values E of different threshold values_ijWeighted average value F of_jWherein

S806: f when the comparison noise threshold takes different values j_jIf F is the absolute value of_kIs the smallest in absolute value, where k is e { j; j is 2,3, …,20}, and k is the final selected noise threshold.

Further preferably, in step S801, the number n of segments of the sample video is 3 or 4, and the clipping time of the sample video is 0.5 to 2 hours.

Compared with the prior art, the invention has the beneficial effects that:

1) in the pedestrian flow statistical process, the pedestrian detection model based on deep learning is applied to detect the current pedestrian, the multi-target tracking algorithm is utilized to track the current pedestrian, the deep neural network has stronger representation capability compared with the traditional characteristic extraction, and the more accurate detection result can be obtained by using the deep neural network to detect the pedestrian; in addition, the multi-target tracking algorithm based on the deep learning uses a depth correlation measurement mode, so that effective tracking can be achieved when the current pedestrian faces shielding and the pedestrian is overlapped for a short time, the tracking track is more accurate, and compared with a simple tracking method, the tracking disorder is not easy to occur.

2) Compared with the pedestrian flow counting mode in the prior art, which is accumulated by the target moving track crossing the auxiliary line, the pedestrian flow counting method has the advantages that the moving track of the current pedestrian is generated in the effective area by dividing the effective area of the image, then the moving track of the current pedestrian is counted after noise judgment, the problems of inaccurate and incomplete tracks caused by serious dependence on the correctness of the moving track in the prior art are solved, and the statistical effect of the pedestrian flow is more accurate.

3) And the influence caused by uncertain factors in pedestrian statistics is effectively avoided by carrying out three-time denoising. Specifically, firstly, in step S2, an effective region of images in consecutive video frames is set, that is, environmental detection noise caused by uncertain factors such as an illumination environment and pedestrians in a non-attention region is removed, so as to realize the denoising effect of the first step; subsequently, step S7 utilizes a multi-target tracking algorithm to judge whether the current pedestrian leaves the effective area in real time, the tracking algorithm can remove pedestrian detection noise caused by uncertain factors such as blocking and incomplete pedestrian, and the denoising effect of the second step is realized; and finally, step S8 is to select a noise threshold and judge the noise of the moving track of the current pedestrian, the noise threshold can be effectively fitted with the real pedestrian flow, and the noise track is judged by setting the noise threshold, so that the denoising effect of the third step is realized macroscopically, the counting result of the pedestrian flow is well adjusted, and the statistical result is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block flow diagram of the present invention.

Detailed Description

The pedestrian traffic statistical method based on deep learning and multi-target tracking provided by the invention will be described in detail by way of embodiments with reference to the accompanying drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, B exists alone, and A and B exist at the same time, and the term "/and" is used herein to describe another association object relationship, which means that two relationships may exist, for example, A/and B, may mean: a alone, and both a and B alone, and further, the character "/" in this document generally means that the former and latter associated objects are in an "or" relationship.

Example (b):

s1: and shooting the pedestrian monitoring video in real time, and reading images in continuous video frames of the pedestrian monitoring video. It should be noted that, in this step, the network monitoring camera may be but is not limited to be used to shoot the pedestrian monitoring video, the network monitoring camera is first placed at a position where the pedestrian can be shot, and then the camera is accessed to read the video image through a network, a local transmission, or other manners.

S2: the effective area of the images in successive video frames is set and the flow count is set to an initial value of 0. It should be noted that, setting of the effective region is to perform black filling and shielding on the non-attention region which is likely to affect the pedestrian statistics based on the observation requirement, so as to achieve the purpose of removing the noise generated by the surrounding environment.

S3: constructing a pedestrian detection model based on deep learning and training the pedestrian detection model; preferably, in the step S3, the pedestrian detection model is trained by using a YOLOv3 network structure.

S4: zooming the image with the effective area by using the trained pedestrian detection model, specifically, zooming the image into 416 × 416 pixels, then detecting the current pedestrian of the image to obtain the coordinate and the image block of the current pedestrian frame, wherein the current pedestrian frame is used for indicating the area with the current pedestrian in the image; it should be understood that the pedestrian detection model can also be implemented by fast R-CNN (fast Region-Based functional Network), ssd (single shot multi-box detector) or other target detection networks, and the model can be made to have both performance and speed by using YOLOv3 Network training.

S5: and tracking the current pedestrian in real time by using a multi-target tracking algorithm based on deep learning, and generating the coordinates of the current pedestrian in the continuous video frame images.

S6: and generating a moving track of the current pedestrian according to the coordinates of the current pedestrian in the images of the continuous video frames.

S7: judging whether the current pedestrian leaves the effective area in real time by utilizing a multi-target tracking algorithm according to the moving track of the current pedestrian; if so, the process proceeds to step S8, otherwise, the process proceeds to step S4.

S8: selecting a noise threshold value and judging the noise of the moving track of the current pedestrian; if the number of coordinates of the moving track of the current pedestrian is smaller than the noise threshold value, judging that the moving track of the current pedestrian is a noise track; and if the number of the coordinates of the moving track of the current pedestrian is greater than or equal to the noise threshold, judging that the moving track of the current pedestrian is an effective track, and adding 1 to the numerical value of the flow counting. Let j be a default value for the noise threshold, and j may be a randomly chosen smaller value, such as 6,7, …, 12.

It should be noted that, in the pedestrian flow statistics process, the factors that can affect the pedestrian flow statistics result mainly include the lighting environment in the video, the pedestrians in the non-attention area in the video, the blocked pedestrians, the incomplete pedestrians, and other uncertain factors, and in the statistics process, these factors may affect the obtaining of the coordinates and image blocks of the current pedestrian frame, thereby affecting the real-time tracking process of the current pedestrian and finally affecting the pedestrian flow statistics. In the invention, the uncertain factors influencing pedestrian flow statistics are collectively called noise, the setting of the noise threshold value can judge whether the moving track of the current pedestrian is a noise track, and the current pedestrian track judged as the noise track is removed, thus avoiding the influence of the uncertain factors on the pedestrian statistical result.

In the pedestrian flow counting process, the pedestrian detection model based on deep learning is applied to detect the current pedestrian through the step S3, and the current pedestrian is tracked by the multi-target tracking algorithm in the step S4, so that the deep neural network has stronger representation capability compared with the traditional feature extraction, and more accurate detection results can be obtained by using the deep neural network to detect the pedestrian; in addition, in step S5, the multi-target tracking algorithm based on deep learning uses a depth-related metric method, so that effective tracking can be achieved when the current pedestrian faces occlusion and the pedestrian momentarily overlaps, which means that the tracking trajectory is more accurate, and compared with a simple tracking method, the tracking is less prone to being confused.

In the prior art, a pedestrian flow counting mode that a target moving track crosses an auxiliary line to accumulate is usually adopted, and the target moving track seriously depends on the correctness of the moving track, namely the quality of a tracking effect, but when the pedestrian flow is dense, the tracking effect is often poor, disorder can often occur, and finally the pedestrian track is inaccurate and incomplete, so that the statistical result of the pedestrian flow is seriously influenced. According to the invention, the effective area of the image is defined, the moving track of the current pedestrian is generated in the effective area, then the moving track of the current pedestrian is counted after noise judgment, so that the problem that the track is inaccurate and incomplete due to the fact that the moving track is seriously dependent on the correctness in the prior art is effectively avoided, and the statistical effect of the pedestrian flow is more accurate.

It should also be understood that the invention carries out three-time denoising, and effectively avoids the influence caused by uncertain factors in pedestrian statistics. Specifically, firstly, in step S2, an effective region of images in consecutive video frames is set, that is, environmental detection noise caused by uncertain factors such as an illumination environment and pedestrians in a non-attention region is removed, so as to realize the denoising effect of the first step; subsequently, step S7 utilizes a multi-target tracking algorithm to judge whether the current pedestrian leaves the effective area in real time, the tracking algorithm can remove pedestrian detection noise caused by uncertain factors such as blocking and incomplete pedestrian, and the denoising effect of the second step is realized; and finally, step S8 is to select a noise threshold and judge the noise of the moving track of the current pedestrian, the noise threshold can be effectively fitted with the real pedestrian flow, and the noise track is judged by setting the noise threshold, so that the third step of denoising is realized macroscopically, the counting result of the pedestrian flow is well adjusted, the statistical result is more accurate, and the statistical result of the pedestrian flow is more accurate.

Further, the specific steps of step S5 are as follows:

s501: and reading the coordinates and the image blocks of the current pedestrian frame.

S502: and constructing a pedestrian feature extraction model based on deep learning and training the pedestrian feature extraction model, and performing depth apparent feature extraction on the image block of the current pedestrian frame by using the trained deep network pedestrian feature extraction model to generate depth apparent features of different current pedestrian frames in continuous video frames. Preferably, the deep network pedestrian feature extraction model is obtained based on residual error network training; the depth appearance feature is a 128-dimensional feature vector.

S503: according to the coordinates of the current pedestrian frame, the prediction coordinates of the current pedestrian and the updating coordinates of the current pedestrian are calculated by using a Kalman filter, and the distance d between the prediction coordinates of the current pedestrian and the coordinates of the current pedestrian frame in the Markov space is calculated₁. It should be noted that the kalman filter can provide a better estimate of the position of the moving object, andthe motion track is smoother.

S504: calculating the cosine distance d between the apparent depth features of different current pedestrian frames in the continuous video frames obtained in step S502₂。

S505: the distance d obtained in step S503₁And the distance d obtained in step S504₂Combining to form a fusion metric c, wherein c ═ λ d₁+(1-λ)d₂And λ is 0.1.

S506: and according to the fusion metric c, performing target matching on different current pedestrian frames in the continuous video frames by using a Hungarian matching algorithm to obtain the coordinates of the current pedestrian in the continuous video frames. Preferably, in the step S506, the target matching is performed on each current pedestrian frame according to a cascade image matching algorithm, and the cascade image matching algorithm is based on geometric characteristics, and tends to have better invariance and better stability.

Further, the specific steps of step S6 are as follows:

s601: and establishing a coordinate buffer queue of the current pedestrian movement track, wherein the size of the queue is set as m.

S602: and taking the updated coordinates of the current pedestrian obtained in the step S503 as a current pedestrian tracking frame, calculating the frame center coordinates of the current pedestrian tracking frame, and adding the frame center coordinates as the current pedestrian track coordinates into a current pedestrian coordinate buffer queue.

S603: and after the current pedestrian tracking frame is updated, recalculating the central coordinates of the frame, and adding the recalculated central coordinates of the frame into a coordinate buffer queue.

Further, in order to further reduce the error in the pedestrian flow statistics to obtain a more accurate pedestrian flow statistics result, in step S8, the noise threshold is selected as follows:

s801: based on the current pedestrian movement track, intercepting n sections of sample VIDEOs from the pedestrian monitoring VIDEO, and recording the n sections of sample VIDEOs as VIDEO₁,…,VIDEO_iI is 1,2, …, n. In this step, the number n of segments of the sample video is 3 or 4, and the clipping time of the sample video is 0.5 to 2 hours.

S802: for n segments of sample VIDEO VIDEO₁,…,VIDEO_iManually counting the pedestrian volume, and recording the result of the pedestrian volume counting as NUM₁,…,NUM_i。

S803: setting a value set of a noise threshold value as { j; j is 2,3, …,20}, and n sample VIDEOs are subjected to VIDEO image processing based on steps S1 to S9₁,…,VIDEO_iCarrying out flow statistics, and recording the obtained flow counting RESULT as RESULT_ij。

S804: calculating the difference between the flow counting result obtained in step S803 and the flow counting result obtained in step S802, and recording the difference as E_ij＝RESULT_ij-NUM_i。

S806: f when the comparison noise threshold takes different values j_jIf F is the absolute value of_kIs the smallest in absolute value, where k is e { j; j is 2,3, …,20}, and k is the finally selected noise threshold, i.e. the optimal noise threshold.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A pedestrian flow statistical method based on deep learning and multi-target tracking is characterized in that: the method comprises the following steps:

s9: deleting the coordinates of the current pedestrian in the continuous video frames, and then repeating the step S4;

in step S8, the noise threshold is selected as follows:

2. The pedestrian traffic statistical method based on deep learning and multi-target tracking as claimed in claim 1, characterized in that: in the step S3, the pedestrian detection model is trained by using the YOLOv3 network structure.

3. The pedestrian traffic statistical method based on deep learning and multi-target tracking as claimed in claim 1, characterized in that: the specific steps of step S5 are as follows:

s501: reading the coordinates and image blocks of the current pedestrian frame;

4. The pedestrian traffic statistical method based on deep learning and multi-target tracking as claimed in claim 3, characterized in that: in step S502, the deep network pedestrian feature extraction model is obtained based on residual error network training.

5. The pedestrian traffic statistical method based on deep learning and multi-target tracking as claimed in claim 3, characterized in that: in step S506, target matching is performed on each current pedestrian frame according to a cascade image matching algorithm.

6. The pedestrian traffic statistical method based on deep learning and multi-target tracking as claimed in claim 3, characterized in that: the specific steps of step S6 are as follows:

7. The pedestrian traffic statistical method based on deep learning and multi-target tracking as claimed in claim 1, characterized in that: in step S801, the number of segments n of the sample video is 3 or 4, and the clipping time of the sample video is 0.5 to 2 hours.