CN107145821A

CN107145821A - A method and system for crowd density detection based on deep learning

Info

Publication number: CN107145821A
Application number: CN201710177154.1A
Authority: CN
Inventors: 李康顺; 黄鸿涛; 郑泽标; 陆誉升; 冯思聪; 邓坚
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2017-03-23
Filing date: 2017-03-23
Publication date: 2017-09-08

Abstract

The invention discloses a kind of crowd density detection method and system based on deep learning, detection method step is as follows：Background image information is obtained by Background learning first, the target prospect image of each two field picture is then extracted by background image information.Low density crowd model is set up by the image for having extracted target prospect image and belong to low density crowd grade, Dense crowd model is set up by the image for having extracted target prospect image and belong to Dense crowd grade；Each two field picture for needing detection crowd density, it is first applied to low density crowd model, when low density crowd model gets crowd's quantity result not less than certain value, crowd density grade is then judged according to crowd's quantity, when low density crowd model gets crowd's quantity result more than certain value, image is then inputted into high density people's group model, crowd density is judged by Dense crowd model.High, the small advantage of amount of calculation with accuracy of detection.

Description

A method and system for crowd density detection based on deep learning

技术领域technical field

本发明属于机器视觉领域，特别涉及一种基于深度学习的人群密度检测方法和系统。The invention belongs to the field of machine vision, and in particular relates to a method and system for detecting crowd density based on deep learning.

背景技术Background technique

随着我国经济的高速发展，人口城市化日益明显。越来越多的人涌入城市，导致城市内许多公共场合(包括地铁、机场、商业区、体育场等)的人口密度不断增长。尤其在公共节假日时段，人群拥挤现象屡见不鲜。人群作为一个特殊的管理对象，越来越受到社会的重视。因此如何实时有效地监控人群，消除人群过度拥挤带来的安全隐患，是现今社会亟待解决的问题之一。地铁作为城市轨道交通系统中的组成部分，人群密度检测的需求更为急切。With the rapid development of my country's economy, population urbanization is becoming more and more obvious. More and more people are pouring into the city, resulting in the increasing population density of many public places in the city (including subways, airports, business districts, stadiums, etc.). Crowds are common, especially during public holidays. As a special management object, the crowd has been paid more and more attention by the society. Therefore, how to effectively monitor the crowd in real time and eliminate the potential safety hazards caused by overcrowding is one of the problems that need to be solved urgently in today's society. As an integral part of the urban rail transit system, the subway has a more urgent demand for crowd density detection.

传统方法利用人数的统计来判断场景内是否拥挤。但因监控场景面积不同，单纯的通过人员进出或手机信号发射统计人数存在大量耗费人力财力和产生误差较大的问题。而且地铁中不同地段的面积不同，仅通过统计人数无法准确判断出场景内人群的密集程度，而对于处理公共场合的突发状况，人群密集程度更为重要，人数统计仅作为辅助数据提供。The traditional method uses the statistics of the number of people to judge whether the scene is crowded. However, due to the different areas of the monitoring scene, simply counting the number of people through the entry and exit of personnel or the transmission of mobile phone signals has the problems of consuming a lot of manpower and financial resources and causing large errors. Moreover, the area of different sections in the subway is different, and the density of the crowd in the scene cannot be accurately judged only by counting the number of people. For dealing with emergencies in public places, the density of the crowd is more important, and the population count is only provided as auxiliary data.

目前对人群密度的研究可分为两类，分别是基于像素的方法和基于纹理分析的方法。基于像素的方法最早是由Davies在文章“基于图像处理的人群监控”(Crowdmonitoring using image processing，Electronics&Communication EngineeringJournal，1995，7(1)：37-47)中提出，通过背景提取人群前景，运用边缘检测法提取前景边缘像素数目，根据标定的人数拟合人群数量估计线性模型，将提取的前景边缘像素数输入估计模型可获得对应的人群数量。由于透视畸变效应的影响，人群前景像素与边缘像素数目随着其真实点距摄像机的远近产生近大远小现象。基于像素的方法在人群密度较小时具有良好的效果，随着人群密度增大，因行人间相互遮挡使得此类方法的线性关系不再成立。1998年，Marana提出了一种基于纹理分析技术的人群密度估计方法；该方法的依据是不同密度的人群图像对应的纹理模式不同。高密度的人群在纹理上表现为细模式，而低密度的人群图像在背景图像为低频的同时在纹理上表现为粗模式。基于纹理分析的密度估计方法可以解决高密度人群密度问题，但是算法计算量较大，特征量较多，并且当背景较复杂时，对中低密度人群估计的误差较大。此后，就如何结合使用不同纹理分析方法来提高人群密度估计准确率成为了研究热点。并且现有技术中人群密度检测方法中在图像处理过程中所用到的背景图像通常通过计算每个像素的平均值所获取到的，环境光照变化和背景的多模态性比较敏感，随着环境的变化，其适应性将变差，会影响到人群密度的检测精度。The current research on crowd density can be divided into two categories, which are pixel-based methods and texture-based methods. The pixel-based method was first proposed by Davies in the article "Crowdmonitoring using image processing, Electronics & Communication Engineering Journal, 1995, 7(1): 37-47), extracting the foreground of the crowd through the background, using edge detection The method extracts the number of foreground edge pixels, fits the linear model of crowd number estimation according to the calibrated number of people, and inputs the extracted foreground edge pixel number into the estimation model to obtain the corresponding crowd number. Due to the effect of perspective distortion, the number of foreground pixels and edge pixels of the crowd will be larger and smaller with the distance from the real point to the camera. The pixel-based method has a good effect when the crowd density is small. As the crowd density increases, the linear relationship of such methods no longer holds due to mutual occlusion among pedestrians. In 1998, Marana proposed a crowd density estimation method based on texture analysis technology; this method is based on different texture patterns corresponding to crowd images of different densities. A high-density crowd appears as a fine pattern in texture, while a low-density crowd image appears as a coarse pattern in texture while the background image is low-frequency. The density estimation method based on texture analysis can solve the problem of high-density crowd density, but the algorithm has a large amount of calculation and a large number of features, and when the background is complex, the estimation error for low- and medium-density crowds is large. Since then, how to combine different texture analysis methods to improve the accuracy of crowd density estimation has become a research hotspot. Moreover, the background image used in the image processing process in the crowd density detection method in the prior art is usually obtained by calculating the average value of each pixel, and the environmental illumination changes and the multimodality of the background are relatively sensitive. The change of , its adaptability will become worse, which will affect the detection accuracy of crowd density.

另外，现有技术中针对于人群密度检测的系统，通常是将获取到的图像通过网络传送到远端控制中心，通过远端控制中心对图像进行分析后检测到的，这类系统通信需要占用较大的带宽进行图像传输，具有图像处理过程慢且实时性差等缺陷。In addition, the systems for crowd density detection in the prior art usually transmit the acquired images to the remote control center through the network, and then detect them after analyzing the images through the remote control center. This kind of system communication needs to occupy Large bandwidth for image transmission has defects such as slow image processing and poor real-time performance.

发明内容Contents of the invention

本发明的第一目的在于克服现有技术的缺点与不足，提供一种检测精度高、计算量小的基于深度学习的人群密度检测方法。The first purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a method for detecting crowd density based on deep learning with high detection accuracy and low computational complexity.

本发明的第二目的在于通过一种用于实现上述方法的基于深度学习的人群密度检测系统。The second object of the present invention is to provide a crowd density detection system based on deep learning for realizing the above method.

本发明的第一目的通过下述技术方案实现：一种基于深度学习的人群密度检测方法，步骤如下：The first object of the present invention is achieved through the following technical solutions: a method for detecting crowd density based on deep learning, the steps are as follows:

S1、通过摄像头实时的获取每帧图像，取出前若干帧图像，然后对这若干帧图像进行背景学习，得到背景图像信息；S1. Obtain each frame of image in real time through the camera, take out the previous frames of images, and then perform background learning on these frames of images to obtain background image information;

S2、针对之后的各帧图像，依据步骤S1中获取到的背景图像信息，采用背景差分法提取出各帧图像中的目标前景图像；S2. For subsequent frames of images, according to the background image information obtained in step S1, the background difference method is used to extract the target foreground image in each frame of images;

S3、选取出多帧步骤S2中已提取出目标前景图像且属于低密度人群等级的图像，对选取出的各帧图像标定人群数量，依据上述选取的各帧图像中目标前景图像像素的数目和人群数量之间的关系拟合得到第一低密度人群模型，或者根据上述选取各帧图像中目标前景图像的边缘像素数目和人群数量之间的关系拟合得到第二低密度人群模型；同时选取出多帧步骤S2中已提取出目标前景图像且属于高密度人群等级中各个等级的图像作为训练样本，采用灰度共生矩阵提取各训练样本目标前景图像的纹理特征，将各训练样本目标前景图像的纹理特征输入至BP神经网络，对BP神经网络进行训练，得到高密度人群模型；S3. Select the multi-frame step S2 in which the target foreground image has been extracted and belongs to the low-density crowd level image, and the number of people in each selected frame image is calibrated, according to the number and the number of pixels of the target foreground image in each frame image selected above. The first low-density crowd model is obtained by fitting the relationship between the number of crowds, or the second low-density crowd model is obtained by fitting the relationship between the number of edge pixels of the target foreground image and the number of crowds in each frame image selected above; In step S2 of generating multi-frames, the target foreground image has been extracted and belongs to the images of each level in the high-density crowd level as training samples, and the gray level co-occurrence matrix is used to extract the texture features of the target foreground image of each training sample, and the target foreground image of each training sample is Input the texture features of the BP neural network into the BP neural network, and train the BP neural network to obtain a high-density crowd model;

S4、针对于步骤S2中获取到的需要检测人群密度的各帧图像，将图像的目标前景图像像素的数目输入至第一低密度人群模型，获取到人群数量，然后判断获取到的人群数量是否超过一定值F，若否，则根据上述获取到的人群数量确定出人群密度等级，若是，则进入步骤S5；S4. For each frame of image obtained in step S2 that needs to detect crowd density, input the number of target foreground image pixels of the image into the first low-density crowd model, obtain the number of crowds, and then judge whether the acquired number of crowds is Exceeding a certain value F, if not, then determine the crowd density level according to the above-mentioned acquired crowd number, if so, go to step S5;

或者针对于步骤S2中获取到的需要检测人群密度的各帧图像，将图像目标前景图像的边缘像素的数目输入至第二低密度人群模型，获取到人群数量，然后判断获取到的人群数量是否超过一定值F，若否，则根据上述获取到的人群数量确定出人群密度等级；若是，则进入步骤S5；Or, for each frame image obtained in step S2 that needs to detect crowd density, input the number of edge pixels of the image target foreground image into the second low-density crowd model to obtain the number of people, and then judge whether the number of people obtained is Exceeding a certain value F, if not, then determine the crowd density level according to the number of people obtained above; if so, go to step S5;

S5、采用灰度共生矩阵提取图像的目标前景图像的纹理特征，将提取的纹理特征输入至高密度人群模型中，通过高密度人群模型的输出获取到人群密度等级。S5. Using the gray level co-occurrence matrix to extract the texture features of the target foreground image of the image, input the extracted texture features into the high-density crowd model, and obtain the crowd density level through the output of the high-density crowd model.

优选的，所述步骤S1中背景学习的过程如下：Preferably, the process of background learning in the step S1 is as follows:

S11、针对取出的前若干帧图像中的第一帧图像首先转换成灰度图像，并且根据该帧灰度图像的每个像素点分别建立初始码本；第一帧图像的每个像素点对应一个初始码本，其中每个初始码本中包含一个码元，该码元记录的是第一帧图像中对应像素点的灰度值；并且设置开始学习阈值；S11, for the first frame image in the first several frame images taken out, it is first converted into a grayscale image, and an initial codebook is respectively established according to each pixel of the frame grayscale image; each pixel point of the first frame image corresponds to An initial codebook, wherein each initial codebook contains a symbol, which records the gray value of the corresponding pixel in the first frame image; and sets the learning threshold;

S12、针对于取出的前若干帧图像中第一帧图像之后的图像，每当获取到下一帧图像时，首先将该帧图像转换成灰度图像，并且针对该帧灰度图像的各像素点进行以下操作：S12. For the images after the first frame image in the first several frame images taken out, whenever the next frame image is acquired, first convert the frame image into a grayscale image, and for each pixel of the frame grayscale image Click to do the following:

将该帧灰度图像的像素点与之前帧灰度图像的相同位置像素点构成的当前码本进行码本匹配，检测该帧灰度图像像素点灰度值是否在之前帧灰度图像相同位置像素点构成的当前码本的某个码元的学习阈值范围内；Perform codebook matching between the pixels of the grayscale image of the frame and the current codebook formed by the pixels of the same position in the grayscale image of the previous frame, and detect whether the grayscale value of the pixel in the grayscale image of the frame is at the same position of the grayscale image of the previous frame Within the learning threshold range of a certain code unit of the current codebook composed of pixels;

若是，则根据该帧灰度图像的像素点灰度值更新该码元的码元成员变量，其中码元的成员变量中包括像素点的灰度值最大值和灰度值最小值；If so, then update the symbol member variable of the symbol according to the pixel point gray value of the frame gray image, wherein the member variable of the code element includes the gray value maximum value and the gray value minimum value of the pixel point;

若否，则根据该帧灰度图像的像素点的灰度值建立一个新码元，通过该新码元记录该帧灰度图像的该像素点的灰度值，并且添加到当前码本中，得到更新后的码本，同时更新当前学习阈值；If not, then set up a new code element according to the gray value of the pixel point of the frame gray image, record the gray value of the pixel point of the frame gray image by the new code element, and add it to the current codebook , get the updated codebook and update the current learning threshold at the same time;

S13、检测S12中获取到的帧是否为取出的前若干帧图像中的最后一帧图像S13, detect whether the frame obtained in S12 is the last frame image in the previous several frame images taken out

若否，则在获取到下一帧图像时，继续执行步骤S12；If not, when the next frame of image is acquired, continue to execute step S12;

若是，则背景学习完成，根据步骤S12获取到的各像素点分别对应的码本获取到背景图像信息。If yes, the background learning is completed, and the background image information is obtained according to the codebook corresponding to each pixel obtained in step S12.

更进一步的，所述步骤S11中将开始学习阈值设置为10。Furthermore, in the step S11, the learning start threshold is set to 10.

更进一步的，所述步骤S12中通过对当前学习阈值进行加1以实现更新。Furthermore, in the step S12, the update is implemented by adding 1 to the current learning threshold.

更进一步的，所述步骤S12中码元的学习阈值范围为：码元记录的像素点灰度值-学习阈值～码元记录的像素点灰度值+学习阈值。Furthermore, the range of the learning threshold of the symbol in the step S12 is: the gray value of the pixel recorded in the symbol - the learning threshold - the gray value of the pixel recorded in the symbol + the learning threshold.

优选的，还包括步骤S6、判断各帧图像通过步骤S4或S5获取到的人群密度等级检测结果是否正常；具体过程如下：Preferably, step S6 is also included, judging whether the detection result of the crowd density level obtained by each frame image through step S4 or S5 is normal; the specific process is as follows:

获取当前帧图像的前一帧图像人群密度等级和后一帧图像人群密度等级；将当前帧图像人群密度等级及其前一帧图像人群密度等级和后一帧图像人群密度等级进行比较：Get the crowd density level of the previous frame image and the crowd density level of the next frame image of the current frame image; compare the crowd density level of the current frame image with the crowd density level of the previous frame image and the crowd density level of the next frame image:

若当前帧图像人群密度等级与其前一帧图像人群密度等级和后一帧图像人群密度等级均不相同，则判断当前帧图像人群密度等级检测出错，根据该帧的实际所属人群密度等级作为下一次第一低密度人群模型、第二低密度人群模型或高密度人群模型的训练样本；If the crowd density level of the current frame image is different from the crowd density level of the previous frame image and the crowd density level of the next frame image, it is judged that the detection of the crowd density level of the current frame image is wrong, and the next time is determined according to the actual crowd density level of the frame. A training sample of the first low-density crowd model, the second low-density crowd model or the high-density crowd model;

若当前帧图像人群密度等级与其前一帧图像人群密度等级不相同，而与其下一帧图像人群密度等级相同，则认定人群密度在前一帧图像和当前帧图像之间发生了突变，当前帧图像人群密度等级检测正常；If the crowd density level of the current frame image is different from the crowd density level of the previous frame image, but the same as the crowd density level of the next frame image, it is determined that the crowd density has undergone a mutation between the previous frame image and the current frame image, and the current frame image The image crowd density level detection is normal;

若当前帧图像人群密度等级与其前一帧图像人群密度等级相同，而与其下一帧图像人群密度等级不相同，则认定人群密度在当前帧图像和下一帧图像之间发生了突变，当前帧图像人群密度等级检测正常。If the crowd density level of the current frame image is the same as the crowd density level of the previous frame image, but not the same as the crowd density level of the next frame image, it is determined that the crowd density has undergone a mutation between the current frame image and the next frame image, and the current frame image The image crowd density level detection is normal.

优选的，所述纹理特征包括ASM能量、对比度、逆差矩、熵和自相关。Preferably, the texture features include ASM energy, contrast, inverse moment, entropy and autocorrelation.

优选的，步骤S1中取出前30帧图像，然后对这30帧图像进行背景学习，得到背景图像信息。Preferably, in step S1, the first 30 frames of images are taken out, and then background learning is performed on these 30 frames of images to obtain background image information.

本发明的第二目的通过下述技术方案实现：一种用于实现上述人群密度检测方法的基于深度学习的人群密度检测系统，包括摄像头，用于实时的获取每帧图像，其特征在于，还包括本地图像处理装置和控制中心，所述摄像头通过数据线连接本地图像处理装置，所述本地图像处理装置通过网络连接控制中心；The second object of the present invention is achieved through the following technical solutions: a crowd density detection system based on deep learning for implementing the above crowd density detection method, including a camera for real-time acquisition of each frame of image, characterized in that: It includes a local image processing device and a control center, the camera is connected to the local image processing device through a data line, and the local image processing device is connected to the control center through a network;

所述本地图像处理装置，用于针对各帧图像检测人群密度，并且将各帧图像对应的人群密度信息通过网络发送至控制中心；所述本地图像处理装置包括：The local image processing device is used to detect crowd density for each frame image, and send the crowd density information corresponding to each frame image to the control center through the network; the local image processing device includes:

背景建模模块，用于从摄像头中获取前若干帧图像，并且针对这若干帧图像进行背景学习，得到背景图像信息；The background modeling module is used to obtain the previous frames of images from the camera, and perform background learning on these frames of images to obtain background image information;

背景差分模块，用于针对之后的各帧图像，依据背景图像信息，采用背景差分法提取出各帧图像中的目标前景图像；The background difference module is used to extract the target foreground image in each frame image by using the background difference method according to the background image information for each subsequent frame image;

边缘检测模块，用于针对图像中目标前景图像进行边缘检测；An edge detection module is used to perform edge detection for the target foreground image in the image;

像素统计模块，用于统计图像中目标前景图像的像素数目，用于统计图像中目标前景图像的边缘像素数目；A pixel statistics module is used to count the number of pixels of the target foreground image in the image, and is used to count the number of edge pixels of the target foreground image in the image;

纹理特征提取模块，用于采用灰度共生矩阵针对图像中的目标前景图像进行纹理特征特提取；The texture feature extraction module is used to extract the texture feature for the target foreground image in the image by using the gray level co-occurrence matrix;

低密度人群模型建立模块，用于依据选取的属于低密度人群等级的各帧图像中目标前景图像像素的数目及其标定的人群数量之间的关系拟合得到第一低密度人群模型，或者用于根据选取的属于低密度人群等级的各帧图像中目标前景图像的边缘像素数目和标定的人群数量之间的关系拟合得到第二低密度人群模型；The low-density crowd model building module is used to obtain the first low-density crowd model according to the relationship between the number of pixels of the target foreground image in each frame image of the selected low-density crowd level and the number of people calibrated, or use Obtaining the second low-density crowd model according to the relationship between the number of edge pixels of the target foreground image and the number of calibrated crowds in each frame image selected to belong to the low-density crowd level;

高密度人群模型建立模块，用于将属于高密度人群等级中各个等级的训练样本图像对应的纹理特征输入至BP神经网络，对BP神经网络进行训练，建立得到高密度人群模型；The high-density crowd model building module is used to input the texture features corresponding to the training sample images belonging to each level in the high-density crowd level to the BP neural network, train the BP neural network, and establish the high-density crowd model;

低密度人群密度检测模块，用于针对于需要检测人群密度的各帧图像，将图像的目标前景图像像素的数目输入至第一低密度人群模型，获取到该帧人群数量，当检测到人群数量超过一定值F时，则将该帧图像输入至高密度人群密度检测模块，当检测到人群数量未超过一定值F时，则人群数量获取到该帧图像的人群密度等级；用于针对于需要检测人群密度的各帧图像，将图像的目标前景图像的边缘像素数目输入至第一低密度人群模型，获取到该帧人群数量，当检测到人群数量超过一定值F时，则将该帧图像输入至高密度人群密度检测模块，当检测到人群数量未超过一定值F时，则人群数量获取到该帧图像的人群密度等级；The low-density crowd density detection module is used to input the number of target foreground image pixels of the image into the first low-density crowd model for each frame image that needs to detect the crowd density, and obtain the number of crowds in this frame. When the number of crowds is detected When it exceeds a certain value F, the frame image is input to the high-density crowd density detection module. When it is detected that the number of crowds does not exceed a certain value F, the crowd number obtains the crowd density level of the frame image; it is used to detect For each frame image of crowd density, input the number of edge pixels of the target foreground image of the image into the first low-density crowd model to obtain the number of people in this frame, and when the number of people detected exceeds a certain value F, then input the frame image To the high-density crowd density detection module, when it is detected that the number of crowds does not exceed a certain value F, the number of crowds is obtained to the crowd density level of the frame image;

高密度人群密度检测模块，用于在接收到低密度人群检测模块输入的图像时，首先通过纹理特征提取模块获取到该帧图像目标前景图像的纹路特征，将该纹理特征输入至高密度人群模型，通过高密度人群模型获取到该帧图像的人群密度等级。The high-density crowd density detection module is used to first obtain the texture feature of the target foreground image of the frame image through the texture feature extraction module when receiving the image input by the low-density crowd detection module, and input the texture feature to the high-density crowd model, The crowd density level of the frame image is obtained through the high-density crowd model.

优选的，所述本地图像处理装置为ARM开发板；所述本地图像处理装置中的背景建模模块、背景差分模块、边缘检测模块、像素统计模块、纹理特征提取模块、低密度人群模型建立模块、高密度人群模型建立模块、低密度人群密度检测模块和高密度人群密度检测模块均由ARM开发板中软件平台搭建构成。Preferably, the local image processing device is an ARM development board; the background modeling module, background difference module, edge detection module, pixel statistics module, texture feature extraction module, and low-density crowd model building module in the local image processing device , high-density crowd model building module, low-density crowd density detection module and high-density crowd density detection module are all composed of the software platform in the ARM development board.

本发明相对于现有技术具有如下的优点及效果：Compared with the prior art, the present invention has the following advantages and effects:

(1)本发明首先通过摄像头获取到的前若干帧图像进行背景学习，得到背景图像信息，然后根据背景图像信息，针对接下来获取到的各帧图像提取目标前景图像。接下来选取出多帧已提取出目标前景图像且属于低密度人群等级的图像，并且对这些图像标定人群数量，以通过像素统计法建立低密度人群模型；同时选取出多帧通过上述方法已提取出目标前景图像且属于高密度人群等级中各个等级的图像作为训练样本，并且各训练样本目标前景图像的纹理特征输入至BP神经网络，对BP神经网络进行训练，得到高密度人群模型；针对需要检测人群密度的各帧图像，首先输入至低密度人群模型，当低密度人群模型获取到人群数量结果未超过一定值时，则根据人群数量判断出人群密度等级，当低密度人群模型获取到人群数量结果超过一定值时，则将该帧图像输入至高密度人群模型，通过高密度人群模型判断出人群密度。本发明将基于像素统计方式和基于纹理特征的方式相结合，通过基于像素统计的方法获取到低密度人群的人群密度等级，而当基于像素统计的方法无法进行正确判断的高密度人群通过基于纹理特征的方法进行人群密度等级的检测，具有检测精度高、计算量小的优点。并且本发明方法中的背景图像为通过摄像头获取的前若干帧图像进行学习而获取到的，其中每个像素的时间序列模型在建模过程中可以适应运动，能很好地处理时间起伏，通过背景学习可以到复杂的动态背景，因此通过背景学习而获取到的背景图像能够获取到更加准确的目标前景图像，进一步提高了人群密度检测的准确性。(1) The present invention firstly performs background learning through the first frames of images acquired by the camera to obtain background image information, and then extracts target foreground images for each frame of images acquired next according to the background image information. Next, select multiple frames that have extracted the target foreground image and belong to the low-density crowd level, and calibrate the number of crowds on these images, so as to establish a low-density crowd model through the pixel statistics method; at the same time, select multiple frames that have been extracted by the above method The target foreground image and the images belonging to each level in the high-density crowd level are used as training samples, and the texture features of the target foreground image of each training sample are input to the BP neural network, and the BP neural network is trained to obtain the high-density crowd model; Each frame image for detecting crowd density is first input into the low-density crowd model. When the result of the crowd number obtained by the low-density crowd model does not exceed a certain value, the crowd density level is judged according to the crowd number. When the low-density crowd model gets the crowd When the number result exceeds a certain value, the frame image is input to the high-density crowd model, and the crowd density is judged by the high-density crowd model. The present invention combines the method based on pixel statistics with the method based on texture features, and obtains the crowd density level of low-density crowd through the method based on pixel statistics, and when the high-density crowd that cannot be correctly judged by the method based on pixel statistics The feature method is used to detect the crowd density level, which has the advantages of high detection accuracy and small calculation amount. And the background image in the method of the present invention is obtained by learning the first several frames of images obtained by the camera, wherein the time series model of each pixel can adapt to motion during the modeling process, and can handle time fluctuations well. Background learning can detect complex dynamic backgrounds, so the background images obtained through background learning can obtain more accurate target foreground images, further improving the accuracy of crowd density detection.

(2)本发明人群密度检测系统主要由摄像头、本地图像处理装置以及控制中心构成，其中摄像头获取到的各帧图像直接通过数据线传送给本地图像处理装置，本地图像处理器针对摄像头传送的各帧图像进行处理后得到人群密度，并且将人群密度发送至控制中心，可见本发明是通过本地图像处理器直接处理图像的，不需要通过网络将庞大的图像发送到后台进行处理，只需要占用少部分的带宽传送检测到的人群密度结果值控制中心即可，因此本发明人群密度检测系统具有占用带宽少且处理速度快的优点。(2) The crowd density detection system of the present invention is mainly composed of a camera, a local image processing device, and a control center, wherein each frame of image obtained by the camera is directly transmitted to the local image processing device through a data line, and the local image processor is directed to each image transmitted by the camera. After the frame image is processed, the crowd density is obtained, and the crowd density is sent to the control center. It can be seen that the present invention directly processes the image through the local image processor, and does not need to send a huge image to the background for processing through the network, and only needs to occupy a small Part of the bandwidth is enough to transmit the detected crowd density result value to the control center. Therefore, the crowd density detection system of the present invention has the advantages of less bandwidth occupation and fast processing speed.

附图说明Description of drawings

图1是本发明人群密度检测方法流程图。Fig. 1 is a flow chart of the crowd density detection method of the present invention.

具体实施方式detailed description

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

实施例Example

本实施例公开了一种基于深度学习的人群密度检测方法，如图1所示，步骤如下：This embodiment discloses a method for detecting crowd density based on deep learning, as shown in Figure 1, the steps are as follows:

S1、通过摄像头实时的获取每帧图像，取出前若干帧图像，然后对这若干帧图像进行背景学习，得到背景图像信息；本实施例中取出前30帧图像进行背景学习。S1. Obtain each frame of image in real time through the camera, take out the first several frames of images, and then perform background learning on these several frames of images to obtain background image information; in this embodiment, take out the first 30 frames of images for background learning.

本步骤中背景学习的过程如下：The process of background learning in this step is as follows:

S11、针对取出的前若干帧图像中的第一帧图像首先转换成灰度图像，并且根据该帧灰度图像的每个像素点分别建立初始码本；第一帧图像的每个像素点对应一个初始码本，其中每个初始码本中包含一个码元，该码元记录的是第一帧图像中对应像素点的灰度值；并且设置开始学习阈值；在本实施例中开始学习阈值设置为10；S11, for the first frame image in the first several frame images taken out, it is first converted into a grayscale image, and an initial codebook is respectively established according to each pixel of the frame grayscale image; each pixel point of the first frame image corresponds to An initial codebook, wherein each initial codebook contains a code element, and what this code element records is the gray value of the corresponding pixel in the first frame image; and the start learning threshold is set; in the present embodiment, the start learning threshold set to 10;

将该帧灰度图像的像素点与之前帧灰度图像的相同位置像素点构成的当前码本进行码本匹配，检测该帧灰度图像像素点灰度值是否在之前帧灰度图像相同位置像素点构成的当前码本的某个码元的学习阈值范围内；其中码元的学习阈值范围为：码元记录的像素点灰度值-学习阈值～码元记录的像素点灰度值+学习阈值。Perform codebook matching between the pixels of the grayscale image of the frame and the current codebook formed by the pixels of the same position in the grayscale image of the previous frame, and detect whether the grayscale value of the pixel in the grayscale image of the frame is at the same position of the grayscale image of the previous frame Within the learning threshold range of a certain code element in the current codebook composed of pixels; where the learning threshold range of the code element is: the gray value of the pixel recorded in the code - the learning threshold ~ the gray value of the pixel recorded in the code + learning threshold.

若否，则根据该帧灰度图像的像素点的灰度值建立一个新码元，通过该新码元记录该帧灰度图像的该像素点的灰度值，并且添加到当前码本中，得到更新后的码本，同时更新当前学习阈值；本实施例中通过对当前学习阈值进行加1以实现学习阈值的更新；If not, then set up a new code element according to the gray value of the pixel point of the frame gray image, record the gray value of the pixel point of the frame gray image by the new code element, and add it to the current codebook , obtain the updated codebook, and update the current learning threshold at the same time; in this embodiment, the update of the learning threshold is realized by adding 1 to the current learning threshold;

S4、针对于步骤S2中获取到的需要检测人群密度的各帧图像，将图像的目标前景图像像素的数目输入至第一低密度人群模型，获取到该帧人群数量，然后判断获取到的人群数量是否超过一定值F，若否，则根据上述获取到的人群数量获取到该帧图像的人群密度等级，若是，则进入步骤S5；S4. For each frame of image obtained in step S2 that needs to detect crowd density, input the number of pixels of the target foreground image of the image into the first low-density crowd model, obtain the number of crowd in this frame, and then judge the acquired crowd Whether the number exceeds a certain value F, if not, the crowd density level of the frame image is obtained according to the above-mentioned obtained crowd number, if so, then enter step S5;

或者针对于步骤S2中获取到的需要检测人群密度的各帧图像，将图像目标前景图像的边缘像素的数目输入至第二低密度人群模型，获取到该帧图像的人群数量，然后判断获取到的人群数量是否超过一定值F，若否，则根据上述获取到的人群数量确定出到该帧图像的人群密度等级若是，则进入步骤S5；Or, for each frame image obtained in step S2 that needs to detect the crowd density, input the number of edge pixels of the image target foreground image into the second low-density crowd model, obtain the number of crowds in the frame image, and then determine the acquired Whether the number of people in the crowd exceeds a certain value F, if not, then determine the crowd density level of the frame image according to the number of people obtained above, and if so, enter step S5;

S5、采用灰度共生矩阵提取该帧图像的目标前景图像的纹理特征，将提取的纹理特征输入至高密度人群模型中，通过高密度人群模型的输出获取到人群密度等级。S5. Extract the texture features of the target foreground image of the frame image by using the gray level co-occurrence matrix, input the extracted texture features into the high-density crowd model, and obtain the crowd density level through the output of the high-density crowd model.

S6、判断各帧图像通过步骤S4或S5获取到的人群密度等级检测结果是否正常；具体过程如下：S6. Judging whether the detection result of the crowd density level obtained by each frame image through step S4 or S5 is normal; the specific process is as follows:

其中上述提到的图像的目标前景图像的纹理特征包括ASM能量(angular secondmoment)、对比度(contrast)、逆差矩(inverse different moment)、熵(entropy)和自相关(correlation)。The texture features of the target foreground image of the above-mentioned image include ASM energy (angular second moment), contrast (contrast), inverse difference moment (inverse different moment), entropy (entropy) and autocorrelation (correlation).

本实施例还公开了一种用于实现人群密度检测方法的基于深度学习的人群密度检测系统，包括用于实时的获取每帧图像摄像头、本地图像处理装置和控制中心，摄像头通过数据线连接本地图像处理装置，本地图像处理装置通过网络连接控制中心；其中本地图像处理装置，用于针对各帧图像检测人群密度，并且将各帧图像对应的人群密度信息通过网络发送至控制中心；This embodiment also discloses a crowd density detection system based on deep learning for implementing the crowd density detection method, including a camera for real-time acquisition of each frame of image, a local image processing device and a control center, and the camera is connected to the local An image processing device, the local image processing device is connected to the control center through a network; wherein the local image processing device is used to detect crowd density for each frame of image, and send the crowd density information corresponding to each frame of image to the control center through the network;

本实施例中本地图像处理装置包括：In this embodiment, the local image processing device includes:

在本实施例中本地图像处理装置为三星S5PV210处理器，三星S5PV210处理器的华清FS210开发板(根据实际需要可选择其他开发板)将程序移植到ARM板平台。本地图像处理装置中的背景建模模块、背景差分模块、边缘检测模块、像素统计模块、纹理特征提取模块、低密度人群模型建立模块、高密度人群模型建立模块、低密度人群密度检测模块和高密度人群密度检测模块均由ARM开发板中软件平台搭建构成。In this embodiment, the local image processing device is a Samsung S5PV210 processor, and the Huaqing FS210 development board of the Samsung S5PV210 processor (other development boards can be selected according to actual needs) transplants the program to the ARM board platform. The background modeling module, background difference module, edge detection module, pixel statistics module, texture feature extraction module, low-density crowd model building module, high-density crowd model building module, low-density crowd density detection module and high-density crowd model building module in the local image processing device Density Crowd density detection modules are all composed of software platforms built on ARM development boards.

S5PV210采用了ARM CortexTM-A8内核，ARM V7指令集，主频可达1GHZ，64/32位内部总线结构，32/32KB的数据/指令一级缓存，512KB的二级缓存，可以实现2000DMIPS(每秒运算20亿条指令集)的高性能运算能力。包含很多强大的硬件编解码功能，支持MPEG-1/2/4，H.263，H.264等格式视频的编解码，支持模拟/数字TV输出。JPEG硬件编解码，最大支持8000x8000分辨率。S5PV210 adopts ARM CortexTM-A8 core, ARM V7 instruction set, main frequency up to 1GHZ, 64/32-bit internal bus structure, 32/32KB data/instruction level-1 cache, 512KB level-2 cache, and can achieve 2000DMIPS (per 2 billion instruction sets per second) high-performance computing capability. Contains many powerful hardware codec functions, supports video codec in MPEG-1/2/4, H.263, H.264 and other formats, and supports analog/digital TV output. JPEG hardware codec, maximum support 8000x8000 resolution.

内建高性能PowerVR SGX540 3D图形引擎和2D图形引擎，支持2D/3D图形加速，是第五代PowerVR产品，其多边形生成率为2800万多边形/秒，像素填充率可达2.5亿/秒，在3D和多媒体方面比以往大幅提升，能够支持DX9，SM3.0，OpenGL2.0等PC级别显示技术。Built-in high-performance PowerVR SGX540 3D graphics engine and 2D graphics engine, supports 2D/3D graphics acceleration, is the fifth generation of PowerVR products, its polygon generation rate is 28 million polygons per second, and the pixel filling rate can reach 250 million per second. The 3D and multimedia aspects have been greatly improved than before, and can support PC-level display technologies such as DX9, SM3.0, and OpenGL2.0.

具备IVA3硬件加速器，具备出色的图形解码性能，可以支持全高清、多标准的视频编码，流畅播放和录制30帧/秒的1920×1080像素(1080p)的视频文件，可以更快解码更高质量的图像和视频，同时，内建的HDMIv1.3，可以将高清视频输出到外部显示器上。Equipped with IVA3 hardware accelerator, it has excellent graphics decoding performance, can support full HD, multi-standard video encoding, smoothly play and record 30 frames per second 1920×1080 pixel (1080p) video files, and can decode faster and higher quality images and videos, at the same time, the built-in HDMIv1.3 can output high-definition video to an external display.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.

Claims

1. a kind of crowd density detection method based on deep learning, it is characterised in that step is as follows：

S1, obtain every two field picture in real time by camera, then some two field pictures before taking out are carried on the back to this some two field picture Scape learns, and obtains background image information；

S2, each two field picture for after, according to the background image information got in step S1, are extracted using background subtraction The target prospect image gone out in each two field picture；

S3, the image for having extracted target prospect image in multiframe step S2 and having belonged to low density crowd grade is selected, to choosing Each two field picture demarcation crowd's quantity taken out, number and people according to target prospect image pixel in each two field picture of above-mentioned selection Relation fitting between group's quantity obtains the first low density crowd model, or according to target prospect in each two field picture of above-mentioned selection Relation fitting between the edge pixel number and crowd's quantity of image obtains the second low density crowd model；Select simultaneously many Target prospect image has been extracted in frame step S2 and belongs to the image of each grade in Dense crowd grade as training sample This, the textural characteristics of each training sample target prospect image is extracted using gray level co-occurrence matrixes, by each training sample target prospect The textural characteristics of image are inputted to BP neural network, and BP neural network is trained, and obtain Dense crowd model；

S4, each two field picture for being directed to detection crowd density the need for being got in step S2, by the target prospect image of image The number of pixel is inputted to the first low density crowd model, gets crowd's quantity, then judges that the crowd's quantity got is It is no to exceed certain value F, if it is not, then crowd density grade is determined according to the above-mentioned crowd's quantity got, if so, then entering step Rapid S5；

Or be directed to got in step S2 the need for detect crowd density each two field picture, by image object foreground image The number of edge pixel is inputted to the second low density crowd model, gets crowd's quantity, then judges the crowd's number got Whether amount exceedes certain value F, if it is not, then determining crowd density grade according to the above-mentioned crowd's quantity got；If so, then entering Enter step S5；

S5, extracted using gray level co-occurrence matrixes image target prospect image textural characteristics, the textural characteristics of extraction are inputted In high density people's group model, crowd density grade is got by the output of Dense crowd model.

2. the crowd density detection method according to claim 1 based on deep learning, it is characterised in that the step S1 The process of middle Background learning is as follows：

S11, it is first converted into gray level image for the first two field picture in some two field pictures before taking-up, and according to frame ash Each pixel of degree image sets up initial codebook respectively；Each pixel one initial codebook of correspondence of first two field picture, its In in each initial codebook comprising a code element, code element record be corresponding pixel points in the first two field picture gray value；And And beginning training threshold value is set；

S12, the image after the first two field picture in preceding some two field pictures of taking-up is directed to, whenever getting next two field picture When, the two field picture is converted into gray level image first, and following operate is carried out for each pixel of the frame gray level image：

By the pixel of the frame gray level image therewith previous frame gray level image same position pixel constitute current code book carry out Code book match, detect the frame gray level image pixel gray value whether previous frame gray level image same position pixel constitute In the range of the training threshold value of some code element of current code book；

If so, then update the symbol members variable of the code element according to the pixel gray value of the frame gray level image, wherein code element Member variable includes the gray value maximum and gray value minimum value of pixel；

If it is not, a new code element is then set up according to the gray value of the pixel of the frame gray level image, should by the new code element record The gray value of the pixel of frame gray level image, and be added in current code book, the code book after being updated, work as while updating Preceding training threshold value；

Whether the frame that gets is last frame image in the preceding some two field pictures taken out in S13, detection S12

If it is not, then when getting next two field picture, continuing executing with step S12；

If so, then Background learning is completed, corresponding code book gets Background to each pixel got according to step S12 respectively As information.

3. the crowd density detection method according to claim 2 based on deep learning, it is characterised in that the step Training threshold value will be started in S11 and be set to 10.

4. the crowd density detection method according to claim 2 based on deep learning, it is characterised in that the step By carrying out Jia 1 current training threshold value to realize renewal in S12.

5. the crowd density detection method according to claim 2 based on deep learning, it is characterised in that the step The training threshold value scope of code element is in S12：The pixel ash of pixel gray value-training threshold value of code element record~code element record Angle value+training threshold value.

6. the crowd density detection method according to claim 1 based on deep learning, it is characterised in that also including step S6, judge whether the crowd density grade testing result that each two field picture is got by step S4 or S5 is normal；Detailed process is such as Under：

Obtain the previous frame image crowd density grade and latter two field picture crowd density grade of current frame image；By present frame figure As crowd density grade and its previous frame image crowd density grade and latter two field picture crowd density grade are compared：

If current frame image crowd density grade and its previous frame image crowd density grade and latter two field picture crowd density etc. Level is differed, then judges the detection error of current frame image crowd density grade, crowd density etc. according to belonging to the reality of the frame Level as the first low density crowd model, the second low density crowd model or Dense crowd model next time training sample；

If current frame image crowd density grade is differed with its previous frame image crowd density grade, and with its next two field picture Crowd density grade is identical, then assert that crowd density is mutated between previous frame image and current frame image, present frame The detection of image crowd density grade is normal；

If current frame image crowd density grade is identical with its previous frame image crowd density grade, and with its next two field picture people Population density grade is differed, then assert that crowd density is mutated between current frame image and next two field picture, present frame The detection of image crowd density grade is normal.

7. the crowd density detection method according to claim 1 based on deep learning, it is characterised in that the texture is special Levy including ASM energy, contrast, unfavourable balance square, entropy and auto-correlation.

8. the crowd density detection method according to claim 1 based on deep learning, it is characterised in that taken in step S1 Go out preceding 30 two field picture, Background learning then is carried out to this 30 two field picture, background image information is obtained.

9. a kind of detection of the crowd density based on deep learning system for being used to realize crowd density detection method described in claim 1 System, including camera, for obtaining in real time per two field picture, it is characterised in that also including local image processing apparatus and control Center, the camera connects local image processing apparatus by data wire, and the local image processing apparatus is connected by network Meet control centre；

The local image processing apparatus, for detecting crowd density for each two field picture, and by the corresponding people of each two field picture Population density information is sent to control centre by network；The local image processing apparatus includes：

Background modeling module, backgrounds are carried out for obtaining preceding some two field pictures from camera, and for this some two field picture Study, obtains background image information；

Background difference block, for each two field picture for after, according to background image information, is extracted using background subtraction Target prospect image in each two field picture；

Edge detection module, for carrying out rim detection for target prospect image in image；

Pixels statisticses module, for the number of pixels of target prospect image in statistical picture, for target prospect in statistical picture The edge pixel number of image；

Texture feature extraction module, the target prospect image for being directed to using gray level co-occurrence matrixes in image carries out textural characteristics Spy extracts；

Low density crowd model building module, before according to target in each two field picture for belonging to low density crowd grade chosen Relation fitting between the number of scape image pixel and its crowd's quantity of demarcation obtains the first low density crowd model, Huo Zheyong The edge pixel number of target prospect image and demarcation in each two field picture for belonging to low density crowd grade according to selection Relation fitting between crowd's quantity obtains the second low density crowd model；

Dense crowd model building module, for the training sample image pair by each grade in Dense crowd grade is belonged to The textural characteristics answered are inputted to BP neural network, and BP neural network is trained, and foundation obtains Dense crowd model；

Low density crowd Density Detection module, for being directed to each two field picture for needing to detect crowd density, by the target of image The number of foreground image pixel is inputted to the first low density crowd model, gets frame crowd's quantity, when detecting crowd's number When amount exceedes certain value F, then the two field picture is inputted into high density crowd density detection module, when the crowd quantity of detecting does not surpass When crossing certain value F, then crowd's quantity gets the crowd density grade of the two field picture；Need to detect crowd density for being directed to Each two field picture, the edge pixel number of the target prospect image of image is inputted to the first low density crowd model, got Frame crowd's quantity, when detecting crowd's quantity more than certain value F, then inputs high density crowd density by the two field picture and examines Module is surveyed, when detecting crowd's quantity not less than certain value F, then crowd's quantity gets crowd density of the two field picture etc. Level；

Dense crowd Density Detection module, for when receiving the image of low density crowd detection module input, leading to first The patterned feature that texture feature extraction module gets the two field picture target prospect image is crossed, the textural characteristics are inputted to highly dense People's group model is spent, the crowd density grade of the two field picture is got by Dense crowd model.

10. the crowd density detecting system according to claim 9 based on deep learning, it is characterised in that described local Image processing apparatus is ARM development boards；Background modeling module, background difference block in the local image processing apparatus, side Edge detection module, pixels statisticses module, texture feature extraction module, low density crowd model building module, Dense crowd mould Type sets up module, low density crowd Density Detection module and Dense crowd Density Detection module by software in ARM development boards Platform building is constituted.