CN107657226B - People number estimation method based on deep learning - Google Patents
People number estimation method based on deep learning Download PDFInfo
- Publication number
- CN107657226B CN107657226B CN201710862828.1A CN201710862828A CN107657226B CN 107657226 B CN107657226 B CN 107657226B CN 201710862828 A CN201710862828 A CN 201710862828A CN 107657226 B CN107657226 B CN 107657226B
- Authority
- CN
- China
- Prior art keywords
- training
- image
- neural network
- network
- people
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a people number estimation method based on deep learning, and belongs to people density estimation based on deep learning. The method adopts a single-row convolutional neural network based on convolutional layers and pooling layers, learns the crowd characteristics through training of a large number of samples, thereby estimating the crowd density map of the input image, and further integrating the density map to obtain the estimation of the number of crowds on the image. Compared with other current deep learning algorithms, the convolutional neural network adopted by the invention has the advantages of simple structure, low complexity, short training time and higher estimation accuracy.
Description
Technical Field
The invention belongs to the technical field of digital images, and particularly relates to crowd density estimation based on deep learning.
Background
With the rapid development of scientific technology and the continuous improvement of economic level, the living demand of people is higher and higher, so that the rapid development of artificial intelligence is promoted, and the artificial intelligence technology is gradually applied to various fields including intelligent driving, intelligent monitoring, security and the like. The method has important application value in the fields of intelligent monitoring and security protection by estimating the number of people through the video images, and is beneficial to timely evacuating over-dense people and preventing safety accidents such as trampling and the like in large public places such as large activity sites, railway stations and the like by estimating the number of people in time through the images. In addition, the method can also be used for abnormal warning signals and the like.
Current people counting algorithms can be summarized in 3 categories:
(1) the method based on target detection comprises the following steps:
establishing a detection model according to the target characteristics of the pedestrians, wherein the selected target characteristics comprise human heads, overall pedestrian targets, combination of head and shoulder contours and the like, training a detector according to the characteristics, detecting the targets by combining a sliding window method, and counting the number of the detected targets, namely the number of people. The detector is mainly in a form of a feature plus classifier, the features mainly adopt features such as HOG (histogram of gradient directions), LBP (local binary pattern) and the like, and the classifier mainly adopts Adaboost, SVM and the like. The method based on detection has high accuracy dependency on the used target detection method, is only suitable for scenes with simple background, sparse number of people and no or less shelters among pedestrians, and has lower practicability and popularization.
(2) A method based on density map or population regression:
this method estimates the number of people in an image by building a regression model between image features and the number of people, or between image features and a population density map. The commonly used features include edge features, texture features, and the like, and the commonly used regression functions mainly include gaussian regression, linear regression, and the like. The method is mainly used for monitoring video scenes, and a target area in a video image is extracted by utilizing foreground segmentation so as to extract effective features. However, the algorithm mainly depends on feature selection, the accuracy of the existing methods based on edge information, texture information, fusion of multiple feature information and the like is poor, how to design effective features is still the main problem of the algorithm, and the method has high dependence on scenes and poor migratable capability among different scenes, namely poor generalization capability.
(3) The method based on deep learning comprises the following steps:
deep learning shows remarkable superiority in a plurality of research fields of computer vision at present, and although the deep learning algorithm is not applied to people counting, the algorithm has remarkable improvement in accuracy and generalization compared with the traditional algorithm. The method utilizes the deep convolutional neural network, trains the network learning population characteristics through a large number of labeled samples, and therefore outputs the number of people in the image. However, the existing deep learning algorithm mostly adopts a multi-column convolutional neural network, and has the problems of high complexity, large sample requirement and long training time.
Disclosure of Invention
The invention aims to: aiming at the existing problems, the people number estimation method based on the deep learning of the single-row convolutional neural network is provided.
The invention discloses a people number estimation method based on deep learning, which comprises the following steps:
constructing a deep neural network model: the single-row convolutional neural network is based on 10 convolutional layers and 2 pooling layers, wherein the sizes of convolutional kernels of the first 6 convolutional layers are all 5x5, the sizes of convolutional kernels of the 7 th to 9 th convolutional layers are all 3x3, and the size of a convolutional kernel of the last convolutional layer is 1x 1; the pooling mode of the 2 pooling layers adopts maximum pooling, and the size of each pooling core is 2x 2;
training the constructed deep neural network model by collecting training sample data to obtain a trained deep neural network model, wherein a loss function of the deep neural network modelWhereinDensity map obtained by network forward calculation, M is number of training samples, and real density map of input imageWherein (x-x)i) As an impulse function of the position of the head in the image, xiRepresenting the position of the human head, N is the total number of the human head, and G is a Gaussian kernel;
inputting the image to be estimated into the trained deep neural network model to obtain an estimated density map of the image to be estimated, and integrating the estimated density map to obtain the estimated number of people of the image to be estimated.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the method is based on the single-row convolutional neural network, the loss function is constructed by singly using the density map, the network structure is simple and effective, the estimation accuracy is improved, the network complexity is reduced, the model training time is reduced, and meanwhile the overfitting risk of the network is reduced.
Drawings
FIG. 1: the people number estimation processing flow diagram based on deep learning is shown.
FIG. 2: people number estimation convolutional neural network structure diagram.
FIG. 3: the Network structure of the existing people number estimation Network MCNN (Multi-Column probabilistic Neural Network) and the Neural Network Crowd-CNN of the invention is compared, wherein 3-a is the existing MCNN Network structure, and 3-b is the Crowd-CNN Network structure of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
The invention discloses a single-row convolutional neural network based on 10 convolutional layers and 2 pooling layers, which is named as crown-CNN for short, simplifies the existing deep learning network structure, and realizes the estimation of the number of people in an image. Referring to fig. 1, the specific implementation steps of the present invention are as follows:
step 1, constructing a deep neural network and training:
step 1-1 preparation of training data: aiming at the crown-CNN network structure, in the specific embodiment, databases UCSD, Shanghaitech PartA and Shanghaitech PartB which are commonly used in the field of people counting are adopted, and the marking information (ground route) of the sample is the head position information (x, y) in the image sample, namely the coordinates of the center pixel of the head in the image. And then calculating a density map as label (label) information of the network according to the head coordinates, and generating an LMDB data file (comprising training and test sample data) by using the sample image and the label information by using a tool under a Caffe frame.
Calculating a density map: and calculating a density map based on the Gaussian kernel of the sample according to the head position information in the training image sample. The density map based on a geometrically adapted gaussian kernel is calculated as:wherein (x-x)i) As an impulse function of the position of the head in the image, xiIs head position vector, i.e. head position information (x, y), N is total number of heads, G is Gaussian kernel.
Step 1-2, constructing a network: the overall structure of the deep learning network of the invention is shown in fig. 2, and the detailed structure is shown in fig. 3-b. It has 10 convolution layers and 2 pooling layers, and adopts maximum poolingAnd the loss function adopts a Euclidean distance loss function. The Euclidean distance Loss function (Euclidean Loss) is calculated as:whereinDensity map obtained for network forward calculation, and FnFor inputting imagesAnd (3) inputting the label information of the network into the real density graph F (x) calculated by the formula, wherein M is the number of training samples.
Step 1-3 training the network: and (3) loading the training data and the test data (LMDB files) generated in the step (1-2) and the network file constructed in the step (1-2) into a Caffe training execution process by utilizing a Caffe framework, calculating a network error through the forward calculation of the network and a loss function formula (2), reversely propagating the error, calculating an error gradient of each layer of weight of the network, updating the weight, and gradually reducing the network error. And continuously and circularly executing the process, searching the most effective network training parameters, reducing the network loss to the minimum or to a value meeting the requirement, namely finishing the training process of the network and obtaining a network model, wherein the process can be simply summarized as parameter optimization.
Step 2, testing:
sending the image to be detected into the network structure constructed in the step 1, loading the network model parameters trained in the step 1 for forward calculation to obtain an estimated density map of the imageIntegrating the density map to obtain the estimated number of people
The invention adopts two algorithm measurement standards which are universal in the field of people counting, namely average absolute error (MAE) and Mean Square Error (MSE), in the test experiment, and the two algorithm measurement standards are respectively used for measuring the accuracy and the stability of the algorithm.
wherein M is the number of test samples, ZiTo test the actual number of people in sample i,the number of people of the test sample i calculated for the network.
Compared with the MCNN network with better performance (the network structure is shown in figure 3-a) and the simple structure network provided by the invention, the network structure adopted by the invention is simple, the training time is greatly reduced, and the accuracy is ensured at the same time by the experimental test on the universal people counting database UCSD, Shanghaitech PartA and Shanghaitech PartB. The results of the experimental comparison are shown in tables 1, 2 and 3.
TABLE 1 network training iteration number comparison
TABLE 2MCNN network test results
MSE | MAE | |
Shanghaitech PartA | 173.2 | 110.2 |
Shanghaitech PartB | 41.3 | 26.4 |
UCSD | 1.35 | 1.07 |
TABLE 3Crowd-CNN network test results
MSE | MAE | |
Shanghaitech PartA | 170.38 | 109.05 |
Shanghaitech PartB | 42.1 | 26.04 |
UCSD | 1.21 | 1.03 |
The comparison and verification show that the accuracy of the people number estimation based on the crown-CNN network structure is high, compared with the MCNN network structure, the method has a simpler network structure, and the network parameters and the training time are greatly reduced, so that the requirement on the training data volume is greatly reduced, and the risk of network overfitting is reduced. And meanwhile, the error is also reduced.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.
Claims (1)
1. A method for estimating the number of people based on deep learning is characterized by comprising the following steps:
constructing a deep neural network model:
the single-row convolutional neural network is based on 10 convolutional layers and 2 pooling layers, wherein the sizes of convolutional kernels of the first 6 convolutional layers are all 5x5, the sizes of convolutional kernels of the 7 th to 9 th convolutional layers are all 3x3, and the size of a convolutional kernel of the last convolutional layer is 1x 1; the pooling mode of the 2 pooling layers adopts maximum pooling, and the size of each pooling core is 2x 2;
preparing training data:
adopting a common people counting database UCSD, Shanghaitech PartA and Shanghaitech PartB in the people counting field, wherein the marking information of the sample is the head position information (x, y) in the image sample, namely the coordinates of the head center pixel in the image; then calculating a density map as label information of the network according to the head coordinates, and generating an LMDB data file comprising training data and test data from the sample image and the label information by using a tool under a Caffe frame;
calculating a density map: calculating a density map of the sample based on a Gaussian kernel according to the head position information in the training image sample; based on density maps of geometrically adapted Gaussian kernelsThe calculation is as follows:wherein, (x-x)i) As an impulse function of the position of the head in the image, xiThe method comprises the following steps of (1) obtaining a head position vector, namely head position information (x, y), wherein N is the total number of heads, and G is a Gaussian kernel;
training the constructed deep neural network model based on training sample data to obtain a trained deep neural network model:
loading the generated training data and test data and the constructed network file of the deep neural network model into a Caffe training execution process by utilizing a Caffe framework, calculating a network error through the forward calculation of the network and a loss function L (theta), reversely propagating the error, calculating an error gradient of each layer of weight of the network, updating the weight, and gradually reducing a network error value; continuously and circularly executing the process, and searching the most effective network training parameters to reduce the network loss to the minimum or to a value meeting the requirement; wherein a loss function of the deep neural network modelWhereinDensity map obtained for network forward calculation, M being number of training samples, FnAccording to a formula for the input image Calculating to obtain a real density graph F (x), namely inputting label information of the network;
during training, the number of samples selected by one training sample is set to be 1, the learning rate base _ lr is set to be 1e-7, and the training iteration frequency of the trained deep neural network model is 80 ten thousand times;
inputting an image to be estimated into a trained deep neural network model to obtain an estimated density map of the image to be estimated, and integrating the estimated density map to obtain the estimated number of people of the image to be estimated;
testing the trained deep neural network model in a people counting database UCSD, Shanghaitech PartA and Shanghaitech PartB, wherein the average absolute error and mean square error corresponding to each people counting database are specifically as follows:
the average absolute error and the mean square error of the UCSD are respectively as follows: 1.03, 1.21;
the average absolute error and the mean square error of the people counting database Shanghaitech PartA are respectively as follows: 109.05, 170.38;
the average absolute error and the mean square error of the people counting database Shanghaitech PartB are 26.04 and 42.1 respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710862828.1A CN107657226B (en) | 2017-09-22 | 2017-09-22 | People number estimation method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710862828.1A CN107657226B (en) | 2017-09-22 | 2017-09-22 | People number estimation method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107657226A CN107657226A (en) | 2018-02-02 |
CN107657226B true CN107657226B (en) | 2020-12-29 |
Family
ID=61130780
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710862828.1A Active CN107657226B (en) | 2017-09-22 | 2017-09-22 | People number estimation method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107657226B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647637B (en) * | 2018-05-09 | 2020-06-30 | 广州飞宇智能科技有限公司 | Video acquisition and analysis device and method based on crowd identification |
CN108876774A (en) * | 2018-06-07 | 2018-11-23 | 浙江大学 | A kind of people counting method based on convolutional neural networks |
CN109166100A (en) * | 2018-07-24 | 2019-01-08 | 中南大学 | Multi-task learning method for cell count based on convolutional neural networks |
CN109117791A (en) * | 2018-08-14 | 2019-01-01 | 中国电子科技集团公司第三十八研究所 | A kind of crowd density drawing generating method based on expansion convolution |
CN109101930B (en) * | 2018-08-18 | 2020-08-18 | 华中科技大学 | Crowd counting method and system |
CN109191440A (en) * | 2018-08-24 | 2019-01-11 | 上海应用技术大学 | Glass blister detection and method of counting |
CN109359520B (en) * | 2018-09-04 | 2021-12-17 | 汇纳科技股份有限公司 | Crowd counting method, system, computer readable storage medium and server |
CN109447008B (en) * | 2018-11-02 | 2022-02-15 | 中山大学 | Crowd analysis method based on attention mechanism and deformable convolutional neural network |
CN109858388A (en) * | 2019-01-09 | 2019-06-07 | 武汉中联智诚科技有限公司 | A kind of intelligent tourism management system |
CN109934148A (en) * | 2019-03-06 | 2019-06-25 | 华瑞新智科技(北京)有限公司 | A kind of real-time people counting method, device and unmanned plane based on unmanned plane |
CN110598672B (en) * | 2019-09-23 | 2023-07-04 | 天津天地伟业机器人技术有限公司 | Multi-region people counting method based on single camera |
CN110991225A (en) * | 2019-10-22 | 2020-04-10 | 同济大学 | Crowd counting and density estimation method and device based on multi-column convolutional neural network |
CN110879990A (en) * | 2019-11-22 | 2020-03-13 | 成都考拉悠然科技有限公司 | Method for predicting queuing waiting time of security check passenger in airport and application thereof |
CN111178276B (en) * | 2019-12-30 | 2024-04-02 | 上海商汤智能科技有限公司 | Image processing method, image processing apparatus, and computer-readable storage medium |
CN111723693B (en) * | 2020-06-03 | 2022-05-27 | 云南大学 | Crowd counting method based on small sample learning |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8195598B2 (en) * | 2007-11-16 | 2012-06-05 | Agilence, Inc. | Method of and system for hierarchical human/crowd behavior detection |
CN104077613B (en) * | 2014-07-16 | 2017-04-12 | 电子科技大学 | Crowd density estimation method based on cascaded multilevel convolution neural network |
CN104320617B (en) * | 2014-10-20 | 2017-09-01 | 中国科学院自动化研究所 | A kind of round-the-clock video frequency monitoring method based on deep learning |
CN107624189B (en) * | 2015-05-18 | 2020-11-20 | 北京市商汤科技开发有限公司 | Method and apparatus for generating a predictive model |
CN104992223B (en) * | 2015-06-12 | 2018-02-16 | 安徽大学 | Intensive Population size estimation method based on deep learning |
CN105528589B (en) * | 2015-12-31 | 2019-01-01 | 上海科技大学 | Single image crowd's counting algorithm based on multiple row convolutional neural networks |
CN106203331B (en) * | 2016-07-08 | 2019-05-17 | 苏州平江历史街区保护整治有限责任公司 | A kind of crowd density evaluation method based on convolutional neural networks |
CN106326937B (en) * | 2016-08-31 | 2019-08-09 | 郑州金惠计算机系统工程有限公司 | Crowd density distribution estimation method based on convolutional neural networks |
CN106845621B (en) * | 2017-01-18 | 2019-04-30 | 山东大学 | Dense population number method of counting and system based on depth convolutional neural networks |
-
2017
- 2017-09-22 CN CN201710862828.1A patent/CN107657226B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN107657226A (en) | 2018-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107657226B (en) | People number estimation method based on deep learning | |
CN106096561B (en) | Infrared pedestrian detection method based on image block deep learning features | |
CN107016357B (en) | Video pedestrian detection method based on time domain convolutional neural network | |
CN111611874B (en) | Face mask wearing detection method based on ResNet and Canny | |
Li et al. | Adaptive deep convolutional neural networks for scene-specific object detection | |
CN108520226B (en) | Pedestrian re-identification method based on body decomposition and significance detection | |
CN103824070B (en) | A kind of rapid pedestrian detection method based on computer vision | |
CN107563349A (en) | A kind of Population size estimation method based on VGGNet | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN111191667B (en) | Crowd counting method based on multiscale generation countermeasure network | |
CN112767485B (en) | Point cloud map creation and scene identification method based on static semantic information | |
CN105528794A (en) | Moving object detection method based on Gaussian mixture model and superpixel segmentation | |
CN109034035A (en) | Pedestrian's recognition methods again based on conspicuousness detection and Fusion Features | |
CN105678231A (en) | Pedestrian image detection method based on sparse coding and neural network | |
CN103049751A (en) | Improved weighting region matching high-altitude video pedestrian recognizing method | |
CN101976504B (en) | Multi-vehicle video tracking method based on color space information | |
CN105701448B (en) | Three-dimensional face point cloud nose detection method and the data processing equipment for applying it | |
CN104504395A (en) | Method and system for achieving classification of pedestrians and vehicles based on neural network | |
CN105022982A (en) | Hand motion identifying method and apparatus | |
CN104077605A (en) | Pedestrian search and recognition method based on color topological structure | |
CN104091157A (en) | Pedestrian detection method based on feature fusion | |
CN106023257A (en) | Target tracking method based on rotor UAV platform | |
CN103927511A (en) | Image identification method based on difference feature description | |
CN107767416B (en) | Method for identifying pedestrian orientation in low-resolution image | |
CN111079518B (en) | Ground-falling abnormal behavior identification method based on law enforcement and case handling area scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |