CN107657226B - People number estimation method based on deep learning - Google Patents

People number estimation method based on deep learning Download PDF

Info

Publication number
CN107657226B
CN107657226B CN201710862828.1A CN201710862828A CN107657226B CN 107657226 B CN107657226 B CN 107657226B CN 201710862828 A CN201710862828 A CN 201710862828A CN 107657226 B CN107657226 B CN 107657226B
Authority
CN
China
Prior art keywords
training
image
neural network
network
people
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710862828.1A
Other languages
Chinese (zh)
Other versions
CN107657226A (en
Inventor
解梅
秦方
李佩伦
苏星霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710862828.1A priority Critical patent/CN107657226B/en
Publication of CN107657226A publication Critical patent/CN107657226A/en
Application granted granted Critical
Publication of CN107657226B publication Critical patent/CN107657226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a people number estimation method based on deep learning, and belongs to people density estimation based on deep learning. The method adopts a single-row convolutional neural network based on convolutional layers and pooling layers, learns the crowd characteristics through training of a large number of samples, thereby estimating the crowd density map of the input image, and further integrating the density map to obtain the estimation of the number of crowds on the image. Compared with other current deep learning algorithms, the convolutional neural network adopted by the invention has the advantages of simple structure, low complexity, short training time and higher estimation accuracy.

Description

People number estimation method based on deep learning
Technical Field
The invention belongs to the technical field of digital images, and particularly relates to crowd density estimation based on deep learning.
Background
With the rapid development of scientific technology and the continuous improvement of economic level, the living demand of people is higher and higher, so that the rapid development of artificial intelligence is promoted, and the artificial intelligence technology is gradually applied to various fields including intelligent driving, intelligent monitoring, security and the like. The method has important application value in the fields of intelligent monitoring and security protection by estimating the number of people through the video images, and is beneficial to timely evacuating over-dense people and preventing safety accidents such as trampling and the like in large public places such as large activity sites, railway stations and the like by estimating the number of people in time through the images. In addition, the method can also be used for abnormal warning signals and the like.
Current people counting algorithms can be summarized in 3 categories:
(1) the method based on target detection comprises the following steps:
establishing a detection model according to the target characteristics of the pedestrians, wherein the selected target characteristics comprise human heads, overall pedestrian targets, combination of head and shoulder contours and the like, training a detector according to the characteristics, detecting the targets by combining a sliding window method, and counting the number of the detected targets, namely the number of people. The detector is mainly in a form of a feature plus classifier, the features mainly adopt features such as HOG (histogram of gradient directions), LBP (local binary pattern) and the like, and the classifier mainly adopts Adaboost, SVM and the like. The method based on detection has high accuracy dependency on the used target detection method, is only suitable for scenes with simple background, sparse number of people and no or less shelters among pedestrians, and has lower practicability and popularization.
(2) A method based on density map or population regression:
this method estimates the number of people in an image by building a regression model between image features and the number of people, or between image features and a population density map. The commonly used features include edge features, texture features, and the like, and the commonly used regression functions mainly include gaussian regression, linear regression, and the like. The method is mainly used for monitoring video scenes, and a target area in a video image is extracted by utilizing foreground segmentation so as to extract effective features. However, the algorithm mainly depends on feature selection, the accuracy of the existing methods based on edge information, texture information, fusion of multiple feature information and the like is poor, how to design effective features is still the main problem of the algorithm, and the method has high dependence on scenes and poor migratable capability among different scenes, namely poor generalization capability.
(3) The method based on deep learning comprises the following steps:
deep learning shows remarkable superiority in a plurality of research fields of computer vision at present, and although the deep learning algorithm is not applied to people counting, the algorithm has remarkable improvement in accuracy and generalization compared with the traditional algorithm. The method utilizes the deep convolutional neural network, trains the network learning population characteristics through a large number of labeled samples, and therefore outputs the number of people in the image. However, the existing deep learning algorithm mostly adopts a multi-column convolutional neural network, and has the problems of high complexity, large sample requirement and long training time.
Disclosure of Invention
The invention aims to: aiming at the existing problems, the people number estimation method based on the deep learning of the single-row convolutional neural network is provided.
The invention discloses a people number estimation method based on deep learning, which comprises the following steps:
constructing a deep neural network model: the single-row convolutional neural network is based on 10 convolutional layers and 2 pooling layers, wherein the sizes of convolutional kernels of the first 6 convolutional layers are all 5x5, the sizes of convolutional kernels of the 7 th to 9 th convolutional layers are all 3x3, and the size of a convolutional kernel of the last convolutional layer is 1x 1; the pooling mode of the 2 pooling layers adopts maximum pooling, and the size of each pooling core is 2x 2;
training the constructed deep neural network model by collecting training sample data to obtain a trained deep neural network model, wherein a loss function of the deep neural network model
Figure BDA0001415454990000021
Wherein
Figure BDA0001415454990000022
Density map obtained by network forward calculation, M is number of training samples, and real density map of input image
Figure BDA0001415454990000023
Wherein (x-x)i) As an impulse function of the position of the head in the image, xiRepresenting the position of the human head, N is the total number of the human head, and G is a Gaussian kernel;
inputting the image to be estimated into the trained deep neural network model to obtain an estimated density map of the image to be estimated, and integrating the estimated density map to obtain the estimated number of people of the image to be estimated.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the method is based on the single-row convolutional neural network, the loss function is constructed by singly using the density map, the network structure is simple and effective, the estimation accuracy is improved, the network complexity is reduced, the model training time is reduced, and meanwhile the overfitting risk of the network is reduced.
Drawings
FIG. 1: the people number estimation processing flow diagram based on deep learning is shown.
FIG. 2: people number estimation convolutional neural network structure diagram.
FIG. 3: the Network structure of the existing people number estimation Network MCNN (Multi-Column probabilistic Neural Network) and the Neural Network Crowd-CNN of the invention is compared, wherein 3-a is the existing MCNN Network structure, and 3-b is the Crowd-CNN Network structure of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
The invention discloses a single-row convolutional neural network based on 10 convolutional layers and 2 pooling layers, which is named as crown-CNN for short, simplifies the existing deep learning network structure, and realizes the estimation of the number of people in an image. Referring to fig. 1, the specific implementation steps of the present invention are as follows:
step 1, constructing a deep neural network and training:
step 1-1 preparation of training data: aiming at the crown-CNN network structure, in the specific embodiment, databases UCSD, Shanghaitech PartA and Shanghaitech PartB which are commonly used in the field of people counting are adopted, and the marking information (ground route) of the sample is the head position information (x, y) in the image sample, namely the coordinates of the center pixel of the head in the image. And then calculating a density map as label (label) information of the network according to the head coordinates, and generating an LMDB data file (comprising training and test sample data) by using the sample image and the label information by using a tool under a Caffe frame.
Calculating a density map: and calculating a density map based on the Gaussian kernel of the sample according to the head position information in the training image sample. The density map based on a geometrically adapted gaussian kernel is calculated as:
Figure BDA0001415454990000031
wherein (x-x)i) As an impulse function of the position of the head in the image, xiIs head position vector, i.e. head position information (x, y), N is total number of heads, G is Gaussian kernel.
Step 1-2, constructing a network: the overall structure of the deep learning network of the invention is shown in fig. 2, and the detailed structure is shown in fig. 3-b. It has 10 convolution layers and 2 pooling layers, and adopts maximum poolingAnd the loss function adopts a Euclidean distance loss function. The Euclidean distance Loss function (Euclidean Loss) is calculated as:
Figure BDA0001415454990000032
wherein
Figure BDA0001415454990000033
Density map obtained for network forward calculation, and FnFor inputting images
Figure BDA0001415454990000034
And (3) inputting the label information of the network into the real density graph F (x) calculated by the formula, wherein M is the number of training samples.
Step 1-3 training the network: and (3) loading the training data and the test data (LMDB files) generated in the step (1-2) and the network file constructed in the step (1-2) into a Caffe training execution process by utilizing a Caffe framework, calculating a network error through the forward calculation of the network and a loss function formula (2), reversely propagating the error, calculating an error gradient of each layer of weight of the network, updating the weight, and gradually reducing the network error. And continuously and circularly executing the process, searching the most effective network training parameters, reducing the network loss to the minimum or to a value meeting the requirement, namely finishing the training process of the network and obtaining a network model, wherein the process can be simply summarized as parameter optimization.
Step 2, testing:
sending the image to be detected into the network structure constructed in the step 1, loading the network model parameters trained in the step 1 for forward calculation to obtain an estimated density map of the image
Figure BDA0001415454990000035
Integrating the density map to obtain the estimated number of people
Figure BDA0001415454990000036
The invention adopts two algorithm measurement standards which are universal in the field of people counting, namely average absolute error (MAE) and Mean Square Error (MSE), in the test experiment, and the two algorithm measurement standards are respectively used for measuring the accuracy and the stability of the algorithm.
Mean Absolute Error (MAE) definition:
Figure BDA0001415454990000037
mean Square Error (MSE) definition:
Figure BDA0001415454990000041
wherein M is the number of test samples, ZiTo test the actual number of people in sample i,
Figure BDA0001415454990000042
the number of people of the test sample i calculated for the network.
Compared with the MCNN network with better performance (the network structure is shown in figure 3-a) and the simple structure network provided by the invention, the network structure adopted by the invention is simple, the training time is greatly reduced, and the accuracy is ensured at the same time by the experimental test on the universal people counting database UCSD, Shanghaitech PartA and Shanghaitech PartB. The results of the experimental comparison are shown in tables 1, 2 and 3.
TABLE 1 network training iteration number comparison
Figure BDA0001415454990000043
TABLE 2MCNN network test results
MSE MAE
Shanghaitech PartA 173.2 110.2
Shanghaitech PartB 41.3 26.4
UCSD 1.35 1.07
TABLE 3Crowd-CNN network test results
MSE MAE
Shanghaitech PartA 170.38 109.05
Shanghaitech PartB 42.1 26.04
UCSD 1.21 1.03
The comparison and verification show that the accuracy of the people number estimation based on the crown-CNN network structure is high, compared with the MCNN network structure, the method has a simpler network structure, and the network parameters and the training time are greatly reduced, so that the requirement on the training data volume is greatly reduced, and the risk of network overfitting is reduced. And meanwhile, the error is also reduced.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (1)

1. A method for estimating the number of people based on deep learning is characterized by comprising the following steps:
constructing a deep neural network model:
the single-row convolutional neural network is based on 10 convolutional layers and 2 pooling layers, wherein the sizes of convolutional kernels of the first 6 convolutional layers are all 5x5, the sizes of convolutional kernels of the 7 th to 9 th convolutional layers are all 3x3, and the size of a convolutional kernel of the last convolutional layer is 1x 1; the pooling mode of the 2 pooling layers adopts maximum pooling, and the size of each pooling core is 2x 2;
preparing training data:
adopting a common people counting database UCSD, Shanghaitech PartA and Shanghaitech PartB in the people counting field, wherein the marking information of the sample is the head position information (x, y) in the image sample, namely the coordinates of the head center pixel in the image; then calculating a density map as label information of the network according to the head coordinates, and generating an LMDB data file comprising training data and test data from the sample image and the label information by using a tool under a Caffe frame;
calculating a density map: calculating a density map of the sample based on a Gaussian kernel according to the head position information in the training image sample; based on density maps of geometrically adapted Gaussian kernelsThe calculation is as follows:
Figure FDA0002654920230000011
wherein, (x-x)i) As an impulse function of the position of the head in the image, xiThe method comprises the following steps of (1) obtaining a head position vector, namely head position information (x, y), wherein N is the total number of heads, and G is a Gaussian kernel;
training the constructed deep neural network model based on training sample data to obtain a trained deep neural network model:
loading the generated training data and test data and the constructed network file of the deep neural network model into a Caffe training execution process by utilizing a Caffe framework, calculating a network error through the forward calculation of the network and a loss function L (theta), reversely propagating the error, calculating an error gradient of each layer of weight of the network, updating the weight, and gradually reducing a network error value; continuously and circularly executing the process, and searching the most effective network training parameters to reduce the network loss to the minimum or to a value meeting the requirement; wherein a loss function of the deep neural network model
Figure FDA0002654920230000012
Wherein
Figure FDA0002654920230000013
Density map obtained for network forward calculation, M being number of training samples, FnAccording to a formula for the input image
Figure FDA0002654920230000014
Figure FDA0002654920230000015
Calculating to obtain a real density graph F (x), namely inputting label information of the network;
during training, the number of samples selected by one training sample is set to be 1, the learning rate base _ lr is set to be 1e-7, and the training iteration frequency of the trained deep neural network model is 80 ten thousand times;
inputting an image to be estimated into a trained deep neural network model to obtain an estimated density map of the image to be estimated, and integrating the estimated density map to obtain the estimated number of people of the image to be estimated;
testing the trained deep neural network model in a people counting database UCSD, Shanghaitech PartA and Shanghaitech PartB, wherein the average absolute error and mean square error corresponding to each people counting database are specifically as follows:
the average absolute error and the mean square error of the UCSD are respectively as follows: 1.03, 1.21;
the average absolute error and the mean square error of the people counting database Shanghaitech PartA are respectively as follows: 109.05, 170.38;
the average absolute error and the mean square error of the people counting database Shanghaitech PartB are 26.04 and 42.1 respectively.
CN201710862828.1A 2017-09-22 2017-09-22 People number estimation method based on deep learning Active CN107657226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710862828.1A CN107657226B (en) 2017-09-22 2017-09-22 People number estimation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710862828.1A CN107657226B (en) 2017-09-22 2017-09-22 People number estimation method based on deep learning

Publications (2)

Publication Number Publication Date
CN107657226A CN107657226A (en) 2018-02-02
CN107657226B true CN107657226B (en) 2020-12-29

Family

ID=61130780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710862828.1A Active CN107657226B (en) 2017-09-22 2017-09-22 People number estimation method based on deep learning

Country Status (1)

Country Link
CN (1) CN107657226B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647637B (en) * 2018-05-09 2020-06-30 广州飞宇智能科技有限公司 Video acquisition and analysis device and method based on crowd identification
CN108876774A (en) * 2018-06-07 2018-11-23 浙江大学 A kind of people counting method based on convolutional neural networks
CN109166100A (en) * 2018-07-24 2019-01-08 中南大学 Multi-task learning method for cell count based on convolutional neural networks
CN109117791A (en) * 2018-08-14 2019-01-01 中国电子科技集团公司第三十八研究所 A kind of crowd density drawing generating method based on expansion convolution
CN109101930B (en) * 2018-08-18 2020-08-18 华中科技大学 Crowd counting method and system
CN109191440A (en) * 2018-08-24 2019-01-11 上海应用技术大学 Glass blister detection and method of counting
CN109359520B (en) * 2018-09-04 2021-12-17 汇纳科技股份有限公司 Crowd counting method, system, computer readable storage medium and server
CN109447008B (en) * 2018-11-02 2022-02-15 中山大学 Crowd analysis method based on attention mechanism and deformable convolutional neural network
CN109858388A (en) * 2019-01-09 2019-06-07 武汉中联智诚科技有限公司 A kind of intelligent tourism management system
CN109934148A (en) * 2019-03-06 2019-06-25 华瑞新智科技(北京)有限公司 A kind of real-time people counting method, device and unmanned plane based on unmanned plane
CN110598672B (en) * 2019-09-23 2023-07-04 天津天地伟业机器人技术有限公司 Multi-region people counting method based on single camera
CN110991225A (en) * 2019-10-22 2020-04-10 同济大学 Crowd counting and density estimation method and device based on multi-column convolutional neural network
CN110879990A (en) * 2019-11-22 2020-03-13 成都考拉悠然科技有限公司 Method for predicting queuing waiting time of security check passenger in airport and application thereof
CN111178276B (en) * 2019-12-30 2024-04-02 上海商汤智能科技有限公司 Image processing method, image processing apparatus, and computer-readable storage medium
CN111723693B (en) * 2020-06-03 2022-05-27 云南大学 Crowd counting method based on small sample learning

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8195598B2 (en) * 2007-11-16 2012-06-05 Agilence, Inc. Method of and system for hierarchical human/crowd behavior detection
CN104077613B (en) * 2014-07-16 2017-04-12 电子科技大学 Crowd density estimation method based on cascaded multilevel convolution neural network
CN104320617B (en) * 2014-10-20 2017-09-01 中国科学院自动化研究所 A kind of round-the-clock video frequency monitoring method based on deep learning
CN107624189B (en) * 2015-05-18 2020-11-20 北京市商汤科技开发有限公司 Method and apparatus for generating a predictive model
CN104992223B (en) * 2015-06-12 2018-02-16 安徽大学 Intensive Population size estimation method based on deep learning
CN105528589B (en) * 2015-12-31 2019-01-01 上海科技大学 Single image crowd's counting algorithm based on multiple row convolutional neural networks
CN106203331B (en) * 2016-07-08 2019-05-17 苏州平江历史街区保护整治有限责任公司 A kind of crowd density evaluation method based on convolutional neural networks
CN106326937B (en) * 2016-08-31 2019-08-09 郑州金惠计算机系统工程有限公司 Crowd density distribution estimation method based on convolutional neural networks
CN106845621B (en) * 2017-01-18 2019-04-30 山东大学 Dense population number method of counting and system based on depth convolutional neural networks

Also Published As

Publication number Publication date
CN107657226A (en) 2018-02-02

Similar Documents

Publication Publication Date Title
CN107657226B (en) People number estimation method based on deep learning
CN106096561B (en) Infrared pedestrian detection method based on image block deep learning features
CN107016357B (en) Video pedestrian detection method based on time domain convolutional neural network
CN111611874B (en) Face mask wearing detection method based on ResNet and Canny
Li et al. Adaptive deep convolutional neural networks for scene-specific object detection
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN107563349A (en) A kind of Population size estimation method based on VGGNet
CN107633226B (en) Human body motion tracking feature processing method
CN111191667B (en) Crowd counting method based on multiscale generation countermeasure network
CN112767485B (en) Point cloud map creation and scene identification method based on static semantic information
CN105528794A (en) Moving object detection method based on Gaussian mixture model and superpixel segmentation
CN109034035A (en) Pedestrian's recognition methods again based on conspicuousness detection and Fusion Features
CN105678231A (en) Pedestrian image detection method based on sparse coding and neural network
CN103049751A (en) Improved weighting region matching high-altitude video pedestrian recognizing method
CN101976504B (en) Multi-vehicle video tracking method based on color space information
CN105701448B (en) Three-dimensional face point cloud nose detection method and the data processing equipment for applying it
CN104504395A (en) Method and system for achieving classification of pedestrians and vehicles based on neural network
CN105022982A (en) Hand motion identifying method and apparatus
CN104077605A (en) Pedestrian search and recognition method based on color topological structure
CN104091157A (en) Pedestrian detection method based on feature fusion
CN106023257A (en) Target tracking method based on rotor UAV platform
CN103927511A (en) Image identification method based on difference feature description
CN107767416B (en) Method for identifying pedestrian orientation in low-resolution image
CN111079518B (en) Ground-falling abnormal behavior identification method based on law enforcement and case handling area scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant