CN106815563B - Human body apparent structure-based crowd quantity prediction method - Google Patents
Human body apparent structure-based crowd quantity prediction method Download PDFInfo
- Publication number
- CN106815563B CN106815563B CN201611225785.8A CN201611225785A CN106815563B CN 106815563 B CN106815563 B CN 106815563B CN 201611225785 A CN201611225785 A CN 201611225785A CN 106815563 B CN106815563 B CN 106815563B
- Authority
- CN
- China
- Prior art keywords
- scene
- pedestrian
- image
- crowd
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Abstract
The invention discloses a crowd quantity prediction method based on a human body apparent structure, which is used for predicting the crowd quantity in a given scene image. The method specifically comprises the following steps: acquiring a monitoring image data set used for training a crowd quantity prediction model, and defining an algorithm target; modeling an apparent semantic structure of a pedestrian body in the monitoring image data set, and performing combined modeling on density distribution and body shape of the pedestrian; establishing a prediction model of the crowd quantity according to the modeling result in the step S2; and predicting the number of people in the scene image by using the prediction model. The method is suitable for predicting the number of people in a real video monitoring scene, and has better effect and robustness in the face of various complex conditions.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a crowd quantity prediction method based on a human body apparent structure.
Background
Since the end of the 20 th century, with the development of computer vision, intelligent video surveillance technology has gained widespread attention and research. People counting is one of the important and challenging tasks, with the goal of accurately predicting the number of pedestrians in high-density people images. Three key factors of the crowd counting task are the pedestrian, the head and their contextual structure. When people count the number of people, the semantic structures of different parts of the bodies of the people are used as clues to accurately judge the positions of the people. Therefore, accurately predicting the number of people requires analysis of the semantic structure of the pedestrian's body.
Existing population counting methods generally include the following three categories: 1. people counting based on pedestrian detectors. Such methods utilize various pedestrian detectors to match each pedestrian in the image; 2. population counts based on global regression. The method mainly models the mapping between the crowd image and the crowd quantity; 3. population counts based on density estimates. The method models the density distribution of the crowd and predicts the crowd quantity through the density distribution. Existing methods model the entire body of the pedestrian as a whole, or only the head of the pedestrian. They ignore rich semantic structural information of the pedestrian body parts, and the performance of the crowd counting algorithm can be improved by utilizing the structural information.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for predicting the number of people in a given scene image based on the apparent structure of human body. The method carries out semantic modeling on the body apparent structure and density distribution information of the pedestrian based on the deep neural network, predicts the accurate crowd quantity according to the modeling result, and can better adapt to the complex situation in the real video monitoring scene.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a crowd quantity prediction method based on human body apparent structure comprises the following steps:
s1, acquiring a monitoring image data set used for training a crowd quantity prediction model, and defining an algorithm target;
s2, modeling the apparent semantic structure of the pedestrian body in the monitoring image data set, and performing combined modeling on the density distribution and the body shape of the pedestrian;
s3, establishing a prediction model of the crowd quantity according to the modeling result in the step S2;
and S4, predicting the number of people in the scene image by using the prediction model.
Further, in step S1, the monitoring image data set for training the population quantity prediction model includes a scene imageArtificially labeled head position P of pedestriantrainAnd scene depth map
Further, in step S2, the modeling the apparent semantic structure of the pedestrian body specifically includes:
s21, collecting head positions P of all pedestrians according to the monitoring image datatrainAnd their respective scene depth valuesDetermining the position and size of each pedestrian image bounding box from the set of scene imagesMiddle cutting to obtain pedestrian image Itrain;
S22, displaying the pedestrian image ItrainRespectively inputting a single pedestrian semantic segmentation system for semantic segmentation;
s23, for each scene imageRestoring the segmentation results of all the pedestrians according to the original size and position to obtain a scene imageSemantic structure diagram of crowd Reflecting scene imagesSemantic structure information of body parts of all pedestrians.
Further, in step S2, the joint modeling of the density distribution and the body shape of the pedestrian specifically includes:
s24, aiming at scene imagePerforming combined modeling on the density distribution and the body shape of the pedestrians to obtain a structured crowd density map
Wherein p isThe position of the upper pixel in the image,is a two-dimensional gaussian kernel to approximate the shape of a human head,is a two-dimensional gaussian kernel to approximate the shape of the human body,andthe central positions of the ith individual's head and body respectively,is taken from Ptrain,ByAnd scene depth valueEstimate to obtainhAnd σbAre respectivelyAndof (a) each of which consists ofAndthe result of the estimation is that,semantic structure diagram of crowdThe binary image is obtained by the binary image,is the number of pedestrians in the scene, and Z is a normalization factor for each pedestrian in the sceneSum of Density 1, structured population Density mapReflecting scene imagesThe density distribution and body shape information of all pedestrians.
Further, in step S3, the establishing a prediction model of the population specifically includes:
s31, establishing a deep convolution neural network, wherein the input of the neural network is a scene imageOutput is corresponding toSemantic structure diagram of crowdStructured population density mapAndnumber of pedestriansThus, the structure of the neural network can be represented as a map
WhereinIs one of the outputs of the neural network,to representThe middle pixel position (h, w) and the value of channel i,generated by the method described in step S23,to representThe value of the middle pixel position (h, w);
s35 loss function of the whole neural network
L=Lc+λdLd+λbLbFormula (5)
The entire neural network is trained using a stochastic gradient descent and back propagation algorithm under a loss function L.
Further, in step S4, the predicting the number of people in the scene image includes: image of a scene to be predictedInputting the trained neural network, and outputting the population numberI.e. the result of the prediction of the number of the crowd.
Compared with the existing crowd quantity prediction method, the crowd quantity prediction method based on the human body apparent structure has the following beneficial effects:
firstly, the method for predicting the number of the crowd discovers the semantic attribute of the crowd counting problem, defines and models three key factors of the problem: body, head and their contextual structure. This assumption is more adaptive to the complexity in the actual scene.
Secondly, the crowd quantity prediction method establishes a crowd quantity prediction model based on the deep convolutional neural network. The deep convolutional neural network can better express visual features, in addition, visual feature extraction, pedestrian semantic modeling and crowd quantity regression are unified in the same frame, and the final effect of the method is improved.
The crowd quantity prediction method based on the human body apparent structure has good application value in an intelligent video monitoring analysis system, and can effectively improve the efficiency and accuracy of crowd quantity prediction. For example, in the application scene of public safety, the crowd quantity prediction method can quickly and accurately predict the pedestrian quantity in the shooting area of the monitoring camera, and provides decision basis for daily operation and emergency treatment in public places.
Drawings
Fig. 1 is a schematic flow chart of a human body apparent structure-based crowd quantity prediction method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
Referring to fig. 1, in a preferred embodiment of the present invention, a method for predicting the number of people based on the apparent structure of human body comprises the following steps:
first, a monitoring image dataset for training a population quantity prediction model is obtained. Wherein the monitoring image data set used for training the crowd quantity prediction model comprises a scene imageArtificially labeled head position P of pedestriantrainAnd scene depth map
Secondly, the density distribution and body shape of the pedestrian in the obtained monitoring image dataset are jointly modeled. Specifically, the method comprises the following steps:
first, according to the head positions P of all the pedestrians in the monitored image data settrainAnd their respective scene depth valuesDetermining the position and size of each pedestrian image bounding box from the set of scene imagesMiddle cutting to obtain pedestrian image Itrain;
Second, the pedestrian image ItrainRespectively inputting a single pedestrian semantic segmentation system for semantic segmentation;
third, for each scene imageRestoring the segmentation results of all the pedestrians according to the original size and position to obtain a scene imageSemantic structure diagram of crowd Reflecting scene imagesSemantic structure information of body parts of all pedestrians.
Next, the density distribution and the body shape of the pedestrian are jointly modeled. For scene imagePerforming combined modeling on the density distribution and the body shape of the pedestrians to obtain a structured crowd density map
Wherein p isThe position of the upper pixel in the image,is a two-dimensional gaussian kernel to approximate the shape of a human head,is a two-dimensional gaussian kernel to approximate the shape of the human body.Andthe central positions of the ith individual's head and body respectively,is taken from Ptrain,ByAnd scene depth valueAnd (6) estimating. SigmahAnd σbAre respectivelyAndof (a) each of which consists ofAndand (4) estimating to obtain.Semantic structure diagram of crowdAnd (4) carrying out binarization to obtain.Is the number of pedestrians in the scene, and Z is a normalization factor for each pedestrian in the sceneThe sum of the densities of (a) and (b) is 1. Structured population density mapReflecting scene imagesThe density distribution and body shape information of all pedestrians.
And then, establishing a prediction model of the number of the crowd. The method specifically comprises the following steps:
firstly, establishing a deep convolution neural network, wherein the input of the neural network is a scene imageOutput is corresponding toSemantic structure diagram of crowdStructured population density mapAndnumber of pedestriansThus, the structure of the neural network can be represented as a map
WhereinIs one of the outputs of the neural network,to representThe middle pixel position (h, w) and the value of channel i,to representThe value of the middle pixel position (h, w);
WhereinIs a neural networkOne of the outputs is a high-frequency signal,generated by the method described in equation (1).
The fifth step, the loss function of the whole neural network is
L=Lc+λdLd+λbLbFormula (5)
The entire neural network is trained using a stochastic gradient descent and back propagation algorithm under a loss function L.
And finally, predicting the number of people in the scene image to be predicted by using the established model. The method specifically comprises the following steps: scene image to be predictedInputting the trained neural network, and outputting the population numberI.e. the result of the prediction of the number of the crowd.
In the above embodiment, the crowd quantity prediction method of the present invention first models the body appearance structure and the density distribution information of the pedestrian into two semantic scene models. On the basis, the original problem is converted into a multi-task learning problem, and a crowd quantity prediction model is established based on the deep neural network. And finally, predicting the accurate pedestrian number in the new scene image by using the trained crowd number prediction model.
Through the technical scheme, the embodiment of the invention develops the crowd quantity prediction algorithm applied to the video monitoring scene based on the deep learning technology. The invention can effectively model the body semantic structure information and the density distribution information of the pedestrian at the same time, thereby predicting the accurate crowd number.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (2)
1. A crowd quantity prediction method based on human body apparent structure is characterized by comprising the following steps:
s1, obtaining a monitoring image data set for training a crowd quantity prediction model, including scene imagesArtificially labeled head position P of pedestriantrainAnd scene depth mapAnd defining the algorithm targets as: predicting a scene image XtestNumber of pedestrians Ctest;
S2, modeling the apparent semantic structure of the pedestrian body in the monitoring image data set, and jointly modeling the density distribution and the body shape of the pedestrian, specifically comprising:
s21, collecting head positions P of all pedestrians according to the monitoring image datatrainAnd their respective scene depth valuesDetermining the position and size of each pedestrian image bounding box to derive a scene imageMiddle cutting to obtain pedestrian image Itrain;
S22, displaying the pedestrian image ItrainRespectively inputting a single pedestrian semantic segmentation system for semantic segmentation;
s23, for each scene imageRestoring the segmentation results of all the pedestrians according to the original size and position to obtain a scene imageSemantic structure diagram of crowd Reflecting scene imagesSemantic structure information of body parts of all pedestrians;
s24, aiming at scene imagePerforming combined modeling on the density distribution and the body shape of the pedestrians to obtain a structured crowd density map
Wherein p isThe position of the upper pixel in the image,is a two-dimensional gaussian kernel to approximate the shape of a human head,is a two-dimensional gaussian kernel to approximate the shape of the human body,andthe central positions of the ith individual's head and body respectively,is taken from Ptrain,ByAnd head position PhDepth value of sceneEstimate to obtainhAnd σbAre respectivelyAndrespectively by the head position PhDepth value of sceneAnd body center position PbDepth value of sceneEstimated to obtain BmThe method comprises the following steps that A, a crowd semantic structure diagram B is obtained through binarization, C is the number of pedestrians in a scene image X, Z is a normalization coefficient so that the sum of the density of each pedestrian on D is 1, and a structured crowd density diagram D reflects the density distribution and body shape information of all pedestrians in the scene image X;
s3, establishing a prediction model of the crowd quantity according to the modeling result in the step S2, which specifically comprises the following steps:
s31, establishing a deep convolution neural network, wherein the input of the neural network is a scene imageOutput is corresponding toPrediction of the semantic structure of the crowdPrediction of structured population density mapAnd prediction of pedestrian number in XThus, the structure of the neural network can be represented as a map
WhereinIs one of the outputs of the neural network,to representThe values of the (h, w) middle pixel position and the channel i, B is generated by the method described in step S23, and B (h, w) represents the value of the (h, w) middle pixel position in B;
S35 loss function of the whole neural network
L=Lc+λdLd+λbLbFormula (5)
Training the whole neural network under a loss function L by using a random gradient descent and back propagation algorithm;
and S4, predicting the number of people in the scene image by using the prediction model.
2. The method for predicting the number of people based on the apparent structure of human body according to claim 1, wherein the step S4 of predicting the number of people in the scene image comprises: image of a scene to be predictedInputting the trained neural network and the output scene imageThe pedestrian number C in (1) is the prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611225785.8A CN106815563B (en) | 2016-12-27 | 2016-12-27 | Human body apparent structure-based crowd quantity prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611225785.8A CN106815563B (en) | 2016-12-27 | 2016-12-27 | Human body apparent structure-based crowd quantity prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106815563A CN106815563A (en) | 2017-06-09 |
CN106815563B true CN106815563B (en) | 2020-06-02 |
Family
ID=59110304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611225785.8A Active CN106815563B (en) | 2016-12-27 | 2016-12-27 | Human body apparent structure-based crowd quantity prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106815563B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508583B (en) * | 2017-09-15 | 2020-11-06 | 杭州海康威视数字技术股份有限公司 | Method and device for acquiring crowd distribution characteristics |
CN107622244B (en) * | 2017-09-25 | 2020-08-28 | 华中科技大学 | Indoor scene fine analysis method based on depth map |
CN110505440A (en) * | 2018-05-18 | 2019-11-26 | 杭州海康威视数字技术股份有限公司 | A kind of area monitoring method and device |
CN109961060B (en) * | 2019-04-11 | 2021-04-30 | 北京百度网讯科技有限公司 | Method and apparatus for generating crowd density information |
CN112026686B (en) * | 2019-06-04 | 2022-04-12 | 上海汽车集团股份有限公司 | Method and device for automatically adjusting position of vehicle seat |
CN115083112B (en) * | 2022-08-22 | 2022-11-22 | 枫树谷(成都)科技有限责任公司 | Intelligent early warning emergency management system and deployment method thereof |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976353A (en) * | 2010-10-28 | 2011-02-16 | 北京智安邦科技有限公司 | Statistical method and device of low density crowd |
CN102063613A (en) * | 2010-12-28 | 2011-05-18 | 北京智安邦科技有限公司 | People counting method and device based on head recognition |
CN103020606A (en) * | 2012-12-27 | 2013-04-03 | 北京大学 | Pedestrian detection method based on spatio-temporal context information |
CN103093211A (en) * | 2013-01-27 | 2013-05-08 | 西安电子科技大学 | Human motion tracking method based on deep nuclear information image feature |
CN103646257A (en) * | 2013-12-30 | 2014-03-19 | 中国科学院自动化研究所 | Video monitoring image-based pedestrian detecting and counting method |
CN105184260A (en) * | 2015-09-10 | 2015-12-23 | 北京大学 | Image characteristic extraction method, pedestrian detection method and device |
CN106066993A (en) * | 2016-05-23 | 2016-11-02 | 上海交通大学 | A kind of crowd's semantic segmentation method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150285639A1 (en) * | 2014-04-04 | 2015-10-08 | Umm-Al-Qura University | Method and system for crowd sensing to be used for automatic semantic identification |
-
2016
- 2016-12-27 CN CN201611225785.8A patent/CN106815563B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976353A (en) * | 2010-10-28 | 2011-02-16 | 北京智安邦科技有限公司 | Statistical method and device of low density crowd |
CN102063613A (en) * | 2010-12-28 | 2011-05-18 | 北京智安邦科技有限公司 | People counting method and device based on head recognition |
CN103020606A (en) * | 2012-12-27 | 2013-04-03 | 北京大学 | Pedestrian detection method based on spatio-temporal context information |
CN103093211A (en) * | 2013-01-27 | 2013-05-08 | 西安电子科技大学 | Human motion tracking method based on deep nuclear information image feature |
CN103646257A (en) * | 2013-12-30 | 2014-03-19 | 中国科学院自动化研究所 | Video monitoring image-based pedestrian detecting and counting method |
CN105184260A (en) * | 2015-09-10 | 2015-12-23 | 北京大学 | Image characteristic extraction method, pedestrian detection method and device |
CN106066993A (en) * | 2016-05-23 | 2016-11-02 | 上海交通大学 | A kind of crowd's semantic segmentation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN106815563A (en) | 2017-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815563B (en) | Human body apparent structure-based crowd quantity prediction method | |
CN107967451B (en) | Method for counting crowd of still image | |
CN110472531B (en) | Video processing method, device, electronic equipment and storage medium | |
CN111611878B (en) | Method for crowd counting and future people flow prediction based on video image | |
CN108830145B (en) | People counting method based on deep neural network and storage medium | |
CN108921051B (en) | Pedestrian attribute identification network and technology based on cyclic neural network attention model | |
Ke et al. | Multi-dimensional traffic congestion detection based on fusion of visual features and convolutional neural network | |
US10735694B2 (en) | System and method for activity monitoring using video data | |
CN110276264B (en) | Crowd density estimation method based on foreground segmentation graph | |
CN103971386B (en) | A kind of foreground detection method under dynamic background scene | |
CN110147743A (en) | Real-time online pedestrian analysis and number system and method under a kind of complex scene | |
CN107301376B (en) | Pedestrian detection method based on deep learning multi-layer stimulation | |
CN110427839A (en) | Video object detection method based on multilayer feature fusion | |
CN111191667B (en) | Crowd counting method based on multiscale generation countermeasure network | |
CN110826447A (en) | Restaurant kitchen staff behavior identification method based on attention mechanism | |
CN102142085B (en) | Robust tracking method for moving flame target in forest region monitoring video | |
CN116258608B (en) | Water conservancy real-time monitoring information management system integrating GIS and BIM three-dimensional technology | |
CN110298297A (en) | Flame identification method and device | |
CN110942015A (en) | Crowd density estimation method | |
CN110163060B (en) | Method for determining crowd density in image and electronic equipment | |
CN111709300A (en) | Crowd counting method based on video image | |
CN109614896A (en) | A method of the video content semantic understanding based on recursive convolution neural network | |
CN114519302A (en) | Road traffic situation simulation method based on digital twin | |
CN110827320A (en) | Target tracking method and device based on time sequence prediction | |
CN113435432B (en) | Video anomaly detection model training method, video anomaly detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |