CN112818945A - Convolutional network construction method suitable for subway station crowd counting - Google Patents
Convolutional network construction method suitable for subway station crowd counting Download PDFInfo
- Publication number
- CN112818945A CN112818945A CN202110250379.1A CN202110250379A CN112818945A CN 112818945 A CN112818945 A CN 112818945A CN 202110250379 A CN202110250379 A CN 202110250379A CN 112818945 A CN112818945 A CN 112818945A
- Authority
- CN
- China
- Prior art keywords
- density
- crowd
- network
- training
- density map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000006870 function Effects 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000001914 filtration Methods 0.000 claims abstract description 7
- 238000013135 deep learning Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 10
- 230000003042 antagnostic effect Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A convolutional network construction method suitable for subway station crowd counting comprises the following steps: firstly, a deep learning framework is set up as a density generation network: marking the crowd by using a Gaussian filtering method on the collected sample of the crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map; taking the sample of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training; optimization is performed by using a loss function. Then, designing a judgment network for judging the accuracy of the generated density map; optimizing and judging the network by using a loss function; training an confrontation generation network, and predicting the crowd density by adopting a confrontation training mode: the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision; the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.
Description
Technical Field
The invention relates to the field of population counting, in particular to a density generation model construction method based on a convolutional neural network.
Technical Field
People counting is an important subject in the field of computer vision, and the main task is to estimate the number of people from an image, to accurately obtain the crowd density and to give the number of people in the image. In recent years, people counting is widely applied to the fields of intelligent video monitoring, public safety, intelligent early warning and the like. However, due to the fact that the target deforms due to factors such as visual angle, shading and scale transformation, people counting becomes a challenging task especially in a subway station scene.
The traditional population counting method uses a target detection method, and the research focuses on feature extraction and feature classification. Thus, researchers have proposed various forms of features and classifiers. However, since the conventional target detection method uses designed features, the accuracy of target detection is not practically required even if the best non-linear classifier is used for feature classification. There are three main disadvantages to the design's characteristics: 1) the designed characteristics are low-level characteristics, and the expression capacity of the design on the target is insufficient; 2) the separability of the designed features is poor, so that the error rate of classification is high; 3) the designed features are targeted, and it is difficult to comprehensively grasp the multi-scale features to apply to crowd counting. Davies et al, first proposed that a nearly linear relationship is present between the pixel characteristics of an image and the total number of people in the image, so authors separate a foreground image from the entire image by a three-frame difference method, then count the pixel characteristics of the foreground region, and then find the mapping relationship between the number of characteristics and the total number of people by a look-ahead equation. The researchers then turned their eyes to the convolutional neural network CNN, using the density map as a regression target.
Disclosure of Invention
The invention aims to solve the technical problem of providing a density generation model construction method based on a convolutional neural network, and aims to solve the problems that when an existing population counting model is applied to identifying complex images or multi-scale targets, the effect is poor, and the existing population counting model is not suitable for counting the population of subway stations.
In order to solve the technical problems, the invention adopts the following technical scheme:
a convolutional network construction method suitable for subway station crowd counting comprises the following steps:
the crowd counting model is configured by adopting a U-Net algorithm, and the front 13 layers of a basic network VGG16 are used for feature extraction, wherein the VGG comprises 13 convolution layers and three maximum pooling layers;
firstly, marking the crowd by using a Gaussian filtering method on a sample of the acquired crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map;
then, taking the samples of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training;
the density generation network comprises a feature extraction module and a density generation module; the feature extraction module is composed of the front 13 layers in the VGG16 model, all uses a 3x3 convolution kernel, and uses 1x1 convolution to obtain a density map;
and step 3: optimizing by using a loss function;
and 4, step 4: the training method of the density generation network comprises the following steps:
(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by using a pre-trained VGG model;
(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;
(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;
and 5: designing a discrimination network, namely a discriminator (discrimination module), for discriminating the accuracy of the generated density map;
step 6: optimizing and judging the network by using a loss function;
marking the generated density graph as 0, and marking the label of the real density graph as 1; the output of the arbiter represents the probability that the generated density map is the true density map;
an additional penalty function is used to improve the quality of the generated density map, the penalty function being expressed by
Ladv=-log(D(lc,(G(lc;θ))))
G (lc; theta)) is a density map obtained by generating the network, with the real value of lc.
And 7: training confrontation generation network
Adopting an antagonistic training mode to predict the crowd density:
the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; wherein the training generation network is used to generate an accurate population density map to fool the discriminator, and conversely, the discriminator is trained to discriminate between the generated density map and a true density map label;
meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision;
the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.
The density map step of the step 2) comprises the following steps:
2.1) generating a single-channel picture with the same size as the original sample picture, wherein all pixel points are 0;
2.2) marking the point with the head in the label as 1;
2.3) processing the graph through Gaussian filtering to form a graph which is a crowd density graph; the Gaussian filter is fixed
Gauss filtering is determined, and parameters are selected to be mu-15 and sigma-4;
in step 3), the loss function is the euclidean distance.
Drawings
Fig. 1 is a schematic diagram of a network architecture.
Detailed Description
The technical scheme is explained in the following with the accompanying drawings:
a convolutional network construction method suitable for subway station crowd counting comprises the following steps:
step 1: constructing a PyTorch deep learning framework:
the crowd counting model is configured by using a U-Net algorithm, and feature extraction is carried out by using the front 13 layers of a basic network VGG16, wherein the VGG comprises 13 convolution layers and three maximum pooling layers.
Step 2: sample treatment:
using a fixed kernel gaussian filter mu-15 and sigma-4 to generate a label density map for the group labels, wherein the method comprises the following steps:
1, generating a single-channel picture with the same size as the original picture, wherein all pixel points are 0;
2 mark the spot with a human head in label as 1;
3 processing the graph through Gaussian filtering to form a graph which is a crowd density graph;
and then the crowd graph and the crowd density graph are used as a combination to be sent into a density generation network for training.
As in fig. 1, the density generation network is composed of two parts: the device comprises a feature extraction module and a density generation module;
in the scheme, the feature extraction module is composed of the first 13 layers in the VGG16 model, all 3x3 convolution kernels are used, and a density map is obtained by using 1x1 convolution;
and step 3: optimizing the density by using a loss function to generate a network;
the present scheme uses euclidean distance as a loss function,
and 4, step 4: training the network, the method is as follows:
(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by adopting a pre-trained VGG model in order to prevent overfitting;
(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;
(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;
and 5: designing a network for judging the accuracy of the generated density map;
step 6: optimizing and judging the network by using a loss function;
the density map label that marks the generated density map as 0 true is marked as 1. The output of the discriminator represents the probability that the generated density map is a true density map. An additional penalty function is used in the method to improve the quality of the generated density map. The loss-fighting function is expressed by
Ladv=-log(D(lc,(G(lc;θ))))
G (lc; theta)) is a density map obtained by generating the network, with the real value of lc.
And 7: training the countermeasure generation network.
According to the scheme, a confrontation training mode is adopted for predicting the crowd density, and the maximum and minimum problems are optimized by adopting a combined alternate iteration training mode for generating the model and judging the model. Wherein the training generation network is used to generate an accurate population density map to fool the discriminators, and conversely, the discriminators are trained to discriminate between the generated density map and the true density map labels. At the same time, the output of the discriminator will provide feedback to the generator of the density map location and prediction accuracy. The two networks compete for training at the same time so as to improve the generated effect until the sample generated by the generator cannot be correctly judged by the discriminator.
Claims (3)
1. A convolutional network construction method suitable for subway station crowd counting is characterized by comprising the following steps:
step 1, building a deep learning framework as a density generation network:
the crowd counting model is configured by adopting a U-Net algorithm, and the front 13 layers of a basic network VGG16 are used for feature extraction, wherein the VGG comprises 13 convolution layers and three maximum pooling layers;
firstly, marking the crowd by using a Gaussian filtering method on a sample of the acquired crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map;
then, taking the samples of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training;
the density generation network comprises a feature extraction module and a density generation module; the feature extraction module is composed of the front 13 layers in the VGG16 model, all uses a 3x3 convolution kernel, and uses 1x1 convolution to obtain a density map;
step 3, optimizing by using a loss function;
and 4, the training method of the density generation network comprises the following steps:
(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by using a pre-trained VGG model;
(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;
(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;
step 5, designing a discrimination network for discriminating the accuracy of the generated density map;
step 6, optimizing and judging the network by using the loss function;
marking the generated density graph as 0, and marking the label of the real density graph as 1; the output of the arbiter represents the probability that the generated density map is the true density map;
an additional antagonistic loss function is adopted to improve the quality of the generated density map;
step 7 training the confrontation generation network
Adopting an antagonistic training mode to predict the crowd density:
the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; wherein the training generation network is used to generate an accurate population density map to fool the discriminator, and conversely, the discriminator is trained to discriminate between the generated density map and a true density map label;
meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision;
the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.
2. The method for constructing a convolutional network suitable for counting the population at a subway station as claimed in claim 1, wherein said density map of step 2) comprises:
2.1) generating a single-channel picture with the same size as the original sample picture, wherein all pixel points are 0;
2.2) marking the point with the head in the label as 1;
2.3) processing the graph through Gaussian filtering to form a graph which is a crowd density graph.
3. The method for constructing a convolutional network suitable for counting the population at a subway station as claimed in claim 1, wherein in step 3), the loss function is euclidean distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110250379.1A CN112818945A (en) | 2021-03-08 | 2021-03-08 | Convolutional network construction method suitable for subway station crowd counting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110250379.1A CN112818945A (en) | 2021-03-08 | 2021-03-08 | Convolutional network construction method suitable for subway station crowd counting |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112818945A true CN112818945A (en) | 2021-05-18 |
Family
ID=75862967
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110250379.1A Pending CN112818945A (en) | 2021-03-08 | 2021-03-08 | Convolutional network construction method suitable for subway station crowd counting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818945A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255526A (en) * | 2021-05-28 | 2021-08-13 | 华中科技大学 | Momentum-based confrontation sample generation method and system for crowd counting model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764085A (en) * | 2018-05-17 | 2018-11-06 | 上海交通大学 | Based on the people counting method for generating confrontation network |
CN109522857A (en) * | 2018-11-26 | 2019-03-26 | 山东大学 | A kind of Population size estimation method based on production confrontation network model |
EP3618077A1 (en) * | 2018-08-27 | 2020-03-04 | Koninklijke Philips N.V. | Generating metadata for trained model |
US20200074186A1 (en) * | 2018-08-28 | 2020-03-05 | Beihang University | Dense crowd counting method and apparatus |
CN110879982A (en) * | 2019-11-15 | 2020-03-13 | 苏州大学 | Crowd counting system and method |
CN111191667A (en) * | 2018-11-15 | 2020-05-22 | 天津大学青岛海洋技术研究院 | Crowd counting method for generating confrontation network based on multiple scales |
-
2021
- 2021-03-08 CN CN202110250379.1A patent/CN112818945A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764085A (en) * | 2018-05-17 | 2018-11-06 | 上海交通大学 | Based on the people counting method for generating confrontation network |
EP3618077A1 (en) * | 2018-08-27 | 2020-03-04 | Koninklijke Philips N.V. | Generating metadata for trained model |
US20200074186A1 (en) * | 2018-08-28 | 2020-03-05 | Beihang University | Dense crowd counting method and apparatus |
CN111191667A (en) * | 2018-11-15 | 2020-05-22 | 天津大学青岛海洋技术研究院 | Crowd counting method for generating confrontation network based on multiple scales |
CN109522857A (en) * | 2018-11-26 | 2019-03-26 | 山东大学 | A kind of Population size estimation method based on production confrontation network model |
CN110879982A (en) * | 2019-11-15 | 2020-03-13 | 苏州大学 | Crowd counting system and method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255526A (en) * | 2021-05-28 | 2021-08-13 | 华中科技大学 | Momentum-based confrontation sample generation method and system for crowd counting model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110348319B (en) | Face anti-counterfeiting method based on face depth information and edge image fusion | |
CN105975941B (en) | A kind of multi-direction vehicle detection identifying system based on deep learning | |
CN111191667B (en) | Crowd counting method based on multiscale generation countermeasure network | |
CN103390164B (en) | Method for checking object based on depth image and its realize device | |
CN106295124B (en) | The method of a variety of image detecting technique comprehensive analysis gene subgraph likelihood probability amounts | |
EP0363828B1 (en) | Method and apparatus for adaptive learning type general purpose image measurement and recognition | |
CN108830188A (en) | Vehicle checking method based on deep learning | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN102214309B (en) | Special human body recognition method based on head and shoulder model | |
CN106529499A (en) | Fourier descriptor and gait energy image fusion feature-based gait identification method | |
CN109255289B (en) | Cross-aging face recognition method based on unified generation model | |
CN102436589A (en) | Automatic complex target identification method based on multi-class primitive autonomous learning | |
CN110298297A (en) | Flame identification method and device | |
CN112926522B (en) | Behavior recognition method based on skeleton gesture and space-time diagram convolution network | |
CN112365586B (en) | 3D face modeling and stereo judging method and binocular 3D face modeling and stereo judging method of embedded platform | |
Ali et al. | Vehicle detection and tracking in UAV imagery via YOLOv3 and Kalman filter | |
CN109636834A (en) | Video frequency vehicle target tracking algorism based on TLD innovatory algorithm | |
CN111242046A (en) | Ground traffic sign identification method based on image retrieval | |
CN112115838B (en) | Face classification method based on thermal infrared image spectrum fusion | |
CN117333948A (en) | End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism | |
Yang et al. | A Face Detection Method Based on Skin Color Model and Improved AdaBoost Algorithm. | |
CN112818945A (en) | Convolutional network construction method suitable for subway station crowd counting | |
CN103971100A (en) | Video-based camouflage and peeping behavior detection method for automated teller machine | |
Kim et al. | Object Modeling with Color Arrangement for Region‐Based Tracking | |
CN114627424A (en) | Gait recognition method and system based on visual angle transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210518 |