CN112818945A - Convolutional network construction method suitable for subway station crowd counting - Google Patents

Convolutional network construction method suitable for subway station crowd counting Download PDF

Info

Publication number
CN112818945A
CN112818945A CN202110250379.1A CN202110250379A CN112818945A CN 112818945 A CN112818945 A CN 112818945A CN 202110250379 A CN202110250379 A CN 202110250379A CN 112818945 A CN112818945 A CN 112818945A
Authority
CN
China
Prior art keywords
density
crowd
network
training
density map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110250379.1A
Other languages
Chinese (zh)
Inventor
张正
田青
仝淑贞
张华�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN202110250379.1A priority Critical patent/CN112818945A/en
Publication of CN112818945A publication Critical patent/CN112818945A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A convolutional network construction method suitable for subway station crowd counting comprises the following steps: firstly, a deep learning framework is set up as a density generation network: marking the crowd by using a Gaussian filtering method on the collected sample of the crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map; taking the sample of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training; optimization is performed by using a loss function. Then, designing a judgment network for judging the accuracy of the generated density map; optimizing and judging the network by using a loss function; training an confrontation generation network, and predicting the crowd density by adopting a confrontation training mode: the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision; the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.

Description

Convolutional network construction method suitable for subway station crowd counting
Technical Field
The invention relates to the field of population counting, in particular to a density generation model construction method based on a convolutional neural network.
Technical Field
People counting is an important subject in the field of computer vision, and the main task is to estimate the number of people from an image, to accurately obtain the crowd density and to give the number of people in the image. In recent years, people counting is widely applied to the fields of intelligent video monitoring, public safety, intelligent early warning and the like. However, due to the fact that the target deforms due to factors such as visual angle, shading and scale transformation, people counting becomes a challenging task especially in a subway station scene.
The traditional population counting method uses a target detection method, and the research focuses on feature extraction and feature classification. Thus, researchers have proposed various forms of features and classifiers. However, since the conventional target detection method uses designed features, the accuracy of target detection is not practically required even if the best non-linear classifier is used for feature classification. There are three main disadvantages to the design's characteristics: 1) the designed characteristics are low-level characteristics, and the expression capacity of the design on the target is insufficient; 2) the separability of the designed features is poor, so that the error rate of classification is high; 3) the designed features are targeted, and it is difficult to comprehensively grasp the multi-scale features to apply to crowd counting. Davies et al, first proposed that a nearly linear relationship is present between the pixel characteristics of an image and the total number of people in the image, so authors separate a foreground image from the entire image by a three-frame difference method, then count the pixel characteristics of the foreground region, and then find the mapping relationship between the number of characteristics and the total number of people by a look-ahead equation. The researchers then turned their eyes to the convolutional neural network CNN, using the density map as a regression target.
Disclosure of Invention
The invention aims to solve the technical problem of providing a density generation model construction method based on a convolutional neural network, and aims to solve the problems that when an existing population counting model is applied to identifying complex images or multi-scale targets, the effect is poor, and the existing population counting model is not suitable for counting the population of subway stations.
In order to solve the technical problems, the invention adopts the following technical scheme:
a convolutional network construction method suitable for subway station crowd counting comprises the following steps:
step 1, building a deep learning framework as a density generation network:
the crowd counting model is configured by adopting a U-Net algorithm, and the front 13 layers of a basic network VGG16 are used for feature extraction, wherein the VGG comprises 13 convolution layers and three maximum pooling layers;
firstly, marking the crowd by using a Gaussian filtering method on a sample of the acquired crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map;
then, taking the samples of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training;
the density generation network comprises a feature extraction module and a density generation module; the feature extraction module is composed of the front 13 layers in the VGG16 model, all uses a 3x3 convolution kernel, and uses 1x1 convolution to obtain a density map;
and step 3: optimizing by using a loss function;
and 4, step 4: the training method of the density generation network comprises the following steps:
(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by using a pre-trained VGG model;
(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;
(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;
and 5: designing a discrimination network, namely a discriminator (discrimination module), for discriminating the accuracy of the generated density map;
step 6: optimizing and judging the network by using a loss function;
marking the generated density graph as 0, and marking the label of the real density graph as 1; the output of the arbiter represents the probability that the generated density map is the true density map;
an additional penalty function is used to improve the quality of the generated density map, the penalty function being expressed by
Ladv=-log(D(lc,(G(lc;θ))))
G (lc; theta)) is a density map obtained by generating the network, with the real value of lc.
And 7: training confrontation generation network
Adopting an antagonistic training mode to predict the crowd density:
the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; wherein the training generation network is used to generate an accurate population density map to fool the discriminator, and conversely, the discriminator is trained to discriminate between the generated density map and a true density map label;
meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision;
the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.
The density map step of the step 2) comprises the following steps:
2.1) generating a single-channel picture with the same size as the original sample picture, wherein all pixel points are 0;
2.2) marking the point with the head in the label as 1;
2.3) processing the graph through Gaussian filtering to form a graph which is a crowd density graph; the Gaussian filter is fixed
Gauss filtering is determined, and parameters are selected to be mu-15 and sigma-4;
in step 3), the loss function is the euclidean distance.
Drawings
Fig. 1 is a schematic diagram of a network architecture.
Detailed Description
The technical scheme is explained in the following with the accompanying drawings:
a convolutional network construction method suitable for subway station crowd counting comprises the following steps:
step 1: constructing a PyTorch deep learning framework:
the crowd counting model is configured by using a U-Net algorithm, and feature extraction is carried out by using the front 13 layers of a basic network VGG16, wherein the VGG comprises 13 convolution layers and three maximum pooling layers.
Step 2: sample treatment:
using a fixed kernel gaussian filter mu-15 and sigma-4 to generate a label density map for the group labels, wherein the method comprises the following steps:
1, generating a single-channel picture with the same size as the original picture, wherein all pixel points are 0;
2 mark the spot with a human head in label as 1;
3 processing the graph through Gaussian filtering to form a graph which is a crowd density graph;
and then the crowd graph and the crowd density graph are used as a combination to be sent into a density generation network for training.
As in fig. 1, the density generation network is composed of two parts: the device comprises a feature extraction module and a density generation module;
in the scheme, the feature extraction module is composed of the first 13 layers in the VGG16 model, all 3x3 convolution kernels are used, and a density map is obtained by using 1x1 convolution;
and step 3: optimizing the density by using a loss function to generate a network;
the present scheme uses euclidean distance as a loss function,
Figure BDA0002965785960000031
and 4, step 4: training the network, the method is as follows:
(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by adopting a pre-trained VGG model in order to prevent overfitting;
(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;
(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;
and 5: designing a network for judging the accuracy of the generated density map;
step 6: optimizing and judging the network by using a loss function;
the density map label that marks the generated density map as 0 true is marked as 1. The output of the discriminator represents the probability that the generated density map is a true density map. An additional penalty function is used in the method to improve the quality of the generated density map. The loss-fighting function is expressed by
Ladv=-log(D(lc,(G(lc;θ))))
G (lc; theta)) is a density map obtained by generating the network, with the real value of lc.
And 7: training the countermeasure generation network.
According to the scheme, a confrontation training mode is adopted for predicting the crowd density, and the maximum and minimum problems are optimized by adopting a combined alternate iteration training mode for generating the model and judging the model. Wherein the training generation network is used to generate an accurate population density map to fool the discriminators, and conversely, the discriminators are trained to discriminate between the generated density map and the true density map labels. At the same time, the output of the discriminator will provide feedback to the generator of the density map location and prediction accuracy. The two networks compete for training at the same time so as to improve the generated effect until the sample generated by the generator cannot be correctly judged by the discriminator.

Claims (3)

1. A convolutional network construction method suitable for subway station crowd counting is characterized by comprising the following steps:
step 1, building a deep learning framework as a density generation network:
the crowd counting model is configured by adopting a U-Net algorithm, and the front 13 layers of a basic network VGG16 are used for feature extraction, wherein the VGG comprises 13 convolution layers and three maximum pooling layers;
firstly, marking the crowd by using a Gaussian filtering method on a sample of the acquired crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map;
then, taking the samples of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training;
the density generation network comprises a feature extraction module and a density generation module; the feature extraction module is composed of the front 13 layers in the VGG16 model, all uses a 3x3 convolution kernel, and uses 1x1 convolution to obtain a density map;
step 3, optimizing by using a loss function;
and 4, the training method of the density generation network comprises the following steps:
(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by using a pre-trained VGG model;
(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;
(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;
step 5, designing a discrimination network for discriminating the accuracy of the generated density map;
step 6, optimizing and judging the network by using the loss function;
marking the generated density graph as 0, and marking the label of the real density graph as 1; the output of the arbiter represents the probability that the generated density map is the true density map;
an additional antagonistic loss function is adopted to improve the quality of the generated density map;
step 7 training the confrontation generation network
Adopting an antagonistic training mode to predict the crowd density:
the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; wherein the training generation network is used to generate an accurate population density map to fool the discriminator, and conversely, the discriminator is trained to discriminate between the generated density map and a true density map label;
meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision;
the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.
2. The method for constructing a convolutional network suitable for counting the population at a subway station as claimed in claim 1, wherein said density map of step 2) comprises:
2.1) generating a single-channel picture with the same size as the original sample picture, wherein all pixel points are 0;
2.2) marking the point with the head in the label as 1;
2.3) processing the graph through Gaussian filtering to form a graph which is a crowd density graph.
3. The method for constructing a convolutional network suitable for counting the population at a subway station as claimed in claim 1, wherein in step 3), the loss function is euclidean distance.
CN202110250379.1A 2021-03-08 2021-03-08 Convolutional network construction method suitable for subway station crowd counting Pending CN112818945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110250379.1A CN112818945A (en) 2021-03-08 2021-03-08 Convolutional network construction method suitable for subway station crowd counting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110250379.1A CN112818945A (en) 2021-03-08 2021-03-08 Convolutional network construction method suitable for subway station crowd counting

Publications (1)

Publication Number Publication Date
CN112818945A true CN112818945A (en) 2021-05-18

Family

ID=75862967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110250379.1A Pending CN112818945A (en) 2021-03-08 2021-03-08 Convolutional network construction method suitable for subway station crowd counting

Country Status (1)

Country Link
CN (1) CN112818945A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255526A (en) * 2021-05-28 2021-08-13 华中科技大学 Momentum-based confrontation sample generation method and system for crowd counting model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764085A (en) * 2018-05-17 2018-11-06 上海交通大学 Based on the people counting method for generating confrontation network
CN109522857A (en) * 2018-11-26 2019-03-26 山东大学 A kind of Population size estimation method based on production confrontation network model
EP3618077A1 (en) * 2018-08-27 2020-03-04 Koninklijke Philips N.V. Generating metadata for trained model
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN110879982A (en) * 2019-11-15 2020-03-13 苏州大学 Crowd counting system and method
CN111191667A (en) * 2018-11-15 2020-05-22 天津大学青岛海洋技术研究院 Crowd counting method for generating confrontation network based on multiple scales

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764085A (en) * 2018-05-17 2018-11-06 上海交通大学 Based on the people counting method for generating confrontation network
EP3618077A1 (en) * 2018-08-27 2020-03-04 Koninklijke Philips N.V. Generating metadata for trained model
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN111191667A (en) * 2018-11-15 2020-05-22 天津大学青岛海洋技术研究院 Crowd counting method for generating confrontation network based on multiple scales
CN109522857A (en) * 2018-11-26 2019-03-26 山东大学 A kind of Population size estimation method based on production confrontation network model
CN110879982A (en) * 2019-11-15 2020-03-13 苏州大学 Crowd counting system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255526A (en) * 2021-05-28 2021-08-13 华中科技大学 Momentum-based confrontation sample generation method and system for crowd counting model

Similar Documents

Publication Publication Date Title
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN105975941B (en) A kind of multi-direction vehicle detection identifying system based on deep learning
CN111191667B (en) Crowd counting method based on multiscale generation countermeasure network
CN103390164B (en) Method for checking object based on depth image and its realize device
CN106295124B (en) The method of a variety of image detecting technique comprehensive analysis gene subgraph likelihood probability amounts
EP0363828B1 (en) Method and apparatus for adaptive learning type general purpose image measurement and recognition
CN108830188A (en) Vehicle checking method based on deep learning
CN107633226B (en) Human body motion tracking feature processing method
CN102214309B (en) Special human body recognition method based on head and shoulder model
CN106529499A (en) Fourier descriptor and gait energy image fusion feature-based gait identification method
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN102436589A (en) Automatic complex target identification method based on multi-class primitive autonomous learning
CN110298297A (en) Flame identification method and device
CN112926522B (en) Behavior recognition method based on skeleton gesture and space-time diagram convolution network
CN112365586B (en) 3D face modeling and stereo judging method and binocular 3D face modeling and stereo judging method of embedded platform
Ali et al. Vehicle detection and tracking in UAV imagery via YOLOv3 and Kalman filter
CN109636834A (en) Video frequency vehicle target tracking algorism based on TLD innovatory algorithm
CN111242046A (en) Ground traffic sign identification method based on image retrieval
CN112115838B (en) Face classification method based on thermal infrared image spectrum fusion
CN117333948A (en) End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism
Yang et al. A Face Detection Method Based on Skin Color Model and Improved AdaBoost Algorithm.
CN112818945A (en) Convolutional network construction method suitable for subway station crowd counting
CN103971100A (en) Video-based camouflage and peeping behavior detection method for automated teller machine
Kim et al. Object Modeling with Color Arrangement for Region‐Based Tracking
CN114627424A (en) Gait recognition method and system based on visual angle transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210518