CN112818945A

CN112818945A - Convolutional network construction method suitable for subway station crowd counting

Info

Publication number: CN112818945A
Application number: CN202110250379.1A
Authority: CN
Inventors: 张正; 田青; 仝淑贞; 张华�
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2021-05-18

Abstract

A convolutional network construction method suitable for subway station crowd counting comprises the following steps: firstly, a deep learning framework is set up as a density generation network: marking the crowd by using a Gaussian filtering method on the collected sample of the crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map; taking the sample of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training; optimization is performed by using a loss function. Then, designing a judgment network for judging the accuracy of the generated density map; optimizing and judging the network by using a loss function; training an confrontation generation network, and predicting the crowd density by adopting a confrontation training mode: the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision; the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.

Description

Convolutional network construction method suitable for subway station crowd counting

Technical Field

The invention relates to the field of population counting, in particular to a density generation model construction method based on a convolutional neural network.

Technical Field

People counting is an important subject in the field of computer vision, and the main task is to estimate the number of people from an image, to accurately obtain the crowd density and to give the number of people in the image. In recent years, people counting is widely applied to the fields of intelligent video monitoring, public safety, intelligent early warning and the like. However, due to the fact that the target deforms due to factors such as visual angle, shading and scale transformation, people counting becomes a challenging task especially in a subway station scene.

The traditional population counting method uses a target detection method, and the research focuses on feature extraction and feature classification. Thus, researchers have proposed various forms of features and classifiers. However, since the conventional target detection method uses designed features, the accuracy of target detection is not practically required even if the best non-linear classifier is used for feature classification. There are three main disadvantages to the design's characteristics: 1) the designed characteristics are low-level characteristics, and the expression capacity of the design on the target is insufficient; 2) the separability of the designed features is poor, so that the error rate of classification is high; 3) the designed features are targeted, and it is difficult to comprehensively grasp the multi-scale features to apply to crowd counting. Davies et al, first proposed that a nearly linear relationship is present between the pixel characteristics of an image and the total number of people in the image, so authors separate a foreground image from the entire image by a three-frame difference method, then count the pixel characteristics of the foreground region, and then find the mapping relationship between the number of characteristics and the total number of people by a look-ahead equation. The researchers then turned their eyes to the convolutional neural network CNN, using the density map as a regression target.

Disclosure of Invention

The invention aims to solve the technical problem of providing a density generation model construction method based on a convolutional neural network, and aims to solve the problems that when an existing population counting model is applied to identifying complex images or multi-scale targets, the effect is poor, and the existing population counting model is not suitable for counting the population of subway stations.

In order to solve the technical problems, the invention adopts the following technical scheme:

a convolutional network construction method suitable for subway station crowd counting comprises the following steps:

step 1, building a deep learning framework as a density generation network:

the crowd counting model is configured by adopting a U-Net algorithm, and the front 13 layers of a basic network VGG16 are used for feature extraction, wherein the VGG comprises 13 convolution layers and three maximum pooling layers;

firstly, marking the crowd by using a Gaussian filtering method on a sample of the acquired crowd map, and generating a crowd density map; calculating the total number of the crowds in the crowd density map;

then, taking the samples of the crowd graph and the corresponding crowd density graph as a combination to be sent to a density generation network for training;

the density generation network comprises a feature extraction module and a density generation module; the feature extraction module is composed of the front 13 layers in the VGG16 model, all uses a 3x3 convolution kernel, and uses 1x1 convolution to obtain a density map;

and step 3: optimizing by using a loss function;

and 4, step 4: the training method of the density generation network comprises the following steps:

(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by using a pre-trained VGG model;

(b) the newly added convolution layer weight is randomly initialized by zero mean Gaussian distribution, and the standard deviation is 0.01;

(c) in each iteration process, inputting a batch of marked training data into a network, and then updating parameters;

and 5: designing a discrimination network, namely a discriminator (discrimination module), for discriminating the accuracy of the generated density map;

step 6: optimizing and judging the network by using a loss function;

marking the generated density graph as 0, and marking the label of the real density graph as 1; the output of the arbiter represents the probability that the generated density map is the true density map;

an additional penalty function is used to improve the quality of the generated density map, the penalty function being expressed by

L_adv＝-log(D(lc，(G(lc；θ))))

G (lc; theta)) is a density map obtained by generating the network, with the real value of lc.

And 7: training confrontation generation network

Adopting an antagonistic training mode to predict the crowd density:

the maximum and minimum problems of the generated model and the discrimination model are optimized by adopting a training mode of joint alternating iteration; wherein the training generation network is used to generate an accurate population density map to fool the discriminator, and conversely, the discriminator is trained to discriminate between the generated density map and a true density map label;

meanwhile, the output of the discriminator provides the generator with the feedback of the density map position and the prediction precision;

the two networks compete for training at the same time until the samples generated by the generator cannot be correctly judged by the arbiter.

The density map step of the step 2) comprises the following steps:

2.1) generating a single-channel picture with the same size as the original sample picture, wherein all pixel points are 0;

2.2) marking the point with the head in the label as 1;

2.3) processing the graph through Gaussian filtering to form a graph which is a crowd density graph; the Gaussian filter is fixed

Gauss filtering is determined, and parameters are selected to be mu-15 and sigma-4;

in step 3), the loss function is the euclidean distance.

Drawings

Fig. 1 is a schematic diagram of a network architecture.

Detailed Description

The technical scheme is explained in the following with the accompanying drawings:

step 1: constructing a PyTorch deep learning framework:

the crowd counting model is configured by using a U-Net algorithm, and feature extraction is carried out by using the front 13 layers of a basic network VGG16, wherein the VGG comprises 13 convolution layers and three maximum pooling layers.

Step 2: sample treatment:

using a fixed kernel gaussian filter mu-15 and sigma-4 to generate a label density map for the group labels, wherein the method comprises the following steps:

1, generating a single-channel picture with the same size as the original picture, wherein all pixel points are 0;

2 mark the spot with a human head in label as 1;

3 processing the graph through Gaussian filtering to form a graph which is a crowd density graph;

and then the crowd graph and the crowd density graph are used as a combination to be sent into a density generation network for training.

As in fig. 1, the density generation network is composed of two parts: the device comprises a feature extraction module and a density generation module;

in the scheme, the feature extraction module is composed of the first 13 layers in the VGG16 model, all 3x3 convolution kernels are used, and a density map is obtained by using 1x1 convolution;

and step 3: optimizing the density by using a loss function to generate a network;

the present scheme uses euclidean distance as a loss function,

and 4, step 4: training the network, the method is as follows:

(a) training by adopting a random gradient descent method, and initializing the first five convolutional layers in the density generation network by adopting a pre-trained VGG model in order to prevent overfitting;

and 5: designing a network for judging the accuracy of the generated density map;

step 6: optimizing and judging the network by using a loss function;

the density map label that marks the generated density map as 0 true is marked as 1. The output of the discriminator represents the probability that the generated density map is a true density map. An additional penalty function is used in the method to improve the quality of the generated density map. The loss-fighting function is expressed by

L_adv＝-log(D(lc，(G(lc；θ))))

And 7: training the countermeasure generation network.

According to the scheme, a confrontation training mode is adopted for predicting the crowd density, and the maximum and minimum problems are optimized by adopting a combined alternate iteration training mode for generating the model and judging the model. Wherein the training generation network is used to generate an accurate population density map to fool the discriminators, and conversely, the discriminators are trained to discriminate between the generated density map and the true density map labels. At the same time, the output of the discriminator will provide feedback to the generator of the density map location and prediction accuracy. The two networks compete for training at the same time so as to improve the generated effect until the sample generated by the generator cannot be correctly judged by the discriminator.

Claims

1. A convolutional network construction method suitable for subway station crowd counting is characterized by comprising the following steps:

step 1, building a deep learning framework as a density generation network:

step 3, optimizing by using a loss function;

and 4, the training method of the density generation network comprises the following steps:

step 5, designing a discrimination network for discriminating the accuracy of the generated density map;

step 6, optimizing and judging the network by using the loss function;

an additional antagonistic loss function is adopted to improve the quality of the generated density map;

step 7 training the confrontation generation network

Adopting an antagonistic training mode to predict the crowd density:

2. The method for constructing a convolutional network suitable for counting the population at a subway station as claimed in claim 1, wherein said density map of step 2) comprises:

2.2) marking the point with the head in the label as 1;

2.3) processing the graph through Gaussian filtering to form a graph which is a crowd density graph.

3. The method for constructing a convolutional network suitable for counting the population at a subway station as claimed in claim 1, wherein in step 3), the loss function is euclidean distance.