WO2020169043A1

WO2020169043A1 - Dense crowd counting method, apparatus and device, and storage medium

Info

Publication number: WO2020169043A1
Application number: PCT/CN2020/075795
Authority: WO
Inventors: 张莉; 陆金刚; 周伟达; 王邦军; 章晓芳; 屈蕴茜; 赵雷
Original assignee: 苏州大学
Priority date: 2019-02-21
Filing date: 2020-02-19
Publication date: 2020-08-27
Also published as: CN109858461A; CN109858461B

Abstract

Provided are a dense crowd counting method, apparatus and device, and a computer-readable storage medium. The method comprises: inputting an image to be tested into a target multi-scale multi-column convolutional neural network model comprising multiple columns of parallel convolutional neural networks, wherein each column of convolutional neural networks comprises multiple convolutional layers with different convolutional kernel sizes and quantities; processing the image to be tested by using each convolutional layer in each column of convolutional neural networks, and fusing feature maps output by pre-selected convolutional layers in each column of convolutional neural networks, so as to obtain estimated density maps output by each column of convolutional neural networks; fusing the estimated density maps output by each column of convolutional neural networks to obtain a target estimated density map of the image to be tested; and calculating the number of people in the image to be tested according to the target estimated density map. By means of the provided method, apparatus and device and computer-readable storage medium, the accuracy of a dense crowd image prediction result is improved.

Description

Method, device, equipment and storage medium for counting dense crowds

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 21, 2019, the application number is 201910129612.3, and the invention title is "A method, device, equipment, and storage medium for dense crowd counting", and its entire contents Incorporated in this application by reference.

Technical field

The present invention relates to the field of computer vision technology, in particular to a method, device, equipment and computer-readable storage medium for counting dense crowds.

Background technique

For crowd control and public safety, accurately estimating the crowd from images or videos has become an increasingly important application of computer vision technology. The task of crowd counting in computer vision is to automatically count the number of people in an image or video. To help control crowds and public safety in many scenarios such as public gatherings and sports events, accurate crowd counting is required.

Traditional dense crowd counting methods include two types: detection-based methods and regression-based methods. Detection-based methods treat the population as a set of detected individual entities. However, pedestrians are often obscured by dense crowds, which is especially challenging when estimating crowds in still images. The regression-based method regresses the scalar value (such as the number of people) or density map of various features extracted from the crowd image. They basically have two steps: first, extract effective features from crowd images; second, use various regression functions to estimate the number of crowds. However, crowd counting by regression is susceptible to sharp changes in viewing angles and scales, which usually exist in crowd images.

At the same time, deep learning has been successfully applied to the estimation of dense crowd images. The mainstream estimation method adopts the idea of density map, which is to design a neural network, the input of the network is the original image, and the output is the density map of the crowd. The first step of this kind of method for image processing of dense crowds is to pass a Gaussian filter to obtain the density map corresponding to the image according to the ground-truth of the image. Zhang et al. proposed a multi-column convolutional neural network in "Single-Image Crowd Counting via Multi-Column Convolutional Neural Network". The network is composed of three parallel convolutional neural networks. Each column uses convolution kernels with different receptive field sizes, corresponding to human heads with different scales; each column has the same composition except for the size and number of convolution kernels; adopts The maximum pooling and ReLU activation function of size is; finally, the three columns of feature maps are connected in series on the number of channels, and a convolution kernel is used to map them to the estimated density map output. However, the structure of the multi-column convolutional neural network is simple and the number of layers is small. Some features extracted by the previous convolutional layer may be discarded in the subsequent process and the extracted features are not enough to affect the final result.

In summary, it can be seen that how to improve the accuracy of the prediction results of dense crowds is a problem to be solved at present.

Summary of the invention

The purpose of the present invention is to provide a method, device, device, and computer-readable storage medium for dense crowd counting to solve the problem of poor performance of the neural network for dense crowd counting provided in the prior art.

In order to solve the above technical problems, the present invention provides a dense crowd counting method, including: inputting the image to be tested into a pre-trained target multi-scale and multi-column convolutional neural network model; wherein the target multi-scale and multi-column The convolutional neural network model includes multiple columns of parallel convolutional neural networks, and each column of convolutional neural networks includes multiple convolutional layers with different sizes and numbers of convolution kernels; the images to be tested are input to each In the column convolutional neural network, each convolutional layer in each column of the convolutional neural network is used to process the image to be tested, and the feature map output by the preselected convolutional layer in each column of the convolutional neural network is processed Fusion, so as to separately obtain the estimated density map output by each column of the convolutional neural network; after fusing the estimated density map output by each column of the convolutional neural network, the target estimated density map of the image to be tested is obtained; According to the target estimated density map of the image to be tested, the number of people in the image to be tested is calculated.

Preferably, the input of the image to be tested into the pre-trained target multi-scale multi-column convolutional neural network model includes:

After performing filtering processing on the pre-created crowd image data set by using a Gaussian filter, a density map of each image in the crowd image data set is obtained, thereby constructing a target training set;

The target training set is used to train the multi-scale and multi-column convolutional neural network model to obtain the target multi-scale and multi-column convolutional neural network model after the training is completed.

Preferably, after performing filtering processing on a pre-created crowd image data set by using a Gaussian filter, obtaining a density map of each image in the crowd image data set to construct a target training set includes:

Obtain pre-collected crowd image dataset

Wherein, _X-i is the i-th groups of image data sets of images, size is m * n; Y _i is the i-images corresponding to the head coordinate point view of size m * n, N is the image groups The total number of images in the data set;

Use Gaussian filter on the crowd image data set

Each of the X _i images after filtering, to obtain the density map M _i X _i of each image, using the density of each image in FIG M _i X _i of the training set target construct

Preferably, the training a multi-scale and multi-column convolutional neural network model using the target training set includes:

Input the current crowd image in the target training set into each column of the convolutional neural network of the multi-scale and multi-column convolutional neural network model;

Wherein, each column of the convolutional neural network in the multi-scale and multi-column convolutional neural network model is parallel to each other, and the convolutional neural network of each column has the same network structure except for the size and number of convolution kernels;

After concatenating the estimated density map of the current crowd image output by each column of the convolutional neural network on the number of channels, it passes through a total convolutional layer with a convolution kernel size of 1*1, and the total convolution The feature map output by the layer is mapped to the target estimated density map of the current crowd image, so that the target estimated density map of the current crowd image is used as the network output of the multi-scale and multi-column convolutional neural network model.

Preferably, each column of the convolutional neural network of the multi-scale and multi-column convolutional neural network model includes:

The first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, the deconvolutional layer, the sixth convolutional layer, and the seventh convolutional layer;

Wherein, the size of the convolution kernels of the first convolutional layer and other convolutional layers are different, and the second convolutional layer, the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer The size of the convolution kernel is the same as that of the sixth convolutional layer, and the convolutional layers of the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer The number of product cores is the same;

The pooling layer selection area between the first convolutional layer, the second convolutional layer, the third convolutional layer, and the fourth convolutional layer is 2*2, and the maximum step size is 2 Pooling

The pooling layer between the fourth convolutional layer and the fifth convolutional layer selects a 3*3 area with a maximum pooling step of 1 in order to maintain the output feature map of the fourth convolutional layer and The size of the feature map after the output feature pooling of the fourth convolutional layer remains unchanged;

The activation function of each convolutional layer adopts the ReLU function;

The feature map output by the fourth convolution layer and the feature map output by the fifth convolution layer are connected in series in the number of channels and then input to the deconvolution layer. The feature map output by the deconvolution layer and the The feature map output by the third convolutional layer is connected in series on the number of channels and then input to the sixth convolutional layer. The eighth convolutional layer outputs the estimated density map of the image to be tested as the convolutional neural network for each column The output of the model.

Preferably, said calculating the number of persons in the image to be tested according to the target estimated density map of the image to be tested includes:

Input the image T to be tested into the target multi-scale multi-column convolutional neural network model to obtain the estimated density map of the image T to be tested

After calculating the estimated density map

The sum of all pixel values in the image to get the number of people in the image to be tested

The present invention also provides a device for counting dense crowds, including:

The input module is used to input the image to be tested into the pre-trained target multi-scale and multi-column convolutional neural network model; wherein the target multi-scale and multi-column convolutional neural network model includes a multi-column parallel convolutional neural network , Each column of convolutional neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels;

The processing module is configured to input the image to be tested into each column of the convolutional neural network, use each convolutional layer in each column of the convolutional neural network to process the image to be tested, and Fuse the feature maps output by the preselected convolutional layers in each column of the convolutional neural network, so as to obtain the estimated density maps output by each column of the convolutional neural network respectively;

The output module is used to fuse the estimated density map output by each column of the convolutional neural network to obtain the target estimated density map of the image to be tested;

The calculation module is used to calculate the number of people in the image to be tested according to the target estimated density map of the image to be tested.

Preferably, the output module includes:

The training module is used to filter the pre-created crowd image data set by using a Gaussian filter, and then obtain the density map of each image in the crowd image data set, thereby constructing a target training set;

The memory is used to store a computer program; the processor is used to implement the steps of the above-mentioned dense crowd counting method when the computer program is executed.

The present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned dense crowd counting method are realized.

The dense crowd counting method provided by the present invention uses a pre-trained target multi-scale multi-column convolutional neural network model to predict the test image. The target multi-scale multi-column convolutional neural network model includes multiple parallel convolutional neural networks. After inputting the image to be tested into the target multi-scale and multi-column convolutional neural network model, inputting the image to be tested into the convolutional neural network of each column respectively. Each column of the convolutional neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels, and different convolutional layers in each column of the convolutional neural network are used to calculate the image to be tested, The feature maps output by the convolutional layer preselected in each column of the convolutional neural network are merged to extract features of different scales of the image to be tested; the previous convolutional neural network in the prior art is solved Some features extracted by the multi-layer may be discarded in the subsequent process, resulting in insufficient features, which affects the accuracy of the test image prediction results. The method provided by the present invention introduces the idea of multi-scale, which can combine the features extracted from the previous convolutional layer with the features extracted from the subsequent convolutional layer, that is, to combine features with different levels of detail to extract the features. It compensates for some of the features that may be discarded after pooling in the feature map obtained by the convolution layer in front of the traditional neural network, and improves the performance of the dense crowd counting neural network and the accuracy of the dense crowd image prediction result.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions of the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are merely For some of the embodiments of the present invention, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

FIG. 1 is a flowchart of a first specific embodiment of a method for counting dense crowds provided by the present invention;

Figure 2 is a structure diagram of a multi-scale and multi-column convolutional neural network provided by the present invention;

3 is a flowchart of a second specific embodiment of the method for counting dense crowds provided by the present invention;

Fig. 4 is a structural block diagram of a device for counting dense crowds according to an embodiment of the present invention.

detailed description

The core of the present invention is to provide a dense crowd counting method, device, equipment and computer readable storage medium, which improve the performance of the dense crowd counting neural network and the accuracy of the dense crowd image prediction result.

In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Please refer to Fig. 1, which is a flowchart of a first specific embodiment of a method for counting dense crowds provided by the present invention; the specific operation steps are as follows:

Step S101: Input the image to be tested into the pre-trained target multi-scale and multi-column convolutional neural network model, where the target multi-scale and multi-column convolutional neural network model includes a multi-column convolutional neural network. The column convolutional neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels;

Before inputting the image to be tested into the pre-trained target multi-scale and multi-column convolutional neural network model, it is necessary to train the multi-scale and multi-column convolutional neural network (SaMCNN).

When training the multi-scale and multi-column convolutional neural network, a Gaussian filter is first used to analyze the pre-created crowd image data set

After filtering, the acquired image data set population density maps M _i X _i of each image, to construct the training set target

Wherein, _X-i is the i-th groups of image data sets of images, size is m * n; Y _i is the i-images corresponding to the head coordinate point view of size m * n, N is the image groups The total number of images in the dataset. Use the target training set

Train the multi-scale and multi-column convolutional neural network model to obtain the target multi-scale and multi-column convolutional neural network model after training.

As shown in FIG. 2, the multi-scale multi-column convolutional neural network may include a multi-column convolutional neural network. In this embodiment, a three-column convolutional neural network is taken as an example. Each column of the convolutional neural network includes a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer, a deconvolution layer, a sixth convolution layer, and The seventh convolutional layer. Wherein, the size of the convolution kernels of the first convolutional layer and other convolutional layers are different, and the second convolutional layer, the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer The size of the convolution kernel is the same as that of the sixth convolutional layer, and the convolutional layers of the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer The number of product cores is the same. The activation function of each convolutional layer adopts the ReLU function.

The pooling layer selection area between the first convolutional layer, the second convolutional layer, the third convolutional layer, and the fourth convolutional layer is 2*2, and the maximum step size is 2 Pooling; the pooling layer between the fourth convolutional layer and the fifth convolutional layer selects a 3*3 area, and the maximum pooling with a step length of 1, so as to maintain the output of the fourth convolutional layer The size of the feature map and the feature map after the output feature pooling of the fourth convolutional layer remains unchanged.

Step S102: Input the image to be tested into each column of the convolutional neural network, use each convolutional layer in each column of the convolutional neural network to process the image to be tested, and The feature maps output by the preselected convolutional layers in the column convolutional neural network are fused, so as to obtain the estimated density maps output by each column of the convolutional neural network respectively;

Input the image to be tested into the target multi-scale and multi-column convolutional neural network model, and input the image to be tested into each column of the target multi-scale and multi-column convolutional neural network model. in. The convolution layer in each column of the convolution application network processes the data to be tested. Use each convolutional layer and pooling layer in each column of the convolutional network neural network for processing, and select 3*3 between the fourth convolutional layer and the fifth convolutional layer of each column of the convolution application network Area, the maximum pooling with a step length of 1, to keep the size of the feature map before and after pooling unchanged, so that the feature map after two convolutions can be connected in series on the number of channels. After the fifth convolution layer, the deconvolution layer is used to up-sample the previous feature maps, and then the feature maps obtained by the third convolution layer are connected in series with the number of channels.

Step S103: After fusing the estimated density maps output by each column of the convolutional neural network, the target estimated density map of the image to be tested is obtained;

Step S104: According to the target estimated density map of the image to be tested, the number of people in the image to be tested is calculated.

The method provided in this embodiment uses a multi-scale and multi-column convolutional neural network to test the image to be tested. Compared with the multi-column convolutional neural network, the multi-scale and multi-column convolutional neural network increases the number of layers of each column of the convolutional neural network, and introduces the idea of multi-scale, which extracts the feature maps from the previous convolutional layer and The feature maps extracted by the subsequent convolutional layers are combined; thus, the performance of the neural network for counting dense crowds and the accuracy of the prediction results of dense crowds are improved.

Based on the above embodiment, in this embodiment, the second part of the Shanghai tech dataset can be selected as the crowd image dataset, and the dense level map of the second part of the crowd image dataset is used to convolve the multi-scale and multi-column The neural network model is trained. Please refer to FIG. 3, which is a flowchart of a second specific embodiment of a method for counting dense crowds provided by the present invention; the specific operation steps are as follows:

Step 301: After filtering the crowd images in the second part of the Shanghai tech data set by using a Gaussian filter, obtain a degree map of the crowd images in the second part to construct a target training set;

In this embodiment, the second part of the Shanghai tech dataset can be selected as the crowd image dataset

X _i is the _i-th image of the crowd image data set, with a size of 768*1024; Y _i is the human head coordinate point map corresponding to the i-th image, with a size of 768*1024, and N is the crowd image data set The total number of images.

The Shanghai tech data set contains 1,198 annotated images and 330165 head center annotations; the Shanghai tech data set is divided into two parts, of which, the first part includes 482 images randomly crawled from the Internet, of which 300 For training, 182 images were used for testing; the second part included 716 images taken on the streets of Shanghai, 400 of which were used for training and 316 were used for testing.

Step 302: Use the target training set to train the multi-scale and multi-column convolutional neural network model to obtain the target multi-scale and multi-column convolutional neural network model after training;

Step 303: Input the image T to be tested into the target multi-scale and multi-column convolutional neural network model, where the target multi-scale and multi-column convolutional neural network model includes multiple columns of parallel convolutional neural networks, each column The convolutional neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels;

Step 304: After inputting the image T to be tested into the target multi-scale and multi-column convolutional neural network model, output an estimated density map of the image T to be tested

Step S305: Calculate the estimated density map

The multi-scale and multi-column convolutional neural network model provided in this embodiment and the multi-column convolutional neural network model are compared on the same data set for crowd counting. It can be obtained from Table 1 that the average complete error (MAE) and mean square error (MSE) of the counting result of the network model proposed in this embodiment are both smaller than the counting result of the network model in the prior art, and better performance is obtained.

Table-1 Comparison of population count results

Please refer to FIG. 4, which is a block diagram of a device for counting dense crowds according to an embodiment of the present invention. Specific devices may include:

The input module 100 is used to input the image to be tested into a pre-trained target multi-scale multi-column convolutional neural network model; wherein the target multi-scale multi-column convolutional neural network model includes multiple columns of parallel convolutional neural networks Network, each column of convolutional neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels;

The processing module 200 is configured to input the image to be tested into each column of the convolutional neural network, use each convolutional layer in each column of the convolutional neural network to process the image to be tested, and Fusing the feature maps output by the preselected convolutional layers in each column of the convolutional neural network, so as to obtain the estimated density maps output by the convolutional neural network of each column respectively;

The output module 300 is configured to fuse the estimated density map output by each column of the convolutional neural network to obtain the target estimated density map of the image to be tested;

The calculation module 400 is configured to calculate the number of people in the image to be tested according to the target estimated density map of the image to be tested.

The device for counting dense crowds of this embodiment is used to implement the aforementioned method for counting dense crowds. Therefore, the specific implementation of the device for counting dense crowds can be seen in the foregoing embodiment of the method for counting dense crowds, for example, the input module 100 , The processing module 200, the output module 300, and the calculation module 400 are respectively used to implement steps S101, S102, S103, and S104 in the above-mentioned dense crowd counting method. Therefore, for the specific implementation, please refer to the description of the respective parts of the embodiment. I will not repeat them here.

Specific embodiments of the present invention also provide a device for counting crowds of people, including: a memory for storing a computer program; a processor for implementing the steps of the method for counting a crowd of people when executing the computer program.

A specific embodiment of the present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned dense crowd counting method are realized.

The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method part.

Professionals may further realize that the units and algorithm steps of the examples described in the embodiments disclosed in this article can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the possibilities of hardware and software. Interchangeability. In the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.

The steps of the method or algorithm described in the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two. The software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage medium.

The method, device, equipment and computer-readable storage medium for counting dense crowds provided by the present invention have been introduced in detail above. Specific examples are used in this article to illustrate the principle and implementation of the present invention. The description of the above examples is only used to help understand the method and core idea of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.

Claims

A method for counting dense crowds is characterized in that it includes:

Input the image to be tested into the pre-trained target multi-scale and multi-column convolutional neural network model; wherein the target multi-scale and multi-column convolutional neural network model includes multiple columns of parallel convolutional neural networks, and each column of convolution The neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels;

Input the image to be tested into each column of the convolutional neural network, use each convolutional layer in each column of the convolutional neural network to process the image to be tested, and convolve each column The feature maps output by the preselected convolutional layers in the neural network are fused, so as to obtain the estimated density maps output by each column of the convolutional neural network respectively;

After fusing the estimated density maps output by each column of the convolutional neural network, the target estimated density map of the image to be tested is obtained;

According to the target estimated density map of the image to be tested, the number of people in the image to be tested is calculated.
The method according to claim 1, wherein the inputting the image to be tested into the pre-trained target multi-scale multi-column convolutional neural network model comprises:

After performing filtering processing on the pre-created crowd image data set by using a Gaussian filter, a density map of each image in the crowd image data set is obtained, thereby constructing a target training set;

The target training set is used to train the multi-scale and multi-column convolutional neural network model to obtain the target multi-scale and multi-column convolutional neural network model after the training is completed.
The method according to claim 2, characterized in that, after filtering a pre-created crowd image data set by using a Gaussian filter, a density map of each image in the crowd image data set is obtained, thereby constructing target training The set includes:

Obtain pre-collected crowd image dataset
Wherein, X-i is the i-th groups of image data sets of images, size is m * n; Y i is the i-images corresponding to the head coordinate point view of size m * n, N is the image groups The total number of images in the data set;

Use Gaussian filter on the crowd image data set
Each of the X i images after filtering, to obtain the density map M i X i of each image, using the density of each image in FIG M i X i of the training set target construct
The method according to claim 2, wherein said training a multi-scale and multi-column convolutional neural network model by using the target training set comprises:

Input the current crowd image in the target training set into each column of the convolutional neural network of the multi-scale and multi-column convolutional neural network model;

Wherein, each column of the convolutional neural network in the multi-scale and multi-column convolutional neural network model is parallel to each other, and the convolutional neural network of each column has the same network structure except for the size and number of convolution kernels;

After concatenating the estimated density map of the current crowd image output by each column of the convolutional neural network on the number of channels, it passes through a total convolutional layer with a convolution kernel size of 1*1, and the total convolution The feature map output by the layer is mapped to the target estimated density map of the current crowd image, so that the target estimated density map of the current crowd image is used as the network output of the multi-scale and multi-column convolutional neural network model.
The method according to claim 4, wherein each column of the convolutional neural network of the multi-scale and multi-column convolutional neural network model comprises:

The first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, the deconvolutional layer, the sixth convolutional layer, and the seventh convolutional layer;

Wherein, the size of the convolution kernels of the first convolutional layer and other convolutional layers are different, and the second convolutional layer, the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer The size of the convolution kernel is the same as that of the sixth convolutional layer, and the convolutional layers of the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer The number of product cores is the same;

The pooling layer selection area between the first convolutional layer, the second convolutional layer, the third convolutional layer, and the fourth convolutional layer is 2*2, and the maximum step size is 2 Pooling

The pooling layer between the fourth convolutional layer and the fifth convolutional layer selects a 3*3 area with a maximum pooling step of 1 in order to maintain the output feature map of the fourth convolutional layer and The size of the feature map after the output feature pooling of the fourth convolutional layer remains unchanged;

The activation function of each convolutional layer adopts the ReLU function;

The feature map output by the fourth convolution layer and the feature map output by the fifth convolution layer are connected in series in the number of channels and then input to the deconvolution layer. The feature map output by the deconvolution layer and the The feature map output by the third convolutional layer is connected in series on the number of channels and then input to the sixth convolutional layer. The eighth convolutional layer outputs the estimated density map of the image to be tested as the convolutional neural network for each column The output of the model.
The method according to any one of claims 1 to 5, wherein the calculating the number of people in the image to be tested according to the target estimated density map of the image to be tested comprises:

Input the image T to be tested into the target multi-scale multi-column convolutional neural network model to obtain the estimated density map of the image T to be tested
After calculating the estimated density map
The sum of all the pixel values in the image to be tested
A device for counting dense crowds is characterized in that it comprises:

The input module is used to input the image to be tested into the pre-trained target multi-scale and multi-column convolutional neural network model; wherein the target multi-scale and multi-column convolutional neural network model includes a multi-column parallel convolutional neural network , Each column of convolutional neural network includes multiple convolutional layers with different sizes and numbers of convolution kernels;

The processing module is configured to input the image to be tested into each column of the convolutional neural network, use each convolutional layer in each column of the convolutional neural network to process the image to be tested, and Fuse the feature maps output by the preselected convolutional layers in each column of the convolutional neural network, so as to obtain the estimated density maps output by each column of the convolutional neural network respectively;

The output module is used to fuse the estimated density map output by each column of the convolutional neural network to obtain the target estimated density map of the image to be tested;

The calculation module is used to calculate the number of people in the image to be tested according to the target estimated density map of the image to be tested.
8. The device according to claim 7, wherein the output module front comprises:

The training module is used to filter the pre-created crowd image data set by using a Gaussian filter, and then obtain the density map of each image in the crowd image data set, thereby constructing a target training set;

The target training set is used to train the multi-scale and multi-column convolutional neural network model to obtain the target multi-scale and multi-column convolutional neural network model after the training is completed.
A device for counting dense crowds is characterized in that it includes:

Memory, used to store computer programs;

The processor is configured to implement the steps of the dense crowd counting method according to any one of claims 1 to 7 when the computer program is executed.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, a dense group of people as claimed in any one of claims 1 to 7 is realized Steps of counting method.