CN111898578A

CN111898578A - Crowd density acquisition method and device, electronic equipment and computer program

Info

Publication number: CN111898578A
Application number: CN202010797785.5A
Authority: CN
Inventors: 王昌安
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-11-06
Anticipated expiration: 2040-08-10
Also published as: CN111898578B

Abstract

The embodiment of the application provides a crowd density obtaining method, a crowd density obtaining device, electronic equipment and a computer program, and relates to the technical field of image processing, wherein the method comprises the steps of obtaining an initial crowd density graph of an image to be detected, dividing the initial crowd density graph according to at least two sizes, and obtaining a plurality of sub-areas according to each size; carrying out quantization processing on the number of people in the plurality of sub-areas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map; determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes; and determining a target prediction model corresponding to the crowd density category from the trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density graph of the image to be detected. Compared with a prediction model in the prior art, the method and the device have the advantages that the specialization capacity is improved, and the output crowd density graph has higher accuracy.

Description

Crowd density acquisition method and device, electronic equipment and computer program

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for acquiring crowd density, an electronic device, and a computer program.

Background

Crowd density estimation is a technology for automatically estimating the total number of people in a given image, and plays an important role in the fields of people flow statistics, public safety, monitoring and the like.

The current crowd density estimation algorithm mainly utilizes a deep learning technology, extracts crowd density characteristics rich in high-level semantic information from an image through a convolutional neural network, and then regresses a crowd density graph to obtain an estimation result of the crowd density.

The prior art usually takes the estimated total number of people in an image as a crowd density characteristic, and the main idea is to firstly calculate the average distance between each head point and N adjacent neighbors, and take the average value of all the heads in the image as the measure of the crowd density in the image.

Disclosure of Invention

Embodiments of the present invention provide a method, apparatus, electronic device and computer program for obtaining a population density that overcomes or at least partially solves the above mentioned problems.

In a first aspect, a method for obtaining crowd density is provided, the method comprising:

obtaining an initial crowd density graph of an image to be detected, dividing the initial crowd density graph according to at least two sizes, and obtaining a plurality of sub-areas according to each size;

carrying out quantization processing on the number of people in the plurality of sub-areas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map;

determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes;

and determining a target prediction model corresponding to the crowd density category from the trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density graph of the image to be detected.

Further, the quantifying process is performed on the number of people in the plurality of sub-areas to obtain quantifying process results corresponding to at least two sizes of the initial crowd density map, and the quantifying process results comprise:

determining the number of people in the sub-region for the sub-region with any size, and performing quantization processing on the number of people in the sub-region to obtain a quantized value of the number of people in the sub-region;

and carrying out normalization processing on the frequency of the number of people quantization values in the sub-area to obtain a quantization processing result corresponding to any size of the initial crowd density graph.

Further, the quantifying process is carried out on the number of people in the sub-region, and a quantified value of the number of people in the sub-region is obtained, and the quantifying process comprises the following steps:

determining the maximum number of persons for a single subregion from the number of persons in each subregion of any size;

and carrying out quantization processing on the number of people in each sub-area with the same size according to the maximum number of people in each sub-area to obtain a quantified value of the number of people in each sub-area.

Further, determining the crowd density category of the image to be detected according to the quantization processing results corresponding to the at least two sizes respectively, comprising:

obtaining crowd density characteristics according to the quantization processing results respectively corresponding to the at least two sizes;

and inputting the crowd density characteristics into a pre-constructed clustering model to obtain the crowd density category of the image to be detected output by the clustering model.

Further, the image to be detected is input to the target prediction model to obtain a target crowd density map of the image to be detected, and then the method further comprises the following steps:

determining at least one prediction model which does not correspond to the crowd density category from the plurality of trained prediction models and using the prediction model as an auxiliary prediction model;

inputting the image to be detected into an auxiliary prediction model to obtain an auxiliary crowd density graph of the image to be detected;

and obtaining a final crowd density map of the image to be detected according to the target crowd density map and the auxiliary crowd density map.

Further, the training method of the prediction model corresponding to any crowd density category comprises the following steps:

acquiring a plurality of sample images with any crowd density category and a crowd density map of each sample image;

training an initial neural network model by taking the sample image as a sample and the crowd density graph of the sample image as a sample label until a loss function of the initial neural network model is converged to obtain a prediction model corresponding to any crowd density category;

the size of the void rate of the void convolution layer of the prediction model and the crowd density category matched with the prediction model are in a negative correlation relationship.

In a second aspect, there is provided an apparatus for obtaining crowd density, the apparatus comprising:

the dividing module is used for obtaining an initial crowd density map of the image to be detected, dividing the initial crowd density map according to at least two sizes and obtaining a plurality of sub-areas according to each size;

the quantization module is used for performing quantization processing on the number of people in the plurality of sub-areas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map;

the category determination module is used for determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes;

and the prediction module is used for determining a target prediction model corresponding to the crowd density category from the trained prediction models and inputting the image to be detected into the target prediction model to obtain a target crowd density graph of the image to be detected.

Further, the quantization module comprises:

the sub-region quantization sub-module is used for determining the number of people in the sub-region for the sub-region with any size and performing quantization processing on the number of people in the sub-region to obtain a quantified value of the number of people in the sub-region;

and the normalization processing sub-module is used for performing normalization processing on the frequency of the number of people quantization values in the sub-area to obtain a quantization processing result corresponding to any size of the initial population density map.

Further, the sub-region quantization sub-module comprises a quantization value obtaining unit for performing quantization processing on the number of people in the sub-region to obtain a quantization value of the number of people in the sub-region, and the quantization value obtaining unit comprises:

the maximum number of people acquiring subunit is used for determining the maximum number of people of a single subregion from the number of people in each subregion of any size;

and the people number quantization subunit is used for performing quantization processing on the number of people in each sub-area with the same size according to the maximum number of people in each sub-area to obtain a quantified value of the number of people in each sub-area.

Further, the category determination module includes:

the density characteristic acquisition submodule is used for acquiring crowd density characteristics according to the quantization processing results respectively corresponding to at least two sizes;

and the clustering submodule is used for inputting the crowd density characteristics into a pre-constructed clustering model to obtain the crowd density category of the image to be detected output by the clustering model.

Further, the device for acquiring the crowd density further comprises:

the auxiliary model acquisition module is used for determining at least one prediction model which does not correspond to the crowd density category from the plurality of trained prediction models and taking the prediction model as an auxiliary prediction model;

the auxiliary density map acquisition module is used for inputting the image to be detected into the auxiliary prediction model to obtain an auxiliary crowd density map of the image to be detected;

and the final density map acquisition module is used for acquiring a final crowd density map of the image to be detected according to the target crowd density map and the auxiliary crowd density map.

Further, the device for acquiring the crowd density further comprises a training module for training a prediction model corresponding to any crowd density category, and the training module comprises:

the sample acquisition sub-module is used for acquiring a plurality of sample images with any crowd density category and a crowd density map of each sample image;

the training submodule is used for training the initial neural network model by taking the sample image as a sample and the crowd density graph of the sample image as a sample label until the loss function of the initial neural network model is converged to obtain a prediction model corresponding to any crowd density category;

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method as provided in the first aspect.

In a fifth aspect, an embodiment of the present invention provides a computer program, where the computer program includes computer instructions stored in a computer-readable storage medium, and when a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executes the computer instructions, so that the computer device executes the steps of implementing the method provided in the first aspect.

According to the method, the device, the electronic equipment and the computer program for acquiring the crowd density, the initial crowd density graph of the image to be detected is acquired, the initial crowd density graph is divided into at least two sizes, and a plurality of sub-areas are obtained according to each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of sub-regions to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map, wherein the quantization processing results corresponding to the at least two sizes are characteristic vectors of hundreds of dimensions generally, the distribution condition of crowds in an image can be represented in a refined mode, the crowd density type of the image to be measured is further obtained by utilizing the quantization processing results corresponding to the at least two sizes, and the crowd density map of the image to be measured is predicted by selecting a prediction model matched with the crowd density type of the image to be measured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 illustrates a hardware environment diagram formed by an image capture terminal and an electronic device;

FIG. 2 is a flow chart of one implementation of a method for obtaining a crowd density map as disclosed in an embodiment of the present application;

FIG. 3 is a schematic diagram of a network structure of a crowd density map prediction model according to an embodiment of the present application;

fig. 4 is a schematic flow chart of a method for obtaining a crowd density map according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of an apparatus for obtaining a crowd density map according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The application provides a crowd density obtaining method, a crowd density obtaining device, an electronic device and a computer program, and aims to solve the technical problems in the prior art.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The crowd density map acquiring device may include a crowd density map acquiring device running in the electronic device and a crowd density map acquiring device running in the background server/platform.

The electronic device may be, but is not limited to, an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The electronic device and the background server/platform may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In one example, the crowd density map acquisition device running in the electronic device may be a client running in a smartphone. The client can be an application program client or a webpage client.

The crowd density map acquisition device running in the background server/platform can be a hardware component of the server/platform, and can also be a functional module or component.

The background server or the platform may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.

Cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

In an alternative embodiment, the crowd density map obtaining method may be applied to the hardware environment formed by the image capturing terminal 11 and the electronic device 12 shown in fig. 1. As shown in fig. 1, the image capturing terminal 11 may be connected with the electronic device 12 in a wireless manner. The wireless modes include but are not limited to: wide area network, metropolitan area network, local area network, bluetooth, or NFC (Near Field Communication). The image capturing terminal 11 may also be connected to the electronic device 12 in a wired manner.

The image acquisition terminal 11 may be used to monitor the crowd in the crowd activity scene and acquire the crowd image 111. The image capturing terminal 11 may further transmit the captured crowd image 111 to the electronic device 12.

The electronic device 12 may obtain, based on the obtained crowd image 111, a crowd density map 121 corresponding to the crowd image 111 by using the crowd density map obtaining method provided in the present application. The electronic device 12 may predict potential safety hazards, such as tread accidents, based on the crowd density map 121.

Optionally, the electronic device 12 may count the number of people in the currently acquired crowd image 111 based on the crowd density map 121, for example, if a black dot in the crowd density map 121 represents one person, the crowd density map 121 shown in fig. 1 includes 10 people in total.

Of course, the above only illustrates an alternative display manner of the crowd density map 121, and other display manners are also possible, for example, a black dot in the crowd density map 121 may represent 10 people.

Referring to fig. 2, fig. 2 is a flowchart of an implementation manner of the method for obtaining a crowd density map according to the embodiment of the present disclosure. As shown in fig. 2, the method comprises the steps of:

s101, obtaining an initial crowd density graph of an image to be detected, dividing the initial crowd density graph according to at least two sizes, and obtaining a plurality of sub-areas according to each size.

Specifically, an image to be detected is input into an initial prediction model, and an initial crowd density graph of the image to be detected output by the initial prediction model is obtained; the initial prediction model is obtained by taking a plurality of sample images with at least two crowd density categories as samples and taking a crowd density map of each sample image as a sample label for training.

The initial prediction model in the embodiment of the application can be an existing single deep convolutional neural network, a crowd density map is predicted through the existing single deep convolutional neural network and serves as the initial crowd density map, the initial crowd density map is used for predicting the number of people at the corresponding position of each pixel point of an image to be measured in an actual scene, the application does not make specific requirements on the accuracy of the initial crowd density map, and therefore the number of people predicted by the initial crowd density map can be understood as the number of people roughly estimated.

The principle of the crowd density graph generated by the initial prediction model of the embodiment of the invention is as follows:

for a certain head central point x_iIt can be expressed as: in a single picture of the same size as the original picture, only the position x_iIs 1, and the rest positions are 0, and (x-x) can also be used_i) The function is expressed, so that for a population image with N head labels, it can be expressed as h (x) function:

it will be noted that the total population is obtained by integrating the graph, using a Gaussian kernel G_σPerforming convolution operation on the graph to obtain a density graph D ═ G_σH (x), since the gaussian kernel is normalized, the total population can be obtained by integrating the density map D obtained after convolution.

The embodiment of the application can construct the second sample set, the second sample set comprises sample images of at least two crowd density categories, and thus, for the sample images in the second sample set, the initial prediction model is trained without considering the crowd density categories.

The embodiment of the application aims at the initial crowd density graph of the image to be detected, the initial crowd density graph can be divided into at least two grid graphs according to different sizes, and the sizes of sub areas in each grid graph are the same. For example, the size of the initial population density map of the image to be measured is 128 × 128, if the initial population density map is subjected to meshing division by the size of 4 × 4, a grid map having 1024 sub-regions of 4 × 4 may be obtained, if the initial population density map is subjected to meshing division by the size of 8 × 8, a grid map having 256 sub-regions of 8 × 8 may be obtained, if the initial population density map is subjected to meshing division by the size of 16 × 16, a grid map having 64 sub-regions of 16 × 16 may be obtained, and if the initial population density map is subjected to meshing division by the size of 32 × 32, a grid map having 16 sub-regions of 32 × 32 may be obtained.

After the grid map of each size is obtained, the crowd density values of the sub-areas can be integrated to obtain the number of people in each sub-area, and the density information of the grid map can be obtained by counting the number of people in all the sub-areas in one grid map.

It should be understood that, since the grid map is only divided into specific sizes on the initial crowd density map, the number of people in the grid can be obtained by integration by determining the crowd density values recorded in the initial crowd density map by the pixel points in the subareas.

S102, quantizing the number of people in the plurality of sub-areas to obtain quantized processing results corresponding to at least two sizes of the initial crowd density graph.

It should be noted that, at present, the existing method that the total number of people in the image is directly used as the crowd density feature is often adopted, the main idea is to calculate the average distance between each head point and N neighbors, and use the average value of all the heads in the image as the measure of the crowd density in the image, but it can be found through analysis that this method actually quantizes the crowd density category into a single scalar, and the crowd density category in the image cannot be accurately reflected.

In order to overcome the above problems in the prior art, in the embodiment of the present application, at least two sizes are adopted to divide the initial population density map respectively, so as to obtain sub-regions corresponding to each size, each sub-region is a grid, and the number of people in each sub-region of the same size is quantized, so as to obtain a quantized processing result of the initial population density map of the size. Because the sub-areas with different sizes contain different numbers of pixel points, the number of people in the sub-areas with different sizes also differs, and meanwhile, the number of the sub-areas contained in the grid graphs divided according to different sizes also differs, so that each grid graph can obtain unique density information, and the crowd density characteristics containing the quantized processing results with different sizes can be obtained by processing the quantized processing results of all the grid graphs with different sizes, such as weighted summation or splicing. The crowd density feature formed in this way is usually a feature vector of several hundred dimensions, and can finely characterize the crowd distribution in the image.

S103, determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes;

the crowd density category is a condition reached by the crowd density in the image to be detected and is also a result of qualitative analysis of the crowd density in the image to be detected.

According to the embodiment of the application, the crowd density characteristics of the images to be detected can be obtained according to the quantization processing results corresponding to at least two sizes respectively, the crowd density characteristics of a certain number of sample images are combined for clustering, and the crowd density category of the images to be detected is determined according to the clustering result of the images to be detected and the crowd density category of the sample images which are clustered into the same category with the images to be detected.

Specifically, the crowd density characteristics of the image to be detected are input into a crowd density classification model which is constructed in advance, and the crowd density category of the image to be detected output by the crowd density classification model is obtained;

the construction method of the crowd density classification model comprises the following steps:

collecting a plurality of sample images, and acquiring crowd density characteristics of the plurality of sample images;

and constructing a crowd density classification model by adopting a clustering algorithm based on the crowd density characteristics of the multiple sample images.

After the crowd density characteristics of the image to be detected are obtained, the crowd density category of the image to be detected can be obtained by inputting the crowd density characteristics into the pre-trained clustering model.

The clustering model may be constructed according to sample data, where the sample data includes crowd density features corresponding to different crowd density categories, such as crowd density features corresponding to sparsity, crowd density features corresponding to moderation, crowd density features corresponding to density, and so on.

The clustering algorithm used in the clustering model is not particularly limited, and may be, for example, K-Means clustering algorithm, mean shift clustering algorithm, Density-Based clustering algorithm (DBSCAN, Density-Based clustering of Applications with Noise), Expectation Maximization (EM) clustering algorithm using Gaussian Mixture Model (GMM), and agglomerative hierarchical clustering algorithm.

The following describes the process of constructing a clustering model according to the present application, taking the K-Means algorithm as an example:

1) firstly, determining the category and the number of crowd density categories, and randomly initializing the central point of each category, wherein the central point is the position with the same length as the vector of each data point (crowd density characteristic);

2) calculating the distance from each data point to a central point, and dividing the data point into which category the data point is closest to which central point;

3) recalculating the central point of each category and taking the recalculated central point as a new central point;

4) the above steps are repeated until the center of each category does not change much after each iteration. It is also possible to randomly initialize the center point multiple times and then select the one that has the best run result.

And S104, determining a target prediction model corresponding to the crowd density category from the trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density graph of the image to be detected.

Because the crowd distribution density in the actual scene has a large variation range, when the sample image of the scene with large distribution difference is trained by adopting a single deep convolution network, in order to be represented more uniformly on all scenes, the performance is not ideal in a scene with a single scale or density, so that the accuracy of the overall crowd density estimation is limited, that is, when the prior art predicts the crowd density map of a mass of images to be detected, because the image to be detected is directly input into the single depth convolution network, the defect of low crowd density estimation accuracy exists, the method firstly determines the crowd density category of the image to be detected, lays a foundation for subsequently selecting a prediction model matched with the crowd density category of the image to be detected, therefore, when the crowd density graph is predicted by using the prediction model matched with the image to be detected, the crowd density graph with higher accuracy can be output.

It should be understood that after qualitative description of the crowd density category, a large number of sample images can be collected for each crowd density category, and then a prediction model corresponding to the crowd density category can be trained for the sample image of each crowd density category, so that for any image to be measured, as long as the crowd density category of the image to be measured is determined, a prediction model matched with the image to be measured can be found from a plurality of pre-trained prediction models.

Before obtaining a prediction model matched with the crowd density category of the image to be tested, at least two prediction models corresponding to different crowd density categories need to be trained. After the crowd density type of the image to be measured is determined, a prediction model matched with the crowd density type of the image to be measured is obtained and used as a target prediction model.

The target prediction model takes the sample images in the first sample set (the crowd density types of the sample images in the first sample set are the same as the crowd density types of the images to be detected) as samples, and takes the crowd density images of the sample images in the first sample set as sample labels to train the initial neural network model until the loss function of the initial neural network model converges, so that the prediction model corresponding to the crowd density types of the images to be detected is obtained. That is, the prediction model matching the population density type of the image to be measured in step S104 means that the population density type of the training sample used in the training of the prediction model is the same as the population density type of the image to be measured. Because the training samples used by the target prediction model used in the embodiment of the application are sample images with the same crowd density type as the images to be detected, the trained target prediction model can specially predict crowd density images of the images to be detected with the same crowd density degree, and the operation can improve the specialization of the model, so that the counting precision of the model on the density type images of a certain crowd is improved.

According to the method and the device, an initial crowd density graph of the image to be detected is obtained, the initial crowd density graph is divided into at least two sizes, and a plurality of sub-areas are obtained according to each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of sub-regions to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map respectively, wherein the quantization processing results corresponding to the at least two sizes are characteristic vectors of hundreds of dimensions generally, the condition of crowd distribution in an image can be represented in a refined mode, the crowd density type of the image to be detected is further obtained by utilizing the quantization processing results corresponding to the at least two sizes respectively, and the crowd density map of the image to be detected is predicted by selecting a prediction model matched with the crowd density type of the image to be detected

By applying the embodiment of the application to a crowd density map acquisition scene of a mass of images to be detected, the problem of crowd density estimation in a large scale and with large crowd density distribution change can be solved by collecting sample images of different crowd density categories in advance and respectively training prediction models of corresponding crowd density categories.

On the basis of the above embodiments, the embodiment of the present application may also count the number of people based on the crowd density map output by the target prediction model.

According to the method and the device, the number of people contained in the image to be detected can be accurately obtained by adopting a mode of predicting the potential safety hazard by the crowd density map.

On the basis of the embodiments, the embodiment of the application can also obtain the heat map based on the finally obtained crowd density map, and the color of each pixel point in the heat map can be more and more striking along with the increase of the number of people in the actual scene corresponding to the pixel point. The density distribution of the crowd is visualized through the heat map, so that the judgment of potential safety hazard events is facilitated.

On the basis of the above embodiments, obtaining the crowd density category of the image to be measured output by the crowd density classification model, and then further comprising: and judging whether the crowd density of the image to be detected is abnormal or not according to the crowd density category of the image to be detected.

It can be known from the above embodiments that the crowd density category in the embodiment of the present application is a qualitative description of the crowd density, and may be sparse, moderate, crowded, and so on, for example, after the crowd density category of the image to be detected is determined, it can be directly determined whether the crowd density is abnormal according to the crowd density category, for example, when the crowd density category is crowded, it is determined as abnormal, and when the crowd density category is sparse or moderate, it is determined as normal.

On the basis of the foregoing embodiments, as an optional embodiment, the performing quantization processing on the number of people in the multiple sub-regions to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map respectively includes:

and S1021, determining the number of people in the sub-region for the sub-region with any size, and quantizing the number of people in the sub-region to obtain a quantized value of the number of people in the sub-region.

S1022, the frequency of the person number quantization values in the sub-area is normalized, and a quantization processing result corresponding to any size of the initial crowd density graph is obtained.

The quantization processing refers to a process of approximating more discrete data to less discrete data, in the embodiment of the present application, for any size of sub-region, the number of people in the sub-region is discrete rather than continuous, and in addition, the number of sub-regions per se is numerous, and when the number of people in each sub-region is counted, the number of people is more dispersed.

Because the quantized values of the number of people in different sub-regions may be the same, in the embodiment of the application, the quantized values are not used as indexes for measuring the crowd density in the image, but the frequency of the quantized values is counted first, and then the frequency is normalized, so that the quantized processing result corresponding to each size of the initial crowd density graph is obtained. The frequency of the quantization values, i.e. the number of times a quantization value appears in a grid map, for example, there are 16 sub-regions in the grid map, and if the quantization value of the number of people having two sub-regions in total is 3, the frequency of a quantization value is obtained: 2, counting the frequency of different quantization values in 16 sub-regions, and then performing normalization processing, so that when the obtained processing result is used as the input of the model, the accuracy of the model can be improved, and the convergence rate can be improved.

On the basis of the above embodiments, as an optional embodiment, the quantifying the number of people in the sub-region to obtain the quantified value of the number of people in the sub-region includes:

It should be understood that the maximum number of people for a single grid in the embodiments of the present application refers to the maximum number of people for a single sub-area in a grid map of the same size.

Optionally, in the embodiment of the present application, the number of people in each sub-region in the grid map is quantized according to the maximum number of people in a single sub-region, and is calculated by the following quantization formula:

c＝int(N*log(p-q)/log(C_Max))

wherein c represents a quantization value; n represents a quantization level; int represents a rounding operation; p is the number of people in the sub-region; q is a quantization coefficient and q is a positive number less than the minimum number of people in a single sub-region; c_MaxThe maximum number of people in a single sub-region. The values of N and q can be adjusted according to actual conditions, and optionally, N is 25, and q is 2.99.

On the basis of the foregoing embodiments, as an optional embodiment, determining the crowd density category of the image to be measured according to the quantization processing results corresponding to at least two sizes respectively includes:

inputting the crowd density characteristics into a pre-constructed clustering model to obtain the crowd density category of the image to be detected output by the clustering model

Specifically, the quantization processing results of the initial crowd density map divided according to at least two sizes may be sorted in a preset order, for example, in a sequence from the small size to the large size, or in a sequence from the large size to the small size, and then the previous quantization processing result and the next quantization processing result are spliced according to the sorting to obtain the crowd density feature of the image to be measured. For example, C8, C16, C32 and C64 respectively show the quantization processing results of grid patterns obtained by dividing a certain initial population density pattern by 8 × 8 size, 16 × 16 size, 32 × 32 size and 64 × 64 size, where C8 is (0.15,0.40, 0.45); c16 is (0.25,0.35, 0.40); c32 is (0.15,0.15,0.3, 0.4); c64 is (0.2,0.3,0.50), the population density obtained after splicing is characterized by (0.15,0.40,0.45,0.25,0.35,0.40,0.15,0.15,0.3,0.4,0.2,0.3, 0.50).

On the basis of the above embodiments, as an optional embodiment, after the prediction model set is constructed, when the prediction of the crowd density map needs to be performed on the image to be detected, the embodiment of the present application may also select a plurality of prediction models that do not match with the crowd density type of the image to be detected from the prediction model set according to the crowd density type of the image to be detected, predict the crowd density map of the image to be detected by using the unmatched prediction models, and finally obtain the final crowd density map by comprehensively considering the crowd density maps output by the matched prediction models and the unmatched prediction models. The benefits of this operation are: the accuracy rate of the crowd density graph can be improved under the condition that the crowd density category is inaccurate.

On the basis of the foregoing embodiments, as an optional embodiment, after the step of inputting the image to be measured into the target prediction model to obtain the target crowd density map of the image to be measured, the method further includes:

It should be understood that the prediction model not corresponding to the crowd density category of the image to be measured trains the sample to be a sample image different from the crowd density category of the image to be measured.

Specifically, the target crowd density graph and the auxiliary crowd density graph are weighted and averaged to obtain a weighted and averaged crowd density graph of the image to be detected, and the weighted and averaged crowd density graph is used as a final crowd density graph of the image to be detected.

The training method of the prediction model corresponding to any crowd density category comprises the following steps:

and training the initial neural network model by taking the sample image as a sample and the crowd density graph of the sample image as a sample label until the loss function of the initial neural network model is converged to obtain a prediction model corresponding to any crowd density category.

Referring to fig. 3, fig. 3 is a schematic diagram of a network structure of a prediction model according to an embodiment of the present disclosure, where the prediction model is a CSRNet network model, the CSRNet network model is mainly divided into a front-end network and a back-end network, a VGG-16 network excluding a full connection layer is used as the front-end network of the CSRNet, and a VGG-16 network excluding the full connection layer is used, and for a sensing domain with the same size, the smaller the convolution kernel is, the model with the larger number of convolution layers is better than the model with the larger convolution kernel and the smaller number of convolution layers. In order to balance accuracy and resource overhead, the VGG-16 network in this embodiment of the application uses a combination of 10 convolutional layers and 3 pooling layers (max-pooling), and specifically, the VGG16 network sequentially includes two first convolutional layers, a first pooling layer, two second convolutional layers, a second pooling layer, three third convolutional layers, a third pooling layer, and three fourth convolutional layers along an operation method, where the first convolutional layer is represented as 3-64-1 in fig. 3, and means that the convolutional kernel size is 3, the number of channels is 64, the void rate is 1, and similarly, the convolutional kernel size of the second convolutional layer is 3, the number of channels is 128, the void rate is 1, the convolutional kernel size of the third convolutional layer is 3, the number of channels is 256, the void rate is 1, the convolutional kernel size of the fourth convolutional layer is 3, the number of channels is 512, and the void rate is 1. The back-end network adopts six cavity convolution layers with the same cavity rate (alpha is positive number), and finally adopts a layer of 1 multiplied by 1 common convolution layer to output results, the cavity convolution layers can increase the reception field of convolution kernels on the premise of not increasing the calculated amount, the back-end network specifically comprises 3 first cavity convolution layers, 1 second cavity convolution layers, 1 third cavity convolution layer, 1 fourth cavity convolution layer and 1 full joint, the convolution kernel size of the first cavity convolution layer is 3 x 3, the number of channels is 512, the convolution kernel size of the second cavity convolution layer is 3 x 3, the number of channels is 256, the convolution kernel size of the second cavity convolution layer is 3 x 3, the number of channels is 128, the convolution kernel size of the fourth cavity convolution layer is 3 x 3, and the number of channels is 64.

As a CSRnet network model of an end-to-end network, the most direct method is adopted for model training: the 10 convolutional layers in the front-end network come from the already trained VGG-16, so only fine-tuning training is needed. The parameters for the other convolutional layers were initialized with 0.01 standard deviation of gaussian. During training, the learning rate of the random gradient descent is fixed at 1 e-6. And measuring the distance between the generated density graph and the real value by adopting the Euclidean distance. Loss function L_regThe definition is as follows:

where N represents the total number of pixels in the training image,

is the true density value, z, of the ith pixel point_iAnd optimizing the prediction model for the density prediction value of the ith pixel point, so that the crowd density graph predicted by the prediction model is as close as possible to the real crowd density graph.

In addition to the above embodiments, when the prediction model corresponding to any crowd density type is trained, the size of the void rate of the void convolution layer of the prediction model and the crowd density type corresponding to the prediction model have a negative correlation. That is, for a prediction model using a sample image with a high population density class as a training sample, a smaller void rate α is used in the void convolution layer; for a prediction model using a sample image with a low population density class as a training sample, a large void rate α is used in the void convolution layer.

Fig. 4 is a schematic flowchart of a method for obtaining a crowd density map according to another embodiment of the present application, as shown in fig. 4, the method includes:

s20, obtaining a second sample set, wherein the second sample set comprises sample images of N crowd density categories, and a sample subset is constructed according to the sample image of each crowd density category, so that N sample subsets are obtained, and N is an integer not less than 2;

s21, respectively training prediction models corresponding to each crowd density category according to N sample subsets in the second sample set to obtain N prediction models;

s22, constructing a prediction model set, wherein the prediction model set comprises N prediction models;

s23, determining the crowd density category of the image to be detected;

s24, acquiring a prediction model matched with the crowd density category of the image to be detected from the prediction model set according to the crowd density category of the image to be detected, and taking the prediction model as a target prediction model;

and S25, inputting the image to be detected into the target prediction model, and obtaining a target crowd density graph of the image to be detected output by the target prediction model.

Fig. 5 is a schematic structural diagram of an apparatus for obtaining a population density according to an embodiment of the present application, and as shown in fig. 5, the apparatus for obtaining a population density includes: the classification module 101, the quantization module 102, the category determination module 103, and the prediction module 104 specifically:

the dividing module 101 is configured to obtain an initial crowd density map of an image to be detected, divide the initial crowd density map according to at least two sizes, and obtain a plurality of sub-areas for each size;

the quantification module 102 is configured to perform quantification processing on the number of people in the multiple sub-regions to obtain quantification processing results corresponding to at least two sizes of the initial crowd density map;

the category determination module 103 is configured to determine a crowd density category of the image to be detected according to quantization processing results corresponding to the at least two sizes;

a prediction module 104, configured to determine a target prediction model corresponding to the crowd density category from the trained prediction models, and input the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected

The device for acquiring crowd density according to the embodiment of the present invention specifically executes the process of the method according to the embodiment of the present invention, and please refer to the content of the method for acquiring crowd density in detail, which is not described herein again. According to the device for acquiring the crowd density, provided by the embodiment of the invention, the initial crowd density graph of the image to be detected is acquired, the initial crowd density graph is divided into at least two sizes, and a plurality of sub-areas are obtained for each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of sub-regions to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map, wherein the quantization processing results corresponding to the at least two sizes are characteristic vectors of hundreds of dimensions generally, the distribution condition of crowds in an image can be represented in a refined mode, the crowd density type of the image to be measured is further obtained by utilizing the quantization processing results corresponding to the at least two sizes, and the crowd density map of the image to be measured is predicted by selecting a prediction model matched with the crowd density type of the image to be measured.

On the basis of the above embodiments, as an alternative embodiment, the quantization module includes:

On the basis of the foregoing embodiments, as an optional embodiment, the sub-region quantization sub-module includes a quantized value obtaining unit configured to perform quantization processing on the number of people in the sub-region to obtain a quantized value of the number of people in the sub-region, and the quantized value obtaining unit includes:

On the basis of the foregoing embodiments, as an alternative embodiment, the category determining module includes:

On the basis of the above embodiments, as an alternative embodiment, the apparatus for obtaining crowd density further includes:

On the basis of the foregoing embodiments, as an optional embodiment, the apparatus for obtaining crowd density further includes a training module for training a prediction model corresponding to any crowd density category, where the training module includes:

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: dividing the initial crowd density map by at least two sizes by obtaining the initial crowd density map of the image to be detected, and obtaining a plurality of sub-regions according to each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of sub-regions to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map, wherein the quantization processing results corresponding to the at least two sizes are characteristic vectors of hundreds of dimensions generally, the distribution condition of crowds in an image can be represented in a refined mode, the crowd density type of the image to be measured is further obtained by utilizing the quantization processing results corresponding to the at least two sizes, and the crowd density map of the image to be measured is predicted by selecting a prediction model matched with the crowd density type of the image to be measured.

In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (extended industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically erasable programmable Read Only Memory), a CD-ROM (Compact Read Only Memory) or other optical disk storage, optical disk storage (including Compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the initial crowd density graph of the image to be detected is obtained, the initial crowd density graph is divided into at least two sizes, and a plurality of sub-areas are obtained according to each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of sub-regions to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map, wherein the quantization processing results corresponding to the at least two sizes are characteristic vectors of hundreds of dimensions generally, the distribution condition of crowds in an image can be represented in a refined mode, the crowd density type of the image to be measured is further obtained by utilizing the quantization processing results corresponding to the at least two sizes, and the crowd density map of the image to be measured is predicted by selecting a prediction model matched with the crowd density type of the image to be measured.

The embodiment of the present application provides a computer program, which includes computer instructions stored in a computer-readable storage medium, and when a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executes the computer instructions, so that the computer device executes the contents as shown in the foregoing method embodiment.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for obtaining population density, comprising:

quantifying the number of people in the plurality of sub-areas to obtain quantified processing results corresponding to at least two sizes of the initial crowd density graph;

and determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density graph of the image to be detected.

2. The method for obtaining the population density according to claim 1, wherein the quantifying the population in the plurality of sub-regions to obtain the quantified results corresponding to at least two sizes of the initial population density map respectively comprises:

for a subregion with any size, determining the number of people in the subregion, and performing quantization processing on the number of people in the subregion to obtain a quantified value of the number of people in the subregion;

and carrying out normalization processing on the frequency of the number of people quantization values in the sub-area to obtain a quantization processing result corresponding to any size of the initial crowd density map.

3. The method for obtaining the crowd density according to claim 2, wherein the quantifying the number of people in the sub-region to obtain the quantified value of the number of people in the sub-region comprises:

determining the maximum number of persons in a single subregion from the number of persons in each subregion of any size;

and carrying out quantization processing on the number of people in each sub-area with the same size according to the maximum number of people in the single sub-area to obtain a quantified value of the number of people in each sub-area.

4. The method for obtaining the crowd density according to claim 1, wherein the determining the crowd density category of the image to be measured according to the quantization processing results corresponding to the at least two sizes respectively comprises:

5. The method for obtaining the crowd density according to claim 1, wherein the inputting the image to be measured into the target prediction model to obtain a target crowd density map of the image to be measured further comprises:

determining at least one prediction model which does not correspond to the crowd density category from a plurality of trained prediction models and using the prediction model as an auxiliary prediction model;

inputting the image to be detected into the auxiliary prediction model to obtain an auxiliary crowd density graph of the image to be detected;

6. The method for acquiring the crowd density according to claim 1 or 5, wherein the method for training the prediction model corresponding to any crowd density category comprises:

obtaining a plurality of sample images with any one crowd density category and a crowd density map of each sample image;

training an initial neural network model by taking the sample image as a sample and taking a crowd density graph of the sample image as a sample label until a loss function of the initial neural network model is converged to obtain a prediction model corresponding to any crowd density category;

7. An apparatus for obtaining population density, comprising:

the quantification module is used for quantifying the number of people in the plurality of sub-areas to obtain quantification processing results corresponding to at least two sizes of the initial crowd density graph;

and the prediction module is used for determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models and inputting the image to be detected into the target prediction model to obtain a target crowd density graph of the image to be detected.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for obtaining population density according to any one of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for obtaining a population density according to any one of claims 1 to 6.

10. A computer program, characterized in that the computer program comprises computer instructions stored in a computer readable storage medium, which when read by a processor of a computer device from the computer readable storage medium, cause the processor to execute the computer instructions, causing the computer device to perform the steps of the crowd density obtaining method according to any one of claims 1-6.