CN113052827B - Crowd counting method and system based on multi-branch expansion convolutional neural network - Google Patents
Crowd counting method and system based on multi-branch expansion convolutional neural network Download PDFInfo
- Publication number
- CN113052827B CN113052827B CN202110354656.3A CN202110354656A CN113052827B CN 113052827 B CN113052827 B CN 113052827B CN 202110354656 A CN202110354656 A CN 202110354656A CN 113052827 B CN113052827 B CN 113052827B
- Authority
- CN
- China
- Prior art keywords
- image
- crowd
- head position
- module
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 57
- 230000006870 function Effects 0.000 claims description 23
- 230000004927 fusion Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 14
- 238000003860 storage Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 238000005520 cutting process Methods 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 claims 1
- 238000010276 construction Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 10
- 238000012544 monitoring process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30242—Counting objects in image
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of computer vision, and provides a crowd counting method and system based on a multi-branch expanded convolutional neural network. The method comprises the following steps: acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image; constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample; training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model; inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph; and summing the pixel values in the crowd density image to obtain a crowd counting result.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a crowd counting method and system based on a multi-branch expansion convolutional neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The Crowd Counting (Crowd Counting) aims at estimating the Crowd distribution appearing in an image or video in real time and Counting the number of people facing the image data or video data. In recent years, research on a crowd counting method becomes a research hotspot in the field of computer vision, the application field of the method is mainly intelligent security, crowd distribution and the number of people are provided in real time, people flow can be effectively analyzed and controlled, and safety accidents are prevented.
Due to the influence of the shooting angle and the shooting distance, the size difference of the target crowd in the image or the video is large, and great challenge is brought to the research of the crowd counting method.
Disclosure of Invention
Aiming at the problem of scale difference of target crowds, the invention provides a crowd counting method and a system based on a multi-branch expanded convolutional neural network, which are used for designing the multi-branch expanded convolutional network sharing training parameters so as to extract the characteristics with different receptive fields by using fewer network parameters; and the supervised head position binary image is used for guiding the network to pay attention to the head position, so that more accurate crowd counting is realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a crowd counting method based on a multi-branch expansion convolutional neural network.
The crowd counting method based on the multi-branch expansion convolutional neural network comprises the following steps:
acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
training a multi-branch expansion convolution crowd counting network model based on a training set to obtain network optimal parameters so as to generate a trained multi-branch expansion convolution crowd counting network model;
inputting the image to be detected into a trained multi-branch expansion convolution population counting network model, and outputting a population density map;
and summing the pixel values in the crowd density image to obtain a crowd counting result.
A second aspect of the invention provides a population counting system based on a multi-branch expanded convolutional neural network.
Crowd counting system based on many branches expand convolutional neural network includes:
a tag generation module configured to: acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
a training set building module configured to: constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample, and a plurality of samples are combined to form a training set;
a model training module configured to: training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model;
a crowd counting application module configured to: inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
an output module configured to: and summing the pixel values in the crowd density image to obtain a crowd counting result.
A third aspect of the invention provides a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for population counting based on a multi-branch expanded convolutional neural network as described in the first aspect above.
A fourth aspect of the invention provides a computer apparatus.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for population counting based on multi-branch expanded convolutional neural network as described in the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention designs 3 branches for feature extraction by utilizing the expanding convolution operation, the extracted features have different receptive fields, and the problem of head size difference caused by the shooting angle and the shooting distance can be effectively solved.
2. The invention shares the network parameters in a plurality of expansion convolution branches, can effectively reduce the variable parameter quantity and improve the training speed of the network.
3. The head position binary image estimation module designed by the invention can supervise and guide the network to extract more stable characteristics on one hand, and can assist the crowd density image estimation module to more accurately position the head position on the other hand, thereby enhancing the accuracy of crowd counting.
4. The original image is used as an input of a multi-branch expansion convolution crowd counting network model, and the output of the multi-branch expansion convolution crowd counting network model comprises a human head position binary image generated by a binary image estimation module and a crowd density image generated by a crowd density image estimation module. The binary image estimation module outputs a binary image which can represent the position and size of the human head after supervised training, and further calculates a Hadamard product with the fusion characteristics to be used as the input of the crowd density image estimation module, so that the crowd density image estimation module can more accurately position the position of the human head to estimate the density, and the problem of counting errors caused by target crowd size difference is solved.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a population counting method based on a multi-branch expanded convolutional neural network according to the present invention;
FIG. 2 is a flow chart of a population counting method in an embodiment;
fig. 3 is a block diagram of a multi-branch expanded convolutional neural network in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
As shown in fig. 1-2, the present embodiment provides a crowd counting method based on a multi-branch expanded convolutional neural network, and the present embodiment is illustrated by applying the method to a server, it is understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented by interaction between the terminal and the server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network server, cloud communication, middleware service, a domain name service, a security service CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In this embodiment, the method includes the steps of:
step (1): acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
specifically, scene images containing crowds are obtained, and the position of the head of each image is marked;
generating a crowd density map label according to the marked head position;
and generating a human head position binary image label by using a binarization function according to the crowd density image label.
In an example, the server may obtain scene images of people, label the scene images of people, and label the position of the head in each image. The scene image of the crowd can be an image shot by a camera used for scene monitoring, and the scene monitoring specifically can be subway monitoring, market monitoring and the like.
Generating a population density icon label in the step (1) by adopting a mode of a Gaussian kernel with a fixed size, and covering a Gaussian kernel with the sum of 1 in a fixed size at each head position; the head position binary image firstly utilizes a nearest neighbor algorithm to calculate the size of the head, then the pixel of the head position is set to be 1, other positions are set to be 0, and the head position binary image label contains head size information.
Specifically, a crowd density map label is generated according to the marked head position, and the generation mode is shown in formula (1):
in the formula, l represents the position set of all human heads in the image, and l i Coordinates representing the center of the ith target person's head position. δ () is the pulse function, G () is the Gaussian kernel, σ i The variance of the gaussian sum is represented and is set to 8 in this embodiment. Namely: the head position is covered with a gaussian kernel with a sum of 1 and variance of 8, and the non-head position is set to 0.
Specifically, a human head position binary image label is generated according to the marked human head position, and the generation mode is shown in formula (2):
in the formula (2), B () is a binarization function, and other variables and functions represented by the same symbols as those in the formula (1) are identical thereto. Namely, the specific generation mode of the human head position binary image label is as follows: firstly, according to the generation mode of the crowd density graph label, the variance of a Gaussian kernel is set to be 15, and then a binarization function is used for converting the generated result into a binary graph, namely non-0 pixels are reset to be 1, and the rest are 0.
Step (2): constructing a training set according to the training samples; and each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample.
Constructing a training set according to the training samples, comprising:
and carrying out data expansion on each training sample in a random cutting, mirror image and rotation mode to construct a training set.
Specifically, random clipping, mirroring and rotation operations are respectively adopted for data expansion. Specifically, 50 image blocks with the length and width being multiples of 32 and smaller than the size of the original image are randomly cropped, then horizontal mirroring and vertical mirroring are respectively performed on the 50 image blocks to obtain 150 image blocks, and finally the 150 image blocks are respectively rotated by 15 degrees to obtain 300 image blocks. It should be noted that the same operation is performed for the population density map label and the head position binary map label.
And (3): training a multi-branch expansion convolution crowd counting network model based on a training set to obtain network optimal parameters so as to generate a trained multi-branch expansion convolution crowd counting network model;
wherein, the multi-branch expansion convolution crowd counting network model includes: the device comprises a multi-branch expansion convolution module, a feature fusion module, a binary image estimation module and a density image estimation module.
Considering that the expansion convolution can obtain the characteristics with a larger receptive field by using fewer parameters, the embodiment designs a multi-branch convolution module, adopts the expansion convolution branches sharing 3 parameters to extract multi-scale characteristics, and designs a head position binary image estimation module to enhance the characteristics of the head position, so that the crowd density image estimation module is assisted to more accurately position the head position, and the accuracy of crowd counting is improved.
In one embodiment, a multi-branch convolution module, comprising: the three expansion convolution branches share network parameters and have different expansion rates and are used for carrying out multi-scale feature extraction on the crowd images. And the feature fusion module is used for performing feature fusion on the features of the three expansion convolution branches, and then performing feature extraction on the fused features to generate a feature map. And the binary image estimation module is used for realizing the estimation of the binary image by adopting a cross entropy loss function in a supervised way. And the density map estimation module receives the output of the binary map estimation module, calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and realizes the estimation of the crowd density map by using a cross entropy loss function in a supervised manner by using three-layer convolution operation.
Specifically, fig. 3 is a diagram of a crowd counting network based on a multi-branch extended convolutional network. As shown in fig. 3: firstly, a multi-branch expansion convolution module carries out multi-scale feature extraction on a crowd image block, wherein the module comprises three branches, each branch adopts a convolution kernel of 3x3, parameters are shared, and the three branches are respectively provided with a convolution expansion rate of 1,2 and 3. Under the arrangement, the network can extract the characteristics with different receptive fields by fewer parameters, and can effectively deal with the size difference of human heads.
Then, a feature fusion module fuses features extracted from different branches, specifically, after the three features are added, dimension reduction is carried out by convolution of 1x1, and further feature extraction is carried out by convolution of 3x 3;
after the characteristics are fused, the characteristics are divided into two paths which are respectively input into a head position binary image generation module and a crowd density estimation module. The human head position binary image generation module further extracts features based on the fused features, the human head position binary image label is used for supervising and predicting the position of the human head, and the supervised training can enable the network to extract more stable features; in addition, the head position binary image generated by the head position binary image generation module after supervised training and the feature Hadamard product obtained by the feature fusion module are used as the input of the crowd density image estimation module, so that the auxiliary crowd density image estimation module can more accurately position the head position and estimate the crowd density.
And (4): and inputting the image to be detected into the trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph.
And (5): and summing the pixel values in the crowd density image to obtain a crowd counting result.
And inputting the image to be tested into a trained multi-branch expansion convolution population counting network model aiming at the test image, estimating population density map estimation aiming at newly received image data, and finally summing pixel values of the output population density image to obtain the number of people predicted in the image.
Example two
The embodiment provides a crowd counting system based on a multi-branch expanded convolutional neural network.
Crowd counting system based on many branches expand convolutional neural network includes:
a tag generation module configured to: acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
a training set building module configured to: constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
a model training module configured to: training a multi-branch expansion convolution crowd counting network model based on a training set to obtain network optimal parameters so as to generate a trained multi-branch expansion convolution crowd counting network model;
a crowd counting application module configured to: inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
an output module configured to: and summing the pixel values in the crowd density image to obtain a crowd counting result.
Wherein, the multi-branch expansion convolution crowd counting network model includes: the system comprises a multi-branch expansion convolution module, a feature fusion module, a binary image estimation module and a crowd density image estimation module, wherein the multi-branch expansion convolution module consists of 3 expansion convolution branches with different expansion rates, the feature fusion module consists of a feature summation layer and a convolution layer, and the binary image estimation module and the crowd density image estimation module are formed by 3 layers of convolution; it should be noted that the original image will be used as an input of the multi-branch expanded convolution population counting network model, and the output of the multi-branch expanded convolution population counting network model includes the head position binary image generated by the binary image estimation module and the population density image generated by the population density image estimation module. The binary image estimation module outputs a binary image capable of representing the position and the size of the human head after supervised training, and further calculates a Hadamard product with the fusion characteristics to be used as the input of the crowd density image estimation module so as to assist the crowd density image estimation module to more accurately position the position of the human head for density estimation.
Illustratively, the multi-branch dilation convolution module consists of 3 convolution branches with dilation rates of 1,2,3, respectively, and convolution kernel sizes of 3 × 3, each branch comprising 4 layers of convolutions, wherein the first two layers of convolutions are followed by maximum pooling. The feature fusion module firstly sums the features of the three expanded convolution branches, then uses the convolution of 1x1 to reduce the dimension and further uses the convolution of 3x3 to further extract the features. The binary image estimation module includes 3-layer convolution and supervised estimation of the binary image with cross entropy loss. The crowd density map estimation module firstly receives the output of the binary map estimation module, and calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and then realizes the estimation of the crowd density map with cross entropy loss by using 3-layer rolling operation.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method for population counting based on a multi-branch expanded convolutional neural network as described in the first embodiment above.
Example four
The present embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the steps in the method for counting people based on multi-branch expanded convolutional neural network as described in the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. The crowd counting method based on the multi-branch expansion convolution neural network is characterized by comprising the following steps:
acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model;
inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
summing the pixel values in the crowd density image to obtain a crowd counting result;
the multi-branch expansion convolution crowd counting network model comprises: the device comprises a multi-branch convolution module, a feature fusion module, a binary image estimation module and a density image estimation module;
the multi-branch convolution module includes: three expansion convolution branches sharing network parameters and having different expansion rates are used for carrying out multi-scale feature extraction on the crowd images;
the binary image estimation module is used for realizing estimation of the binary image by adopting a cross entropy loss function in a supervised manner;
the density map estimation module receives the output of the binary map estimation module, calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and then realizes the estimation of the crowd density map by using a cross entropy loss function in a supervised manner by using three-layer convolution operation;
the feature fusion module is used for performing feature fusion on the features of the three expansion convolution branches, and then performing feature extraction on the fused features to generate a feature map; after the characteristics are fused, the characteristics are divided into two paths which are respectively input into a head position binary image generation module and a crowd density estimation module; the head position binary image generation module is used for further extracting features based on the fused features, the head position binary image label is used for supervising and predicting the position of the head, and the head position binary image generated after supervision training by the head position binary image generation module is used for solving a Hadamard product with the features obtained by the feature fusion module to serve as the input of the crowd density image estimation module;
the generated head position binary image label is as follows:
wherein B () is a binarization function, l represents a position set of all human heads in the image, and l is a binary function i Coordinates representing the center of the ith target head position, δ () being a pulse function, G () being a gaussian kernel, σ i The variance of the gaussian sum is indicated.
2. The method according to claim 1, wherein the obtaining a scene image containing a crowd and generating a crowd density map label and a head position binary map label respectively according to the scene image, comprises:
acquiring scene images containing crowds, and marking the positions of the heads in each image;
generating a crowd density map label according to the marked head position;
and generating a human head position binary image label by using a binarization function according to the crowd density image label.
3. The method according to claim 1, wherein the constructing a training set according to the training samples comprises:
and carrying out data expansion on each training sample in a random cutting, mirror image and rotation mode.
4. Crowd counting system based on many branches expand convolutional neural network, its characterized in that includes:
a tag generation module configured to: acquiring a scene image containing a crowd and respectively generating a crowd density map label and a head position binary map label according to the scene image;
a training set construction module configured to: constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
a model training module configured to: training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model;
a crowd counting application module configured to: inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
an output module configured to: summing the pixel values in the crowd density image to obtain a crowd counting result;
the multi-branch expansion convolution crowd counting network model comprises: the device comprises a multi-branch convolution module, a feature fusion module, a binary image estimation module and a density image estimation module;
the multi-branch convolution module includes: three expansion convolution branches sharing network parameters and having different expansion rates are used for carrying out multi-scale feature extraction on the crowd images;
the binary image estimation module is used for realizing estimation of the binary image by adopting a cross entropy loss function in a supervised manner;
the density map estimation module receives the output of the binary map estimation module, calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and then realizes the estimation of the crowd density map by using a cross entropy loss function in a supervised manner by using three-layer convolution operation;
the feature fusion module is used for carrying out feature fusion on the features of the three expansion convolution branches, and then carrying out feature extraction on the fused features to generate a feature map; after the characteristics are fused, the characteristics are divided into two paths which are respectively input into a head position binary image generation module and a crowd density estimation module; the human head position binary image generation module is used for further extracting features based on the fused features, a human head position binary image label is used for supervising and predicting the position of the human head, and the human head position binary image generated after the human head position binary image generation module is subjected to supervision training and the Hadamard product obtained by the human head position binary image and the features obtained by the feature fusion module is used as the input of the crowd density image estimation module;
the generated head position binary image label is as follows:
wherein B () is a binarization function, l represents a position set of all human heads in the image, and l is a binary function i Coordinates representing the center of the ith target head position, δ () being a pulse function, G () being a gaussian kernel, σ i The variance of the gaussian sum is indicated.
5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for population counting based on a multi-branch expanded convolutional neural network as claimed in any one of claims 1 to 3.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in the method for population counting based on a multi-branch expanded convolutional neural network as claimed in any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110354656.3A CN113052827B (en) | 2021-03-30 | 2021-03-30 | Crowd counting method and system based on multi-branch expansion convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110354656.3A CN113052827B (en) | 2021-03-30 | 2021-03-30 | Crowd counting method and system based on multi-branch expansion convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113052827A CN113052827A (en) | 2021-06-29 |
CN113052827B true CN113052827B (en) | 2022-12-27 |
Family
ID=76517105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110354656.3A Active CN113052827B (en) | 2021-03-30 | 2021-03-30 | Crowd counting method and system based on multi-branch expansion convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052827B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135325A (en) * | 2019-05-10 | 2019-08-16 | 山东大学 | Crowd's number method of counting and system based on dimension self-adaption network |
CN110674704A (en) * | 2019-09-05 | 2020-01-10 | 同济大学 | Crowd density estimation method and device based on multi-scale expansion convolutional network |
CN111626184A (en) * | 2020-05-25 | 2020-09-04 | 齐鲁工业大学 | Crowd density estimation method and system |
CN111915627A (en) * | 2020-08-20 | 2020-11-10 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Semantic segmentation method, network, device and computer storage medium |
CN112101195A (en) * | 2020-09-14 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Crowd density estimation method and device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815867A (en) * | 2019-01-14 | 2019-05-28 | 东华大学 | A kind of crowd density estimation and people flow rate statistical method |
CN112232140A (en) * | 2020-09-25 | 2021-01-15 | 浙江远传信息技术股份有限公司 | Crowd counting method and device, electronic equipment and computer storage medium |
-
2021
- 2021-03-30 CN CN202110354656.3A patent/CN113052827B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135325A (en) * | 2019-05-10 | 2019-08-16 | 山东大学 | Crowd's number method of counting and system based on dimension self-adaption network |
CN110674704A (en) * | 2019-09-05 | 2020-01-10 | 同济大学 | Crowd density estimation method and device based on multi-scale expansion convolutional network |
CN111626184A (en) * | 2020-05-25 | 2020-09-04 | 齐鲁工业大学 | Crowd density estimation method and system |
CN111915627A (en) * | 2020-08-20 | 2020-11-10 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Semantic segmentation method, network, device and computer storage medium |
CN112101195A (en) * | 2020-09-14 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Crowd density estimation method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113052827A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446889B (en) | Object tracking method and device based on twin matching network | |
US20230081645A1 (en) | Detecting forged facial images using frequency domain information and local correlation | |
Raza et al. | Appearance based pedestrians’ head pose and body orientation estimation using deep learning | |
CN112597941B (en) | Face recognition method and device and electronic equipment | |
US20170161591A1 (en) | System and method for deep-learning based object tracking | |
CN109840477B (en) | Method and device for recognizing shielded face based on feature transformation | |
CN112348828B (en) | Instance segmentation method and device based on neural network and storage medium | |
CN111626184B (en) | Crowd density estimation method and system | |
CN111539290B (en) | Video motion recognition method and device, electronic equipment and storage medium | |
CN106709461A (en) | Video based behavior recognition method and device | |
CN114331829A (en) | Countermeasure sample generation method, device, equipment and readable storage medium | |
JP7292492B2 (en) | Object tracking method and device, storage medium and computer program | |
CN111652181B (en) | Target tracking method and device and electronic equipment | |
Sengar et al. | Motion detection using block based bi-directional optical flow method | |
CN112418195B (en) | Face key point detection method and device, electronic equipment and storage medium | |
Ma et al. | Fusioncount: Efficient crowd counting via multiscale feature fusion | |
JP2023131117A (en) | Joint perception model training, joint perception method, device, and medium | |
CN115018039A (en) | Neural network distillation method, target detection method and device | |
CN114581918A (en) | Text recognition model training method and device | |
CN114170290A (en) | Image processing method and related equipment | |
CN115311518A (en) | Method, device, medium and electronic equipment for acquiring visual attribute information | |
CN111353429A (en) | Interest degree method and system based on eyeball turning | |
CN114677611B (en) | Data identification method, storage medium and device | |
CN111914809B (en) | Target object positioning method, image processing method, device and computer equipment | |
CN113052827B (en) | Crowd counting method and system based on multi-branch expansion convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Patentee after: Qilu University of Technology (Shandong Academy of Sciences) Country or region after: China Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Patentee before: Qilu University of Technology Country or region before: China |