CN113052827B - Crowd counting method and system based on multi-branch expansion convolutional neural network - Google Patents

Crowd counting method and system based on multi-branch expansion convolutional neural network Download PDF

Info

Publication number
CN113052827B
CN113052827B CN202110354656.3A CN202110354656A CN113052827B CN 113052827 B CN113052827 B CN 113052827B CN 202110354656 A CN202110354656 A CN 202110354656A CN 113052827 B CN113052827 B CN 113052827B
Authority
CN
China
Prior art keywords
image
crowd
head position
module
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110354656.3A
Other languages
Chinese (zh)
Other versions
CN113052827A (en
Inventor
张友梅
张瑜
刘伟龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202110354656.3A priority Critical patent/CN113052827B/en
Publication of CN113052827A publication Critical patent/CN113052827A/en
Application granted granted Critical
Publication of CN113052827B publication Critical patent/CN113052827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision, and provides a crowd counting method and system based on a multi-branch expanded convolutional neural network. The method comprises the following steps: acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image; constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample; training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model; inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph; and summing the pixel values in the crowd density image to obtain a crowd counting result.

Description

Crowd counting method and system based on multi-branch expansion convolutional neural network
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a crowd counting method and system based on a multi-branch expansion convolutional neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The Crowd Counting (Crowd Counting) aims at estimating the Crowd distribution appearing in an image or video in real time and Counting the number of people facing the image data or video data. In recent years, research on a crowd counting method becomes a research hotspot in the field of computer vision, the application field of the method is mainly intelligent security, crowd distribution and the number of people are provided in real time, people flow can be effectively analyzed and controlled, and safety accidents are prevented.
Due to the influence of the shooting angle and the shooting distance, the size difference of the target crowd in the image or the video is large, and great challenge is brought to the research of the crowd counting method.
Disclosure of Invention
Aiming at the problem of scale difference of target crowds, the invention provides a crowd counting method and a system based on a multi-branch expanded convolutional neural network, which are used for designing the multi-branch expanded convolutional network sharing training parameters so as to extract the characteristics with different receptive fields by using fewer network parameters; and the supervised head position binary image is used for guiding the network to pay attention to the head position, so that more accurate crowd counting is realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a crowd counting method based on a multi-branch expansion convolutional neural network.
The crowd counting method based on the multi-branch expansion convolutional neural network comprises the following steps:
acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
training a multi-branch expansion convolution crowd counting network model based on a training set to obtain network optimal parameters so as to generate a trained multi-branch expansion convolution crowd counting network model;
inputting the image to be detected into a trained multi-branch expansion convolution population counting network model, and outputting a population density map;
and summing the pixel values in the crowd density image to obtain a crowd counting result.
A second aspect of the invention provides a population counting system based on a multi-branch expanded convolutional neural network.
Crowd counting system based on many branches expand convolutional neural network includes:
a tag generation module configured to: acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
a training set building module configured to: constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample, and a plurality of samples are combined to form a training set;
a model training module configured to: training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model;
a crowd counting application module configured to: inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
an output module configured to: and summing the pixel values in the crowd density image to obtain a crowd counting result.
A third aspect of the invention provides a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for population counting based on a multi-branch expanded convolutional neural network as described in the first aspect above.
A fourth aspect of the invention provides a computer apparatus.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for population counting based on multi-branch expanded convolutional neural network as described in the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention designs 3 branches for feature extraction by utilizing the expanding convolution operation, the extracted features have different receptive fields, and the problem of head size difference caused by the shooting angle and the shooting distance can be effectively solved.
2. The invention shares the network parameters in a plurality of expansion convolution branches, can effectively reduce the variable parameter quantity and improve the training speed of the network.
3. The head position binary image estimation module designed by the invention can supervise and guide the network to extract more stable characteristics on one hand, and can assist the crowd density image estimation module to more accurately position the head position on the other hand, thereby enhancing the accuracy of crowd counting.
4. The original image is used as an input of a multi-branch expansion convolution crowd counting network model, and the output of the multi-branch expansion convolution crowd counting network model comprises a human head position binary image generated by a binary image estimation module and a crowd density image generated by a crowd density image estimation module. The binary image estimation module outputs a binary image which can represent the position and size of the human head after supervised training, and further calculates a Hadamard product with the fusion characteristics to be used as the input of the crowd density image estimation module, so that the crowd density image estimation module can more accurately position the position of the human head to estimate the density, and the problem of counting errors caused by target crowd size difference is solved.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a population counting method based on a multi-branch expanded convolutional neural network according to the present invention;
FIG. 2 is a flow chart of a population counting method in an embodiment;
fig. 3 is a block diagram of a multi-branch expanded convolutional neural network in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
As shown in fig. 1-2, the present embodiment provides a crowd counting method based on a multi-branch expanded convolutional neural network, and the present embodiment is illustrated by applying the method to a server, it is understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented by interaction between the terminal and the server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network server, cloud communication, middleware service, a domain name service, a security service CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In this embodiment, the method includes the steps of:
step (1): acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
specifically, scene images containing crowds are obtained, and the position of the head of each image is marked;
generating a crowd density map label according to the marked head position;
and generating a human head position binary image label by using a binarization function according to the crowd density image label.
In an example, the server may obtain scene images of people, label the scene images of people, and label the position of the head in each image. The scene image of the crowd can be an image shot by a camera used for scene monitoring, and the scene monitoring specifically can be subway monitoring, market monitoring and the like.
Generating a population density icon label in the step (1) by adopting a mode of a Gaussian kernel with a fixed size, and covering a Gaussian kernel with the sum of 1 in a fixed size at each head position; the head position binary image firstly utilizes a nearest neighbor algorithm to calculate the size of the head, then the pixel of the head position is set to be 1, other positions are set to be 0, and the head position binary image label contains head size information.
Specifically, a crowd density map label is generated according to the marked head position, and the generation mode is shown in formula (1):
Figure BDA0002999027560000061
in the formula, l represents the position set of all human heads in the image, and l i Coordinates representing the center of the ith target person's head position. δ () is the pulse function, G () is the Gaussian kernel, σ i The variance of the gaussian sum is represented and is set to 8 in this embodiment. Namely: the head position is covered with a gaussian kernel with a sum of 1 and variance of 8, and the non-head position is set to 0.
Specifically, a human head position binary image label is generated according to the marked human head position, and the generation mode is shown in formula (2):
Figure BDA0002999027560000062
in the formula (2), B () is a binarization function, and other variables and functions represented by the same symbols as those in the formula (1) are identical thereto. Namely, the specific generation mode of the human head position binary image label is as follows: firstly, according to the generation mode of the crowd density graph label, the variance of a Gaussian kernel is set to be 15, and then a binarization function is used for converting the generated result into a binary graph, namely non-0 pixels are reset to be 1, and the rest are 0.
Step (2): constructing a training set according to the training samples; and each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample.
Constructing a training set according to the training samples, comprising:
and carrying out data expansion on each training sample in a random cutting, mirror image and rotation mode to construct a training set.
Specifically, random clipping, mirroring and rotation operations are respectively adopted for data expansion. Specifically, 50 image blocks with the length and width being multiples of 32 and smaller than the size of the original image are randomly cropped, then horizontal mirroring and vertical mirroring are respectively performed on the 50 image blocks to obtain 150 image blocks, and finally the 150 image blocks are respectively rotated by 15 degrees to obtain 300 image blocks. It should be noted that the same operation is performed for the population density map label and the head position binary map label.
And (3): training a multi-branch expansion convolution crowd counting network model based on a training set to obtain network optimal parameters so as to generate a trained multi-branch expansion convolution crowd counting network model;
wherein, the multi-branch expansion convolution crowd counting network model includes: the device comprises a multi-branch expansion convolution module, a feature fusion module, a binary image estimation module and a density image estimation module.
Considering that the expansion convolution can obtain the characteristics with a larger receptive field by using fewer parameters, the embodiment designs a multi-branch convolution module, adopts the expansion convolution branches sharing 3 parameters to extract multi-scale characteristics, and designs a head position binary image estimation module to enhance the characteristics of the head position, so that the crowd density image estimation module is assisted to more accurately position the head position, and the accuracy of crowd counting is improved.
In one embodiment, a multi-branch convolution module, comprising: the three expansion convolution branches share network parameters and have different expansion rates and are used for carrying out multi-scale feature extraction on the crowd images. And the feature fusion module is used for performing feature fusion on the features of the three expansion convolution branches, and then performing feature extraction on the fused features to generate a feature map. And the binary image estimation module is used for realizing the estimation of the binary image by adopting a cross entropy loss function in a supervised way. And the density map estimation module receives the output of the binary map estimation module, calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and realizes the estimation of the crowd density map by using a cross entropy loss function in a supervised manner by using three-layer convolution operation.
Specifically, fig. 3 is a diagram of a crowd counting network based on a multi-branch extended convolutional network. As shown in fig. 3: firstly, a multi-branch expansion convolution module carries out multi-scale feature extraction on a crowd image block, wherein the module comprises three branches, each branch adopts a convolution kernel of 3x3, parameters are shared, and the three branches are respectively provided with a convolution expansion rate of 1,2 and 3. Under the arrangement, the network can extract the characteristics with different receptive fields by fewer parameters, and can effectively deal with the size difference of human heads.
Then, a feature fusion module fuses features extracted from different branches, specifically, after the three features are added, dimension reduction is carried out by convolution of 1x1, and further feature extraction is carried out by convolution of 3x 3;
after the characteristics are fused, the characteristics are divided into two paths which are respectively input into a head position binary image generation module and a crowd density estimation module. The human head position binary image generation module further extracts features based on the fused features, the human head position binary image label is used for supervising and predicting the position of the human head, and the supervised training can enable the network to extract more stable features; in addition, the head position binary image generated by the head position binary image generation module after supervised training and the feature Hadamard product obtained by the feature fusion module are used as the input of the crowd density image estimation module, so that the auxiliary crowd density image estimation module can more accurately position the head position and estimate the crowd density.
And (4): and inputting the image to be detected into the trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph.
And (5): and summing the pixel values in the crowd density image to obtain a crowd counting result.
And inputting the image to be tested into a trained multi-branch expansion convolution population counting network model aiming at the test image, estimating population density map estimation aiming at newly received image data, and finally summing pixel values of the output population density image to obtain the number of people predicted in the image.
Example two
The embodiment provides a crowd counting system based on a multi-branch expanded convolutional neural network.
Crowd counting system based on many branches expand convolutional neural network includes:
a tag generation module configured to: acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
a training set building module configured to: constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
a model training module configured to: training a multi-branch expansion convolution crowd counting network model based on a training set to obtain network optimal parameters so as to generate a trained multi-branch expansion convolution crowd counting network model;
a crowd counting application module configured to: inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
an output module configured to: and summing the pixel values in the crowd density image to obtain a crowd counting result.
Wherein, the multi-branch expansion convolution crowd counting network model includes: the system comprises a multi-branch expansion convolution module, a feature fusion module, a binary image estimation module and a crowd density image estimation module, wherein the multi-branch expansion convolution module consists of 3 expansion convolution branches with different expansion rates, the feature fusion module consists of a feature summation layer and a convolution layer, and the binary image estimation module and the crowd density image estimation module are formed by 3 layers of convolution; it should be noted that the original image will be used as an input of the multi-branch expanded convolution population counting network model, and the output of the multi-branch expanded convolution population counting network model includes the head position binary image generated by the binary image estimation module and the population density image generated by the population density image estimation module. The binary image estimation module outputs a binary image capable of representing the position and the size of the human head after supervised training, and further calculates a Hadamard product with the fusion characteristics to be used as the input of the crowd density image estimation module so as to assist the crowd density image estimation module to more accurately position the position of the human head for density estimation.
Illustratively, the multi-branch dilation convolution module consists of 3 convolution branches with dilation rates of 1,2,3, respectively, and convolution kernel sizes of 3 × 3, each branch comprising 4 layers of convolutions, wherein the first two layers of convolutions are followed by maximum pooling. The feature fusion module firstly sums the features of the three expanded convolution branches, then uses the convolution of 1x1 to reduce the dimension and further uses the convolution of 3x3 to further extract the features. The binary image estimation module includes 3-layer convolution and supervised estimation of the binary image with cross entropy loss. The crowd density map estimation module firstly receives the output of the binary map estimation module, and calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and then realizes the estimation of the crowd density map with cross entropy loss by using 3-layer rolling operation.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method for population counting based on a multi-branch expanded convolutional neural network as described in the first embodiment above.
Example four
The present embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the steps in the method for counting people based on multi-branch expanded convolutional neural network as described in the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The crowd counting method based on the multi-branch expansion convolution neural network is characterized by comprising the following steps:
acquiring a scene image containing crowds and respectively generating a crowd density map label and a head position binary map label according to the scene image;
constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model;
inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
summing the pixel values in the crowd density image to obtain a crowd counting result;
the multi-branch expansion convolution crowd counting network model comprises: the device comprises a multi-branch convolution module, a feature fusion module, a binary image estimation module and a density image estimation module;
the multi-branch convolution module includes: three expansion convolution branches sharing network parameters and having different expansion rates are used for carrying out multi-scale feature extraction on the crowd images;
the binary image estimation module is used for realizing estimation of the binary image by adopting a cross entropy loss function in a supervised manner;
the density map estimation module receives the output of the binary map estimation module, calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and then realizes the estimation of the crowd density map by using a cross entropy loss function in a supervised manner by using three-layer convolution operation;
the feature fusion module is used for performing feature fusion on the features of the three expansion convolution branches, and then performing feature extraction on the fused features to generate a feature map; after the characteristics are fused, the characteristics are divided into two paths which are respectively input into a head position binary image generation module and a crowd density estimation module; the head position binary image generation module is used for further extracting features based on the fused features, the head position binary image label is used for supervising and predicting the position of the head, and the head position binary image generated after supervision training by the head position binary image generation module is used for solving a Hadamard product with the features obtained by the feature fusion module to serve as the input of the crowd density image estimation module;
the generated head position binary image label is as follows:
Figure FDA0003858511150000021
wherein B () is a binarization function, l represents a position set of all human heads in the image, and l is a binary function i Coordinates representing the center of the ith target head position, δ () being a pulse function, G () being a gaussian kernel, σ i The variance of the gaussian sum is indicated.
2. The method according to claim 1, wherein the obtaining a scene image containing a crowd and generating a crowd density map label and a head position binary map label respectively according to the scene image, comprises:
acquiring scene images containing crowds, and marking the positions of the heads in each image;
generating a crowd density map label according to the marked head position;
and generating a human head position binary image label by using a binarization function according to the crowd density image label.
3. The method according to claim 1, wherein the constructing a training set according to the training samples comprises:
and carrying out data expansion on each training sample in a random cutting, mirror image and rotation mode.
4. Crowd counting system based on many branches expand convolutional neural network, its characterized in that includes:
a tag generation module configured to: acquiring a scene image containing a crowd and respectively generating a crowd density map label and a head position binary map label according to the scene image;
a training set construction module configured to: constructing a training set according to the training samples; each image, the corresponding crowd density graph label and the corresponding head position binary icon label are used as a training sample;
a model training module configured to: training a multi-branch expansion convolution population counting network model based on a training set to obtain network optimal parameters, and generating a trained multi-branch expansion convolution population counting network model;
a crowd counting application module configured to: inputting the image to be detected into a trained multi-branch expansion convolution crowd counting network model, and outputting a crowd density graph;
an output module configured to: summing the pixel values in the crowd density image to obtain a crowd counting result;
the multi-branch expansion convolution crowd counting network model comprises: the device comprises a multi-branch convolution module, a feature fusion module, a binary image estimation module and a density image estimation module;
the multi-branch convolution module includes: three expansion convolution branches sharing network parameters and having different expansion rates are used for carrying out multi-scale feature extraction on the crowd images;
the binary image estimation module is used for realizing estimation of the binary image by adopting a cross entropy loss function in a supervised manner;
the density map estimation module receives the output of the binary map estimation module, calculates the Hadamard product of the output and the feature map generated by the feature fusion module, and then realizes the estimation of the crowd density map by using a cross entropy loss function in a supervised manner by using three-layer convolution operation;
the feature fusion module is used for carrying out feature fusion on the features of the three expansion convolution branches, and then carrying out feature extraction on the fused features to generate a feature map; after the characteristics are fused, the characteristics are divided into two paths which are respectively input into a head position binary image generation module and a crowd density estimation module; the human head position binary image generation module is used for further extracting features based on the fused features, a human head position binary image label is used for supervising and predicting the position of the human head, and the human head position binary image generated after the human head position binary image generation module is subjected to supervision training and the Hadamard product obtained by the human head position binary image and the features obtained by the feature fusion module is used as the input of the crowd density image estimation module;
the generated head position binary image label is as follows:
Figure FDA0003858511150000041
wherein B () is a binarization function, l represents a position set of all human heads in the image, and l is a binary function i Coordinates representing the center of the ith target head position, δ () being a pulse function, G () being a gaussian kernel, σ i The variance of the gaussian sum is indicated.
5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for population counting based on a multi-branch expanded convolutional neural network as claimed in any one of claims 1 to 3.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in the method for population counting based on a multi-branch expanded convolutional neural network as claimed in any one of claims 1 to 3.
CN202110354656.3A 2021-03-30 2021-03-30 Crowd counting method and system based on multi-branch expansion convolutional neural network Active CN113052827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110354656.3A CN113052827B (en) 2021-03-30 2021-03-30 Crowd counting method and system based on multi-branch expansion convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110354656.3A CN113052827B (en) 2021-03-30 2021-03-30 Crowd counting method and system based on multi-branch expansion convolutional neural network

Publications (2)

Publication Number Publication Date
CN113052827A CN113052827A (en) 2021-06-29
CN113052827B true CN113052827B (en) 2022-12-27

Family

ID=76517105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110354656.3A Active CN113052827B (en) 2021-03-30 2021-03-30 Crowd counting method and system based on multi-branch expansion convolutional neural network

Country Status (1)

Country Link
CN (1) CN113052827B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135325A (en) * 2019-05-10 2019-08-16 山东大学 Crowd's number method of counting and system based on dimension self-adaption network
CN110674704A (en) * 2019-09-05 2020-01-10 同济大学 Crowd density estimation method and device based on multi-scale expansion convolutional network
CN111626184A (en) * 2020-05-25 2020-09-04 齐鲁工业大学 Crowd density estimation method and system
CN111915627A (en) * 2020-08-20 2020-11-10 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Semantic segmentation method, network, device and computer storage medium
CN112101195A (en) * 2020-09-14 2020-12-18 腾讯科技(深圳)有限公司 Crowd density estimation method and device, computer equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815867A (en) * 2019-01-14 2019-05-28 东华大学 A kind of crowd density estimation and people flow rate statistical method
CN112232140A (en) * 2020-09-25 2021-01-15 浙江远传信息技术股份有限公司 Crowd counting method and device, electronic equipment and computer storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135325A (en) * 2019-05-10 2019-08-16 山东大学 Crowd's number method of counting and system based on dimension self-adaption network
CN110674704A (en) * 2019-09-05 2020-01-10 同济大学 Crowd density estimation method and device based on multi-scale expansion convolutional network
CN111626184A (en) * 2020-05-25 2020-09-04 齐鲁工业大学 Crowd density estimation method and system
CN111915627A (en) * 2020-08-20 2020-11-10 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Semantic segmentation method, network, device and computer storage medium
CN112101195A (en) * 2020-09-14 2020-12-18 腾讯科技(深圳)有限公司 Crowd density estimation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113052827A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109446889B (en) Object tracking method and device based on twin matching network
US20230081645A1 (en) Detecting forged facial images using frequency domain information and local correlation
Raza et al. Appearance based pedestrians’ head pose and body orientation estimation using deep learning
CN112597941B (en) Face recognition method and device and electronic equipment
US20170161591A1 (en) System and method for deep-learning based object tracking
CN109840477B (en) Method and device for recognizing shielded face based on feature transformation
CN112348828B (en) Instance segmentation method and device based on neural network and storage medium
CN111626184B (en) Crowd density estimation method and system
CN111539290B (en) Video motion recognition method and device, electronic equipment and storage medium
CN106709461A (en) Video based behavior recognition method and device
CN114331829A (en) Countermeasure sample generation method, device, equipment and readable storage medium
JP7292492B2 (en) Object tracking method and device, storage medium and computer program
CN111652181B (en) Target tracking method and device and electronic equipment
Sengar et al. Motion detection using block based bi-directional optical flow method
CN112418195B (en) Face key point detection method and device, electronic equipment and storage medium
Ma et al. Fusioncount: Efficient crowd counting via multiscale feature fusion
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN115018039A (en) Neural network distillation method, target detection method and device
CN114581918A (en) Text recognition model training method and device
CN114170290A (en) Image processing method and related equipment
CN115311518A (en) Method, device, medium and electronic equipment for acquiring visual attribute information
CN111353429A (en) Interest degree method and system based on eyeball turning
CN114677611B (en) Data identification method, storage medium and device
CN111914809B (en) Target object positioning method, image processing method, device and computer equipment
CN113052827B (en) Crowd counting method and system based on multi-branch expansion convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501

Patentee after: Qilu University of Technology (Shandong Academy of Sciences)

Country or region after: China

Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501

Patentee before: Qilu University of Technology

Country or region before: China