CN111626134A - Dense crowd counting method, system and terminal based on hidden density distribution - Google Patents

Dense crowd counting method, system and terminal based on hidden density distribution Download PDF

Info

Publication number
CN111626134A
CN111626134A CN202010349623.5A CN202010349623A CN111626134A CN 111626134 A CN111626134 A CN 111626134A CN 202010349623 A CN202010349623 A CN 202010349623A CN 111626134 A CN111626134 A CN 111626134A
Authority
CN
China
Prior art keywords
density
gaussian
hidden
map
dense
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010349623.5A
Other languages
Chinese (zh)
Other versions
CN111626134B (en
Inventor
杨华
高宇康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010349623.5A priority Critical patent/CN111626134B/en
Publication of CN111626134A publication Critical patent/CN111626134A/en
Application granted granted Critical
Publication of CN111626134B publication Critical patent/CN111626134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a dense crowd counting method, a system and a terminal based on hidden density distribution, wherein the method comprises the following steps: obtaining a self-adaptive hidden Gaussian density map through a Gaussian network according to a crowd point map; according to the counting loss term, the smoothing term and the Bayes term, guiding the optimization of the hidden Gaussian density map to ensure that the generated quality is higher; according to the hidden Gaussian density map serving as a training target, combining a confrontation loss function and a Bayesian loss function, and outputting the dense crowd image as a predicted density distribution map; and summing the predicted density distribution maps to obtain the predicted number of density people. And (3) alternately training a density predictor, a hidden Gaussian density generator and a discriminator and performing cooperative optimization. The invention improves the precision to a greater extent, has good robustness, and has stronger application value because the parameter quantity and the operation quantity of the deduction stage are not increased.

Description

Dense crowd counting method, system and terminal based on hidden density distribution
Technical Field
The invention relates to the technical field of computer vision, in particular to a dense crowd counting method, a dense crowd counting system and a dense crowd counting terminal based on hidden density distribution.
Background
With the rapid growth of the world population and the acceleration of urbanization construction, how to accurately count the population at high density so as to perform early warning in time, effectively control and dredge the flow of people becomes a very important hotspot problem. Most existing methods extract image features based on a multi-layer convolutional neural network and regress the count results.
However, in the existing method, the generated density distribution map often has the problems of low quality, inaccurate prediction of high-density parts, high parameter redundancy, large calculation amount, poor generalization capability caused by the need of manually adjusting hyper-parameters according to each scene, and the like, and in the application of an actual scene, the model is often required to save storage resources and calculation resources while having a considerable prediction precision, and the generated density distribution map has good robustness for different scenes.
Through retrieval, the chinese patent application No. 201810986919.0 discloses a dense population counting method and apparatus, which obtains an image to be detected including a human figure, inputs the image to be detected into a convolutional neural network model to obtain a population density map of the image to be detected, and determines the number of human figures in the image to be detected according to the population density map. The above-mentioned process fully extracts the characteristic information in the image to be detected, realizes effectual crowd's count and density estimation, brings very big facility for subsequent applications such as safety monitoring, crowd management and control. However, the patent is low in counting precision, the challenges caused by the problems of cross-scene, cross-scale, cross-density grade and the like are difficult to solve, and the hyper-parameters need to be manually adjusted according to each scene, so that the application capability of the hyper-parameters in the actual scene is limited.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a dense crowd counting method, a system and a terminal based on hidden density distribution, improves the performance, and realizes self-adaption solution of crowd counting in various scenes.
According to a first aspect of the present invention, there is provided a dense population counting method based on implicit density distribution, comprising:
acquiring dense crowd image Ic(x, y) dense crowd coordinate data and converting into a dense crowd point diagram Dt(x,y);
The dense crowd point diagram Dt(x, y) obtaining an adaptive hidden Gaussian density map G (x, y) by a hidden Gaussian density generator;
taking the hidden Gaussian density map G (x, y) as a learning target of a density predictor, and adopting a multi-level loss function to constrain a generation target;
the dense crowd image Ic(x, y) inputting the density predictor and outputting a predicted density prediction map Dp(x,y);
Predicting the density of the image DpAnd (x, y) summing all the pixel values to obtain the final predicted number of people.
Optionally, the dense population point map Dt(x, y) obtaining an adaptive hidden gaussian density map G (x, y) by a hidden gaussian density generator, comprising:
the hidden Gaussian density generator adopts a Gaussian network to make the dense crowd point diagram Dt(x, y) convolving with N Gaussian kernels K with different variance sigma values to obtain first feature maps with different scale information, and performing the same convolution operation on the first feature maps to obtain second feature maps;
extracting the second feature graph by adopting a plurality of mask Gaussian convolution modules, extracting and decomposing the second feature graph into features of different levels by the plurality of mask Gaussian convolution modules through Gaussian envelope constraint initialization parameters, and sequentially adding the features of different levels input by two adjacent mask Gaussian convolution modules by utilizing residual operation to obtain more robust features;
through a decoding module formed by multilayer convolution, the number of output channels of each convolution layer is gradually reduced compared with the number of input channels, and finally, an implicit Gaussian density map D is obtaineds(x,y)。
Optionally, the constraining the generation goal by using a multi-level loss function includes:
the density prediction graph D output by the density predictor is subjected to pixel-by-pixel constraint by adopting a mean square error loss functionp(x, y) and the hidden Gaussian density map DsThe distribution of (x, y) is guaranteed to be similar;
adopting a Bayesian loss function, and enabling a density prediction graph D output by the density predictor to be restricted by a pedestrian point rangep(x, y) is kept close to the probability distribution of the manually marked pedestrian coordinate position;
identification of predictions by discriminators using a penalty functionMeasured Density prediction map Dp(x, y) authentication, and generating a density prediction map Dp(x, y) more high frequency information is retained, i.e. the dense region prediction accuracy is improved.
Optionally, the dense crowd image Ic(x, y) inputting the density predictor and outputting the predicted density prediction map Dp(x, y) comprising:
taking pre-trained VggNet as a feature extraction network, and taking an image I of a dense crowdc(x, y) inputting VggNet network to obtain characteristic diagram, up-sampling the characteristic diagram, and obtaining output density prediction diagram D after multilayer convolution layerp(x,y)。
Optionally, the density prediction graph D is output according to the density predictorp(x, y) and the hidden-gaussian density map G (x, y), updating discriminator parameters using an LSGAN loss function.
Optionally, the method further includes optimizing the hidden gaussian density map G (x, y), where the optimization is performed according to a count loss term or a bayesian term, or according to a smoothing term and a count loss term, or according to a smoothing term and a bayesian term, to generate a higher quality hidden gaussian density map; wherein the content of the first and second substances,
designing a counting loss item, and adopting an L1 distance to constrain that the total number of the hidden Gaussian density graph G (x, y) is close to the total number of the marked people;
designing a smoothing term, and adopting a smoothing term loss function constraint to enable pixel points of the hidden Gaussian density map G (x, y) to have coherence with surrounding pixels;
designing a Bayesian term, and constraining the probability distribution of a hidden Gaussian density map G (x, y) to be consistent with the probability distribution of manually marked marking points in training data by carrying out Gaussian modeling on a foreground point and a background so as to reduce the interference of a background noise region on a target crowd region;
and taking the implicit Gaussian density map with higher quality as a learning target of the density predictor.
Optionally, the method further comprises: and updating the parameters of the hidden Gaussian density generator according to the target loss function smoothing term, the Bayesian term and the counting error term.
Optionally, the density map predictor is updated according to a mean square error term output by the hidden gaussian density generator, a confrontation generation loss term obtained by the feedback of the discriminator, and a bayesian term loss function.
According to a second aspect of the present invention, there is provided a dense population counting system based on hidden density distribution, comprising:
dense crowd point diagram acquisition module for acquiring dense crowd image Ic(x, y) dense crowd coordinate data and converting into a dense crowd point diagram Dt(x,y);
A hidden Gaussian density generator for generating the dense crowd point diagram Dt(x, y) obtaining an adaptive hidden Gaussian density map G (x, y) by a hidden Gaussian density generator;
the density map predictor is used for taking the hidden Gaussian density map G (x, y) as a learning target and adopting a multi-level loss function to constrain a generation target; the density map predictor is used for converting the dense crowd image Ic(x, y) output as predicted Density prediction map Dp(x,y);
A population number prediction module for predicting the density of the image DpAnd (x, y) summing all the pixel values to obtain the final predicted number of people.
According to a third aspect of the present invention, there is provided an electronic terminal, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the above dense crowd counting method based on implicit density distribution when executing the computer program.
Compared with the prior art, the invention has at least one of the following beneficial effects:
according to the method, the system and the terminal, the adaptive hidden Gaussian density map is obtained through the Gaussian network according to the crowd point map, the quality is higher, and the method, the system and the terminal are more beneficial to network learning and optimization of a predictor. Meanwhile, by adopting a multi-level loss function, the learning target is optimized and the precision of the method is improved under the condition that model parameters and calculated amount are not increased.
The method, the system and the terminal extract more robust characteristics based on the pre-trained VggNet network, improve the precision and have good robustness.
The method, the system and the terminal of the invention guide the optimization of the hidden Gaussian density map according to the counting loss term, the smoothing term and the Bayesian term, are used for the constraint optimization process, and constrain the learning target in multiple scales such as pixel-by-pixel, line-by-line human points, image block-by-image block and the like, so that the generation quality is higher.
According to the method, the system and the terminal, the density predictor, the hidden Gaussian density generator and the discriminator are alternately trained and cooperatively optimized to form a cooperative learning framework, so that the precision can be further improved, and the method has a strong application value because the parameters and the operation amount of an inference stage are not increased.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of a dense population counting method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a dense population counting method in a preferred embodiment of the present invention;
fig. 3 is a schematic diagram illustrating an effect of the dense people counting method according to an embodiment of the invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Fig. 1 is a schematic diagram illustrating a dense population counting method based on implicit density distribution according to an embodiment of the present invention. As shown in fig. 1, the dense population counting method based on the implicit density distribution includes:
s100, acquiring an image I of the dense crowdc(x, y) dense crowd coordinate data and converting into a dense crowd point diagram Dt(x,y);
S200, dense crowd point diagram Dt(x, y) obtaining an adaptive hidden Gaussian density map G (x, y) by a hidden Gaussian density generator;
s300, using the hidden Gaussian density map G (x, y) as a learning target of a density predictor, and adopting a multi-level loss function to constrain a generation target;
s400, the dense crowd image Ic(x, y) inputting the density predictor and outputting the predicted density prediction map Dp(x,y);
S500, predicting the density DpAnd (x, y) summing all the pixel values to obtain the final predicted number of people.
According to the embodiment of the invention, the adaptive hidden Gaussian density map is obtained through the Gaussian network according to the crowd point map, so that the quality is higher, and the network learning and optimization of a predictor are facilitated. Meanwhile, by adopting a multi-level loss function, the learning target is optimized and the precision of the method is improved under the condition that model parameters and calculated amount are not increased.
FIG. 2 is a schematic diagram of a dense people counting method in a preferred embodiment of the present invention. In the graph, an implicit Gaussian density generation network is used as an implicit Gaussian density generator, a density prediction network is used as a density graph predictor, and a discrimination network is used as a discriminator. The method comprises the steps of converting crowd marking information into a point diagram, obtaining an initial density diagram by utilizing Gaussian convolution, splicing the density diagram with dense crowd Coordinate data (Coordinate map), inputting the density diagram into a Gaussian convolution module to obtain an implicit Gaussian density diagram, and using the implicit Gaussian density diagram as a learning target of a density predictor. The density predictor consists of a VggNet extraction high-dimensional feature and an up-sampling decoding layer used for recovering and generating density map details. The discriminator is used for restraining the output of the density predictor so that more high-frequency characteristics can be reserved.
In another embodiment, the present invention further provides a dense population counting system based on implicit density distribution, which can be used to implement the above method, and specifically includes a dense population point diagram obtaining module, an implicit gaussian density generator, a density diagram predictor, and a population number prediction module, wherein: intensive crowd point diagram acquisition module for acquiring intensive crowd image Ic(x, y) dense crowd coordinate data and converting into dense crowdDot diagram Dt(x, y); hidden Gaussian density generator points diagram D of dense populationt(x, y) obtaining an adaptive hidden Gaussian density map G (x, y) by a hidden Gaussian density generator; the density map predictor takes the hidden Gaussian density map G (x, y) as a learning target, and a multilevel loss function is adopted to constrain the generation target; density map predictor is used for converting intensive crowd image Ic(x, y) output as predicted Density prediction map Dp(x, y); people number prediction module density prediction graph DpAnd (x, y) summing all the pixel values to obtain the final predicted number of people.
In order to better illustrate the implementation of the technical solution of the present invention, a specific application example of the dense population counting method based on implicit density distribution is given below, and the specific operation steps may include:
s101, acquiring an image I of a dense crowdc(x,y);
In this embodiment, the original target set may include a three-channel color map, or may include a single-channel grayscale map.
S102, acquiring dense crowd coordinate data and converting the dense crowd coordinate data into a dense crowd point diagram Dt(x,y);
In this embodiment, the crowd point map is a picture scaled to the size of the dense crowd image 1/8, where pixel values of pixel points where pedestrians exist are 1, and pixel values of other points are 0;
s103, drawing a point diagram D by the crowdt(x, y) to obtain a hidden Gaussian density map G (x, y).
In this embodiment, the dense population point diagram Dt(x, y) are convolved with 18 gaussian kernels K with different variance σ values, which are sampled from three normal distributions: sigma1~N(0.5,0.02),σ2~N(1,0.02),σ3N (1.5, 0.02). So as to obtain a first characteristic diagram with different scale information, and the same convolution operation is carried out on the first characteristic diagram to obtain a further second characteristic diagram.
In this embodiment, a mask gaussian convolution module is designed to further extract the feature map, and the initialization parameter W is constrained by a gaussian envelope G, so that the feature map X is extracted and decomposed into features O of different levels. Adding the original characteristic diagram input into the mask Gaussian convolution module and the characteristic diagram multiplied by the mask points to obtain more robust characteristics; and inputting the more robust features into a decoding module formed by multilayer convolution to obtain a hidden Gaussian density map G (x, y).
Figure BDA0002471384540000061
Specifically, in a preferred embodiment, the implicit gaussian density generator is composed of two layers of gaussian convolution layers, six mask gaussian convolution modules and a decoding module formed by ordinary convolution. In order to extract multi-scale features, every two mask Gaussian convolution modules are connected through a short circuit to output a feature map, the feature map is spliced, and then an implicit Gaussian density map is decoded through a series of common convolution layers with gradually reduced channel numbers. For example, the number of output channels per convolution is 1/2, which is the number of input channels, thereby achieving a reduction in the number of channels. The features O of different levels are related to the internal structure of the Gaussian mask convolution module, the general Gaussian mask convolution module is composed of three columns, and the Gaussian kernel variance parameters corresponding to each column are different, so that the features of different levels are obtained.
Point diagram D of dense crowd by first layer of Gaussian convolution layert(x, y) convolving with N Gaussian kernels K with different variance sigma values to obtain first feature maps with different scale information, and performing the same convolution operation on the first feature maps by the second layer of Gaussian convolution layer to obtain second feature maps; and then, extracting the second feature graph by adopting six mask Gaussian convolution modules, and extracting and decomposing the second feature graph into six features of different levels through Gaussian envelope constraint initialization parameters. Of course, in other embodiments, other numbers of mask gaussian convolution modules may be used as desired.
The feature map multiplied by the point is the input of the mask gaussian convolution module, for example, if the input feature map is x1, the weight mask is w1, and the output is y1, then y1 is x1+ x1 x w 1; the second signature is the input to the first mask gaussian convolution module, then the input to the second mask gaussian convolution module is the output of the first mask gaussian convolution module, and so on. Because there are a plurality of mask Gaussian convolution modules, the input feature map of the first mask Gaussian convolution module is used as the original feature map, namely the second feature map. For the first mask Gaussian convolution module, the point-multiplied feature map is the second feature map, for the second mask Gaussian convolution module, the point-multiplied feature map is the output of the first mask Gaussian convolution module, and in the same way, the input of the next mask Gaussian convolution module (namely, the point-multiplied feature map) is the output of the last mask Gaussian convolution module. Namely: coordinate point diagram (manual labeling data) -generating a first characteristic diagram-generating a second characteristic diagram-inputting the second characteristic diagram into a first mask Gaussian convolution module-outputting a former mask Gaussian convolution module as the input of a latter mask Gaussian convolution module.
And S104, guiding the optimization of the hidden Gaussian density map according to the counting loss term, the smoothing term and the Bayes term to generate a higher-quality hidden Gaussian density map G (x, y).
In this embodiment, a count loss term is designed, and an L1 distance is used to constrain the total number of the hidden gaussian density maps to be close to the total number of the labeled population. Is provided with CgtThe real number of people in the picture is as follows:
Lc=||∑Ds(x,y)-Cgt||1
Lca count loss term is represented.
In the embodiment, the smoothing term is designed, and the part of the hidden Gaussian density image with severe pixel value change is not beneficial to network modeling learning, so that the pixel point and the surrounding pixels have coherence by adopting the constraint of the smoothing term loss function, the proportion of an abnormal area is reduced, network convergence is easy, and the performance is improved.
Figure BDA0002471384540000071
LsRepresenting the smoothing term, H, W represents the length and width of the generated implicit gaussian density map.
In the embodiment, a Bayesian term is designed, the probability distribution of the hidden Gaussian density map and the labeled points is constrained to be consistent by performing Gaussian modeling on the foreground point and the background, and the probability distribution of the background noise area to the target person is reducedInterference in the cluster area. Let F (-) denote the L1 distance function, cnIndicates the total number of people associated with each of the marked points, c0Indicating a total number of people associated with the context.
Lbay=F(1-E[cn])+F(0-E[c0])
LbayA bayesian term is represented. E [ c ]n]It is desirable to have the expectation of the distribution of the number of people associated with each annotation point be as close to 1 as possible. E [ c ]0]Indicating a desire for a population distribution that is relevant to the context.
In this embodiment, the label point refers to a label labeled manually in the training data, for example, 100 people exist in a drawing, 100 coordinate positions need to be labeled when the training data is made, and the data is actually projected on the drawing to be the dense crowd point diagram Dt(x, y), i.e., the input to the hidden gaussian density generator.
In this embodiment, the counting loss term, the smoothing term and the bayesian term are optimized together, but it should be noted that in other embodiments, the counting loss term and the bayesian term may be optimized separately, or the smoothing term may be optimized in combination with any one of the counting loss term and the bayesian term, and it is of course most preferable that the counting loss term, the smoothing term and the bayesian term are optimized together, and the effect is the best. For example, in practice, only the count loss term may be retained, but some precision loss may result.
S105, taking the pre-trained VggNet as a feature extraction network, and taking the dense crowd image Ic(x, y) inputting the VggNet network to obtain a characteristic diagram, and performing up-sampling on the characteristic diagram to obtain an output density prediction diagram Dp(x, y). And summing all pixel values of the density prediction image to obtain the final number of predicted people.
In this embodiment, the VggNet network is used as a front-end portion of the density predictor, and forms a complete density predictor together with upsampling and multilayer convolution.
In this example, VggNet takes the feature extraction part except the full connection layer, down-samples to 1/16 size of the original image, decodes the original image by one up-sampling layer and the multilayer convolution layer, and outputs the density prediction map Dp(x, y) is input dense person1/8 size of the cluster image.
In another preferred embodiment, based on the above embodiment, a multi-level loss function can be used to constrain the generation goal. The multi-level loss function may include a pixel-by-pixel mean square error loss function, a line-by-line human point bayesian loss function, and a countering loss function. In the specific implementation:
using the mean square error loss function, the output D of the density prediction network is obtained through the pixel-by-pixel constraintp(x, y) and output D of the hidden Gaussian density generatorsThe (x, y) distributions are guaranteed to be similar. In this embodiment, the mean square error loss function L is adopted1The method comprises the following specific steps:
L1=||Dp(x,y)-Ds(x,y)||1
the Bayesian loss function is adopted, and the density prediction graph D of the output of the density prediction network is constrained by the range of the pedestrian pointsp(x, y) the probability distribution of the pedestrian coordinate position of the manually labeled group route remains close. Let F (-) denote the L1 distance function, cnIndicates the total number of people associated with each of the marked points, c0Indicating a total number of people associated with the context. In this embodiment, the Bayesian loss function L2The method comprises the following specific steps:
L2=F(1-E[cn])+F(0-E[c0])
E[cn]it is desirable to have the expectation of the distribution of the number of people associated with each annotation point be as close to 1 as possible. E [ c ]0]Indicating a desire for a population distribution that is relevant to the context.
Identification of predicted Density prediction map D by discriminator Using penalty functionp(x, y) authentication, and generating a density prediction map Dp(x, y) more high frequency information is retained, i.e. the dense region prediction accuracy is improved. x is the number ofrRepresenting true density profiles, i.e. Ds(x,y),xfRepresenting the output of the predictor, i.e. Dp(x, y). Penalty function L in this embodiment3The method comprises the following specific steps:
L3=E[(Dis(xr)-0)2]+E[(Dis(xf)-1)2]
the discriminator adopts PatchGAN network structure with the reception field size of 16 × 16 and dense crowd graph Ic(x, y) is downsampled to 1/8 size and concatenated with the gaussian density map as the input to the discriminator. The discriminator loss function uses LSGAN to ensure its stability and robustness.
In the preferred embodiment, the hidden gaussian density generator parameters are updated based on the objective loss function smoothing term, the bayesian term, and the count error term.
In another preferred embodiment, on the basis of the above embodiment, the density predictor, the discriminator and the hidden gaussian density generator can be cooperatively optimized, and the discriminator parameter is updated based on the LSGAN loss function according to the output of the density predictor and the output of the hidden gaussian density generator. And updating a density map predictor according to a mean square error term output by the hidden Gaussian density generator, a confrontation loss term obtained by feedback of the discriminator and a Bayesian term loss function.
In order to enable the optimization target of the density predictor to be less influenced by noise, the hidden Gaussian density generator can be pre-trained by inputting training data, and then the output G (x, y) of the hidden Gaussian density generator is used as the group of the density predictor.
In order to ensure the stability of the training process, the density predictor and the discriminator are updated alternately and sequentially, the hidden Gaussian density generator is updated once every 100 times, and the learning rate of the hidden Gaussian density generator is adjusted to 1/5 of the learning rate of the density predictor.
Based on the above embodiment steps, the training data of the specific example is respectively from the shanghai science data set and the UCF _ QNRF data set, wherein the former comprises 300 crowd pictures with different scenes and different sizes, the latter comprises 1200 larger-sized images with different viewing angles, sizes and density levels, and the test data respectively comprises 182 pictures and 334 pictures. Each picture has 50 to 3500 pedestrians.
The evaluation standard adopts MAE (mean absolute error) and MSE (mean square error), N is set as the number of pictures in the test set, CiThe number of people is predicted for the ith picture,
Figure BDA0002471384540000091
the real number of people in the ith picture is defined as follows:
Figure BDA0002471384540000092
Figure BDA0002471384540000093
Figure BDA0002471384540000094
the method and the device have the advantages that the accuracy is improved to a large extent by the result obtained by the embodiment of the invention, the robustness is good, in addition, compared with the baseline, the embodiment of the invention does not increase the parameter and the operation amount of the inference stage, and the application value is strong.
Fig. 3 is a schematic diagram illustrating the effect of the dense crowd counting method based on implicit density distribution according to the embodiment of the present invention, as shown in fig. 3, if the implicit density distribution method according to the above embodiment of the present invention is not used, the generated density map has low counting accuracy and poor quality, and it is difficult to accurately reflect the crowd distribution.
In another embodiment of the present invention, an electronic terminal is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the dense crowd counting method based on implicit density distribution is implemented.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may refer to the technical solution of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred example for implementing the method, and details are not described herein.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (10)

1. A dense crowd counting method based on hidden density distribution is characterized by comprising the following steps:
acquiring dense crowd image Ic(x, y) dense crowd coordinate data and converting into a dense crowd point diagram Dt(x,y);
The dense crowd point diagram Dt(x, y) obtaining an adaptive hidden Gaussian density map G (x, y) by a hidden Gaussian density generator;
taking the hidden Gaussian density map G (x, y) as a learning target of a density predictor, and adopting a multi-level loss function to constrain a generation target;
the dense crowd image Ic(x, y) inputting the density predictor and outputting a predicted density prediction map Dp(x,y);
Predicting the density of the image DpAnd (x, y) summing all the pixel values to obtain the final predicted number of people.
2. The base of claim 1A dense crowd counting method based on a hidden density distribution, characterized in that the dense crowd point diagram Dt(x, y) obtaining an adaptive hidden gaussian density map G (x, y) by a hidden gaussian density generator, comprising:
the hidden Gaussian density generator adopts a Gaussian network to make the dense crowd point diagram Dt(x, y) convolving with N Gaussian kernels K with different variance sigma values to obtain first feature maps with different scale information, and performing the same convolution operation on the first feature maps to obtain second feature maps;
extracting the second feature graph by adopting a plurality of mask Gaussian convolution modules, extracting and decomposing the second feature graph into features of different levels by the plurality of mask Gaussian convolution modules through Gaussian envelope constraint initialization parameters, and sequentially adding the features of different levels input by two adjacent mask Gaussian convolution modules by utilizing residual operation to obtain more robust features;
inputting the more robust features into a decoding module formed by multilayer convolution, wherein the number of output channels of each convolution layer is gradually reduced compared with the input channels, and finally obtaining a hidden Gaussian density map Ds(x,y)。
3. The method for dense population counting based on implicit density distribution according to claim 1, wherein the constraint on the generation goal by using the multi-level loss function comprises:
the density prediction graph D output by the density predictor is subjected to pixel-by-pixel constraint by adopting a mean square error loss functionp(x, y) and the hidden Gaussian density map DsThe distribution of (x, y) is guaranteed to be similar;
adopting a Bayesian loss function, and enabling a density prediction graph D output by the density predictor to be restricted by a pedestrian point rangep(x, y) is kept close to the probability distribution of the manually marked pedestrian coordinate position;
identification of predicted Density prediction map D by discriminator Using penalty functionp(x, y) authentication, and generating a density prediction map Dp(x, y) retaining more high frequency information, i.e. increasing dense area predictionAnd (6) measuring the precision.
4. The method according to claim 1, wherein the dense crowd image I is obtained by dividing the dense crowd image into a plurality of groupsc(x, y) inputting the density predictor and outputting the predicted density prediction map Dp(x, y) comprising:
taking pre-trained VggNet as a feature extraction network, and taking an image I of a dense crowdc(x, y) inputting VggNet network to obtain characteristic diagram, up-sampling the characteristic diagram, and obtaining output density prediction diagram D after multilayer convolution layerp(x, y), the feature extraction network, the upsampling layer and the multilayer convolution layer form a density predictor.
5. The dense crowd counting method based on implicit density distribution according to claim 4, wherein the density prediction graph D is obtained from the density prediction graph output by the density predictorp(x, y) and the hidden-gaussian density map G (x, y), updating discriminator parameters using an LSGAN loss function.
6. The dense crowd counting method based on the implicit density distribution according to any one of claims 1 to 5, further comprising optimizing the implicit Gaussian density map G (x, y), wherein the optimization is performed according to a count loss term or a Bayesian term, or according to a smoothing term and a count loss term, or according to a smoothing term and a Bayesian term, so as to generate a higher-quality implicit Gaussian density map; wherein the content of the first and second substances,
designing a counting loss item, and adopting an L1 distance to constrain that the total number of the hidden Gaussian density graph G (x, y) is close to the total number of the marked people;
designing a smoothing term, and adopting a smoothing term loss function constraint to enable pixel points of the hidden Gaussian density map G (x, y) to have coherence with surrounding pixels;
designing a Bayesian term, and constraining the probability distribution of a hidden Gaussian density map G (x, y) to be consistent with the probability distribution of manually marked marking points in training data by carrying out Gaussian modeling on a foreground point and a background so as to reduce the interference of a background noise region on a target crowd region;
and taking the implicit Gaussian density map with higher quality as a learning target of the density predictor.
7. The method of claim 6, further comprising: and updating the parameters of the hidden Gaussian density generator according to the target loss function smoothing term, the Bayesian term and the counting error term.
8. The method according to claim 6, wherein the dense population counting method based on implicit density distribution is characterized in that a density map predictor is updated according to a mean square error term output by the implicit Gaussian density generator, an antagonistic generation loss term fed back by the discriminator, and a Bayesian loss function.
9. A dense population counting system based on hidden density distribution, comprising:
dense crowd point diagram acquisition module for acquiring dense crowd image Ic(x, y) dense crowd coordinate data and converting into a dense crowd point diagram Dt(x,y);
A hidden Gaussian density generator for generating the dense crowd point diagram Dt(x, y) obtaining an adaptive hidden Gaussian density map G (x, y) by a hidden Gaussian density generator;
the density map predictor is used for taking the hidden Gaussian density map G (x, y) as a learning target and adopting a multi-level loss function to constrain a generation target; the density map predictor is used for converting the dense crowd image Ic(x, y) output as predicted Density prediction map Dp(x,y);
A population number prediction module for predicting the density of the image DpAnd (x, y) summing all the pixel values to obtain the final predicted number of people.
10. An electronic terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-8 when executing the computer program.
CN202010349623.5A 2020-04-28 2020-04-28 Dense crowd counting method, system and terminal based on hidden density distribution Active CN111626134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010349623.5A CN111626134B (en) 2020-04-28 2020-04-28 Dense crowd counting method, system and terminal based on hidden density distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010349623.5A CN111626134B (en) 2020-04-28 2020-04-28 Dense crowd counting method, system and terminal based on hidden density distribution

Publications (2)

Publication Number Publication Date
CN111626134A true CN111626134A (en) 2020-09-04
CN111626134B CN111626134B (en) 2023-04-21

Family

ID=72258122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010349623.5A Active CN111626134B (en) 2020-04-28 2020-04-28 Dense crowd counting method, system and terminal based on hidden density distribution

Country Status (1)

Country Link
CN (1) CN111626134B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159182A (en) * 2021-04-23 2021-07-23 中国科学院合肥物质科学研究院 Agricultural tiny pest image detection method based on dense region re-refining technology
CN113191301A (en) * 2021-05-14 2021-07-30 上海交通大学 Video dense crowd counting method and system integrating time sequence and spatial information
CN113516029A (en) * 2021-04-28 2021-10-19 上海科技大学 Image crowd counting method, device, medium and terminal based on partial annotation
CN114973112A (en) * 2021-02-19 2022-08-30 四川大学 Scale-adaptive dense crowd counting method based on antagonistic learning network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862261A (en) * 2017-10-25 2018-03-30 天津大学 Image people counting method based on multiple dimensioned convolutional neural networks
CN109214337A (en) * 2018-09-05 2019-01-15 苏州大学 A kind of Demographics' method, apparatus, equipment and computer readable storage medium
CN109858461A (en) * 2019-02-21 2019-06-07 苏州大学 A kind of method, apparatus, equipment and storage medium that dense population counts
CN110705344A (en) * 2019-08-21 2020-01-17 中山大学 Crowd counting model based on deep learning and implementation method thereof
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN110879982A (en) * 2019-11-15 2020-03-13 苏州大学 Crowd counting system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862261A (en) * 2017-10-25 2018-03-30 天津大学 Image people counting method based on multiple dimensioned convolutional neural networks
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN109214337A (en) * 2018-09-05 2019-01-15 苏州大学 A kind of Demographics' method, apparatus, equipment and computer readable storage medium
CN109858461A (en) * 2019-02-21 2019-06-07 苏州大学 A kind of method, apparatus, equipment and storage medium that dense population counts
CN110705344A (en) * 2019-08-21 2020-01-17 中山大学 Crowd counting model based on deep learning and implementation method thereof
CN110879982A (en) * 2019-11-15 2020-03-13 苏州大学 Crowd counting system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李嘉文: "复杂场景下人群计数分析" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114973112A (en) * 2021-02-19 2022-08-30 四川大学 Scale-adaptive dense crowd counting method based on antagonistic learning network
CN114973112B (en) * 2021-02-19 2024-04-05 四川大学 Scale self-adaptive dense crowd counting method based on countermeasure learning network
CN113159182A (en) * 2021-04-23 2021-07-23 中国科学院合肥物质科学研究院 Agricultural tiny pest image detection method based on dense region re-refining technology
CN113159182B (en) * 2021-04-23 2022-09-09 中国科学院合肥物质科学研究院 Agricultural tiny pest image detection method based on dense region re-refining technology
CN113516029A (en) * 2021-04-28 2021-10-19 上海科技大学 Image crowd counting method, device, medium and terminal based on partial annotation
CN113516029B (en) * 2021-04-28 2023-11-07 上海科技大学 Image crowd counting method, device, medium and terminal based on partial annotation
CN113191301A (en) * 2021-05-14 2021-07-30 上海交通大学 Video dense crowd counting method and system integrating time sequence and spatial information
CN113191301B (en) * 2021-05-14 2023-04-18 上海交通大学 Video dense crowd counting method and system integrating time sequence and spatial information

Also Published As

Publication number Publication date
CN111626134B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
CN110176027B (en) Video target tracking method, device, equipment and storage medium
US11176381B2 (en) Video object segmentation by reference-guided mask propagation
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN111626134B (en) Dense crowd counting method, system and terminal based on hidden density distribution
CN113642390B (en) Street view image semantic segmentation method based on local attention network
CN110781980B (en) Training method of target detection model, target detection method and device
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN111860398A (en) Remote sensing image target detection method and system and terminal equipment
CN109977832B (en) Image processing method, device and storage medium
CN112364873A (en) Character recognition method and device for curved text image and computer equipment
CN112836625A (en) Face living body detection method and device and electronic equipment
CN113065551A (en) Method for performing image segmentation using a deep neural network model
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN111104941B (en) Image direction correction method and device and electronic equipment
US20220335572A1 (en) Semantically accurate super-resolution generative adversarial networks
CN116935044B (en) Endoscopic polyp segmentation method with multi-scale guidance and multi-level supervision
CN117576724A (en) Unmanned plane bird detection method, system, equipment and medium
CN111814693A (en) Marine ship identification method based on deep learning
CN113628349B (en) AR navigation method, device and readable storage medium based on scene content adaptation
CN111539420B (en) Panoramic image saliency prediction method and system based on attention perception features
CN114511877A (en) Behavior recognition method and device, storage medium and terminal
CN116543246A (en) Training method of image denoising model, image denoising method, device and equipment
CN114332884B (en) Document element identification method, device, equipment and storage medium
CN112990215B (en) Image denoising method, device, equipment and storage medium
CN116805337B (en) Crowd positioning method based on trans-scale visual transformation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant