CN110188597A - A kind of dense population counting and accurate positioning method and system based on attention mechanism circulation scaling - Google Patents
A kind of dense population counting and accurate positioning method and system based on attention mechanism circulation scaling Download PDFInfo
- Publication number
- CN110188597A CN110188597A CN201910293903.6A CN201910293903A CN110188597A CN 110188597 A CN110188597 A CN 110188597A CN 201910293903 A CN201910293903 A CN 201910293903A CN 110188597 A CN110188597 A CN 110188597A
- Authority
- CN
- China
- Prior art keywords
- crowd
- branch
- scaling
- image
- count
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30242—Counting objects in image
Abstract
The present invention relates to a kind of dense populations based on attention mechanism circulation scaling to count and accurate positioning method and system.Obtain that the method for crowd's quantity survey is different, and the present invention obtains the corresponding crowd's count density figure of input picture, crowd's location map by the deep neural network of well-designed three branch respectively and tries hard to for obtaining intensive candidate's attention from original people counting method based on density map and by face or pedestrian detection.Crowd's count value initial in image is obtained by crowd's count density figure;The position coordinates of each personage in image are obtained by crowd's location map;Several regions that the crowd is dense in image are obtained by close quarters candidate's figure, these regions are cut out from original image and comes and resolution ratio is enlarged into original twice, subsequent network is sent into and obtains more accurate personage's positioning result.
Description
Technical field
The present invention relates to dense populations in a kind of image to count and the pinpoint method of crowd more particularly to a kind of use
The attention mechanism circulation scaling pinpoint method and system of acquisition crowd, belongs to computer vision field.
Background technique
With the Urbanization Progress of society, urban population quantity is steeply risen, and video monitoring camera is densely installed
In many peri-urbans, more and more use in our routine works and life.The most important application of these video datas
One of field is exactly intelligent video monitoring.Have the China of 1,300,000,000 populations, a series of problems of the big initiation of the size of population is always
Threaten public security.Equally in the world elsewhere, as the overstocked generation of crowd is uncontrollable when holding large-scale activity
Event.So effectively using safety monitoring data rational allocation law enforcement officer and construction additional transport facility to crowd into
Row guidance, which shunts, has great significance for the protection of maintenance and the personal safety of public order.However traditional video surveillance needs
Direct surveillance's processing reports the developments, very consumption manpower and material resources.The video analysis of automation and processing can not only liberate labour
Power mining data, study can also arrive useful knowledge and rule from the video information of magnanimity.Crowd counts as video point
Crowd pedestrian is analyzed in a field in analysis, and emergency monitoring, many aspects such as traffic programme suffer from important meaning
Justice.
Existing crowd's counting technology be broadly divided into based on density map carry out integral estimation and face or pedestrian detection into
Row Population size estimation two major classes.With the development of depth learning technology, many researchers learn to obtain using deep neural network
The density map of crowd obtains crowd's quantity in picture by integrating to density map, this method has been achieved for good accurate
Degree, the major defect of this method are although crowd's quantity is suitable in density map integrated value and picture that study obtains, but to learn
The density map distribution and true density map distributional difference that acquistion is arrived are larger, are unfavorable for further population analysis.
The development of deep learning also makes traditional object detection task make significant headway, so there is researcher logical
It crosses and the face or pedestrian that occur in image is detected to estimate crowd's quantity.Although this method can accurately provide people
Position, the defect based on density drawing method prediction distribution inaccuracy avoided, but there is also very big problem is existing
The poor effect of face or pedestrian detector under super-intensive scene, and crowd's estimation is all often super-intensive scene, is difficult to see
Clear face or the body of people, so this method is difficult to have obtained effect in such a scenario.
Summary of the invention
For dense population count in based on density drawing method forecasting inaccuracy really and based on the method for detection for intensive
The bad problem of scene effect, the purpose of the present invention is to provide a kind of based on the dense population of attention mechanism circulation scaling
Several and pinpoint solution and system.The method that the present invention uses deep learning proposes a kind of based on attention
The circulation of mechanism scales network, which converts crowd's initial estimation for crowd's quantity survey problem in original intensive picture
And crowd is accurately positioned two problems.
Crowd's quantity is obtained with original people counting method based on density map and by face or pedestrian detection
The method of estimation is different, and it is corresponding that the present invention by the deep neural network of well-designed three branch obtains input picture respectively
Crowd's count density figure, crowd's location map and scaling candidate region pay attention to trying hard to.It is obtained by crowd's count density figure
Initial crowd's count value in image;The position coordinates of each personage in image are obtained by crowd's location map;Pass through contracting
Several regions that candidate region notices trying hard to obtain that the crowd is dense in image are put, these regions are cut out from original image and comes and incites somebody to action
Resolution ratio is enlarged into original twice, is sent into subsequent circulation scaling network and obtains more accurate personage's positioning result.From people
Crowd's count value can be obtained in group's count density figure and crowd's location map, the invention also provides a kind of combination scenes certainly
Weight is adapted to, two obtained crowd's count values are weighted to obtain more accurate crowd's quantity survey with the weight.
A kind of dense population based on attention mechanism circulation scaling of the invention counts and accurate positioning method, including with
Lower step:
1) deep neural network of three branches is established, obtains the corresponding crowd's count density figure of input picture, crowd respectively
Location map and scaling candidate region pay attention to trying hard to;
2) crowd's count value initial in image is obtained by crowd's count density figure, passes through the crowd position point
Butut obtains the position coordinates of each personage in image, notices trying hard to obtain that crowd is close in image by the scaling candidate region
Several regions of collection;
3) several regions that the crowd is dense are cut out to come from image, are accurately determined by improving resolution ratio and obtaining
Position is as a result, and update crowd's location map with it;
4) the crowd's count value obtained according to crowd's count density figure and the people obtained according to crowd's location map are utilized
Group's count value obtains accurate crowd's count value by weighting.
The above method is further illustrated below.The detailed process signal of this method is as shown in Figure 1, comprising the following steps:
Step1: network structure building and parameter initialization.As shown in Figure 1, including two in method proposed by the present invention
Major networks: master network (MainNet) and circulation scaling network (Recurrent Attention Zooming Net, abbreviation
RAZNet), MainNet include positioning branch (Localization Branch), counter branch (Counting Branch) with
And scaling candidate region branch (Zooming Region Proposal Branch).
MainNet positions branch by empty convolutional layer (dilated using first 13 layers of VGG-16 network as basic network
Convolutional layers) and 3 warp laminations compositions (deconvolutional layers), which finally exports
One layer of characteristic pattern identical with original image resolution sizes;Counter branch is only made of empty convolutional layer, and branch output is former
The characteristic pattern of 1/8 size of beginning photo resolution;The characteristic pattern of counter branch output after positioning branch and up-sampling is spelled
It connects, the input as scaling candidate region branch (Zooming region proposal branch).
RAZNet has lacked counter branch compared with MainNet, and rest part is consistent with MainNet.We are by VGG-
Initiation parameter of 16 parameters that training obtains on ImageNet data set as MainNet basic network, RAZNet is to instruct
Practice the MainNet parameter completed as initiation parameter.
Step2: the training of model.For the ease of model convergence, we are candidate according to counter branch, positioning branch, scaling
The sequence of region branch is successively trained three branches.After the completion of MainNet training, using MainNet as RAZNet's
Initiation parameter is finely adjusted RAZNet.
Step3: the selection of weight is merged.After the completion of model training, we can respectively obtain positioning point on training set
Crowd's count value that branch and counter branch obtain, the corresponding true crowd's count value of image having been had according to us, Wo Menke
To learn the fusion weight arrived between positioning branch and the count value of counter branch, which makes predicted value and true value more adjunction
Closely.
Step4: the reasoning of network.After the completion of model training, to each test picture, the people obtained from MainNet
Group's density map, crowd's location map and scaling candidate region pay attention to trying hard to, and try hard to obtain several close quarters according to attention,
These regional shears are gone out from original image, and length and width are enlarged into original twice, these pictures are obtained newly by RAZNet
Crowd's location map and scaling candidate region pay attention to trying hard to.It is new intensive when can not find during scaling candidate region pays attention to trying hard to
When region, entire reasoning terminates.
Step5: the acquisition of final crowd's count value and number of people position coordinates.We take the peak in crowd's location map
The position of value point is as the number of people coordinate finally predicted.In order to obtain peak point, we first do non-pole to crowd's location map
Big value inhibits (Nonmaxima Suppresssion, NMS), and response is then taken to be greater than all location point conducts of a certain threshold value
The anchor point of the number of people.The fusion weight obtained according to Step3, we calculate counter branch and the positioning fused crowd of branch
Count results, as final crowd's count value.
As shown in Figure 1, this method contains two basic network modules of MainNet and RAZNet, in MainNet there are three
Branch, there are two branches in RAZNet, and the title and function of network module and branch are respectively:
1. master network (MainNet): crowd's initial count is done to the initial picture of input and coarse crowd positions, it should
The scaling candidate region that network obtains pays attention to trying hard to for instructing the shearing of subsequent close quarters to amplify.
2. circulation scaling network (RAZNet): doing crowd's positioning to the close quarters selected in MainNet, obtain partial zones
The more accurate positioning result in domain.The network itself can obtain scaling candidate region and pay attention to trying hard to, according to scaling candidate regions
Domain pays attention to trying hard to decide whether that share zone continues through RAZNet again.
3. positioning branch (Localization Branch): obtaining feature from basic network, pass through 6 empty convolution
Layer and intermediate 3 interspersed warp laminations, export crowd's location map identical with network inputs image resolution ratio size.
4. counter branch (Counting Branch): feature is obtained from basic network, it is defeated by 6 empty convolutional layers
Length and width are respectively the crowd density figure of 1/8 size of network inputs image out.
5. scaling candidate region branch (Zooming Region Proposal Branch): from positioning branch and counting point
It obtains feature in branch, and they is stitched together the input as the branch, by 3 empty convolutional layers, output and network
The identical scaling candidate region of input image resolution size pays attention to trying hard to.
Accordingly with above method, the present invention also provides a kind of dense population countings based on attention mechanism circulation scaling
With Precise Position System comprising:
Master network module, it includes the deep neural networks of three branches, for obtaining the corresponding crowd of input picture respectively
Count density figure, crowd's location map and scaling candidate region pay attention to trying hard to;It is obtained by crowd's count density figure
Initial crowd's count value in image obtains the position coordinates of each personage in image by crowd's location map, leads to
Cross several regions that the scaling candidate region notices trying hard to obtain that the crowd is dense in image;By several areas that the crowd is dense
Domain is cut out from image to be come, and improves its resolution ratio;
Circulation scaling network module is responsible for obtaining with several regions that the crowd is dense described in improving after resolution ratio for input
Crowd's location map is updated to accurate personage's positioning result, and with it;
Counting module is merged, is responsible for using the crowd's count value obtained according to crowd's count density figure and according to crowd position
Crowd's count value that distribution map obtains obtains accurate crowd's count value by weighting.
It is described in the invention that circulation scaling is carried out based on attention mechanism compared with current existing crowd's counting technology
Dense population counting had the advantage that with accurate positioning method
1. the position that technology described in the invention can accurately provide personage in picture.
2. the region that can find out automatically in image that the crowd is dense by attention mechanism, passes through the resolution for improving close quarters
Rate obtains accurate positioning result.
3. the result positioned to crowd's counting and crowd merges by scene adaptive weight, crowd's meter is improved
Several accuracys.
Detailed description of the invention
Fig. 1 is schematic network structure;
Fig. 2 is that attention generates the crowd is dense candidate region schematic diagram.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific embodiment and
Attached drawing is described in further details the present invention.
1. target data generates
When model training, it would be desirable to image true crowd's count density figure corresponding with it, true crowd's position distribution
Figure (number of people location drawing) and true scaling candidate region pay attention to trying hard to as training data.
(1) true crowd's count density figure generates: the work that crowd counts before we refer to, according in labeled data
The number of people coordinate given generates corresponding crowd density figure.Crowd density figure is generated according to following formula, for each mark
The number of people, we introduce a Gaussian convolution, for each of true crowd's count density figure pixel coordinate point x, it close
Angle valueCalculation as shown by the following formula, wherein N is total number of people number in image, and N number of number of people coordinate points indicate
For x1..., xn,For distance xiThe average distance of 4 nearest numbers of people, ZiThe normalizing of Gaussian convolution is corresponded to for each number of people
Change parameter, β is the zoom factor of distance, and empirically value is 0.1 for we.
(2) the true number of people location drawing generates: everyone leader note point four neighborhoods corresponding with it are set to 1 by us, are obtained
To the final number of people location drawing.
(3) really scaling candidate region pays attention to trying hard to generate: we find apart from the point each of figure pixel
Three nearest number of people positions, calculate the average value of their three distances, do a Gaussian transformation to the value and obtain pixel pair
The response answered pays attention to the density degree for trying hard to be able to reflect out different zones crowd.
2. network structure constructs.
The present invention is a kind of using deep learning progress crowd's counting and the pinpoint method of crowd, depth nerve net
The structure design of network is as shown in Figure 1.Network includes two major networks of MainNet and RAZNet, and wherein MainNet is by VGG-16
First 13 layers as basic network, be followed by positioning branch, three parts of counter branch and scaling candidate region branch form,
RAZNet is made of positioning branch and scaling two parts of candidate region branch.The detailed configuration of MainNet and RAZNet is joined
Number sees below table 1.
The configuration parameter of table 1.MainNet and RAZNet
The training process of 3.MainNet and RAZ-Net.
We train MainNet first.By step 2 it is found that MainNet is candidate by counter branch, positioning branch and scaling
Three parts of region branch are constituted, and in order to facilitate model convergence, we are according to counter branch, positioning branch and scaling candidate region
The sequence of branch successively training pattern.
(1) for counter branch, the MSE between density map and true density map that we are exported with the branch, which loses, to be made
For optimization object function, shown in the following formula of the calculation of MSE, εdenIt (I) is penalty values on picture I, wherein m, n difference
Indicate input picture height and width, φ (p) andIt is illustrated respectively in p-th of pixel in crowd's count density figure of output
Corresponding prediction and true value on point.
(2) after counter branch convergence, the parameter that counter branch is learnt is as the initiation parameter of positioning branch, positioning
Branch is different from counter branch, intersects entropy loss with predict the Weight between the number of people location drawing and the true number of people location drawing
(BCE) optimization object function, ε are used aslocIt (I) is the BCE penalty values on picture I, wherein m, n respectively indicate the height of input picture
Degree and width, Y (xp) indicate that corresponding true value on p-th of pixel, ψ (p) indicate the predicted value on p-th of pixel, γ
For weighted value, empirically value is 100 for we.
l(xp)=- γ Y (xp)·log(ψ(p))-(1-Y(xp))·log(1-ψ(p))
(3) after counter branch and positioning branch learn, we fix the parameter of the two branches, start to train scaling
Candidate region branch, the branch are lost using MSE as optimization object function.
After MainNet training is completed, we train RAZNet, RAZNet only to remain positioning branch and scaling candidate regions
Domain branch.The training data of RAZNet is different from MainNet, we according to fig. 2, to find crowd in original image close from paying attention to trying hard to
Training sample of several regions of collection as RAZNet.Since the network structure of RAZNet and MainNet are almost the same, we with
Initiation parameter of the parameter learnt in MainNet as RAZNet, successively to positioning branch and scaling candidate region branch
It is finely adjusted.
4. counter branch obtains personage's total quantity in image.
It quadratures to the Crowds Distribute density map that counter branch obtains, can calculate in the image of the branch prediction and occur
Personage's total quantity.
5. positioning branch obtains the personage occurred in human head location coordinate and figure sum.
What positioning branch obtained is number of people location map, it would be desirable to take out local peaking's point in the figure, and pass through
After non-maxima suppression (non maxima suppression, NMS) operation, final number of people coordinate could be obtained.
1) we cross the average pond that a kernel size is 3x3 first on obtaining number of people location map, are used to
Possible peak point in prominent regional area;
2) the maximum value pond for being again 3x3 by a kernel size on the basis of first step, by maximum value pond
After change compared with distribution map before carries out pixel scale, the identical position of former and later two distribution maps is the part needed
Peak point;
3) peak point that response is greater than a certain threshold value in the distribution map for taking second step to obtain is the finally obtained number of people
Position coordinate;
4) personage occurred in image sum can be obtained by count to obtained human head location coordinate.
6. according to the fusion weight of scene learning position branch and counter branch.
After model training, according to step 5 and step 6, our available positioning branch and countings on training set
Crowd's count value of branch, the corresponding true crowd's count value of image having had according to us, we may learn positioning point
Branch the count value of counter branch between fusion weight (the fusion weight is indicated in Fig. 1 with α), the weight make predicted value with
True value is more nearly.Such as crowd's count value obtained in counter branch and positioning branch obtained in crowd's count value phase
When difference is greater than 150, the numerical value that counter branch obtains is more accurate, we select to believe the result that counter branch obtains.
The result fusion that the positioning branch of 7.MainNet and RAZNet obtains.
It is that RAZNet is obtained the result is that the accurate positioning in a certain piece of region is as a result, theory in original image according to the design of network
On it is more accurate than positioning the obtained result of branch in MainNet, we are replaced with RAZNet in some region of testing result
The task that the number of people is accurately positioned part can be completed in the testing result for falling the region in MainNet.
8. obtaining adaptive fused weights according to scene, the technical result based on density map and based on detection is merged
To promote the accuracy of number of people counting load.
Weight is merged with counter branch in the positioning branch that must be learnt according to step 6, we are to test set Shang Liang branch
Obtained result is merged, and final crowd's count value can be obtained.
The present invention counts common three data sets ShanghaiTech_A, ShanghaiTech_B and UCF_ in crowd
Performance on QNRF is as shown in table 2.In evaluation index mean absolute error (Mean Average Error, MAE) and mean square error
Performance on poor (Mean Squared Error, MSE) is superior to forefathers' method."-" indicates that this method is unreported herein in table
Performance on data set.
The Contrast on effect of table 2. present invention and other methods
With the present invention do compare have MCNN (Y.Zhang, D.Zhou, S.Chen, S.Gao, and Y.Ma.Single-
image crowd counting via multi-column convolutional neural network.In CVPR,
2016.3,6,7), Switch-CNN (D.B.Sam, S.Surya, and R.V.Babu.Switching convolutional
Neural network for crowd counting.In CVPR, 2017.3,7), CP-CNN (V.A.Sindagi and
V.M.Patel.Generating high-quality crowd density maps using contextual pyramid
Cnns.In ICCV, 2017.3,7), CSRNet (Y.Li, X.Zhang, and D.Chen.Csrnet:Dilated
convolutional neural networks for understanding the highly congested
scenes.In CVPR,2018.3,7)
It is counted another embodiment of the present invention provides a kind of dense population based on attention mechanism circulation scaling and accurate fixed
Position system comprising:
Master network module, it includes the deep neural networks of three branches, for obtaining the corresponding crowd of input picture respectively
Count density figure, crowd's location map and scaling candidate region pay attention to trying hard to;It is obtained by crowd's count density figure
Initial crowd's count value in image obtains the position coordinates of each personage in image by crowd's location map, leads to
Cross several regions that the scaling candidate region notices trying hard to obtain that the crowd is dense in image;By several areas that the crowd is dense
Domain is cut out from image to be come, and improves its resolution ratio;
Circulation scaling network module is responsible for obtaining with several regions that the crowd is dense described in improving after resolution ratio for input
Crowd's location map is updated to accurate personage's positioning result, and with it;
Counting module is merged, is responsible for using the crowd's count value obtained according to crowd's count density figure and according to crowd position
Crowd's count value that distribution map obtains obtains accurate crowd's count value by weighting.
In the present invention, the basic network of MainNet can be replaced with into stronger VGG19 Resnet system by VGG16
Column model, stronger basic network model can bring better effect.
In the present invention, when RAZNet is trained, resolution ratio can be enlarged into original twice in the range of video memory allows
Or more high magnification numbe.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the principle and scope of the present invention, originally
The protection scope of invention should be subject to described in claims.
Claims (10)
1. a kind of dense population based on attention mechanism circulation scaling counts and accurate positioning method, which is characterized in that including
Following steps:
1) deep neural network of three branches is established, obtains the corresponding crowd's count density figure of input picture, crowd position respectively
Distribution map and scaling candidate region pay attention to trying hard to;
2) crowd's count value initial in image is obtained by crowd's count density figure, passes through crowd's location map
The position coordinates of each personage in image are obtained, pay attention to trying hard to obtain in image what the crowd is dense by the scaling candidate region
Several regions;
3) several regions that the crowd is dense are cut out to come from image, obtain accurate positioning knot by improving resolution ratio
Fruit, and crowd's location map is updated with it;
4) it utilizes the crowd's count value obtained according to crowd's count density figure and is obtained according to crowd's location map
Crowd's count value, by weighting obtain accurate crowd's count value.
2. the method according to claim 1, wherein three branch deep neural network constitute master network,
The master network includes positioning branch, counter branch and scaling candidate region branch;The positioning branch is by empty convolutional layer
It is constituted with 3 warp laminations, finally exports one layer of crowd's location map identical with original image resolution sizes;The meter
Number branch is only made of empty convolutional layer, which exports crowd's count density figure of 1/8 size of original image resolution ratio;It will determine
The characteristic pattern of position branch and counter branch output, which is done, to be spliced, and as the input of the scaling candidate region branch, the scaling is waited
Convolutional layer is rolled up by 3 cavities by favored area branch, exports scaling candidate region identical with input image resolution size and pays attention to
Try hard to.
3. method according to claim 1 or 2, which is characterized in that the raising resolution ratio, is that resolution ratio is enlarged into original
Twice come.
4. according to the method described in claim 2, it is characterized in that, described obtain accurate positioning knot by improving its resolution ratio
Fruit is that several regions that the crowd is dense after raising resolution ratio are sent into circulation scaling network to obtain accurate personage's positioning
As a result;The circulation scaling network does not contain counter branch, and rest part is consistent with the master network.
5. according to the method described in claim 4, it is characterized in that, circulation scaling network itself can obtain scaling candidate
Region pays attention to trying hard to, and pays attention to trying hard to decide whether share zone again and continue through the circulation contract according to scaling candidate region
Network is put, until can not find the new region that the crowd is dense during scaling candidate region pays attention to trying hard to.
6. according to the method described in claim 4, it is characterized in that, according to counter branch, positioning branch, scaling candidate region point
The sequence of branch is successively trained three branches of the master network;Using the parameter of the master network of training completion as institute
The initiation parameter for stating circulation scaling network is finely adjusted circulation scaling network.
7. according to the method described in claim 6, it is characterized in that, being counted for counter branch with the crowd of branch output
MSE loss between density map and true crowd's count density figure is used as optimization object function, to the model parameter of the branch
Carry out gradient updating;After counter branch convergence, the parameter that counter branch is learnt is fixed as the initiation parameter of positioning branch
Position branch to predict that the BSE of the Weight between the number of people location drawing and the true number of people location drawing loses as optimization object function,
Gradient updating is carried out to the model parameter of the branch;After counter branch and the study of positioning branch, the two branches are fixed
Parameter starts training scaling candidate region branch, which is lost using MSE as optimization object function.
8. the method according to claim 1, wherein it is described by weighting obtain accurate crowd's count value,
Weight obtains in the following ways:
A) crowd's count value is obtained according to crowd's count density figure, crowd's location map respectively on training set;
B) according to the corresponding true crowd's count value of image that has had, learn the two crowd's count values obtained to step a) it
Between fusion weight.
9. the method according to claim 1, wherein obtaining the side of crowd's count value according to crowd's location map
Method is:
A) non-maxima suppression is done to crowd's location map, response is then taken to be greater than all location point conducts of a certain threshold value
Peak point;
B) take the position of the peak point in crowd's location map as human head location coordinate;
C) by carrying out counting to get the personage occurred in image sum to human head location coordinate.
10. a kind of dense population based on attention mechanism circulation scaling counts and Precise Position System, which is characterized in that packet
It includes:
Master network module, it includes the deep neural networks of three branches, count for obtaining the corresponding crowd of input picture respectively
Density map, crowd's location map and scaling candidate region pay attention to trying hard to;Image is obtained by crowd's count density figure
In initial crowd's count value, the position coordinates of each personage in image are obtained by crowd's location map, pass through institute
State several regions that scaling candidate region notices trying hard to obtain that the crowd is dense in image;By several regions that the crowd is dense from
It is cut out and in image, and improve its resolution ratio;
Circulation scaling network module is responsible for obtaining essence with several regions that the crowd is dense described in improving after resolution ratio for input
True personage's positioning result, and crowd's location map is updated with it;
Counting module is merged, is responsible for using the crowd's count value obtained according to crowd's count density figure and according to crowd's position distribution
Crowd's count value that figure obtains obtains accurate crowd's count value by weighting.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910007050 | 2019-01-04 | ||
CN2019100070505 | 2019-01-04 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110188597A true CN110188597A (en) | 2019-08-30 |
CN110188597B CN110188597B (en) | 2021-06-15 |
Family
ID=67714173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910293903.6A Active CN110188597B (en) | 2019-01-04 | 2019-04-12 | Crowd counting and positioning method and system based on attention mechanism cyclic scaling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110188597B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852267A (en) * | 2019-11-11 | 2020-02-28 | 复旦大学 | Crowd density estimation method and device based on optical flow fusion type deep neural network |
CN111428653A (en) * | 2020-03-27 | 2020-07-17 | 湘潭大学 | Pedestrian congestion state determination method, device, server and storage medium |
CN111445442A (en) * | 2020-03-05 | 2020-07-24 | 中国平安人寿保险股份有限公司 | Crowd counting method and device based on neural network, server and storage medium |
CN111626184A (en) * | 2020-05-25 | 2020-09-04 | 齐鲁工业大学 | Crowd density estimation method and system |
CN111680648A (en) * | 2020-06-12 | 2020-09-18 | 成都数之联科技有限公司 | Training method of target density estimation neural network |
CN111950458A (en) * | 2020-08-12 | 2020-11-17 | 每步科技(上海)有限公司 | Natatorium monitoring system and method and intelligent robot |
CN112084959A (en) * | 2020-09-11 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Crowd image processing method and device |
CN112183627A (en) * | 2020-09-28 | 2021-01-05 | 中星技术股份有限公司 | Method for generating predicted density map network and vehicle annual inspection mark number detection method |
CN112598725A (en) * | 2019-09-17 | 2021-04-02 | 佳能株式会社 | Image processing apparatus, image processing method, and computer readable medium |
CN113205280A (en) * | 2021-05-28 | 2021-08-03 | 广西大学 | Electric vehicle charging station site selection method for Liqun guided attention inference network |
CN114120361A (en) * | 2021-11-19 | 2022-03-01 | 西南交通大学 | Crowd counting and positioning method based on coding and decoding structure |
CN114241411A (en) * | 2021-12-15 | 2022-03-25 | 平安科技(深圳)有限公司 | Counting model processing method and device based on target detection and computer equipment |
CN114494999A (en) * | 2022-01-18 | 2022-05-13 | 西南交通大学 | Double-branch combined target intensive prediction method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060195199A1 (en) * | 2003-10-21 | 2006-08-31 | Masahiro Iwasaki | Monitoring device |
JP3910626B2 (en) * | 2003-10-21 | 2007-04-25 | 松下電器産業株式会社 | Monitoring device |
CN102013022A (en) * | 2010-11-23 | 2011-04-13 | 北京大学 | Selective feature background subtraction method aiming at thick crowd monitoring scene |
CN105122270A (en) * | 2012-11-21 | 2015-12-02 | 派尔高公司 | Method and system for counting people using depth sensor |
CN108764085A (en) * | 2018-05-17 | 2018-11-06 | 上海交通大学 | Based on the people counting method for generating confrontation network |
CN108805619A (en) * | 2018-06-07 | 2018-11-13 | 肇庆高新区徒瓦科技有限公司 | A kind of stream of people's statistical system for billboard |
CN109101930A (en) * | 2018-08-18 | 2018-12-28 | 华中科技大学 | A kind of people counting method and system |
-
2019
- 2019-04-12 CN CN201910293903.6A patent/CN110188597B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060195199A1 (en) * | 2003-10-21 | 2006-08-31 | Masahiro Iwasaki | Monitoring device |
JP3910626B2 (en) * | 2003-10-21 | 2007-04-25 | 松下電器産業株式会社 | Monitoring device |
CN102013022A (en) * | 2010-11-23 | 2011-04-13 | 北京大学 | Selective feature background subtraction method aiming at thick crowd monitoring scene |
CN105122270A (en) * | 2012-11-21 | 2015-12-02 | 派尔高公司 | Method and system for counting people using depth sensor |
CN108764085A (en) * | 2018-05-17 | 2018-11-06 | 上海交通大学 | Based on the people counting method for generating confrontation network |
CN108805619A (en) * | 2018-06-07 | 2018-11-13 | 肇庆高新区徒瓦科技有限公司 | A kind of stream of people's statistical system for billboard |
CN109101930A (en) * | 2018-08-18 | 2018-12-28 | 华中科技大学 | A kind of people counting method and system |
Non-Patent Citations (4)
Title |
---|
JIANG LIU 等: "DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
YINGYING ZHANG 等: "Single-Image Crowd Counting via Multi-Column Convolutional Neural Network", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
YUHONG LI 等: "CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
郭文生 等: "基于自适应叠合分割与深度神经网络的人数统计方法", 《计算机科学》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112598725A (en) * | 2019-09-17 | 2021-04-02 | 佳能株式会社 | Image processing apparatus, image processing method, and computer readable medium |
CN110852267A (en) * | 2019-11-11 | 2020-02-28 | 复旦大学 | Crowd density estimation method and device based on optical flow fusion type deep neural network |
CN110852267B (en) * | 2019-11-11 | 2022-06-14 | 复旦大学 | Crowd density estimation method and device based on optical flow fusion type deep neural network |
CN111445442A (en) * | 2020-03-05 | 2020-07-24 | 中国平安人寿保险股份有限公司 | Crowd counting method and device based on neural network, server and storage medium |
CN111445442B (en) * | 2020-03-05 | 2024-04-30 | 中国平安人寿保险股份有限公司 | Crowd counting method and device based on neural network, server and storage medium |
CN111428653A (en) * | 2020-03-27 | 2020-07-17 | 湘潭大学 | Pedestrian congestion state determination method, device, server and storage medium |
CN111428653B (en) * | 2020-03-27 | 2024-02-02 | 湘潭大学 | Pedestrian congestion state judging method, device, server and storage medium |
CN111626184A (en) * | 2020-05-25 | 2020-09-04 | 齐鲁工业大学 | Crowd density estimation method and system |
CN111680648A (en) * | 2020-06-12 | 2020-09-18 | 成都数之联科技有限公司 | Training method of target density estimation neural network |
CN111950458A (en) * | 2020-08-12 | 2020-11-17 | 每步科技(上海)有限公司 | Natatorium monitoring system and method and intelligent robot |
CN112084959A (en) * | 2020-09-11 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Crowd image processing method and device |
CN112084959B (en) * | 2020-09-11 | 2024-04-16 | 腾讯科技(深圳)有限公司 | Crowd image processing method and device |
CN112183627A (en) * | 2020-09-28 | 2021-01-05 | 中星技术股份有限公司 | Method for generating predicted density map network and vehicle annual inspection mark number detection method |
CN113205280A (en) * | 2021-05-28 | 2021-08-03 | 广西大学 | Electric vehicle charging station site selection method for Liqun guided attention inference network |
CN113205280B (en) * | 2021-05-28 | 2023-06-23 | 广西大学 | Electric vehicle charging station address selection method of plum cluster guiding attention reasoning network |
CN114120361A (en) * | 2021-11-19 | 2022-03-01 | 西南交通大学 | Crowd counting and positioning method based on coding and decoding structure |
CN114241411A (en) * | 2021-12-15 | 2022-03-25 | 平安科技(深圳)有限公司 | Counting model processing method and device based on target detection and computer equipment |
CN114241411B (en) * | 2021-12-15 | 2024-04-09 | 平安科技(深圳)有限公司 | Counting model processing method and device based on target detection and computer equipment |
CN114494999A (en) * | 2022-01-18 | 2022-05-13 | 西南交通大学 | Double-branch combined target intensive prediction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110188597B (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188597A (en) | A kind of dense population counting and accurate positioning method and system based on attention mechanism circulation scaling | |
CN107230351B (en) | A kind of Short-time Traffic Flow Forecasting Methods based on deep learning | |
CN105354548B (en) | A kind of monitor video pedestrian recognition methods again based on ImageNet retrievals | |
CN110517487B (en) | Urban area traffic resource regulation and control method and system based on thermodynamic diagram change identification | |
CN102324016B (en) | Statistical method for high-density crowd flow | |
CN109753946A (en) | A kind of real scene pedestrian's small target deteection network and detection method based on the supervision of body key point | |
CN113240688A (en) | Integrated flood disaster accurate monitoring and early warning method | |
CN109241871A (en) | A kind of public domain stream of people's tracking based on video data | |
CN109376576A (en) | The object detection method for training network from zero based on the intensive connection of alternately update | |
CN108399745A (en) | A kind of city road network trend prediction method at times based on unmanned plane | |
CN105243356B (en) | A kind of method and device that establishing pedestrian detection model and pedestrian detection method | |
CN109190507A (en) | A kind of passenger flow crowding calculation method and device based on rail transit train | |
CN109635748A (en) | The extracting method of roadway characteristic in high resolution image | |
CN109002752A (en) | A kind of complicated common scene rapid pedestrian detection method based on deep learning | |
Rong et al. | Pest identification and counting of yellow plate in field based on improved mask r-cnn | |
CN109635720A (en) | The illegal road occupying real-time detection method actively monitored based on video | |
CN110633678A (en) | Rapid and efficient traffic flow calculation method based on video images | |
Zheng et al. | A review of remote sensing image object detection algorithms based on deep learning | |
CN105224911A (en) | A kind of various visual angles pedestrian detection method and system in real time | |
CN110163060A (en) | The determination method and electronic equipment of crowd density in image | |
Zhang et al. | cst-ml: Continuous spatial-temporal meta-learning for traffic dynamics prediction | |
CN104077571B (en) | A kind of crowd's anomaly detection method that model is serialized using single class | |
Zhou et al. | Method for judging parking status based on yolov2 target detection algorithm | |
CN109977796A (en) | Trail current detection method and device | |
CN109711313A (en) | It is a kind of to identify the real-time video monitoring algorithm that sewage is toppled over into river |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |