CN110188597A - A kind of dense population counting and accurate positioning method and system based on attention mechanism circulation scaling - Google Patents

A kind of dense population counting and accurate positioning method and system based on attention mechanism circulation scaling Download PDF

Info

Publication number
CN110188597A
CN110188597A CN201910293903.6A CN201910293903A CN110188597A CN 110188597 A CN110188597 A CN 110188597A CN 201910293903 A CN201910293903 A CN 201910293903A CN 110188597 A CN110188597 A CN 110188597A
Authority
CN
China
Prior art keywords
crowd
branch
scaling
image
count
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910293903.6A
Other languages
Chinese (zh)
Other versions
CN110188597B (en
Inventor
陈刚
刘臣臣
王成成
黄波
韩峻
糜俊青
翁昕钰
穆亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mid Star Technology Ltd By Share Ltd
Peking University
Original Assignee
Mid Star Technology Ltd By Share Ltd
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mid Star Technology Ltd By Share Ltd, Peking University filed Critical Mid Star Technology Ltd By Share Ltd
Publication of CN110188597A publication Critical patent/CN110188597A/en
Application granted granted Critical
Publication of CN110188597B publication Critical patent/CN110188597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Abstract

The present invention relates to a kind of dense populations based on attention mechanism circulation scaling to count and accurate positioning method and system.Obtain that the method for crowd's quantity survey is different, and the present invention obtains the corresponding crowd's count density figure of input picture, crowd's location map by the deep neural network of well-designed three branch respectively and tries hard to for obtaining intensive candidate's attention from original people counting method based on density map and by face or pedestrian detection.Crowd's count value initial in image is obtained by crowd's count density figure;The position coordinates of each personage in image are obtained by crowd's location map;Several regions that the crowd is dense in image are obtained by close quarters candidate's figure, these regions are cut out from original image and comes and resolution ratio is enlarged into original twice, subsequent network is sent into and obtains more accurate personage's positioning result.

Description

A kind of dense population counting and accurate positioning side based on attention mechanism circulation scaling Method and system
Technical field
The present invention relates to dense populations in a kind of image to count and the pinpoint method of crowd more particularly to a kind of use The attention mechanism circulation scaling pinpoint method and system of acquisition crowd, belongs to computer vision field.
Background technique
With the Urbanization Progress of society, urban population quantity is steeply risen, and video monitoring camera is densely installed In many peri-urbans, more and more use in our routine works and life.The most important application of these video datas One of field is exactly intelligent video monitoring.Have the China of 1,300,000,000 populations, a series of problems of the big initiation of the size of population is always Threaten public security.Equally in the world elsewhere, as the overstocked generation of crowd is uncontrollable when holding large-scale activity Event.So effectively using safety monitoring data rational allocation law enforcement officer and construction additional transport facility to crowd into Row guidance, which shunts, has great significance for the protection of maintenance and the personal safety of public order.However traditional video surveillance needs Direct surveillance's processing reports the developments, very consumption manpower and material resources.The video analysis of automation and processing can not only liberate labour Power mining data, study can also arrive useful knowledge and rule from the video information of magnanimity.Crowd counts as video point Crowd pedestrian is analyzed in a field in analysis, and emergency monitoring, many aspects such as traffic programme suffer from important meaning Justice.
Existing crowd's counting technology be broadly divided into based on density map carry out integral estimation and face or pedestrian detection into Row Population size estimation two major classes.With the development of depth learning technology, many researchers learn to obtain using deep neural network The density map of crowd obtains crowd's quantity in picture by integrating to density map, this method has been achieved for good accurate Degree, the major defect of this method are although crowd's quantity is suitable in density map integrated value and picture that study obtains, but to learn The density map distribution and true density map distributional difference that acquistion is arrived are larger, are unfavorable for further population analysis.
The development of deep learning also makes traditional object detection task make significant headway, so there is researcher logical It crosses and the face or pedestrian that occur in image is detected to estimate crowd's quantity.Although this method can accurately provide people Position, the defect based on density drawing method prediction distribution inaccuracy avoided, but there is also very big problem is existing The poor effect of face or pedestrian detector under super-intensive scene, and crowd's estimation is all often super-intensive scene, is difficult to see Clear face or the body of people, so this method is difficult to have obtained effect in such a scenario.
Summary of the invention
For dense population count in based on density drawing method forecasting inaccuracy really and based on the method for detection for intensive The bad problem of scene effect, the purpose of the present invention is to provide a kind of based on the dense population of attention mechanism circulation scaling Several and pinpoint solution and system.The method that the present invention uses deep learning proposes a kind of based on attention The circulation of mechanism scales network, which converts crowd's initial estimation for crowd's quantity survey problem in original intensive picture And crowd is accurately positioned two problems.
Crowd's quantity is obtained with original people counting method based on density map and by face or pedestrian detection The method of estimation is different, and it is corresponding that the present invention by the deep neural network of well-designed three branch obtains input picture respectively Crowd's count density figure, crowd's location map and scaling candidate region pay attention to trying hard to.It is obtained by crowd's count density figure Initial crowd's count value in image;The position coordinates of each personage in image are obtained by crowd's location map;Pass through contracting Several regions that candidate region notices trying hard to obtain that the crowd is dense in image are put, these regions are cut out from original image and comes and incites somebody to action Resolution ratio is enlarged into original twice, is sent into subsequent circulation scaling network and obtains more accurate personage's positioning result.From people Crowd's count value can be obtained in group's count density figure and crowd's location map, the invention also provides a kind of combination scenes certainly Weight is adapted to, two obtained crowd's count values are weighted to obtain more accurate crowd's quantity survey with the weight.
A kind of dense population based on attention mechanism circulation scaling of the invention counts and accurate positioning method, including with Lower step:
1) deep neural network of three branches is established, obtains the corresponding crowd's count density figure of input picture, crowd respectively Location map and scaling candidate region pay attention to trying hard to;
2) crowd's count value initial in image is obtained by crowd's count density figure, passes through the crowd position point Butut obtains the position coordinates of each personage in image, notices trying hard to obtain that crowd is close in image by the scaling candidate region Several regions of collection;
3) several regions that the crowd is dense are cut out to come from image, are accurately determined by improving resolution ratio and obtaining Position is as a result, and update crowd's location map with it;
4) the crowd's count value obtained according to crowd's count density figure and the people obtained according to crowd's location map are utilized Group's count value obtains accurate crowd's count value by weighting.
The above method is further illustrated below.The detailed process signal of this method is as shown in Figure 1, comprising the following steps:
Step1: network structure building and parameter initialization.As shown in Figure 1, including two in method proposed by the present invention Major networks: master network (MainNet) and circulation scaling network (Recurrent Attention Zooming Net, abbreviation RAZNet), MainNet include positioning branch (Localization Branch), counter branch (Counting Branch) with And scaling candidate region branch (Zooming Region Proposal Branch).
MainNet positions branch by empty convolutional layer (dilated using first 13 layers of VGG-16 network as basic network Convolutional layers) and 3 warp laminations compositions (deconvolutional layers), which finally exports One layer of characteristic pattern identical with original image resolution sizes;Counter branch is only made of empty convolutional layer, and branch output is former The characteristic pattern of 1/8 size of beginning photo resolution;The characteristic pattern of counter branch output after positioning branch and up-sampling is spelled It connects, the input as scaling candidate region branch (Zooming region proposal branch).
RAZNet has lacked counter branch compared with MainNet, and rest part is consistent with MainNet.We are by VGG- Initiation parameter of 16 parameters that training obtains on ImageNet data set as MainNet basic network, RAZNet is to instruct Practice the MainNet parameter completed as initiation parameter.
Step2: the training of model.For the ease of model convergence, we are candidate according to counter branch, positioning branch, scaling The sequence of region branch is successively trained three branches.After the completion of MainNet training, using MainNet as RAZNet's Initiation parameter is finely adjusted RAZNet.
Step3: the selection of weight is merged.After the completion of model training, we can respectively obtain positioning point on training set Crowd's count value that branch and counter branch obtain, the corresponding true crowd's count value of image having been had according to us, Wo Menke To learn the fusion weight arrived between positioning branch and the count value of counter branch, which makes predicted value and true value more adjunction Closely.
Step4: the reasoning of network.After the completion of model training, to each test picture, the people obtained from MainNet Group's density map, crowd's location map and scaling candidate region pay attention to trying hard to, and try hard to obtain several close quarters according to attention, These regional shears are gone out from original image, and length and width are enlarged into original twice, these pictures are obtained newly by RAZNet Crowd's location map and scaling candidate region pay attention to trying hard to.It is new intensive when can not find during scaling candidate region pays attention to trying hard to When region, entire reasoning terminates.
Step5: the acquisition of final crowd's count value and number of people position coordinates.We take the peak in crowd's location map The position of value point is as the number of people coordinate finally predicted.In order to obtain peak point, we first do non-pole to crowd's location map Big value inhibits (Nonmaxima Suppresssion, NMS), and response is then taken to be greater than all location point conducts of a certain threshold value The anchor point of the number of people.The fusion weight obtained according to Step3, we calculate counter branch and the positioning fused crowd of branch Count results, as final crowd's count value.
As shown in Figure 1, this method contains two basic network modules of MainNet and RAZNet, in MainNet there are three Branch, there are two branches in RAZNet, and the title and function of network module and branch are respectively:
1. master network (MainNet): crowd's initial count is done to the initial picture of input and coarse crowd positions, it should The scaling candidate region that network obtains pays attention to trying hard to for instructing the shearing of subsequent close quarters to amplify.
2. circulation scaling network (RAZNet): doing crowd's positioning to the close quarters selected in MainNet, obtain partial zones The more accurate positioning result in domain.The network itself can obtain scaling candidate region and pay attention to trying hard to, according to scaling candidate regions Domain pays attention to trying hard to decide whether that share zone continues through RAZNet again.
3. positioning branch (Localization Branch): obtaining feature from basic network, pass through 6 empty convolution Layer and intermediate 3 interspersed warp laminations, export crowd's location map identical with network inputs image resolution ratio size.
4. counter branch (Counting Branch): feature is obtained from basic network, it is defeated by 6 empty convolutional layers Length and width are respectively the crowd density figure of 1/8 size of network inputs image out.
5. scaling candidate region branch (Zooming Region Proposal Branch): from positioning branch and counting point It obtains feature in branch, and they is stitched together the input as the branch, by 3 empty convolutional layers, output and network The identical scaling candidate region of input image resolution size pays attention to trying hard to.
Accordingly with above method, the present invention also provides a kind of dense population countings based on attention mechanism circulation scaling With Precise Position System comprising:
Master network module, it includes the deep neural networks of three branches, for obtaining the corresponding crowd of input picture respectively Count density figure, crowd's location map and scaling candidate region pay attention to trying hard to;It is obtained by crowd's count density figure Initial crowd's count value in image obtains the position coordinates of each personage in image by crowd's location map, leads to Cross several regions that the scaling candidate region notices trying hard to obtain that the crowd is dense in image;By several areas that the crowd is dense Domain is cut out from image to be come, and improves its resolution ratio;
Circulation scaling network module is responsible for obtaining with several regions that the crowd is dense described in improving after resolution ratio for input Crowd's location map is updated to accurate personage's positioning result, and with it;
Counting module is merged, is responsible for using the crowd's count value obtained according to crowd's count density figure and according to crowd position Crowd's count value that distribution map obtains obtains accurate crowd's count value by weighting.
It is described in the invention that circulation scaling is carried out based on attention mechanism compared with current existing crowd's counting technology Dense population counting had the advantage that with accurate positioning method
1. the position that technology described in the invention can accurately provide personage in picture.
2. the region that can find out automatically in image that the crowd is dense by attention mechanism, passes through the resolution for improving close quarters Rate obtains accurate positioning result.
3. the result positioned to crowd's counting and crowd merges by scene adaptive weight, crowd's meter is improved Several accuracys.
Detailed description of the invention
Fig. 1 is schematic network structure;
Fig. 2 is that attention generates the crowd is dense candidate region schematic diagram.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific embodiment and Attached drawing is described in further details the present invention.
1. target data generates
When model training, it would be desirable to image true crowd's count density figure corresponding with it, true crowd's position distribution Figure (number of people location drawing) and true scaling candidate region pay attention to trying hard to as training data.
(1) true crowd's count density figure generates: the work that crowd counts before we refer to, according in labeled data The number of people coordinate given generates corresponding crowd density figure.Crowd density figure is generated according to following formula, for each mark The number of people, we introduce a Gaussian convolution, for each of true crowd's count density figure pixel coordinate point x, it close Angle valueCalculation as shown by the following formula, wherein N is total number of people number in image, and N number of number of people coordinate points indicate For x1..., xn,For distance xiThe average distance of 4 nearest numbers of people, ZiThe normalizing of Gaussian convolution is corresponded to for each number of people Change parameter, β is the zoom factor of distance, and empirically value is 0.1 for we.
(2) the true number of people location drawing generates: everyone leader note point four neighborhoods corresponding with it are set to 1 by us, are obtained To the final number of people location drawing.
(3) really scaling candidate region pays attention to trying hard to generate: we find apart from the point each of figure pixel Three nearest number of people positions, calculate the average value of their three distances, do a Gaussian transformation to the value and obtain pixel pair The response answered pays attention to the density degree for trying hard to be able to reflect out different zones crowd.
2. network structure constructs.
The present invention is a kind of using deep learning progress crowd's counting and the pinpoint method of crowd, depth nerve net The structure design of network is as shown in Figure 1.Network includes two major networks of MainNet and RAZNet, and wherein MainNet is by VGG-16 First 13 layers as basic network, be followed by positioning branch, three parts of counter branch and scaling candidate region branch form, RAZNet is made of positioning branch and scaling two parts of candidate region branch.The detailed configuration of MainNet and RAZNet is joined Number sees below table 1.
The configuration parameter of table 1.MainNet and RAZNet
The training process of 3.MainNet and RAZ-Net.
We train MainNet first.By step 2 it is found that MainNet is candidate by counter branch, positioning branch and scaling Three parts of region branch are constituted, and in order to facilitate model convergence, we are according to counter branch, positioning branch and scaling candidate region The sequence of branch successively training pattern.
(1) for counter branch, the MSE between density map and true density map that we are exported with the branch, which loses, to be made For optimization object function, shown in the following formula of the calculation of MSE, εdenIt (I) is penalty values on picture I, wherein m, n difference Indicate input picture height and width, φ (p) andIt is illustrated respectively in p-th of pixel in crowd's count density figure of output Corresponding prediction and true value on point.
(2) after counter branch convergence, the parameter that counter branch is learnt is as the initiation parameter of positioning branch, positioning Branch is different from counter branch, intersects entropy loss with predict the Weight between the number of people location drawing and the true number of people location drawing (BCE) optimization object function, ε are used aslocIt (I) is the BCE penalty values on picture I, wherein m, n respectively indicate the height of input picture Degree and width, Y (xp) indicate that corresponding true value on p-th of pixel, ψ (p) indicate the predicted value on p-th of pixel, γ For weighted value, empirically value is 100 for we.
l(xp)=- γ Y (xp)·log(ψ(p))-(1-Y(xp))·log(1-ψ(p))
(3) after counter branch and positioning branch learn, we fix the parameter of the two branches, start to train scaling Candidate region branch, the branch are lost using MSE as optimization object function.
After MainNet training is completed, we train RAZNet, RAZNet only to remain positioning branch and scaling candidate regions Domain branch.The training data of RAZNet is different from MainNet, we according to fig. 2, to find crowd in original image close from paying attention to trying hard to Training sample of several regions of collection as RAZNet.Since the network structure of RAZNet and MainNet are almost the same, we with Initiation parameter of the parameter learnt in MainNet as RAZNet, successively to positioning branch and scaling candidate region branch It is finely adjusted.
4. counter branch obtains personage's total quantity in image.
It quadratures to the Crowds Distribute density map that counter branch obtains, can calculate in the image of the branch prediction and occur Personage's total quantity.
5. positioning branch obtains the personage occurred in human head location coordinate and figure sum.
What positioning branch obtained is number of people location map, it would be desirable to take out local peaking's point in the figure, and pass through After non-maxima suppression (non maxima suppression, NMS) operation, final number of people coordinate could be obtained.
1) we cross the average pond that a kernel size is 3x3 first on obtaining number of people location map, are used to Possible peak point in prominent regional area;
2) the maximum value pond for being again 3x3 by a kernel size on the basis of first step, by maximum value pond After change compared with distribution map before carries out pixel scale, the identical position of former and later two distribution maps is the part needed Peak point;
3) peak point that response is greater than a certain threshold value in the distribution map for taking second step to obtain is the finally obtained number of people Position coordinate;
4) personage occurred in image sum can be obtained by count to obtained human head location coordinate.
6. according to the fusion weight of scene learning position branch and counter branch.
After model training, according to step 5 and step 6, our available positioning branch and countings on training set Crowd's count value of branch, the corresponding true crowd's count value of image having had according to us, we may learn positioning point Branch the count value of counter branch between fusion weight (the fusion weight is indicated in Fig. 1 with α), the weight make predicted value with True value is more nearly.Such as crowd's count value obtained in counter branch and positioning branch obtained in crowd's count value phase When difference is greater than 150, the numerical value that counter branch obtains is more accurate, we select to believe the result that counter branch obtains.
The result fusion that the positioning branch of 7.MainNet and RAZNet obtains.
It is that RAZNet is obtained the result is that the accurate positioning in a certain piece of region is as a result, theory in original image according to the design of network On it is more accurate than positioning the obtained result of branch in MainNet, we are replaced with RAZNet in some region of testing result The task that the number of people is accurately positioned part can be completed in the testing result for falling the region in MainNet.
8. obtaining adaptive fused weights according to scene, the technical result based on density map and based on detection is merged To promote the accuracy of number of people counting load.
Weight is merged with counter branch in the positioning branch that must be learnt according to step 6, we are to test set Shang Liang branch Obtained result is merged, and final crowd's count value can be obtained.
The present invention counts common three data sets ShanghaiTech_A, ShanghaiTech_B and UCF_ in crowd Performance on QNRF is as shown in table 2.In evaluation index mean absolute error (Mean Average Error, MAE) and mean square error Performance on poor (Mean Squared Error, MSE) is superior to forefathers' method."-" indicates that this method is unreported herein in table Performance on data set.
The Contrast on effect of table 2. present invention and other methods
With the present invention do compare have MCNN (Y.Zhang, D.Zhou, S.Chen, S.Gao, and Y.Ma.Single- image crowd counting via multi-column convolutional neural network.In CVPR, 2016.3,6,7), Switch-CNN (D.B.Sam, S.Surya, and R.V.Babu.Switching convolutional Neural network for crowd counting.In CVPR, 2017.3,7), CP-CNN (V.A.Sindagi and V.M.Patel.Generating high-quality crowd density maps using contextual pyramid Cnns.In ICCV, 2017.3,7), CSRNet (Y.Li, X.Zhang, and D.Chen.Csrnet:Dilated convolutional neural networks for understanding the highly congested scenes.In CVPR,2018.3,7)
It is counted another embodiment of the present invention provides a kind of dense population based on attention mechanism circulation scaling and accurate fixed Position system comprising:
Master network module, it includes the deep neural networks of three branches, for obtaining the corresponding crowd of input picture respectively Count density figure, crowd's location map and scaling candidate region pay attention to trying hard to;It is obtained by crowd's count density figure Initial crowd's count value in image obtains the position coordinates of each personage in image by crowd's location map, leads to Cross several regions that the scaling candidate region notices trying hard to obtain that the crowd is dense in image;By several areas that the crowd is dense Domain is cut out from image to be come, and improves its resolution ratio;
Circulation scaling network module is responsible for obtaining with several regions that the crowd is dense described in improving after resolution ratio for input Crowd's location map is updated to accurate personage's positioning result, and with it;
Counting module is merged, is responsible for using the crowd's count value obtained according to crowd's count density figure and according to crowd position Crowd's count value that distribution map obtains obtains accurate crowd's count value by weighting.
In the present invention, the basic network of MainNet can be replaced with into stronger VGG19 Resnet system by VGG16 Column model, stronger basic network model can bring better effect.
In the present invention, when RAZNet is trained, resolution ratio can be enlarged into original twice in the range of video memory allows Or more high magnification numbe.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the principle and scope of the present invention, originally The protection scope of invention should be subject to described in claims.

Claims (10)

1. a kind of dense population based on attention mechanism circulation scaling counts and accurate positioning method, which is characterized in that including Following steps:
1) deep neural network of three branches is established, obtains the corresponding crowd's count density figure of input picture, crowd position respectively Distribution map and scaling candidate region pay attention to trying hard to;
2) crowd's count value initial in image is obtained by crowd's count density figure, passes through crowd's location map The position coordinates of each personage in image are obtained, pay attention to trying hard to obtain in image what the crowd is dense by the scaling candidate region Several regions;
3) several regions that the crowd is dense are cut out to come from image, obtain accurate positioning knot by improving resolution ratio Fruit, and crowd's location map is updated with it;
4) it utilizes the crowd's count value obtained according to crowd's count density figure and is obtained according to crowd's location map Crowd's count value, by weighting obtain accurate crowd's count value.
2. the method according to claim 1, wherein three branch deep neural network constitute master network, The master network includes positioning branch, counter branch and scaling candidate region branch;The positioning branch is by empty convolutional layer It is constituted with 3 warp laminations, finally exports one layer of crowd's location map identical with original image resolution sizes;The meter Number branch is only made of empty convolutional layer, which exports crowd's count density figure of 1/8 size of original image resolution ratio;It will determine The characteristic pattern of position branch and counter branch output, which is done, to be spliced, and as the input of the scaling candidate region branch, the scaling is waited Convolutional layer is rolled up by 3 cavities by favored area branch, exports scaling candidate region identical with input image resolution size and pays attention to Try hard to.
3. method according to claim 1 or 2, which is characterized in that the raising resolution ratio, is that resolution ratio is enlarged into original Twice come.
4. according to the method described in claim 2, it is characterized in that, described obtain accurate positioning knot by improving its resolution ratio Fruit is that several regions that the crowd is dense after raising resolution ratio are sent into circulation scaling network to obtain accurate personage's positioning As a result;The circulation scaling network does not contain counter branch, and rest part is consistent with the master network.
5. according to the method described in claim 4, it is characterized in that, circulation scaling network itself can obtain scaling candidate Region pays attention to trying hard to, and pays attention to trying hard to decide whether share zone again and continue through the circulation contract according to scaling candidate region Network is put, until can not find the new region that the crowd is dense during scaling candidate region pays attention to trying hard to.
6. according to the method described in claim 4, it is characterized in that, according to counter branch, positioning branch, scaling candidate region point The sequence of branch is successively trained three branches of the master network;Using the parameter of the master network of training completion as institute The initiation parameter for stating circulation scaling network is finely adjusted circulation scaling network.
7. according to the method described in claim 6, it is characterized in that, being counted for counter branch with the crowd of branch output MSE loss between density map and true crowd's count density figure is used as optimization object function, to the model parameter of the branch Carry out gradient updating;After counter branch convergence, the parameter that counter branch is learnt is fixed as the initiation parameter of positioning branch Position branch to predict that the BSE of the Weight between the number of people location drawing and the true number of people location drawing loses as optimization object function, Gradient updating is carried out to the model parameter of the branch;After counter branch and the study of positioning branch, the two branches are fixed Parameter starts training scaling candidate region branch, which is lost using MSE as optimization object function.
8. the method according to claim 1, wherein it is described by weighting obtain accurate crowd's count value, Weight obtains in the following ways:
A) crowd's count value is obtained according to crowd's count density figure, crowd's location map respectively on training set;
B) according to the corresponding true crowd's count value of image that has had, learn the two crowd's count values obtained to step a) it Between fusion weight.
9. the method according to claim 1, wherein obtaining the side of crowd's count value according to crowd's location map Method is:
A) non-maxima suppression is done to crowd's location map, response is then taken to be greater than all location point conducts of a certain threshold value Peak point;
B) take the position of the peak point in crowd's location map as human head location coordinate;
C) by carrying out counting to get the personage occurred in image sum to human head location coordinate.
10. a kind of dense population based on attention mechanism circulation scaling counts and Precise Position System, which is characterized in that packet It includes:
Master network module, it includes the deep neural networks of three branches, count for obtaining the corresponding crowd of input picture respectively Density map, crowd's location map and scaling candidate region pay attention to trying hard to;Image is obtained by crowd's count density figure In initial crowd's count value, the position coordinates of each personage in image are obtained by crowd's location map, pass through institute State several regions that scaling candidate region notices trying hard to obtain that the crowd is dense in image;By several regions that the crowd is dense from It is cut out and in image, and improve its resolution ratio;
Circulation scaling network module is responsible for obtaining essence with several regions that the crowd is dense described in improving after resolution ratio for input True personage's positioning result, and crowd's location map is updated with it;
Counting module is merged, is responsible for using the crowd's count value obtained according to crowd's count density figure and according to crowd's position distribution Crowd's count value that figure obtains obtains accurate crowd's count value by weighting.
CN201910293903.6A 2019-01-04 2019-04-12 Crowd counting and positioning method and system based on attention mechanism cyclic scaling Active CN110188597B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007050 2019-01-04
CN2019100070505 2019-01-04

Publications (2)

Publication Number Publication Date
CN110188597A true CN110188597A (en) 2019-08-30
CN110188597B CN110188597B (en) 2021-06-15

Family

ID=67714173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910293903.6A Active CN110188597B (en) 2019-01-04 2019-04-12 Crowd counting and positioning method and system based on attention mechanism cyclic scaling

Country Status (1)

Country Link
CN (1) CN110188597B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852267A (en) * 2019-11-11 2020-02-28 复旦大学 Crowd density estimation method and device based on optical flow fusion type deep neural network
CN111428653A (en) * 2020-03-27 2020-07-17 湘潭大学 Pedestrian congestion state determination method, device, server and storage medium
CN111445442A (en) * 2020-03-05 2020-07-24 中国平安人寿保险股份有限公司 Crowd counting method and device based on neural network, server and storage medium
CN111626184A (en) * 2020-05-25 2020-09-04 齐鲁工业大学 Crowd density estimation method and system
CN111680648A (en) * 2020-06-12 2020-09-18 成都数之联科技有限公司 Training method of target density estimation neural network
CN111950458A (en) * 2020-08-12 2020-11-17 每步科技(上海)有限公司 Natatorium monitoring system and method and intelligent robot
CN112084959A (en) * 2020-09-11 2020-12-15 腾讯科技(深圳)有限公司 Crowd image processing method and device
CN112183627A (en) * 2020-09-28 2021-01-05 中星技术股份有限公司 Method for generating predicted density map network and vehicle annual inspection mark number detection method
CN112598725A (en) * 2019-09-17 2021-04-02 佳能株式会社 Image processing apparatus, image processing method, and computer readable medium
CN113205280A (en) * 2021-05-28 2021-08-03 广西大学 Electric vehicle charging station site selection method for Liqun guided attention inference network
CN114120361A (en) * 2021-11-19 2022-03-01 西南交通大学 Crowd counting and positioning method based on coding and decoding structure
CN114241411A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Counting model processing method and device based on target detection and computer equipment
CN114494999A (en) * 2022-01-18 2022-05-13 西南交通大学 Double-branch combined target intensive prediction method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060195199A1 (en) * 2003-10-21 2006-08-31 Masahiro Iwasaki Monitoring device
JP3910626B2 (en) * 2003-10-21 2007-04-25 松下電器産業株式会社 Monitoring device
CN102013022A (en) * 2010-11-23 2011-04-13 北京大学 Selective feature background subtraction method aiming at thick crowd monitoring scene
CN105122270A (en) * 2012-11-21 2015-12-02 派尔高公司 Method and system for counting people using depth sensor
CN108764085A (en) * 2018-05-17 2018-11-06 上海交通大学 Based on the people counting method for generating confrontation network
CN108805619A (en) * 2018-06-07 2018-11-13 肇庆高新区徒瓦科技有限公司 A kind of stream of people's statistical system for billboard
CN109101930A (en) * 2018-08-18 2018-12-28 华中科技大学 A kind of people counting method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060195199A1 (en) * 2003-10-21 2006-08-31 Masahiro Iwasaki Monitoring device
JP3910626B2 (en) * 2003-10-21 2007-04-25 松下電器産業株式会社 Monitoring device
CN102013022A (en) * 2010-11-23 2011-04-13 北京大学 Selective feature background subtraction method aiming at thick crowd monitoring scene
CN105122270A (en) * 2012-11-21 2015-12-02 派尔高公司 Method and system for counting people using depth sensor
CN108764085A (en) * 2018-05-17 2018-11-06 上海交通大学 Based on the people counting method for generating confrontation network
CN108805619A (en) * 2018-06-07 2018-11-13 肇庆高新区徒瓦科技有限公司 A kind of stream of people's statistical system for billboard
CN109101930A (en) * 2018-08-18 2018-12-28 华中科技大学 A kind of people counting method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIANG LIU 等: "DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
YINGYING ZHANG 等: "Single-Image Crowd Counting via Multi-Column Convolutional Neural Network", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
YUHONG LI 等: "CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
郭文生 等: "基于自适应叠合分割与深度神经网络的人数统计方法", 《计算机科学》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598725A (en) * 2019-09-17 2021-04-02 佳能株式会社 Image processing apparatus, image processing method, and computer readable medium
CN110852267A (en) * 2019-11-11 2020-02-28 复旦大学 Crowd density estimation method and device based on optical flow fusion type deep neural network
CN110852267B (en) * 2019-11-11 2022-06-14 复旦大学 Crowd density estimation method and device based on optical flow fusion type deep neural network
CN111445442A (en) * 2020-03-05 2020-07-24 中国平安人寿保险股份有限公司 Crowd counting method and device based on neural network, server and storage medium
CN111445442B (en) * 2020-03-05 2024-04-30 中国平安人寿保险股份有限公司 Crowd counting method and device based on neural network, server and storage medium
CN111428653A (en) * 2020-03-27 2020-07-17 湘潭大学 Pedestrian congestion state determination method, device, server and storage medium
CN111428653B (en) * 2020-03-27 2024-02-02 湘潭大学 Pedestrian congestion state judging method, device, server and storage medium
CN111626184A (en) * 2020-05-25 2020-09-04 齐鲁工业大学 Crowd density estimation method and system
CN111680648A (en) * 2020-06-12 2020-09-18 成都数之联科技有限公司 Training method of target density estimation neural network
CN111950458A (en) * 2020-08-12 2020-11-17 每步科技(上海)有限公司 Natatorium monitoring system and method and intelligent robot
CN112084959A (en) * 2020-09-11 2020-12-15 腾讯科技(深圳)有限公司 Crowd image processing method and device
CN112084959B (en) * 2020-09-11 2024-04-16 腾讯科技(深圳)有限公司 Crowd image processing method and device
CN112183627A (en) * 2020-09-28 2021-01-05 中星技术股份有限公司 Method for generating predicted density map network and vehicle annual inspection mark number detection method
CN113205280A (en) * 2021-05-28 2021-08-03 广西大学 Electric vehicle charging station site selection method for Liqun guided attention inference network
CN113205280B (en) * 2021-05-28 2023-06-23 广西大学 Electric vehicle charging station address selection method of plum cluster guiding attention reasoning network
CN114120361A (en) * 2021-11-19 2022-03-01 西南交通大学 Crowd counting and positioning method based on coding and decoding structure
CN114241411A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Counting model processing method and device based on target detection and computer equipment
CN114241411B (en) * 2021-12-15 2024-04-09 平安科技(深圳)有限公司 Counting model processing method and device based on target detection and computer equipment
CN114494999A (en) * 2022-01-18 2022-05-13 西南交通大学 Double-branch combined target intensive prediction method and system

Also Published As

Publication number Publication date
CN110188597B (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN110188597A (en) A kind of dense population counting and accurate positioning method and system based on attention mechanism circulation scaling
CN107230351B (en) A kind of Short-time Traffic Flow Forecasting Methods based on deep learning
CN105354548B (en) A kind of monitor video pedestrian recognition methods again based on ImageNet retrievals
CN110517487B (en) Urban area traffic resource regulation and control method and system based on thermodynamic diagram change identification
CN102324016B (en) Statistical method for high-density crowd flow
CN109753946A (en) A kind of real scene pedestrian's small target deteection network and detection method based on the supervision of body key point
CN113240688A (en) Integrated flood disaster accurate monitoring and early warning method
CN109241871A (en) A kind of public domain stream of people's tracking based on video data
CN109376576A (en) The object detection method for training network from zero based on the intensive connection of alternately update
CN108399745A (en) A kind of city road network trend prediction method at times based on unmanned plane
CN105243356B (en) A kind of method and device that establishing pedestrian detection model and pedestrian detection method
CN109190507A (en) A kind of passenger flow crowding calculation method and device based on rail transit train
CN109635748A (en) The extracting method of roadway characteristic in high resolution image
CN109002752A (en) A kind of complicated common scene rapid pedestrian detection method based on deep learning
Rong et al. Pest identification and counting of yellow plate in field based on improved mask r-cnn
CN109635720A (en) The illegal road occupying real-time detection method actively monitored based on video
CN110633678A (en) Rapid and efficient traffic flow calculation method based on video images
Zheng et al. A review of remote sensing image object detection algorithms based on deep learning
CN105224911A (en) A kind of various visual angles pedestrian detection method and system in real time
CN110163060A (en) The determination method and electronic equipment of crowd density in image
Zhang et al. cst-ml: Continuous spatial-temporal meta-learning for traffic dynamics prediction
CN104077571B (en) A kind of crowd's anomaly detection method that model is serialized using single class
Zhou et al. Method for judging parking status based on yolov2 target detection algorithm
CN109977796A (en) Trail current detection method and device
CN109711313A (en) It is a kind of to identify the real-time video monitoring algorithm that sewage is toppled over into river

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant